From patchwork Fri Oct 14 07:54:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 1689920 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=cAl9NQyR; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MpdwZ2439z23jn for ; Fri, 14 Oct 2022 18:55:26 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 04FF23850402 for ; Fri, 14 Oct 2022 07:55:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 04FF23850402 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734123; bh=4dW3v53gqAMQVY15dst/aEUsuTdd31vIBS6e70g/DuY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=cAl9NQyRsgu9B/10+EfkRQ9MY07GNW0tB9OgTpCvalcZGAY+NHzr0lK3MjP7mRKGf Pn3Lh7Xu65nHRTPTWuHyTtW6FyduIwJ7rh0KXSAqt6spN36nDU8mm5fsUVwlN5Ebvl bc/VllJSMa0seuouL7KKn0/2ytqIMdQwtzYHdxzw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id C2EA338582A6 for ; Fri, 14 Oct 2022 07:54:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C2EA338582A6 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597861" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597861" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:54:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488383" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488383" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:48 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id DA0C21009C8C; Fri, 14 Oct 2022 15:54:47 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/6] Support Intel AVX-IFMA Date: Fri, 14 Oct 2022 15:54:40 +0800 Message-Id: <20221014075445.7938-2-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com, Hongyu Wang Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Hongyu Wang gcc/ * common/config/i386/i386-common.cc (OPTION_MASK_ISA_AVXIFMA_SET, OPTION_MASK_ISA2_AVXIFMA_UNSET, OPTION_MASK_ISA2_AVX2_UNSET): New macro. (ix86_handle_option): Handle -mavxifma. * commmon/config/i386/i386-cpuinfo.h (processor_types): Add FEATURE_AVXIFMA. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for avxifma. * common/config/i386/cpuinfo.h (get_available_features): Detect avxifma. * config.gcc: Add avxifmaintrin.h * config/i386/avxifmaintrin.h: New. * config/i386/cpuid.h (bit_AVXIFMA): New. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AVXIFMA__. * config/i386/i386-options.cc (isa2_opts): Add -mavxifma. (ix86_valid_target_attribute_inner_p): Handle avxifma. * config/i386/i386.h (TARGET_AVXIFMA, TARGET_AVXIFMA_P, PTA_AVXIFMA): New. * config/i386/i386.opt: Add option -mavxifma. * config/i386/immintrin.h: Inculde avxifmaintrin.h. * config/i386/sse.md (vpamdd52): Remove. (avx_vpmadd52_, vpamdd52, vpamdd52_maskz_1): New define_insn. * doc/invoke.texi: Document -mavxifma. * doc/extend.texi: Document avxifma. * doc/sourcebuild.text: Document target avxifma. gcc/testsuite/ * gcc.target/i386/avx512ifma-vpmaddhuq-1.c: Remane.. * gcc.target/i386/avx512ifma-vpmaddhuq-1a.c: To this. * gcc.target/i386/avx512ifma-vpmaddluq-1.c: Ditto. * gcc.target/i386/avx512ifma-vpmaddluq-1a.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2a.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2a.c: Ditto. * gcc.target/i386/avx-check.h: Add avxifma check. * gcc.target/i386/avx512ifma-vpmaddhuq-1b.c: New Test. * gcc.target/i386/avx512ifma-vpmaddluq-1b.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2b.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2b.c: Ditto. * gcc.target/i386/avx-ifma-1.c: Ditto. * gcc.target/i386/avx-ifma-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx-ifma-vpmaddluq-2.c: Ditto. * gcc.target/i386/sse-12.c: Add -mavxifma. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * g++.dg/other/i386-2.C: Ditto. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/builtin_target.c: Detect avxifma. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * lib/target-supports.exp (check_effective_target_avxifma): New. --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.cc | 20 ++++- gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 1 + gcc/config.gcc | 3 +- gcc/config/i386/avxifmaintrin.h | 78 +++++++++++++++++++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-builtin.def | 6 ++ gcc/config/i386/i386-c.cc | 2 + gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 4 +- gcc/config/i386/i386.opt | 5 ++ gcc/config/i386/immintrin.h | 2 + gcc/config/i386/sse.md | 42 +++++++++- gcc/doc/extend.texi | 5 ++ gcc/doc/invoke.texi | 9 ++- gcc/doc/sourcebuild.texi | 3 + gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-check.h | 6 +- gcc/testsuite/gcc.target/i386/avx-ifma-1.c | 20 +++++ .../gcc.target/i386/avx-ifma-vpmaddhuq-2.c | 72 +++++++++++++++++ .../gcc.target/i386/avx-ifma-vpmaddluq-2.c | 61 +++++++++++++++ ...pmaddhuq-1.c => avx512ifma-vpmaddhuq-1a.c} | 0 .../gcc.target/i386/avx512ifma-vpmaddhuq-1b.c | 33 ++++++++ ...pmaddluq-1.c => avx512ifma-vpmaddluq-1a.c} | 0 .../gcc.target/i386/avx512ifma-vpmaddluq-1b.c | 33 ++++++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/sse-12.c | 2 +- gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- gcc/testsuite/gcc.target/i386/sse-22.c | 4 +- gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 +++ 34 files changed, 423 insertions(+), 17 deletions(-) create mode 100644 gcc/config/i386/avxifmaintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx-ifma-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddhuq-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddluq-2.c rename gcc/testsuite/gcc.target/i386/{avx512ifma-vpmaddhuq-1.c => avx512ifma-vpmaddhuq-1a.c} (100%) create mode 100644 gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1b.c rename gcc/testsuite/gcc.target/i386/{avx512ifma-vpmaddluq-1.c => avx512ifma-vpmaddluq-1a.c} (100%) create mode 100644 gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1b.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index b5c1b21e554..9bb21c6cacc 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -793,6 +793,8 @@ get_available_features (struct __processor_model *cpu_model, { if (eax & bit_AVXVNNI) set_feature (FEATURE_AVXVNNI); + if (eax & bit_AVXIFMA) + set_feature (FEATURE_AVXIFMA); } if (avx512_usable) { diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index d6a68dc9b1d..4de7906b247 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -76,6 +76,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512F_SET) #define OPTION_MASK_ISA_AVX512IFMA_SET \ (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512F_SET) +#define OPTION_MASK_ISA2_AVXIFMA_SET OPTION_MASK_ISA2_AVXIFMA #define OPTION_MASK_ISA_AVX512VBMI_SET \ (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512BW_SET) #define OPTION_MASK_ISA2_AVX5124FMAPS_SET OPTION_MASK_ISA2_AVX5124FMAPS @@ -212,7 +213,8 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_AVX2_UNSET \ (OPTION_MASK_ISA_AVX2 | OPTION_MASK_ISA_AVX512F_UNSET) #define OPTION_MASK_ISA2_AVX2_UNSET \ - (OPTION_MASK_ISA2_AVXVNNI_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET) + (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \ + | OPTION_MASK_ISA2_AVX512F_UNSET) #define OPTION_MASK_ISA_AVX512F_UNSET \ (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \ | OPTION_MASK_ISA_AVX512PF_UNSET | OPTION_MASK_ISA_AVX512ER_UNSET \ @@ -230,6 +232,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VBMI_UNSET) #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA +#define OPTION_MASK_ISA2_AVXIFMA_UNSET OPTION_MASK_ISA2_AVXIFMA #define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW @@ -1124,6 +1127,21 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mavxifma: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVXIFMA_SET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVXIFMA_SET; + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET; + } + else + { + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVXIFMA_UNSET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVXIFMA_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 643fbd97378..968f9a56a6c 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -240,6 +240,7 @@ enum processor_features FEATURE_X86_64_V2, FEATURE_X86_64_V3, FEATURE_X86_64_V4, + FEATURE_AVXIFMA, CPU_FEATURE_MAX }; diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h index 2d0646a68f8..b05b4bb8f0d 100644 --- a/gcc/common/config/i386/i386-isas.h +++ b/gcc/common/config/i386/i386-isas.h @@ -175,4 +175,5 @@ ISA_NAMES_TABLE_START ISA_NAMES_TABLE_ENTRY("x86-64-v2", FEATURE_X86_64_V2, P_X86_64_V2, NULL) ISA_NAMES_TABLE_ENTRY("x86-64-v3", FEATURE_X86_64_V3, P_X86_64_V3, NULL) ISA_NAMES_TABLE_ENTRY("x86-64-v4", FEATURE_X86_64_V4, P_X86_64_V4, NULL) + ISA_NAMES_TABLE_ENTRY("avxifma", FEATURE_AVXIFMA, P_NONE, "-mavxifma") ISA_NAMES_TABLE_END diff --git a/gcc/config.gcc b/gcc/config.gcc index 8d5972fecf7..12365abbf86 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -421,7 +421,8 @@ i[34567]86-*-* | x86_64-*-*) tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h - mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h" + mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h + avxifmaintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avxifmaintrin.h b/gcc/config/i386/avxifmaintrin.h new file mode 100644 index 00000000000..8f512c3ecb0 --- /dev/null +++ b/gcc/config/i386/avxifmaintrin.h @@ -0,0 +1,78 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVXIFMAINTRIN_H_INCLUDED +#define _AVXIFMAINTRIN_H_INCLUDED + +#ifndef __AVXIFMA__ +#pragma GCC push_options +#pragma GCC target("avxifma") +#define __DISABLE_AVXIFMA__ +#endif /* __AVXIFMA__ */ + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_madd52lo_avx_epu64 (__m128i __X, __m128i __Y, __m128i __Z) +{ + return (__m128i) __builtin_ia32_avx_vpmadd52luq128 ((__v2di) __X, + (__v2di) __Y, + (__v2di) __Z); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_madd52hi_avx_epu64 (__m128i __X, __m128i __Y, __m128i __Z) +{ + return (__m128i) __builtin_ia32_avx_vpmadd52huq128 ((__v2di) __X, + (__v2di) __Y, + (__v2di) __Z); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_madd52lo_avx_epu64 (__m256i __X, __m256i __Y, __m256i __Z) +{ + return (__m256i) __builtin_ia32_avx_vpmadd52luq256 ((__v4di) __X, + (__v4di) __Y, + (__v4di) __Z); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_madd52hi_avx_epu64 (__m256i __X, __m256i __Y, __m256i __Z) +{ + return (__m256i) __builtin_ia32_avx_vpmadd52huq256 ((__v4di) __X, + (__v4di) __Y, + (__v4di) __Z); +} + +#ifdef __DISABLE_AVXIFMA__ +#undef __DISABLE_AVXIFMA__ +#pragma GCC pop_options +#endif /* __DISABLE_AVXIFMA__ */ + +#endif /* _AVXIFMAINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index a4c2fed7eda..9885699efd5 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -28,6 +28,7 @@ #define bit_AVXVNNI (1 << 4) #define bit_AVX512BF16 (1 << 5) #define bit_HRESET (1 << 22) +#define bit_AVXIFMA (1 << 23) /* %ecx */ #define bit_SSE3 (1 << 0) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index dea52a28d28..4a89099a00f 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2499,6 +2499,12 @@ BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpamdd BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpamdd52huqv2di_mask, "__builtin_ia32_vpmadd52huq128_mask", IX86_BUILTIN_VPMADD52HUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI) BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpamdd52huqv2di_maskz, "__builtin_ia32_vpmadd52huq128_maskz", IX86_BUILTIN_VPMADD52HUQ128_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI) +/* AVX_IFMA */ +BDESC (0, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_avx_vpmadd52luq_v4di, "__builtin_ia32_avx_vpmadd52luq256", IX86_BUINTIN_AVX_VPMADD52LUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI) +BDESC (0, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_avx_vpmadd52huq_v4di, "__builtin_ia32_avx_vpmadd52huq256", IX86_BUINTIN_AVX_VPMADD52HUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI) +BDESC (0, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_avx_vpmadd52luq_v2di, "__builtin_ia32_avx_vpmadd52luq128", IX86_BUINTIN_AVX_VPMADD52LUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI) +BDESC (0, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_avx_vpmadd52huq_v2di, "__builtin_ia32_avx_vpmadd52huq128", IX86_BUINTIN_AVX_VPMADD52HUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI) + /* AVX512VBMI */ BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmultishiftqbv32qi_mask, "__builtin_ia32_vpmultishiftqb256_mask", IX86_BUILTIN_VPMULTISHIFTQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index eb0e3b36a76..3494ec035d5 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -633,6 +633,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__WIDEKL__"); if (isa_flag2 & OPTION_MASK_ISA2_AVXVNNI) def_or_undef (parse_in, "__AVXVNNI__"); + if (isa_flag2 & OPTION_MASK_ISA2_AVXIFMA) + def_or_undef (parse_in, "__AVXIFMA__"); if (TARGET_IAMCU) { def_or_undef (parse_in, "__iamcu"); diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def index 83659d0bea4..6e0254ce418 100644 --- a/gcc/config/i386/i386-isa.def +++ b/gcc/config/i386/i386-isa.def @@ -109,3 +109,4 @@ DEF_PTA(KL) DEF_PTA(WIDEKL) DEF_PTA(AVXVNNI) DEF_PTA(AVX512FP16) +DEF_PTA(AVXIFMA) diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index acb2291e70f..5facb64c2a8 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -226,7 +226,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mkl", OPTION_MASK_ISA2_KL }, { "-mwidekl", OPTION_MASK_ISA2_WIDEKL }, { "-mavxvnni", OPTION_MASK_ISA2_AVXVNNI }, - { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 } + { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 }, + { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA } }; static struct ix86_target_opts isa_opts[] = { @@ -1072,6 +1073,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("hreset", OPT_mhreset), IX86_ATTR_ISA ("avxvnni", OPT_mavxvnni), IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16), + IX86_ATTR_ISA ("avxifma", OPT_mavxifma), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 0dbaacb57ed..36e28b7063d 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1214,3 +1214,8 @@ Do not use GOT to access external symbols. -param=x86-stlf-window-ninsns= Target Joined UInteger Var(x86_stlf_window_ninsns) Init(64) Param Instructions number above which STFL stall penalty can be compensated. + +mavxifma +Target Mask(ISA2_AVXIFMA) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and +AVXIFMA built-in functions and code generation. diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 6afd78c2b6f..e9d4e975243 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -44,6 +44,8 @@ #include +#include + #include #include diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 076064f97e6..331347569ea 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -27867,6 +27867,19 @@ (define_int_attr vpmadd52type [(UNSPEC_VPMADD52LUQ "luq") (UNSPEC_VPMADD52HUQ "huq")]) +(define_insn "avx_vpmadd52_" + [(set (match_operand:VI8_AVX2 0 "register_operand" "=x") + (unspec:VI8_AVX2 + [(match_operand:VI8_AVX2 1 "register_operand" "0") + (match_operand:VI8_AVX2 2 "register_operand" "x") + (match_operand:VI8_AVX2 3 "nonimmediate_operand" "xm")] + VPMADD52))] + "TARGET_AVXIFMA" + "%{vex%} vpmadd52\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "type" "ssemuladd") + (set_attr "prefix" "vex") + (set_attr "mode" "")]) + (define_expand "vpamdd52huq_maskz" [(match_operand:VI8_AVX512VL 0 "register_operand") (match_operand:VI8_AVX512VL 1 "register_operand") @@ -27895,7 +27908,7 @@ DONE; }) -(define_insn "vpamdd52" +(define_insn "vpamdd52" [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v") (unspec:VI8_AVX512VL [(match_operand:VI8_AVX512VL 1 "register_operand" "0") @@ -27903,7 +27916,32 @@ (match_operand:VI8_AVX512VL 3 "nonimmediate_operand" "vm")] VPMADD52))] "TARGET_AVX512IFMA" - "vpmadd52\t{%3, %2, %0|%0, %2, %3}" +{ + if ( <=32 + && TARGET_AVXIFMA + && !EXT_REX_SSE_REG_P (operands[1]) + && !EXT_REX_SSE_REG_P (operands[2]) + && !EXT_REX_SSE_REG_P (operands[3])) + return "%{vex%} vpmadd52\t{%3, %2, %0|%0, %2, %3}"; + else + return "vpmadd52\t{%3, %2, %0|%0, %2, %3}"; +} + [(set_attr "type" "ssemuladd") + (set_attr "prefix" "maybe_evex") + (set_attr "mode" "")]) + +(define_insn "vpamdd52_maskz_1" + [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v") + (vec_merge:VI8_AVX512VL + (unspec:VI8_AVX512VL + [(match_operand:VI8_AVX512VL 1 "register_operand" "0") + (match_operand:VI8_AVX512VL 2 "register_operand" "v") + (match_operand:VI8_AVX512VL 3 "nonimmediate_operand" "vm")] + VPMADD52) + (match_operand:VI8_AVX512VL 4 "const0_operand" "C") + (match_operand: 5 "register_operand" "Yk")))] + "TARGET_AVX512IFMA" + "vpmadd52\t{%3, %2, %0%{%5%}%{z%}|%0%{%5%}%{z%}, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "prefix" "evex") (set_attr "mode" "")]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index cfbe32afce9..edecf5c0070 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7060,6 +7060,11 @@ Enable/disable the generation of the WIDEKL instructions. @cindex @code{target("avxvnni")} function attribute, x86 Enable/disable the generation of the AVXVNNI instructions. +@item avxifma +@itemx no-avxifma +@cindex @code{target("avxifma")} function attribute, x86 +Enable/disable the generation of the AVXIFMA instructions. + @item cld @itemx no-cld @cindex @code{target("cld")} function attribute, x86 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index a9ecc4426a4..886fc1d0164 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1436,7 +1436,7 @@ See RS/6000 and PowerPC Options. -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni@gol --mavx512fp16 @gol +-mavx512fp16 -mavxifma @gol -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mkl -mwidekl @gol @@ -32893,6 +32893,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @need 200 @itemx -mwidekl @opindex mwidekl +@need 200 +@itemx -mavxifma +@opindex mavxifma These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA, @@ -32902,8 +32905,8 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP, XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2, GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16, ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE, -UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16 -or CLDEMOTE extended instruction sets. Each has a corresponding +UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16, +AVXIFMA or CLDEMOTE extended instruction sets. Each has a corresponding @option{-mno-} option to disable use of these instructions. These extensions are also available as built-in functions: see diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index c81e2ffd43a..0173acf4a65 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2490,6 +2490,9 @@ Target supports the execution of @code{avx512f} instructions. @item avx512vp2intersect Target supports the execution of @code{avx512vp2intersect} instructions. +@item avxifma +Target supports the execution of @code{avxifma} instructions. + @item amx_tile Target supports the execution of @code{amx-tile} instructions. diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index fba3d1ac684..5388606779b 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index 5cc0fa83457..86cedd3d32f 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/gcc.target/i386/avx-check.h b/gcc/testsuite/gcc.target/i386/avx-check.h index 7ddca9d7b80..24ee6ab4efd 100644 --- a/gcc/testsuite/gcc.target/i386/avx-check.h +++ b/gcc/testsuite/gcc.target/i386/avx-check.h @@ -22,7 +22,11 @@ main () /* Run AVX test only if host has AVX support. */ if (((ecx & (bit_AVX | bit_OSXSAVE)) == (bit_AVX | bit_OSXSAVE)) - && avx_os_support ()) + && avx_os_support () +#ifdef AVXIFMA + && __builtin_cpu_supports ("avxifma") +#endif + ) { do_test (); #ifdef DEBUG diff --git a/gcc/testsuite/gcc.target/i386/avx-ifma-1.c b/gcc/testsuite/gcc.target/i386/avx-ifma-1.c new file mode 100644 index 00000000000..6388373123c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ifma-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-mavxifma -O2" } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 1 } } */ + +#include + +volatile __m256i x,y,z; +volatile __m128i x_,y_,z_; + +void extern +avxifma_test (void) +{ + x = _mm256_madd52hi_avx_epu64 (x, y, z); + x = _mm256_madd52lo_avx_epu64 (x, y, z); + x_ = _mm_madd52hi_avx_epu64 (x_, y_, z_); + x_ = _mm_madd52lo_avx_epu64 (x_, y_, z_); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddhuq-2.c b/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddhuq-2.c new file mode 100644 index 00000000000..c9efee33091 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddhuq-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxifma" } */ +/* { dg-require-effective-target avxifma } */ +#define AVXIFMA +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +void +CALC (long long *r, long long *s1, long long *s2, long long *s3, int size) +{ + int i; + long long a,b; + + for (i = 0; i < size; i++) + { + /* Simulate higher 52 bits out of 104 bit, + by shifting opernads with 0 in lower 26 bits. */ + a = s2[i] >> 26; + b = s3[i] >> 26; + r[i] = a * b + s1[i]; + } +} + +void +TEST (void) +{ + union256i_q src1_256, src2_256, dst_256; + union128i_q src1_128, src2_128, dst_128; + long long dst_ref_256[4], dst_ref_128[2]; + int i; + + for (i = 0; i < 4; i++) + { + src1_256.a[i] = 15 + 3467 * i; + src2_256.a[i] = 9217 + i; + src1_256.a[i] = src1_256.a[i] << 26; + src2_256.a[i] = src2_256.a[i] << 26; + src1_256.a[i] &= ((1LL << 52) - 1); + src2_256.a[i] &= ((1LL << 52) - 1); + dst_256.a[i] = -1; + } + + for (i = 0; i < 2; i++) + { + src1_128.a[i] = 16 + 3467 * i; + src2_128.a[i] = 9127 + i; + src1_128.a[i] = src1_128.a[i] << 26; + src2_128.a[i] = src2_128.a[i] << 26; + src1_128.a[i] &= ((1LL << 52) - 1); + src2_128.a[i] &= ((1LL << 52) - 1); + dst_128.a[i] = -1; + } + + CALC (dst_ref_256, dst_256.a, src1_256.a, src2_256.a, 4); + dst_256.x = _mm256_madd52hi_avx_epu64 (dst_256.x, src1_256.x, src2_256.x); + if (check_union256i_q (dst_256, dst_ref_256)) + abort (); + + CALC (dst_ref_128, dst_128.a, src1_128.a, src2_128.a, 2); + dst_128.x = _mm_madd52hi_avx_epu64 (dst_128.x, src1_128.x, src2_128.x); + if (check_union128i_q (dst_128, dst_ref_128)) + abort (); + +} + diff --git a/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddluq-2.c b/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddluq-2.c new file mode 100644 index 00000000000..600978ea9ad --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddluq-2.c @@ -0,0 +1,61 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxifma" } */ +/* { dg-require-effective-target avxifma } */ +#define AVXIFMA +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +void +CALC (unsigned long long *r, unsigned long long *s1, + unsigned long long *s2, unsigned long long *s3, + int size) +{ + int i; + + for (i = 0; i < size; i++) + { + r[i] = s2[i] * s3[i] + s1[i]; + } +} + +void +TEST (void) +{ + union256i_q src1_256, src2_256, dst_256; + union128i_q src1_128, src2_128, dst_128; + unsigned long long dst_ref_256[4], dst_ref_128[2]; + int i; + + for (i = 0; i < 4; i++) + { + src1_256.a[i] = 3450 * i; + src2_256.a[i] = 7863 * i; + dst_256.a[i] = 117; + } + + for (i = 0; i < 2; i++) + { + src1_128.a[i] = 3540 * i; + src2_128.a[i] = 7683 * i; + dst_128.a[i] = 117; + } + + CALC (dst_ref_256, dst_256.a, src1_256.a, src2_256.a, 4); + dst_256.x = _mm256_madd52lo_avx_epu64 (dst_256.x, src1_256.x, src2_256.x); + if (check_union256i_q (dst_256, dst_ref_256)) + abort (); + + CALC (dst_ref_128, dst_128.a, src1_128.a, src2_128.a, 2); + dst_128.x = _mm_madd52lo_avx_epu64 (dst_128.x, src1_128.x, src2_128.x); + if (check_union128i_q (dst_128, dst_ref_128)) + abort (); + +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1.c b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1a.c similarity index 100% rename from gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1.c rename to gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1a.c diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1b.c b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1b.c new file mode 100644 index 00000000000..67e94baa01b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1b.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512ifma -mavx512vl -mavxifma -O2" } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ + +#include + +volatile __m512i _x1, _y1, _z1; +volatile __m256i _x2, _y2, _z2; +volatile __m128i _x3, _y3, _z3; + +void extern +avx512ifma_test (void) +{ + _x3 = _mm_madd52hi_epu64 (_x3, _y3, _z3); + _x3 = _mm_mask_madd52hi_epu64 (_x3, 2, _y3, _z3); + _x3 = _mm_maskz_madd52hi_epu64 (2, _x3, _y3, _z3); + _x2 = _mm256_madd52hi_epu64 (_x2, _y2, _z2); + _x2 = _mm256_mask_madd52hi_epu64 (_x2, 3, _y2, _z2); + _x2 = _mm256_maskz_madd52hi_epu64 (3, _x2, _y2, _z2); + _x1 = _mm512_madd52hi_epu64 (_x1, _y1, _z1); + _x1 = _mm512_mask_madd52hi_epu64 (_x1, 3, _y1, _z1); + _x1 = _mm512_maskz_madd52hi_epu64 (3, _x1, _y1, _z1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1.c b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1a.c similarity index 100% rename from gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1.c rename to gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1a.c diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1b.c b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1b.c new file mode 100644 index 00000000000..4b8ea27f403 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1b.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512ifma -mavx512vl -mavxifma -O2" } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ + +#include + +volatile __m512i _x1, _y1, _z1; +volatile __m256i _x2, _y2, _z2; +volatile __m128i _x3, _y3, _z3; + +void extern +avx512ifma_test (void) +{ + _x3 = _mm_madd52lo_epu64 (_x3, _y3, _z3); + _x3 = _mm_mask_madd52lo_epu64 (_x3, 2, _y3, _z3); + _x3 = _mm_maskz_madd52lo_epu64 (2, _x3, _y3, _z3); + _x2 = _mm256_madd52lo_epu64 (_x2, _y2, _z2); + _x2 = _mm256_mask_madd52lo_epu64 (_x2, 3, _y2, _z2); + _x2 = _mm256_maskz_madd52lo_epu64 (3, _x2, _y2, _z2); + _x1 = _mm512_madd52lo_epu64 (_x1, _y1, _z1); + _x1 = _mm512_mask_madd52lo_epu64 (_x1, 3, _y1, _z1); + _x1 = _mm512_maskz_madd52lo_epu64 (3, _x1, _y1, _z1); +} diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index b76dddb86a2..466555c0d06 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -80,6 +80,7 @@ extern void test_keylocker (void) __attribute__((__target__("kl"))); extern void test_widekl (void) __attribute__((__target__("widekl"))); extern void test_avxvnni (void) __attribute__((__target__("avxvnni"))); extern void test_avx512fp16 (void) __attribute__((__target__("avx512fp16"))); +extern void test_avxifma (void) __attribute__((__target__("avxifma"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx5124fmaps(void) __attribute__((__target__("no-avx5124fmaps"))); @@ -161,6 +162,7 @@ extern void test_no_keylocker (void) __attribute__((__target__("no-kl"))); extern void test_no_widekl (void) __attribute__((__target__("no-widekl"))); extern void test_no_avxvnni (void) __attribute__((__target__("no-avxvnni"))); extern void test_no_avx512fp16 (void) __attribute__((__target__("no-avx512fp16"))); +extern void test_no_avxifma (void) __attribute__((__target__("no-avxifma"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c index 375d4d1b4de..fde56261d8f 100644 --- a/gcc/testsuite/gcc.target/i386/sse-12.c +++ b/gcc/testsuite/gcc.target/i386/sse-12.c @@ -3,7 +3,7 @@ popcntintrin.h gfniintrin.h and mm_malloc.h are usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ -/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */ +/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index e285c307d00..bb29555babe 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index f41493b93f3..f2701ddaaf9 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 31492ef3697..3d196975b1e 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -103,7 +103,7 @@ #ifndef DIFFERENT_PRAGMAS -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") #endif /* Following intrinsics require immediate arguments. They @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1) /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */ #ifdef DIFFERENT_PRAGMAS -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16") +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") #endif #include test_1 (_cvtss_sh, unsigned short, float, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index f71a7b29157..d3a233f90fc 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -843,6 +843,6 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") #include diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index fdd88e6a516..69de3b96bfc 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -9506,6 +9506,18 @@ proc check_effective_target_avxvnni { } { } "-mavxvnni" ] } +# Return 1 if avxifma instructions can be compiled. +proc check_effective_target_avxifma { } { + return [check_no_compiler_messages avxifma object { + typedef long long __v4di __attribute__ ((__vector_size__ (32))); + __v4di + _mm256_maddlo_avx_epu64 (__v4di __X, __v4di __Y, __v4di __Z) + { + return __builtin_ia32_avx_vpmadd52luq256 (__X, __Y, __Z); + } + } "-O0 -mavxifma" ] +} + # Return 1 if sse instructions can be compiled. proc check_effective_target_sse { } { return [check_no_compiler_messages sse object { From patchwork Fri Oct 14 07:54:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 1689923 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=RSkzEvwC; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Mpdxh447rz23jn for ; Fri, 14 Oct 2022 18:56:24 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7D0003839DCA for ; Fri, 14 Oct 2022 07:56:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D0003839DCA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734182; bh=JoBt8gvQqSCR4MxTz3CQ6NiXXqyxYjF4Ni7KwZhq7ZE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=RSkzEvwCZjGIVL+tH7B1Np/8Z5u6sDHyUetMIdKAPawZPc/6UBc+7WNym/ppX8TZ0 0l9sP5WA1+RH3cnpVAhxDJoiKEfeoMcm6OXAJLn0qrWd7cuBT07xzfsgUarP5zxxD4 WnYliiBye6LtJSz2SkBsfIpjky/iBrR2cRwmgiGY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 824593858C39 for ; Fri, 14 Oct 2022 07:55:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 824593858C39 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597870" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597870" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:55:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488400" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488400" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:48 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id E05DD1009C8D; Fri, 14 Oct 2022 15:54:47 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/6] Support Intel AVX-VNNI-INT8 Date: Fri, 14 Oct 2022 15:54:41 +0800 Message-Id: <20221014075445.7938-3-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Kong Lingling gcc/ChangeLog * common/config/i386/cpuinfo.h (get_available_features): Detect avxvnniint8. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVXVNNIINT8_SET): New. (OPTION_MASK_ISA2_AVXVNNIINT8_UNSET): Ditto. (ix86_handle_option): Handle -mavxvnniint8. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_AVXVNNIINT8. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for avxvnniint8. * config.gcc: Add avxvnniint8intrin.h. * config/i386/avxvnniint8intrin.h: New file. * config/i386/cpuid.h (bit_AVXVNNIINT8): New. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AVXVNNIINT8__. * config/i386/i386-options.cc (isa2_opts): Add -mavxvnniint8. (ix86_valid_target_attribute_inner_p): Handle avxvnniint8. * config/i386/i386-isa.def: Add DEF_PTA(AVXVNNIINT8) New.. * config/i386/i386.opt: Add option -mavxvnniint8. * config/i386/immintrin.h: Include avxvnniint8intrin.h. * config/i386/sse.md (vpdp_): New define_insn. * doc/extend.texi: Document avxvnniint8. * doc/invoke.texi: Document -mavxvnniint8. * doc/sourcebuild.texi: Document target avxvnniint8. gcc/testsuite/ChangeLog * g++.dg/other/i386-2.C: Add -mavxvnniint8. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/avx-check.h: Add avxvnniint8 check. * gcc.target/i386/sse-12.c: Add -mavxvnniint8. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * lib/target-supports.exp (check_effective_target_avxvnniint8): New. * gcc.target/i386/avxvnniint8-1.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbssd-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbssds-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbsud-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbsuds-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbuud-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbuuds-2.c: Ditto. Co-authored-by: Hongyu Wang Co-authored-by: Haochen Jiang --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.cc | 22 ++- gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 2 + gcc/config.gcc | 2 +- gcc/config/i386/avxvnniint8intrin.h | 138 ++++++++++++++++++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-builtin.def | 14 ++ gcc/config/i386/i386-c.cc | 2 + gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 4 +- gcc/config/i386/i386.opt | 5 + gcc/config/i386/immintrin.h | 2 + gcc/config/i386/sse.md | 31 ++++ gcc/doc/extend.texi | 5 + gcc/doc/invoke.texi | 9 +- gcc/doc/sourcebuild.texi | 3 + gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-check.h | 3 + gcc/testsuite/gcc.target/i386/avxvnniint8-1.c | 43 ++++++ .../gcc.target/i386/avxvnniint8-vpdpbssd-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbssds-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbsud-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbsuds-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbuud-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbuuds-2.c | 72 +++++++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/sse-12.c | 2 +- gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- gcc/testsuite/gcc.target/i386/sse-22.c | 4 +- gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 ++ 34 files changed, 738 insertions(+), 14 deletions(-) create mode 100644 gcc/config/i386/avxvnniint8intrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuuds-2.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index 9bb21c6cacc..bed88003f8e 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -795,6 +795,8 @@ get_available_features (struct __processor_model *cpu_model, set_feature (FEATURE_AVXVNNI); if (eax & bit_AVXIFMA) set_feature (FEATURE_AVXIFMA); + if (edx & bit_AVXVNNIINT8) + set_feature (FEATURE_AVXVNNIINT8); } if (avx512_usable) { diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 4de7906b247..6a2a7e3d25a 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -108,6 +108,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_AMX_TILE_SET OPTION_MASK_ISA2_AMX_TILE #define OPTION_MASK_ISA2_AMX_INT8_SET OPTION_MASK_ISA2_AMX_INT8 #define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16 +#define OPTION_MASK_ISA2_AVXVNNIINT8_SET OPTION_MASK_ISA2_AVXVNNIINT8 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -214,7 +215,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_AVX2 | OPTION_MASK_ISA_AVX512F_UNSET) #define OPTION_MASK_ISA2_AVX2_UNSET \ (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \ - | OPTION_MASK_ISA2_AVX512F_UNSET) + | OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET) #define OPTION_MASK_ISA_AVX512F_UNSET \ (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \ | OPTION_MASK_ISA_AVX512PF_UNSET | OPTION_MASK_ISA_AVX512ER_UNSET \ @@ -278,6 +279,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_KL_UNSET \ (OPTION_MASK_ISA2_KL | OPTION_MASK_ISA2_WIDEKL_UNSET) #define OPTION_MASK_ISA2_WIDEKL_UNSET OPTION_MASK_ISA2_WIDEKL +#define OPTION_MASK_ISA2_AVXVNNIINT8_UNSET OPTION_MASK_ISA2_AVXVNNIINT8 /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -1142,6 +1144,24 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mavxvnniint8: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVXVNNIINT8_SET; + opts->x_ix86_isa_flags2_explicit |= + OPTION_MASK_ISA2_AVXVNNIINT8_SET; + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET; + } + else + { + opts->x_ix86_isa_flags2 &= + ~OPTION_MASK_ISA2_AVXVNNIINT8_UNSET; + opts->x_ix86_isa_flags2_explicit |= + OPTION_MASK_ISA2_AVXVNNIINT8_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 968f9a56a6c..9a6b92fab79 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -241,6 +241,7 @@ enum processor_features FEATURE_X86_64_V3, FEATURE_X86_64_V4, FEATURE_AVXIFMA, + FEATURE_AVXVNNIINT8, CPU_FEATURE_MAX }; diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h index b05b4bb8f0d..8c1f351056c 100644 --- a/gcc/common/config/i386/i386-isas.h +++ b/gcc/common/config/i386/i386-isas.h @@ -176,4 +176,6 @@ ISA_NAMES_TABLE_START ISA_NAMES_TABLE_ENTRY("x86-64-v3", FEATURE_X86_64_V3, P_X86_64_V3, NULL) ISA_NAMES_TABLE_ENTRY("x86-64-v4", FEATURE_X86_64_V4, P_X86_64_V4, NULL) ISA_NAMES_TABLE_ENTRY("avxifma", FEATURE_AVXIFMA, P_NONE, "-mavxifma") + ISA_NAMES_TABLE_ENTRY("avxvnniint8", FEATURE_AVXVNNIINT8, + P_NONE, "-mavxvnniint8") ISA_NAMES_TABLE_END diff --git a/gcc/config.gcc b/gcc/config.gcc index 12365abbf86..4df78238910 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -422,7 +422,7 @@ i[34567]86-*-* | x86_64-*-*) amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h - avxifmaintrin.h" + avxifmaintrin.h avxvnniint8intrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avxvnniint8intrin.h b/gcc/config/i386/avxvnniint8intrin.h new file mode 100644 index 00000000000..362e6f65c2a --- /dev/null +++ b/gcc/config/i386/avxvnniint8intrin.h @@ -0,0 +1,138 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVXVNNIINT8INTRIN_H_INCLUDED +#define _AVXVNNIINT8INTRIN_H_INCLUDED + +#if !defined(__AVXVNNIINT8__) +#pragma GCC push_options +#pragma GCC target("avxvnniint8") +#define __DISABLE_AVXVNNIINT8__ +#endif /* __AVXVNNIINT8__ */ + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbssd_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbssd128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbssds_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbssds128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbsud_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbsud128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbsuds_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbsuds128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbuud_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbuud128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbuuds_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbuuds128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbssd_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbssd256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbssds_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbssds256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbsud_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbsud256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbsuds_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbsuds256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbuud_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbuud256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbuuds_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbuuds256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +#ifdef __DISABLE_AVXVNNIINT8__ +#undef __DISABLE_AVXVNNIINT8__ +#pragma GCC pop_options +#endif /* __DISABLE_AVXVNNIINT8__ */ + +#endif /* __AVXVNNIINT8INTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index 9885699efd5..f5fad22149a 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -49,6 +49,7 @@ #define bit_RDRND (1 << 30) /* %edx */ +#define bit_AVXVNNIINT8 (1 << 4) #define bit_CMPXCHG8B (1 << 8) #define bit_CMOV (1 << 15) #define bit_MMX (1 << 23) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 4a89099a00f..e6edae5728b 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2696,6 +2696,20 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v4si_mask, "__builtin_ia32_vpdpwssds_v4si_mask", IX86_BUILTIN_VPDPWSSDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v4si_maskz, "__builtin_ia32_vpdpwssds_v4si_maskz", IX86_BUILTIN_VPDPWSSDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +/* AVXVNNIINT8 */ +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssd_v8si, "__builtin_ia32_vpdpbssd256", IX86_BUILTIN_VPDPBSSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssds_v8si, "__builtin_ia32_vpdpbssds256", IX86_BUILTIN_VPDPBSSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsud_v8si, "__builtin_ia32_vpdpbsud256", IX86_BUILTIN_VPDPBSUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsuds_v8si, "__builtin_ia32_vpdpbsuds256", IX86_BUILTIN_VPDPBSUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuud_v8si, "__builtin_ia32_vpdpbuud256", IX86_BUILTIN_VPDPBUUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuuds_v8si, "__builtin_ia32_vpdpbuuds256", IX86_BUILTIN_VPDPBUUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssd_v4si, "__builtin_ia32_vpdpbssd128", IX86_BUILTIN_VPDPBSSDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssds_v4si, "__builtin_ia32_vpdpbssds128", IX86_BUILTIN_VPDPBSSDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsud_v4si, "__builtin_ia32_vpdpbsud128", IX86_BUILTIN_VPDPBSUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsuds_v4si, "__builtin_ia32_vpdpbsuds128", IX86_BUILTIN_VPDPBSUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuud_v4si, "__builtin_ia32_vpdpbuud128", IX86_BUILTIN_VPDPBUUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuuds_v4si, "__builtin_ia32_vpdpbuuds128", IX86_BUILTIN_VPDPBUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) + /* VPCLMULQDQ */ BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpclmulqdq_v2di, "__builtin_ia32_vpclmulqdq_v2di", IX86_BUILTIN_VPCLMULQDQ2, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT) BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vpclmulqdq_v4di, "__builtin_ia32_vpclmulqdq_v4di", IX86_BUILTIN_VPCLMULQDQ4, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index 3494ec035d5..a9a35c0a18a 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -635,6 +635,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__AVXVNNI__"); if (isa_flag2 & OPTION_MASK_ISA2_AVXIFMA) def_or_undef (parse_in, "__AVXIFMA__"); + if (isa_flag2 & OPTION_MASK_ISA2_AVXVNNIINT8) + def_or_undef (parse_in, "__AVXVNNIINT8__"); if (TARGET_IAMCU) { def_or_undef (parse_in, "__iamcu"); diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def index 6e0254ce418..c95b917c6ce 100644 --- a/gcc/config/i386/i386-isa.def +++ b/gcc/config/i386/i386-isa.def @@ -110,3 +110,4 @@ DEF_PTA(WIDEKL) DEF_PTA(AVXVNNI) DEF_PTA(AVX512FP16) DEF_PTA(AVXIFMA) +DEF_PTA(AVXVNNIINT8) diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 5facb64c2a8..3e6d04433a6 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -227,7 +227,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mwidekl", OPTION_MASK_ISA2_WIDEKL }, { "-mavxvnni", OPTION_MASK_ISA2_AVXVNNI }, { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 }, - { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA } + { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA }, + { "-mavxvnniint8", OPTION_MASK_ISA2_AVXVNNIINT8 } }; static struct ix86_target_opts isa_opts[] = { @@ -1074,6 +1075,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("avxvnni", OPT_mavxvnni), IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16), IX86_ATTR_ISA ("avxifma", OPT_mavxifma), + IX86_ATTR_ISA ("avxvnniint8", OPT_mavxvnniint8), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 36e28b7063d..53d534f6392 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1219,3 +1219,8 @@ mavxifma Target Mask(ISA2_AVXIFMA) Var(ix86_isa_flags2) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and AVXIFMA built-in functions and code generation. + +mavxvnniint8 +Target Mask(ISA2_AVXVNNIINT8) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and +AVXVNNIINT8 built-in functions and code generation. diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index e9d4e975243..ddea249d09b 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -46,6 +46,8 @@ #include +#include + #include #include diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 331347569ea..49490a213ea 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -200,6 +200,13 @@ UNSPEC_COMPLEX_FCMUL UNSPEC_COMPLEX_MASK + ;; For AVX-VNNI-INT8 support + UNSPEC_VPDPBSSD + UNSPEC_VPDPBSSDS + UNSPEC_VPDPBSUD + UNSPEC_VPDPBSUDS + UNSPEC_VPDPBUUD + UNSPEC_VPDPBUUDS ]) (define_c_enum "unspecv" [ @@ -29241,3 +29248,27 @@ gcc_unreachable (); DONE; }) + +(define_int_iterator VPDOTPROD + [UNSPEC_VPDPBSSD + UNSPEC_VPDPBSSDS + UNSPEC_VPDPBSUD + UNSPEC_VPDPBSUDS + UNSPEC_VPDPBUUD + UNSPEC_VPDPBUUDS]) + +(define_int_attr vpdotprodtype + [(UNSPEC_VPDPBSSD "bssd") (UNSPEC_VPDPBSSDS "bssds") + (UNSPEC_VPDPBSUD "bsud") (UNSPEC_VPDPBSUDS "bsuds") + (UNSPEC_VPDPBUUD "buud") (UNSPEC_VPDPBUUDS "buuds")]) + +(define_insn "vpdp_" + [(set (match_operand:VI4_AVX 0 "register_operand" "=x") + (unspec:VI4_AVX + [(match_operand:VI4_AVX 1 "register_operand" "0") + (match_operand:VI4_AVX 2 "register_operand" "x") + (match_operand:VI4_AVX 3 "nonimmediate_operand" "xm")] + VPDOTPROD))] + "TARGET_AVXVNNIINT8" + "vpdp\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "prefix" "vex")]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index edecf5c0070..9a8de9fc226 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7065,6 +7065,11 @@ Enable/disable the generation of the AVXVNNI instructions. @cindex @code{target("avxifma")} function attribute, x86 Enable/disable the generation of the AVXIFMA instructions. +@item avxvnniint8 +@itemx no-avxvnniint8 +@cindex @code{target("avxvnniint8")} function attribute, x86 +Enable/disable the generation of the AVXVNNIINT8 instructions. + @item cld @itemx no-cld @cindex @code{target("cld")} function attribute, x86 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 886fc1d0164..d4ff7549bf3 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1436,7 +1436,7 @@ See RS/6000 and PowerPC Options. -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni@gol --mavx512fp16 -mavxifma @gol +-mavx512fp16 -mavxifma -mavxvnniint8 @gol -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mkl -mwidekl @gol @@ -32896,6 +32896,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @need 200 @itemx -mavxifma @opindex mavxifma +@need 200 +@itemx -mavxvnniint8 +@opindex mavxvnniint8 These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA, @@ -32906,8 +32909,8 @@ XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2, GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16, ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE, UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16, -AVXIFMA or CLDEMOTE extended instruction sets. Each has a corresponding -@option{-mno-} option to disable use of these instructions. +AVXIFMA, AVXVNNIINT8 or CLDEMOTE extended instruction sets. Each has a +corresponding @option{-mno-} option to disable use of these instructions. These extensions are also available as built-in functions: see @ref{x86 Built-in Functions}, for details of the functions enabled and diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 0173acf4a65..e21a1d381e0 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2493,6 +2493,9 @@ Target supports the execution of @code{avx512vp2intersect} instructions. @item avxifma Target supports the execution of @code{avxifma} instructions. +@item avxvnniint8 +Target supports the execution of @code{avxvnniint8} instructions. + @item amx_tile Target supports the execution of @code{amx-tile} instructions. diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index 5388606779b..ebd01fe47bc 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index 86cedd3d32f..b66498f1d4c 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/gcc.target/i386/avx-check.h b/gcc/testsuite/gcc.target/i386/avx-check.h index 24ee6ab4efd..77507ca2edc 100644 --- a/gcc/testsuite/gcc.target/i386/avx-check.h +++ b/gcc/testsuite/gcc.target/i386/avx-check.h @@ -25,6 +25,9 @@ main () && avx_os_support () #ifdef AVXIFMA && __builtin_cpu_supports ("avxifma") +#endif +#ifdef AVXVNNIINT8 + && __builtin_cpu_supports ("avxvnniint8") #endif ) { diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-1.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-1.c new file mode 100644 index 00000000000..d6942f34d6e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-1.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-mavxvnniint8 -O2" } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ + + +#include + +volatile __m256i x,y,z; +volatile __m128i x_,y_,z_; +volatile __mmask8 m; + +void extern +avxvnniint8_test (void) +{ + x = _mm256_dpbssd_epi32 (x, y, z); + x_ = _mm_dpbssd_epi32 (x_, y_, z_); + + x = _mm256_dpbssds_epi32 (x, y, z); + x_ = _mm_dpbssds_epi32 (x_, y_, z_); + + x = _mm256_dpbsud_epi32 (x, y, z); + x_ = _mm_dpbsud_epi32 (x_, y_, z_); + + x = _mm256_dpbsuds_epi32 (x, y, z); + x_ = _mm_dpbsuds_epi32 (x_, y_, z_); + + x = _mm256_dpbuud_epi32 (x, y, z); + x_ = _mm_dpbuud_epi32 (x_, y_, z_); + + x = _mm256_dpbuuds_epi32 (x, y, z); + x_ = _mm_dpbuuds_epi32 (x_, y_, z_); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssd-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssd-2.c new file mode 100644 index 00000000000..5016de39621 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssd-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (int *r, int *dst, char *s1, char *s2, int size) +{ + short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (short) s1[i] * (short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + union256i_d res_256; + union256i_b src2_256; + union256i_b src1_256; + int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbssd_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_d (res_256, res_ref_256)) + abort (); + + union128i_d res_128; + union128i_b src2_128; + union128i_b src1_128; + int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbssd_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_d (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssds-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssds-2.c new file mode 100644 index 00000000000..6de5062e917 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssds-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (int *r, int *dst, char *s1, char *s2, int size) +{ + short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (short) s1[i] * (short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test > 0x7FFFFFFF ? 0x7FFFFFFF : test; + } +} + +void +TEST (void) +{ + int i; + union256i_d res_256; + union256i_b src2_256; + union256i_b src1_256; + int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbssds_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_d (res_256, res_ref_256)) + abort (); + + union128i_d res_128; + union128i_b src2_128; + union128i_b src1_128; + int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbssds_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_d (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsud-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsud-2.c new file mode 100644 index 00000000000..6e4ffd1c7be --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsud-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (int *r, int *dst, char *s1, unsigned char *s2, int size) +{ + short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (short) s1[i] * (unsigned short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + union256i_d res_256; + union256i_b src1_256; + union256i_ub src2_256; + int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbsud_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_d (res_256, res_ref_256)) + abort (); + + union128i_d res_128; + union128i_b src1_128; + union128i_ub src2_128; + int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbsud_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_d (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsuds-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsuds-2.c new file mode 100644 index 00000000000..ad4b6047ecd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsuds-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (int *r, int *dst, char *s1, unsigned char *s2, int size) +{ + short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (short) s1[i] * (unsigned short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test > 0x7FFFFFFF ? 0x7FFFFFFF : test; + } +} + +void +TEST (void) +{ + int i; + union256i_d res_256; + union256i_b src1_256; + union256i_ub src2_256; + int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbsuds_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_d (res_256, res_ref_256)) + abort (); + + union128i_d res_128; + union128i_b src1_128; + union128i_ub src2_128; + int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbsuds_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_d (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuud-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuud-2.c new file mode 100644 index 00000000000..6590915a459 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuud-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (unsigned int *r, unsigned int *dst, unsigned char *s1, unsigned char *s2, int size) +{ + unsigned short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (unsigned short) s1[i] * (unsigned short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + unsigned int test = (unsigned int) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + union256i_ud res_256; + union256i_ub src2_256; + union256i_ub src1_256; + unsigned int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbuud_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_ud (res_256, res_ref_256)) + abort (); + + union128i_ud res_128; + union128i_ub src2_128; + union128i_ub src1_128; + unsigned int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbuud_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_ud (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuuds-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuuds-2.c new file mode 100644 index 00000000000..970e4a5d408 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuuds-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (unsigned int *r, unsigned int *dst, unsigned char *s1, unsigned char *s2, int size) +{ + unsigned short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (unsigned short) s1[i] * (unsigned short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + unsigned int test = (unsigned int) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test > 0xFFFFFFFF ? 0xFFFFFFFF : test; + } +} + +void +TEST (void) +{ + int i; + union256i_ud res_256; + union256i_ub src2_256; + union256i_ub src1_256; + unsigned int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbuuds_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_ud (res_256, res_ref_256)) + abort (); + + union128i_ud res_128; + union128i_ub src2_128; + union128i_ub src1_128; + unsigned int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbuuds_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_ud (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index 466555c0d06..a681bffe3e7 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -81,6 +81,7 @@ extern void test_widekl (void) __attribute__((__target__("widekl"))); extern void test_avxvnni (void) __attribute__((__target__("avxvnni"))); extern void test_avx512fp16 (void) __attribute__((__target__("avx512fp16"))); extern void test_avxifma (void) __attribute__((__target__("avxifma"))); +extern void test_avxvnniint8 (void) __attribute__((__target__("avxvnniint8"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx5124fmaps(void) __attribute__((__target__("no-avx5124fmaps"))); @@ -163,6 +164,7 @@ extern void test_no_widekl (void) __attribute__((__target__("no-widekl"))); extern void test_no_avxvnni (void) __attribute__((__target__("no-avxvnni"))); extern void test_no_avx512fp16 (void) __attribute__((__target__("no-avx512fp16"))); extern void test_no_avxifma (void) __attribute__((__target__("no-avxifma"))); +extern void test_no_avxvnniint8 (void) __attribute__((__target__("no-avxvnniint8"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c index fde56261d8f..ddde2df6657 100644 --- a/gcc/testsuite/gcc.target/i386/sse-12.c +++ b/gcc/testsuite/gcc.target/i386/sse-12.c @@ -3,7 +3,7 @@ popcntintrin.h gfniintrin.h and mm_malloc.h are usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ -/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma" } */ +/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma -mavxvnniint8" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index bb29555babe..2b293216c6f 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index f2701ddaaf9..78b51048b90 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 3d196975b1e..cc1c8cfa4be 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -103,7 +103,7 @@ #ifndef DIFFERENT_PRAGMAS -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") #endif /* Following intrinsics require immediate arguments. They @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1) /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */ #ifdef DIFFERENT_PRAGMAS -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") #endif #include test_1 (_cvtss_sh, unsigned short, float, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index d3a233f90fc..270f4483491 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -843,6 +843,6 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") #include diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 69de3b96bfc..64ccfc746bd 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -9518,6 +9518,18 @@ proc check_effective_target_avxifma { } { } "-O0 -mavxifma" ] } +# Return 1 if avxvnniint8 instructions can be compiled. +proc check_effective_target_avxvnniint8 { } { + return [check_no_compiler_messages avxvnniint8 object { + typedef int __v8si __attribute__ ((__vector_size__ (32))); + __v8si + _mm256_dpbssd_epi32 (__v8si __A, __v8si __B, __v8si __C) + { + return __builtin_ia32_vpdpbssd256 (__A, __B, __C); + } + } "-O0 -mavxvnniint8" ] +} + # Return 1 if sse instructions can be compiled. proc check_effective_target_sse { } { return [check_no_compiler_messages sse object { From patchwork Fri Oct 14 07:54:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 1689921 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=oNRUQSk4; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Mpdwb09hjz23k1 for ; Fri, 14 Oct 2022 18:55:26 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DA2B63857811 for ; Fri, 14 Oct 2022 07:55:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DA2B63857811 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734124; bh=gAzjroxN34knf6B48N8kXkEWzVqdVeepNchrpyHY5rA=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=oNRUQSk4pybYBB/arSUvfsNEWO1zkvoa3EAtZH7nQYFnMj5o5MF4OTvNzk+nZEUGP v4MAEvpak0lsKhkUw5ROLB2e9LAuRK0yAhNnpwijRWikfuEnSFHz7p+y42aQg4e/wQ HQE8ghFAbNruc5aOMHlrMi05sfOvq/7oKoVCg/KU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 8BC053857C58 for ; Fri, 14 Oct 2022 07:54:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8BC053857C58 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597868" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597868" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:54:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488391" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488391" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:48 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id E6FED1009C8E; Fri, 14 Oct 2022 15:54:47 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 3/6] i386: Add intrinsic for vector __bf16 Date: Fri, 14 Oct 2022 15:54:42 +0800 Message-Id: <20221014075445.7938-4-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: konglin1 gcc/ChangeLog: * config/i386/avx512fp16intrin.h : New intrinsic. (_mm_load_sbf16): Ditto. (_mm_mask_load_sbf16): Ditto. (_mm_maskz_load_sbf16): Ditto. (_mm_mask_store_sbf16): Ditto. (_mm_mask_move_sbf16): Ditto. (_mm_maskz_move_sbf16): Ditto. * config/i386/avx512bf16intrin.h: New intrinsic. (_mm_setzero_pbf16): Ditto. (_mm256_setzero_pbf16): Ditto. (_mm512_setzero_pbf16): Ditto. (_mm512_undefined_pbf16): Ditto. (_mm512_set1_pbf16): Ditto. (_mm512_set_pbf16): Ditto. (_mm512_setr_pbf16): Ditto. (_mm_castpbf16_ps): Ditto. (_mm256_castpbf16_ps): Ditto. (_mm512_castpbf16_ps): Ditto. (_mm_castpbf16_pd): Ditto. (_mm256_castpbf16_pd): Ditto. (_mm512_castpbf16_pd): Ditto. (_mm_castpbf16_si128): Ditto. (_mm256_castpbf16_si256): Ditto. (_mm512_castpbf16_si512): Ditto. (_mm_castps_pbf16): Ditto. (_mm256_castps_pbf16): Ditto. (_mm512_castps_pbf16): Ditto. (_mm_castpd_pbf16): Ditto. (_mm256_castpd_pbf16): Ditto. (_mm512_castpd_pbf16): Ditto. (_mm_castsi128_pbf16): Ditto. (_mm256_castsi256_pbf16): Ditto. (_mm512_castsi512_pbf16): Ditto. (_mm256_castpbf16256_pbf16128): Ditto. (_mm512_castpbf16512_pbf16128): Ditto. (_mm512_castpbf16512_pbf16256): Ditto. (_mm256_castpbf16128_pbf16256): Ditto. (_mm512_castpbf16128_pbf16512): Ditto. (_mm512_castpbf16256_pbf16512): Ditto. (_mm256_zextpbf16128_pbf16256): Ditto. (_mm512_zextpbf16128_pbf16512): Ditto. (_mm512_zextpbf16256_pbf16512): Ditto. (_mm512_abs_pbf16): Ditto. (_mm512_load_pbf16): Ditto. (_mm256_load_pbf16): Ditto. (_mm_load_pbf16): Ditto. (_mm512_loadu_pbf16): Ditto. (_mm256_loadu_pbf16): Ditto. (_mm_loadu_pbf16): Ditto. (_mm_store_sbf16): Ditto. (_mm512_store_pbf16): Ditto. (_mm256_store_pbf16): Ditto. (_mm_store_pbf16): Ditto. (_mm512_storeu_pbf16): Ditto. (_mm256_storeu_pbf16): Ditto. (_mm_storeu_pbf16): Ditto. (_mm_move_sbf16): Ditto. (_mm512_mask_blend_pbf16): Ditto. (_mm512_permutex2var_pbf16): Ditto. (_mm512_permutexvar_pbf16): Ditto. (_mm512_bcstnebf16_ps): Ditto. (_mm512_mask_bcstnebf16_ps): Ditto. (_mm512_bcstnesh_ps): Ditto. (_mm512_mask_bcstnesh_ps): Ditto. (_mm512_maskz_bcstnesh_ps): Ditto. (_mm512_cvtne2ps_ph): Ditto. (_mm512_mask_cvtne2ps_ph): Ditto. (_mm512_cvtne_round2ps_ph): Ditto. (_mm512_mask_cvtne_round2ps_ph): Ditto. (_mm512_cvtneebf16_ps): Ditto. (_mm512_mask_cvtneebf16_ps): Ditto. (_mm512_maskz_cvtneebf16_ps): Ditto. (_mm512_cvtneeph_ps): Ditto. (_mm512_mask_cvtneeph_ps): Ditto. (_mm512_cvtneobf16_ps): Ditto. (_mm512_mask_cvtneobf16_ps): Ditto. (_mm512_maskz_cvtneobf16_ps): Ditto. (_mm512_cvtneoph_ps): Ditto. (_mm512_mask_cvtneoph_ps): Ditto. * config/i386/avx512bf16vlintrin.h (__attribute__): Ditto. (_mm_cvtsbf16_bf16): Ditto. (_mm256_cvtsbf16_bf16): Ditto. (_mm256_undefined_pbf16): Ditto. (_mm_undefined_pbf16): Ditto. (_mm_set_sbf16): Ditto. (_mm_set1_pbf16): Ditto. (_mm256_set1_pbf16): Ditto. (_mm_set_pbf16): Ditto. (_mm256_set_pbf16): Ditto. (_mm_setr_pbf16): Ditto. (_mm256_setr_pbf16): Ditto. (_mm256_abs_pbf16): Ditto. (_mm_abs_pbf16): Ditto. (_mm_mask_blend_pbf16): Ditto. (_mm256_mask_blend_pbf16): Ditto. (_mm_permutex2var_pbf16): Ditto. (_mm256_permutex2var_pbf16): Ditto. (_mm_permutexvar_pbf16): Ditto. (_mm256_permutexvar_pbf16): Ditto. (_mm_cvtneebf16_ps): Change bf16 mode. (_mm256_cvtneebf16_ps): Diito. (_mm_cvtneobf16_ps): Diito. (_mm256_cvtneobf16_ps): Diito. (_mm_mask_cvtneebf16_ps): Diito. (_mm_maskz_cvtneebf16_ps): Diito. (_mm256_mask_cvtneebf16_ps): Diito. (_mm256_maskz_cvtneebf16_ps): Diito. (_mm_mask_cvtneobf16_ps): Diito. (_mm_maskz_cvtneobf16_ps): Diito. (_mm256_mask_cvtneobf16_ps): Diito. (_mm256_maskz_cvtneobf16_ps): Diito. * config/i386/immintrin.h: Add SSE2 depend for avx512bf16. --- gcc/config/i386/avx512bf16intrin.h | 418 +++++++++++++++++++++++++++ gcc/config/i386/avx512bf16vlintrin.h | 177 ++++++++++++ gcc/config/i386/avx512fp16intrin.h | 70 +++++ gcc/config/i386/immintrin.h | 2 + 4 files changed, 667 insertions(+) diff --git a/gcc/config/i386/avx512bf16intrin.h b/gcc/config/i386/avx512bf16intrin.h index b6e9ddad157..d09a59c1509 100644 --- a/gcc/config/i386/avx512bf16intrin.h +++ b/gcc/config/i386/avx512bf16intrin.h @@ -51,6 +51,424 @@ _mm_cvtsbh_ss (__bfloat16 __A) return __tmp.a; } +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setzero_pbf16 (void) +{ + return (__m512bf16)(__v32bf) _mm512_setzero_ps (); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_undefined_pbf16 (void) +{ + __m512bf16 __Y = __Y; + return __Y; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_pbf16 (__bf16 __h) +{ + return (__m512bf16)(__v32bf) {__h, __h, __h, __h, __h, __h, __h, __h, + __h, __h, __h, __h, __h, __h, __h, __h, + __h, __h, __h, __h, __h, __h, __h, __h, + __h, __h, __h, __h, __h, __h, __h, __h}; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set_pbf16 (__bf16 __h1, __bf16 __h2, __bf16 __h3, __bf16 __h4, + __bf16 __h5, __bf16 __h6, __bf16 __h7, __bf16 __h8, + __bf16 __h9, __bf16 __h10, __bf16 __h11, __bf16 __h12, + __bf16 __h13, __bf16 __h14, __bf16 __h15, __bf16 __h16, + __bf16 __h17, __bf16 __h18, __bf16 __h19, __bf16 __h20, + __bf16 __h21, __bf16 __h22, __bf16 __h23, __bf16 __h24, + __bf16 __h25, __bf16 __h26, __bf16 __h27, __bf16 __h28, + __bf16 __h29, __bf16 __h30, __bf16 __h31, __bf16 __h32) +{ + return + (__m512bf16)(__v32bf) {__h32, __h31, __h30, __h29, __h28, __h27, __h26, + __h25, __h24, __h23, __h22, __h21, __h20, __h19, + __h18, __h17, __h16, __h15, __h14, __h13, __h12, + __h11, __h10, __h9, __h8, __h7, __h6, __h5, + __h4, __h3, __h2, __h1}; +} + +#define _mm512_setr_pbf16(h1, h2, h3, h4, h5, h6, h7, h8, h9, h10, h11, h12, \ + h13, h14, h15, h16, h17, h18, h19, h20, h21, h22, \ + h23, h24, h25, h26, h27, h28, h29, h30, h31, h32) \ + _mm512_set_pbf16 ((h32), (h31), (h30), (h29), (h28), (h27), (h26), (h25), \ + (h24), (h23), (h22), (h21), (h20), (h19), (h18), (h17), \ + (h16), (h15), (h14), (h13), (h12), (h11), (h10), (h9), \ + (h8), (h7), (h6), (h5), (h4), (h3), (h2), (h1)) + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpbf16_ps (__m128bf16 __a) +{ + return (__m128) __a; +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16_ps (__m256bf16 __a) +{ + return (__m256) __a; +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16_ps (__m512bf16 __a) +{ + return (__m512) __a; +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpbf16_pd (__m128bf16 __a) +{ + return (__m128d) __a; +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16_pd (__m256bf16 __a) +{ + return (__m256d) __a; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16_pd (__m512bf16 __a) +{ + return (__m512d) __a; +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpbf16_si128 (__m128bf16 __a) +{ + return (__m128i) __a; +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16_si256 (__m256bf16 __a) +{ + return (__m256i) __a; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16_si512 (__m512bf16 __a) +{ + return (__m512i) __a; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castps_pbf16 (__m128 __a) +{ + return (__m128bf16) __a; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castps_pbf16 (__m256 __a) +{ + return (__m256bf16) __a; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castps_pbf16 (__m512 __a) +{ + return (__m512bf16) __a; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpd_pbf16 (__m128d __a) +{ + return (__m128bf16) __a; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpd_pbf16 (__m256d __a) +{ + return (__m256bf16) __a; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpd_pbf16 (__m512d __a) +{ + return (__m512bf16) __a; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castsi128_pbf16 (__m128i __a) +{ + return (__m128bf16) __a; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castsi256_pbf16 (__m256i __a) +{ + return (__m256bf16) __a; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castsi512_pbf16 (__m512i __a) +{ + return (__m512bf16) __a; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16256_pbf16128 (__m256bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16512_pbf16128 (__m512bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16512_pbf16256 (__m512bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16128_pbf16256 (__m128bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, + -1, -1, -1, -1, -1, -1, -1, -1); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16128_pbf16512 (__m128bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16256_pbf16512 (__m256bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, + 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_zextpbf16128_pbf16256 (__m128bf16 __A) +{ + return (__m256bf16) _mm256_insertf128_ps (_mm256_setzero_ps (), + (__m128) __A, 0); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_zextpbf16128_pbf16512 (__m128bf16 __A) +{ + return (__m512bf16) _mm512_insertf32x4 (_mm512_setzero_ps (), + (__m128) __A, 0); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_zextpbf16256_pbf16512 (__m256bf16 __A) +{ + return (__m512bf16) _mm512_insertf64x4 (_mm512_setzero_pd (), + (__m256d) __A, 0); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_abs_pbf16 (__m512bf16 __A) +{ + return + (__m512bf16) _mm512_and_epi32 (_mm512_set1_epi32 (0x7FFF7FFF), + (__m512i) __A); +} + +// loads with vmovsh if avx512fp16 enable: +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_load_pbf16 (void const *__p) +{ + return *(const __m512bf16 *) __p; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_load_pbf16 (void const *__p) +{ + return *(const __m256bf16 *) __p; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_pbf16 (void const *__p) +{ + return *(const __m128bf16 *) __p; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_loadu_pbf16 (void const *__p) +{ + struct __loadu_pbf16 + { + __m512bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + return ((const struct __loadu_pbf16 *) __p)->__v; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_loadu_pbf16 (void const *__p) +{ + struct __loadu_pbf16 + { + __m256bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + return ((const struct __loadu_pbf16 *) __p)->__v; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadu_pbf16 (void const *__p) +{ + struct __loadu_pbf16 + { + __m128bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + return ((const struct __loadu_pbf16 *) __p)->__v; +} + +// stores with vmovsh if avx512fp16 enable: +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_sbf16 (void *__dp, __m128bf16 __a) +{ + struct __mm_store_sbf16_struct + { + __bf16 __u; + } __attribute__((__packed__, __may_alias__)); + ((struct __mm_store_sbf16_struct *) __dp)->__u = __a[0]; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_store_pbf16 (void *__P, __m512bf16 __A) +{ + *(__m512bf16 *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_store_pbf16 (void *__P, __m256bf16 __A) +{ + *(__m256bf16 *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_pbf16 (void *__P, __m128bf16 __A) +{ + *(__m128bf16 *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_storeu_pbf16 (void *__P, __m512bf16 __A) +{ + struct __storeu_pbf16 { + __m512bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + ((struct __storeu_pbf16 *) __P)->__v = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_storeu_pbf16 (void *__P, __m256bf16 __A) +{ + struct __storeu_pbf16 + { + __m256bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + ((struct __storeu_pbf16 *) __P)->__v = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storeu_pbf16 (void *__P, __m128bf16 __A) +{ + struct __storeu_pbf16 + { + __m128bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + ((struct __storeu_pbf16 *) __P)->__v = __A; +} + +// moves with vmovsh if enable avx512fp16: +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_move_sbf16 (__m128bf16 __a, __m128bf16 __b) +{ + __a[0] = __b[0]; + return __a; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_blend_pbf16 (__mmask32 __U, __m512bf16 __A, __m512bf16 __W) +{ + return (__m512bf16) __builtin_ia32_movdquhi512_mask ((__v32hi) __W, + (__v32hi) __A, + (__mmask32) __U); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex2var_pbf16 (__m512bf16 __A, __m512i __I, __m512bf16 __B) +{ + return (__m512bf16) __builtin_ia32_vpermi2varhi512_mask ((__v32hi) __A, + (__v32hi) __I, + (__v32hi) __B, + (__mmask32)-1); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutexvar_pbf16 (__m512i __A, __m512bf16 __B) +{ + return (__m512bf16) __builtin_ia32_permvarhi512_mask ((__v32hi) __B, + (__v32hi) __A, + (__v32hi) + (_mm512_setzero_si512 ()), + (__mmask32)-1); +} + /* vcvtne2ps2bf16 */ extern __inline __m512bh diff --git a/gcc/config/i386/avx512bf16vlintrin.h b/gcc/config/i386/avx512bf16vlintrin.h index 969335ff358..732623a94a2 100644 --- a/gcc/config/i386/avx512bf16vlintrin.h +++ b/gcc/config/i386/avx512bf16vlintrin.h @@ -44,6 +44,183 @@ typedef short __m256bh __attribute__ ((__vector_size__ (32), __may_alias__)); typedef short __m128bh __attribute__ ((__vector_size__ (16), __may_alias__)); typedef unsigned short __bfloat16; + +extern __inline __bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsbf16_bf16 (__m128bf16 __a) +{ + return __a[0]; +} + +extern __inline __bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtsbf16_bf16 (__m256bf16 __a) +{ + return __a[0]; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_undefined_pbf16 (void) +{ + __m256bf16 __Y = __Y; + return __Y; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_undefined_pbf16 (void) +{ + __m128bf16 __Y = __Y; + return __Y; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setzero_pbf16 (void) +{ + return (__m128bf16)(__v8bf) _mm_setzero_ps (); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setzero_pbf16 (void) +{ + return (__m256bf16)(__v16bf) _mm256_setzero_ps (); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_sbf16 (__bf16 bf) +{ + return (__v8bf) + __builtin_shufflevector ((__v8bf){bf, bf, bf, bf, bf, bf, bf, bf}, + (__v8bf) _mm_setzero_pbf16 (), 0, + 8, 8, 8, 8, 8, 8, 8); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_pbf16 (__bf16 bf) +{ + return (__m128bf16)(__v8bf) {bf, bf, bf, bf, bf, bf, bf, bf}; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_pbf16 (__bf16 bf) +{ + return (__m256bf16)(__v16bf) {bf, bf, bf, bf, bf, bf, bf, bf, + bf, bf, bf, bf, bf, bf, bf, bf}; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pbf16 (__bf16 bf1, __bf16 bf2, __bf16 bf3, __bf16 bf4, + __bf16 bf5, __bf16 bf6, __bf16 bf7, __bf16 bf8) +{ + return (__m128bf16)(__v8bf) {bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8}; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_pbf16 (__bf16 bf1, __bf16 bf2, __bf16 bf3, __bf16 bf4, + __bf16 bf5, __bf16 bf6, __bf16 bf7, __bf16 bf8, + __bf16 bf9, __bf16 bf10, __bf16 bf11, __bf16 bf12, + __bf16 bf13, __bf16 bf14, __bf16 bf15, __bf16 bf16) +{ + return (__m256bf16)(__v16bf) {bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8, + bf9, bf10, bf11, bf12, bf13, bf14, + bf15, bf16}; +} + +#define _mm_setr_pbf16(bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8) \ + _mm_set_pbf16 ((bf8), (bf7), (bf6), (bf5), (bf4), (bf3), (bf2), (bf1)) + +#define _mm256_setr_pbf16(bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8, bf9, bf10, \ + bf11, bf12, bf13, bf14, bf15, bf16) \ + _mm256_set_pbf16 ((bf16), (bf15), (bf14), (bf13), (bf12), (bf11), (bf10), \ + (bf9), (bf8), (bf7), (bf6), (bf5), (bf4), (bf3), (bf2), \ + (bf1)) + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_abs_pbf16 (__m256bf16 __A) +{ + return (__m256bf16) _mm256_and_si256 (_mm256_set1_epi32 (0x7FFF7FFF), + (__m256i)__A); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_abs_pbf16 (__m128bf16 __A) +{ + return (__m128bf16) _mm_and_si128 (_mm_set1_epi32 (0x7FFF7FFF), + (__m128i)__A); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_blend_pbf16 (__mmask8 __U, __m128bf16 __A, __m128bf16 __W) +{ + return (__m128bf16) + __builtin_ia32_movdquhi128_mask ((__v8hi) __W, + (__v8hi) __A, + (__mmask8) __U); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_blend_pbf16 (__mmask16 __U, __m256bf16 __A, __m256bf16 __W) +{ + return (__m256bf16) + __builtin_ia32_movdquhi256_mask ((__v16hi) __W, + (__v16hi) __A, + (__mmask16) __U); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutex2var_pbf16 (__m128bf16 __A, __m128i __I, __m128bf16 __B) +{ + return (__m128bf16) + __builtin_ia32_vpermi2varhi128_mask ((__v8hi) __A, + (__v8hi) __I, + (__v8hi) __B, + (__mmask8) -1); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutex2var_pbf16 (__m256bf16 __A, __m256i __I, __m256bf16 __B) +{ + return (__m256bf16) __builtin_ia32_vpermi2varhi256_mask ((__v16hi) __A, + (__v16hi) __I, + (__v16hi) __B, + (__mmask16)-1); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutexvar_pbf16 (__m128i __A, __m128bf16 __B) +{ + return (__m128bf16) __builtin_ia32_permvarhi128_mask ((__v8hi) __B, + (__v8hi) __A, + (__v8hi) + (_mm_setzero_si128 ()), + (__mmask8) -1); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutexvar_pbf16 (__m256i __A, __m256bf16 __B) +{ + return (__m256bf16) __builtin_ia32_permvarhi256_mask ((__v16hi) __B, + (__v16hi) __A, + (__v16hi) + (_mm256_setzero_si256 ()), + (__mmask16) -1); +} /* vcvtne2ps2bf16 */ extern __inline __m256bh diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 75f7475ad18..82b814abde2 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -53,6 +53,18 @@ typedef _Float16 __m256h_u __attribute__ ((__vector_size__ (32), \ typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64), \ __may_alias__, __aligned__ (1))); + +/* Internal data types for implementing the bf16 intrinsics. */ +typedef __bf16 __v32bf __attribute__((__vector_size__(64), __aligned__(64))); +typedef __bf16 __m512bf16 __attribute__((__vector_size__(64), __aligned__(64))); +typedef __bf16 __m512bf16_u __attribute__((__vector_size__(64), __aligned__(1))); +typedef __bf16 __v8bf __attribute__((__vector_size__(16), __aligned__(16))); +typedef __bf16 __m128bf16 __attribute__((__vector_size__(16), __aligned__(16))); +typedef __bf16 __m128bf16_u __attribute__((__vector_size__(16), __aligned__(1))); +typedef __bf16 __v16bf __attribute__((__vector_size__(32), __aligned__(32))); +typedef __bf16 __m256bf16 __attribute__((__vector_size__(32), __aligned__(32))); +typedef __bf16 __m256bf16_u __attribute__((__vector_size__(32), __aligned__(1))); + extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5, @@ -2771,6 +2783,44 @@ _mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C) __builtin_ia32_storesh_mask (__A, __C, __B); } +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_sbf16 (void const *__dp) +{ + return (__m128bf16) + __builtin_ia32_loadsh_mask ((_Float16 const*) __dp, + _mm_setzero_ph(), + (__mmask8) -1); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_load_sbf16 (__m128bf16 __A, __mmask8 __B, const void *__C) +{ + return (__m128bf16) + __builtin_ia32_loadsh_mask ((_Float16 const*) __C, + (__v8hf) __A, + (__mmask8) __B); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_load_sbf16 (__mmask8 __A, const void *__B) +{ + return (__m128bf16) + __builtin_ia32_loadsh_mask ((_Float16 const*) __B, + _mm_setzero_ph(), + (__mmask8) __A); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_store_sbf16 (const void *__A, __mmask8 __B, __m128bf16 __C) +{ + __builtin_ia32_storesh_mask ((_Float16 const*) __A, + (__v8hf) __C, (__mmask8) __B); +} + extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_move_sh (__m128h __A, __m128h __B) @@ -2793,6 +2843,26 @@ _mm_maskz_move_sh (__mmask8 __A, __m128h __B, __m128h __C) return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A); } +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_move_sbf16 (__m128bf16 __A, __mmask8 __B, + __m128bf16 __C, __m128bf16 __D) +{ + return (__m128bf16) + __builtin_ia32_vmovsh_mask ((__v8hf) __C, (__v8hf) __D, + (__v8hf) __A, (__mmask8) __B); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_move_sbf16 (__mmask8 __A, __m128bf16 __B, __m128bf16 __C) +{ + return (__m128bf16) + __builtin_ia32_vmovsh_mask ((__v8hf) __B, (__v8hf) __C, + _mm_setzero_ph(), + (__mmask8) __A); +} + /* Intrinsics vcvtph2dq. */ extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index ddea249d09b..c62d50f1951 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -118,9 +118,11 @@ #include +#ifdef __SSE2__ #include #include +#endif #include From patchwork Fri Oct 14 07:54:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 1689925 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Qdg3xzTy; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Mpdyq6tKgz23jn for ; Fri, 14 Oct 2022 18:57:23 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 00E653852764 for ; Fri, 14 Oct 2022 07:57:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 00E653852764 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734242; bh=D6kAHsgkX5iK00PjWFMXypzw5EeqSwDxpGW0mDrY66o=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Qdg3xzTydRWek9kMJ3gPlOHpX6UmeBRTGAZCISWWqYpHW5sSDePDdOpEEFAUOjngx Kmh0uPQB9CTclko3jJ1akKJijM7SQrmjO4joVs1zG9tPBtNIzGFS2P/ob+ruqVmNoR jH57pFVWRquu9dCUKSXL0xEE9fSzPdxxRkzE1x2o= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id E0233385741B for ; Fri, 14 Oct 2022 07:55:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E0233385741B X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597869" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597869" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:54:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488393" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488393" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:48 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id ED54E1009C8F; Fri, 14 Oct 2022 15:54:47 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/6] Support Intel AVX-NE-CONVERT Date: Fri, 14 Oct 2022 15:54:43 +0800 Message-Id: <20221014075445.7938-5-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Kong Lingling gcc/ChangeLog: * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVXNECONVERT_SET, OPTION_MASK_ISA2_AVXNECONVERT_UNSET): New. (ix86_handle_option): Handle -mavxneconvert, unset avxneconvert when avx2 is disabled. * common/config/i386/i386-cpuinfo.h (processor_types): Add FEATURE_AVXNECONVERT. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for avxneconvert. * common/config/i386/cpuinfo.h (get_available_features): Detect avxneconvert. * config.gcc: Add avxneconvertintrin.h * config/i386/avxneconvertintrin.h: New. * config/i386/cpuid.h (bit_AVXNECONVERT): New. * config/i386/i386-builtin-types.def: Add DEF_POINTER_TYPE (PCV8HF, V8HF, CONST), DEF_POINTER_TYPE (PCV16HF, V16HF, CONST), DEF_FUNCTION_TYPE (V4SF, PCSHORT), DEF_FUNCTION_TYPE (V8SF, PCSHORT), DEF_FUNCTION_TYPE (V4SF, PCV8BF), DEF_FUNCTION_TYPE (V4SF, PCV8BF), DEF_FUNCTION_TYPE (V8SF, PCV16HF), DEF_FUNCTION_TYPE (V8SF, PCV16BF). * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AVXNECONVERT__. * config/i386/i386-expand.cc (ix86_expand_special_args_builtin): Handle V4SF_FTYPE_PCSHORT,V8SF_FTYPE_PCSHORT,V4SF_FTYPE_PCV8BF, V4SF_FTYPE_PCV8HF,V8SF_FTYPE_PCV16BF,V8SF_FTYPE_PCV16HF. * config/i386/i386-isa.def : Add DEF_PTA(AVXNECONVERT) New. * config/i386/i386-options.cc (isa2_opts): Add -mavxneconvert. (ix86_valid_target_attribute_inner_p): Handle avxneconvert. * config/i386/i386.opt: Add option -mavxneconvert. * config/i386/immintrin.h: Inculde avxneconvertintrin.h. * config/i386/sse.md: (avx_vbcstne2ps_), (avx_vcvtne2ps_), (avx_vcvtne2ps_), (avx_vcvtneps2bf16_): New define_insn (avx512f_cvtneps2bf16_):Ditto. (avx512f_cvtneps2bf16__mask):Ditto. * doc/invoke.texi: Document -mavxneconvert. * doc/extend.texi: Document avxneconvert. * doc/sourcebuild.texi: Document target avxneconvert. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-check.h: Add avxneconvert check. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-12.c: Add -mavxneconvert. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * g++.dg/other/i386-2.C: Ditto. * g++.dg/other/i386-3.C: Ditto. * lib/target-supports.exp:add check_effective_target_avxneconvert. * gcc.target/i386/avx-ne-convert-1.c: New test. * gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c: Ditto. * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c: Rename.. * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1a.c: To this. * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c: New test. --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.cc | 21 ++- gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 2 + gcc/config.gcc | 2 +- gcc/config/i386/avxneconvertintrin.h | 140 ++++++++++++++++++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-builtin-types.def | 17 +++ gcc/config/i386/i386-builtin.def | 18 +++ gcc/config/i386/i386-c.cc | 2 + gcc/config/i386/i386-expand.cc | 8 + gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 4 +- gcc/config/i386/i386.opt | 5 + gcc/config/i386/immintrin.h | 4 + gcc/config/i386/sse.md | 100 ++++++++++++- gcc/doc/extend.texi | 5 + gcc/doc/invoke.texi | 9 +- gcc/doc/sourcebuild.texi | 3 + gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-check.h | 3 + .../gcc.target/i386/avx-ne-convert-1.c | 45 ++++++ .../i386/avx-ne-convert-vbcstnebf162ps-2.c | 54 +++++++ .../i386/avx-ne-convert-vbcstnesh2ps-2.c | 42 ++++++ .../i386/avx-ne-convert-vcvtneebf162ps-2.c | 73 +++++++++ .../i386/avx-ne-convert-vcvtneeph2ps-2.c | 66 +++++++++ .../i386/avx-ne-convert-vcvtneobf162ps-2.c | 75 ++++++++++ .../i386/avx-ne-convert-vcvtneoph2ps-2.c | 66 +++++++++ .../i386/avx-ne-convert-vcvtneps2bf16-2.c | 58 ++++++++ ...16-1.c => avx512bf16vl-vcvtneps2bf16-1a.c} | 0 .../i386/avx512bf16vl-vcvtneps2bf16-1b.c | 27 ++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/sse-12.c | 2 +- gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- gcc/testsuite/gcc.target/i386/sse-22.c | 4 +- gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 ++ 39 files changed, 868 insertions(+), 16 deletions(-) create mode 100644 gcc/config/i386/avxneconvertintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c rename gcc/testsuite/gcc.target/i386/{avx512bf16vl-vcvtneps2bf16-1.c => avx512bf16vl-vcvtneps2bf16-1a.c} (100%) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index bed88003f8e..e9fd586704d 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -797,6 +797,8 @@ get_available_features (struct __processor_model *cpu_model, set_feature (FEATURE_AVXIFMA); if (edx & bit_AVXVNNIINT8) set_feature (FEATURE_AVXVNNIINT8); + if (edx & bit_AVXNECONVERT) + set_feature (FEATURE_AVXNECONVERT); } if (avx512_usable) { diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 6a2a7e3d25a..f9c906f75cf 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -109,6 +109,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_AMX_INT8_SET OPTION_MASK_ISA2_AMX_INT8 #define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16 #define OPTION_MASK_ISA2_AVXVNNIINT8_SET OPTION_MASK_ISA2_AVXVNNIINT8 +#define OPTION_MASK_ISA2_AVXNECONVERT_SET OPTION_MASK_ISA2_AVXNECONVERT /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -215,7 +216,8 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_AVX2 | OPTION_MASK_ISA_AVX512F_UNSET) #define OPTION_MASK_ISA2_AVX2_UNSET \ (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \ - | OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET) + | OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVXNECONVERT_UNSET \ + | OPTION_MASK_ISA2_AVX512F_UNSET) #define OPTION_MASK_ISA_AVX512F_UNSET \ (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \ | OPTION_MASK_ISA_AVX512PF_UNSET | OPTION_MASK_ISA_AVX512ER_UNSET \ @@ -280,6 +282,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA2_KL | OPTION_MASK_ISA2_WIDEKL_UNSET) #define OPTION_MASK_ISA2_WIDEKL_UNSET OPTION_MASK_ISA2_WIDEKL #define OPTION_MASK_ISA2_AVXVNNIINT8_UNSET OPTION_MASK_ISA2_AVXVNNIINT8 +#define OPTION_MASK_ISA2_AVXNECONVERT_UNSET OPTION_MASK_ISA2_AVXNECONVERT /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -1162,6 +1165,22 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mavxneconvert: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVXNECONVERT_SET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVXNECONVERT_SET; + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET; + } + else + { + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVXNECONVERT_UNSET; + opts->x_ix86_isa_flags2_explicit + |= OPTION_MASK_ISA2_AVXNECONVERT_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 9a6b92fab79..2d3fbfc817a 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -242,6 +242,7 @@ enum processor_features FEATURE_X86_64_V4, FEATURE_AVXIFMA, FEATURE_AVXVNNIINT8, + FEATURE_AVXNECONVERT, CPU_FEATURE_MAX }; diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h index 8c1f351056c..bceaee589ee 100644 --- a/gcc/common/config/i386/i386-isas.h +++ b/gcc/common/config/i386/i386-isas.h @@ -178,4 +178,6 @@ ISA_NAMES_TABLE_START ISA_NAMES_TABLE_ENTRY("avxifma", FEATURE_AVXIFMA, P_NONE, "-mavxifma") ISA_NAMES_TABLE_ENTRY("avxvnniint8", FEATURE_AVXVNNIINT8, P_NONE, "-mavxvnniint8") + ISA_NAMES_TABLE_ENTRY("avxneconvert", FEATURE_AVXNECONVERT, + P_NONE, "-mavxneconvert") ISA_NAMES_TABLE_END diff --git a/gcc/config.gcc b/gcc/config.gcc index 4df78238910..840b62aee61 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -422,7 +422,7 @@ i[34567]86-*-* | x86_64-*-*) amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h - avxifmaintrin.h avxvnniint8intrin.h" + avxifmaintrin.h avxvnniint8intrin.h avxneconvertintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avxneconvertintrin.h b/gcc/config/i386/avxneconvertintrin.h new file mode 100644 index 00000000000..30199384725 --- /dev/null +++ b/gcc/config/i386/avxneconvertintrin.h @@ -0,0 +1,140 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVXNECONVERTINTRIN_H_INCLUDED +#define _AVXNECONVERTINTRIN_H_INCLUDED + +#ifndef __AVXNECONVERT__ +#pragma GCC push_options +#pragma GCC target ("avxneconvert") +#define __DISABLE_AVXNECONVERT__ +#endif /* __AVXNECONVERT__ */ + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_bcstnebf16_ps (const void *__P) +{ + return (__m128) __builtin_ia32_vbcstnebf162ps128 ((const short *) __P); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_bcstnebf16_ps (const void *__P) +{ + return (__m256) __builtin_ia32_vbcstnebf162ps256 ((const short *) __P); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_bcstnesh_ps (const void *__P) +{ + return (__m128) __builtin_ia32_vbcstnesh2ps128 ((const short *) __P); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_bcstnesh_ps (const void *__P) +{ + return (__m256) __builtin_ia32_vbcstnesh2ps256 ((const short *) __P); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneebf16_ps (const __m128bf16 *__A) +{ + return (__m128) __builtin_ia32_vcvtneebf162ps128 ((const __v8bf *) __A); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneebf16_ps (const __m256bf16 *__A) +{ + return (__m256) __builtin_ia32_vcvtneebf162ps256 ((const __v16bf *) __A); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneeph_ps (const __m128h *__A) +{ + return (__m128) __builtin_ia32_vcvtneeph2ps128 ((const __v8hf *) __A); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneeph_ps (const __m256h *__A) +{ + return (__m256) __builtin_ia32_vcvtneeph2ps256 ((const __v16hf *) __A); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneobf16_ps (const __m128bf16 *__A) +{ + return (__m128) __builtin_ia32_vcvtneobf162ps128 ((const __v8bf *) __A); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneobf16_ps (const __m256bf16 *__A) +{ + return (__m256) __builtin_ia32_vcvtneobf162ps256 ((const __v16bf *) __A); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneoph_ps (const __m128h *__A) +{ + return (__m128) __builtin_ia32_vcvtneoph2ps128 ((const __v8hf *) __A); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneoph_ps (const __m256h *__A) +{ + return (__m256) __builtin_ia32_vcvtneoph2ps256 ((const __v16hf *) __A); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneps_avx_pbh (__m128 __A) +{ + return (__m128bf16) __builtin_ia32_vcvtneps2bf16128 (__A); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneps_avx_pbh (__m256 __A) +{ + return (__m128bf16) __builtin_ia32_vcvtneps2bf16256 (__A); +} + +#ifdef __DISABLE_AVXNECONVERT__ +#undef __DISABLE_AVXNECONVERT__ +#pragma GCC pop_options +#endif /* __DISABLE_AVXNECONVERT__ */ + +#endif /* _AVXNECONVERTINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index f5fad22149a..18bbc0cb7be 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -50,6 +50,7 @@ /* %edx */ #define bit_AVXVNNIINT8 (1 << 4) +#define bit_AVXNECONVERT (1 << 5) #define bit_CMPXCHG8B (1 << 8) #define bit_CMOV (1 << 15) #define bit_MMX (1 << 23) diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 63a360b0f8b..ebf6e5b4ad8 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -87,6 +87,7 @@ DEF_VECTOR_TYPE (V8QI, QI) DEF_VECTOR_TYPE (V2DF, DOUBLE) DEF_VECTOR_TYPE (V4SF, FLOAT) DEF_VECTOR_TYPE (V8HF, FLOAT16) +DEF_VECTOR_TYPE (V8BF, BFLOAT16) DEF_VECTOR_TYPE (V2DI, DI) DEF_VECTOR_TYPE (V4SI, SI) DEF_VECTOR_TYPE (V8HI, HI) @@ -100,6 +101,7 @@ DEF_VECTOR_TYPE (V16UQI, UQI, V16QI) DEF_VECTOR_TYPE (V4DF, DOUBLE) DEF_VECTOR_TYPE (V8SF, FLOAT) DEF_VECTOR_TYPE (V16HF, FLOAT16) +DEF_VECTOR_TYPE (V16BF, BFLOAT16) DEF_VECTOR_TYPE (V4DI, DI) DEF_VECTOR_TYPE (V8SI, SI) DEF_VECTOR_TYPE (V16HI, HI) @@ -111,6 +113,7 @@ DEF_VECTOR_TYPE (V16UHI, UHI, V16HI) # AVX512F vectors DEF_VECTOR_TYPE (V32SF, FLOAT) DEF_VECTOR_TYPE (V32HF, FLOAT16) +DEF_VECTOR_TYPE (V32BF, BFLOAT16) DEF_VECTOR_TYPE (V16SF, FLOAT) DEF_VECTOR_TYPE (V8DF, DOUBLE) DEF_VECTOR_TYPE (V8DI, DI) @@ -179,6 +182,10 @@ DEF_POINTER_TYPE (PCV4DF, V4DF, CONST) DEF_POINTER_TYPE (PCV4SF, V4SF, CONST) DEF_POINTER_TYPE (PCV8DF, V8DF, CONST) DEF_POINTER_TYPE (PCV8SF, V8SF, CONST) +DEF_POINTER_TYPE (PCV8HF, V8HF, CONST) +DEF_POINTER_TYPE (PCV8BF, V8BF, CONST) +DEF_POINTER_TYPE (PCV16HF, V16HF, CONST) +DEF_POINTER_TYPE (PCV16BF, V16BF, CONST) DEF_POINTER_TYPE (PCV16SF, V16SF, CONST) DEF_POINTER_TYPE (PCV2DI, V2DI, CONST) @@ -254,12 +261,14 @@ DEF_FUNCTION_TYPE (V4DF, V4SI) DEF_FUNCTION_TYPE (V8DF, V8DF) DEF_FUNCTION_TYPE (V4HI, V4HI) DEF_FUNCTION_TYPE (V4SF, PCFLOAT) +DEF_FUNCTION_TYPE (V4SF, PCSHORT) DEF_FUNCTION_TYPE (V4SF, V2DF) DEF_FUNCTION_TYPE (V4SF, V2DF, V4SF, UQI) DEF_FUNCTION_TYPE (V4SF, V4DF) DEF_FUNCTION_TYPE (V4SF, V4DF, V4SF, UQI) DEF_FUNCTION_TYPE (V4SF, V4SF) DEF_FUNCTION_TYPE (V4SF, PCV4SF) +DEF_FUNCTION_TYPE (V4SF, PCV8HF) DEF_FUNCTION_TYPE (V4SF, V4SI) DEF_FUNCTION_TYPE (V4SF, V8SF) DEF_FUNCTION_TYPE (V4SF, V8HI) @@ -275,8 +284,10 @@ DEF_FUNCTION_TYPE (V8HI, V16QI) DEF_FUNCTION_TYPE (V8HI, V8HI) DEF_FUNCTION_TYPE (V8QI, V8QI) DEF_FUNCTION_TYPE (V8SF, PCFLOAT) +DEF_FUNCTION_TYPE (V8SF, PCSHORT) DEF_FUNCTION_TYPE (V8SF, PCV4SF) DEF_FUNCTION_TYPE (V8SF, PCV8SF) +DEF_FUNCTION_TYPE (V8SF, PCV16HF) DEF_FUNCTION_TYPE (V8SF, V4SF) DEF_FUNCTION_TYPE (V8SF, V8SF) DEF_FUNCTION_TYPE (V8SF, V8SI) @@ -1389,3 +1400,9 @@ DEF_FUNCTION_TYPE (V32HF, V32HF) DEF_FUNCTION_TYPE_ALIAS (V8HF_FTYPE_V8HF, ROUND) DEF_FUNCTION_TYPE_ALIAS (V16HF_FTYPE_V16HF, ROUND) DEF_FUNCTION_TYPE_ALIAS (V32HF_FTYPE_V32HF, ROUND) + +# AVXNECONVERT builtins +DEF_FUNCTION_TYPE (V8BF, V8SF) +DEF_FUNCTION_TYPE (V8BF, V4SF) +DEF_FUNCTION_TYPE (V4SF, PCV8BF) +DEF_FUNCTION_TYPE (V8SF, PCV16BF) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index e6edae5728b..a429577180c 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -274,6 +274,20 @@ BDESC (OPTION_MASK_ISA_RTM, 0, CODE_FOR_xbegin, "__builtin_ia32_xbegin", IX86_BU BDESC (OPTION_MASK_ISA_RTM, 0, CODE_FOR_xend, "__builtin_ia32_xend", IX86_BUILTIN_XEND, UNKNOWN, (int) VOID_FTYPE_VOID) BDESC (OPTION_MASK_ISA_RTM, 0, CODE_FOR_xtest, "__builtin_ia32_xtest", IX86_BUILTIN_XTEST, UNKNOWN, (int) INT_FTYPE_VOID) +/* AVX-NE-CONVERT */ +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vbcstnebf162ps_v4sf, "__builtin_ia32_vbcstnebf162ps128", IX86_BUILTIN_VBCSTNEBF162PS128, UNKNOWN, (int) V4SF_FTYPE_PCSHORT) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vbcstnebf162ps_v8sf, "__builtin_ia32_vbcstnebf162ps256", IX86_BUILTIN_VBCSTNEBF162PS256, UNKNOWN, (int) V8SF_FTYPE_PCSHORT) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vbcstnesh2ps_v4sf, "__builtin_ia32_vbcstnesh2ps128", IX86_BUILTIN_VBCSTNESH2PS128, UNKNOWN, (int) V4SF_FTYPE_PCSHORT) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vbcstnesh2ps_v8sf, "__builtin_ia32_vbcstnesh2ps256", IX86_BUILTIN_VBCSTNESH2PS256, UNKNOWN, (int) V8SF_FTYPE_PCSHORT) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneebf162ps_v4sf, "__builtin_ia32_vcvtneebf162ps128", IX86_BUILTIN_VCVTNEEBF162PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8BF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneebf162ps_v8sf, "__builtin_ia32_vcvtneebf162ps256", IX86_BUILTIN_VCVTNEEBF162PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16BF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneeph2ps_v4sf, "__builtin_ia32_vcvtneeph2ps128", IX86_BUILTIN_VCVTNEEPH2PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8HF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneeph2ps_v8sf, "__builtin_ia32_vcvtneeph2ps256", IX86_BUILTIN_VCVTNEEPH2PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16HF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneobf162ps_v4sf, "__builtin_ia32_vcvtneobf162ps128", IX86_BUILTIN_VCVTNEOBF162PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8BF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneobf162ps_v8sf, "__builtin_ia32_vcvtneobf162ps256", IX86_BUILTIN_VCVTNEOBF162PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16BF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneoph2ps_v4sf, "__builtin_ia32_vcvtneoph2ps128", IX86_BUILTIN_VCVTNEOPH2PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8HF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneoph2ps_v8sf, "__builtin_ia32_vcvtneoph2ps256", IX86_BUILTIN_VCVTNEOPH2PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16HF) + /* AVX512BW */ BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI) @@ -2809,6 +2823,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__builti BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPHI16PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPHI16PS_V4SF_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) +/* AVX-NE-CONVERT */ +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_avx_vcvtneps2bf16_v4sf, "__builtin_ia32_vcvtneps2bf16128", IX86_BUILTIN_VCVTNEPS2BF16128, UNKNOWN, (int) V8BF_FTYPE_V4SF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_avx_vcvtneps2bf16_v8sf, "__builtin_ia32_vcvtneps2bf16256", IX86_BUILTIN_VCVTNEPS2BF16256, UNKNOWN, (int) V8BF_FTYPE_V8SF) + /* AVX512FP16. */ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_addph128_mask", IX86_BUILTIN_ADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv16hf3_mask, "__builtin_ia32_addph256_mask", IX86_BUILTIN_ADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index a9a35c0a18a..48934df664c 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -637,6 +637,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__AVXIFMA__"); if (isa_flag2 & OPTION_MASK_ISA2_AVXVNNIINT8) def_or_undef (parse_in, "__AVXVNNIINT8__"); + if (isa_flag2 & OPTION_MASK_ISA2_AVXNECONVERT) + def_or_undef (parse_in, "__AVXNECONVERT__"); if (TARGET_IAMCU) { def_or_undef (parse_in, "__iamcu"); diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index a0f8a98986e..1e29fe584af 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -10427,7 +10427,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V4DI_FTYPE_V4DI: case V16HI_FTYPE_V16SF: case V8HI_FTYPE_V8SF: + case V8BF_FTYPE_V8SF: case V8HI_FTYPE_V4SF: + case V8BF_FTYPE_V4SF: nargs = 1; break; case V4SF_FTYPE_V4SF_VEC_MERGE: @@ -11860,6 +11862,12 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case V8SF_FTYPE_PCV4SF: case V8SF_FTYPE_PCFLOAT: case V4SF_FTYPE_PCFLOAT: + case V4SF_FTYPE_PCSHORT: + case V4SF_FTYPE_PCV8BF: + case V4SF_FTYPE_PCV8HF: + case V8SF_FTYPE_PCSHORT: + case V8SF_FTYPE_PCV16BF: + case V8SF_FTYPE_PCV16HF: case V4DF_FTYPE_PCV2DF: case V4DF_FTYPE_PCDOUBLE: case V2DF_FTYPE_PCDOUBLE: diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def index c95b917c6ce..4ea3f96f69f 100644 --- a/gcc/config/i386/i386-isa.def +++ b/gcc/config/i386/i386-isa.def @@ -111,3 +111,4 @@ DEF_PTA(AVXVNNI) DEF_PTA(AVX512FP16) DEF_PTA(AVXIFMA) DEF_PTA(AVXVNNIINT8) +DEF_PTA(AVXNECONVERT) diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 3e6d04433a6..e59e2d8aeaf 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -228,7 +228,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mavxvnni", OPTION_MASK_ISA2_AVXVNNI }, { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 }, { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA }, - { "-mavxvnniint8", OPTION_MASK_ISA2_AVXVNNIINT8 } + { "-mavxvnniint8", OPTION_MASK_ISA2_AVXVNNIINT8 }, + { "-mavxneconvert", OPTION_MASK_ISA2_AVXNECONVERT } }; static struct ix86_target_opts isa_opts[] = { @@ -1076,6 +1077,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16), IX86_ATTR_ISA ("avxifma", OPT_mavxifma), IX86_ATTR_ISA ("avxvnniint8", OPT_mavxvnniint8), + IX86_ATTR_ISA ("avxneconvert", OPT_mavxneconvert), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 53d534f6392..6e07b89ac4c 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1224,3 +1224,8 @@ mavxvnniint8 Target Mask(ISA2_AVXVNNIINT8) Var(ix86_isa_flags2) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVXVNNIINT8 built-in functions and code generation. + +mavxneconvert +Target Mask(ISA2_AVXNECONVERT) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and +AVXNECONVERT build-in functions and code generation. diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index c62d50f1951..d7433f639c8 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -124,6 +124,10 @@ #include #endif +#ifdef __AVX2__ +#include +#endif + #include #include diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 49490a213ea..bef4447de62 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -171,6 +171,14 @@ UNSPEC_VPMADDWDACCD UNSPEC_VPMADDWDACCSSD + ;; For AVXNECONVERT support + UNSPEC_VCVTNEBF16SF + UNSPEC_VCVTNESHSF + UNSPEC_VCVTNEEBF16SF + UNSPEC_VCVTNEEPHSF + UNSPEC_VCVTNEOBF16SF + UNSPEC_VCVTNEOPHSF + ;; For VAES support UNSPEC_VAESDEC UNSPEC_VAESDECLAST @@ -28930,9 +28938,69 @@ ;; Converting from SF to BF (define_mode_attr sf_cvt_bf16 [(V4SF "V8HI") (V8SF "V8HI") (V16SF "V16HI")]) +(define_mode_attr sf_cvt_bfloat16 + [(V4SF "V8BF") (V8SF "V8BF")]) ;; Mapping from BF to SF (define_mode_attr sf_bf16 [(V4SF "V8HI") (V8SF "V16HI") (V16SF "V32HI")]) +(define_mode_attr sf_bfloat16 + [(V4SF "V8BF") (V8SF "V16BF") (V16SF "V32BF")]) +;; Mapping from PH to SF +(define_mode_attr ph_cvt_sf + [(V4SF "V8HF") (V8SF "V16HF")]) + +(define_int_iterator VBCSTNE + [UNSPEC_VCVTNEBF16SF + UNSPEC_VCVTNESHSF]) + +(define_int_attr vbcstnetype + [(UNSPEC_VCVTNEBF16SF "bf16") (UNSPEC_VCVTNESHSF "sh")]) + +(define_insn "vbcstne2ps_" + [(set (match_operand:VF1_128_256 0 "register_operand" "=x") + (vec_duplicate:VF1_128_256 + (unspec:SF + [(match_operand:HI 1 "memory_operand" "m")] + VBCSTNE)))] + "TARGET_AVXNECONVERT" + "vbcstne2ps\t{%1, %0|%0, %1}" + [(set_attr "prefix" "vex") + (set_attr "mode" "")]) + +(define_int_iterator VCVTNEBF16 + [UNSPEC_VCVTNEEBF16SF + UNSPEC_VCVTNEOBF16SF]) + +(define_int_attr vcvtnebf16type + [(UNSPEC_VCVTNEEBF16SF "ebf16") + (UNSPEC_VCVTNEOBF16SF "obf16")]) +(define_insn "vcvtne2ps_" + [(set (match_operand:VF1_128_256 0 "register_operand" "=x") + (unspec:VF1_128_256 + [(match_operand: 1 "memory_operand" "m")] + VCVTNEBF16))] + "TARGET_AVXNECONVERT" + "vcvtne2ps\t{%1, %0|%0, %1}" + [(set_attr "prefix" "vex") + (set_attr "mode" "")]) + +(define_int_iterator VCVTNEPH + [UNSPEC_VCVTNEEPHSF + UNSPEC_VCVTNEOPHSF]) + +(define_int_attr vcvtnephtype + [(UNSPEC_VCVTNEEPHSF "eph") + (UNSPEC_VCVTNEOPHSF "oph")]) + +(define_insn "vcvtne2ps_" + [(set (match_operand:VF1_128_256 0 "register_operand" "=x") + (unspec:VF1_128_256 + [(match_operand: 1 "memory_operand" "m")] + VCVTNEPH))] + "TARGET_AVXNECONVERT" + "vcvtne2ps\t{%1, %0|%0, %1}" + [(set_attr "prefix" "vex") + (set_attr "mode" "")]) (define_expand "avx512f_cvtne2ps2bf16__maskz" [(match_operand:BF16 0 "register_operand") @@ -28966,13 +29034,41 @@ DONE; }) -(define_insn "avx512f_cvtneps2bf16_" +(define_insn "avx_vcvtneps2bf16_" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VF1_128_256 1 "register_operand" "v")] + UNSPEC_VCVTNEPS2BF16))] + "TARGET_AVXNECONVERT" + "%{vex%} vcvtneps2bf16\t{%1, %0|%0, %1}" + [(set_attr "prefix" "vex")]) + +(define_insn "avx512f_cvtneps2bf16_" [(set (match_operand: 0 "register_operand" "=v") (unspec: [(match_operand:VF1_AVX512VL 1 "register_operand" "v")] UNSPEC_VCVTNEPS2BF16))] "TARGET_AVX512BF16" - "vcvtneps2bf16\t{%1, %0|%0, %1}") + { + if ( <=32 + && TARGET_AVXNECONVERT + && !EXT_REX_SSE_REG_P (operands[0]) + && !EXT_REX_SSE_REG_P (operands[1])) + return "%{vex%} vcvtneps2bf16\t{%1, %0|%0, %1}"; + else + return "vcvtneps2bf16\t{%1, %0|%0, %1}"; + }) + +(define_insn "avx512f_cvtneps2bf16__mask" + [(set (match_operand: 0 "register_operand" "=v") + (vec_merge: + (unspec: + [(match_operand:VF1_AVX512VL 1 "register_operand" "v")] + UNSPEC_VCVTNEPS2BF16) + (match_operand: 2 "nonimm_or_0_operand" "0C") + (match_operand: 3 "register_operand" "Yk")))] + "TARGET_AVX512BF16" + "vcvtneps2bf16\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}") (define_expand "avx512f_dpbf16ps__maskz" [(match_operand:VF1_AVX512VL 0 "register_operand") diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 9a8de9fc226..0a4396f92bb 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7070,6 +7070,11 @@ Enable/disable the generation of the AVXIFMA instructions. @cindex @code{target("avxvnniint8")} function attribute, x86 Enable/disable the generation of the AVXVNNIINT8 instructions. +@item avxneconvert +@itemx no-avxneconvert +@cindex @code{target("avxneconvert")} function attribute, x86 +Enable/disable the generation of the AVXNECONVERT instructions. + @item cld @itemx no-cld @cindex @code{target("cld")} function attribute, x86 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d4ff7549bf3..307fb7fa441 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1436,7 +1436,7 @@ See RS/6000 and PowerPC Options. -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni@gol --mavx512fp16 -mavxifma -mavxvnniint8 @gol +-mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert @gol -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mkl -mwidekl @gol @@ -32899,6 +32899,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @need 200 @itemx -mavxvnniint8 @opindex mavxvnniint8 +@need 200 +@itemx -mavxneconvert +@opindex mavxneconvert These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA, @@ -32909,8 +32912,8 @@ XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2, GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16, ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE, UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16, -AVXIFMA, AVXVNNIINT8 or CLDEMOTE extended instruction sets. Each has a -corresponding @option{-mno-} option to disable use of these instructions. +AVXIFMA, AVXVNNIINT8, AVXNECONVERT or CLDEMOTE extended instruction sets. Each +has a corresponding @option{-mno-} option to disable use of these instructions. These extensions are also available as built-in functions: see @ref{x86 Built-in Functions}, for details of the functions enabled and diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index e21a1d381e0..a12175b6498 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2493,6 +2493,9 @@ Target supports the execution of @code{avx512vp2intersect} instructions. @item avxifma Target supports the execution of @code{avxifma} instructions. +@item avxneconvert +Target supports the execution of @code{avxneconvert} instructions. + @item avxvnniint8 Target supports the execution of @code{avxvnniint8} instructions. diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index ebd01fe47bc..dd3e71f25ed 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index b66498f1d4c..cd7045cc4e4 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/gcc.target/i386/avx-check.h b/gcc/testsuite/gcc.target/i386/avx-check.h index 77507ca2edc..666eff50780 100644 --- a/gcc/testsuite/gcc.target/i386/avx-check.h +++ b/gcc/testsuite/gcc.target/i386/avx-check.h @@ -28,6 +28,9 @@ main () #endif #ifdef AVXVNNIINT8 && __builtin_cpu_supports ("avxvnniint8") +#endif +#ifdef AVXNECONVERT + && __builtin_cpu_supports ("avxneconvert") #endif ) { diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-1.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-1.c new file mode 100644 index 00000000000..b1848037e81 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-1.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-final { scan-assembler-times "vbcstnebf162ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vbcstnebf162ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vbcstnesh2ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vbcstnesh2ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneebf162ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneebf162ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneeph2ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneeph2ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneobf162ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneobf162ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneoph2ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneoph2ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +#include + +volatile __m128 x1; +volatile __m256 x2; +volatile __m128bf16 res1, res2; +const void *a; +__m128bf16 *b; +__m256bf16 *c; +__m128h *d; +__m256h *e; + +void extern +avx_ne_convert_test (void) +{ + x1 = _mm_bcstnebf16_ps (a); + x2 = _mm256_bcstnebf16_ps (a); + x1 = _mm_bcstnesh_ps (a); + x2 = _mm256_bcstnesh_ps (a); + x1 = _mm_cvtneebf16_ps (b); + x2 = _mm256_cvtneebf16_ps (c); + x1 = _mm_cvtneeph_ps (d); + x2 = _mm256_cvtneeph_ps (e); + x1 = _mm_cvtneobf16_ps (b); + x2 = _mm256_cvtneobf16_ps (c); + x1 = _mm_cvtneoph_ps (d); + x2 = _mm256_cvtneoph_ps (e); + res1 = _mm_cvtneps_avx_pbh (x1); + res2 = _mm256_cvtneps_avx_pbh (x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c new file mode 100644 index 00000000000..2707c58f7cd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +static uint16_t convert_fp32_to_bf16 (float fp) +{ + float_int_t fi; + fi.flt = fp; + return ((fi.int32 >> 16) & 0xffff); +} + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t var; + fp32 = (float) 3 * 2 + 5.5; + for (int i = 0; i < 4; i++) + { + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + var = convert_fp32_to_bf16 (fp32); + dst_128.x = _mm_bcstnebf16_ps (&var); + dst_256.x = _mm256_bcstnebf16_ps (&var); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c new file mode 100644 index 00000000000..0e6f38334b8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -mf16c -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t var; + fp32 = (float) 3 * 2 + 8.5; + for (int i = 0; i < 4; i++) + { + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + var = _cvtss_sh (fp32, 0); + dst_128.x = _mm_bcstnesh_ps (&var); + dst_256.x = _mm256_bcstnesh_ps (&var); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c new file mode 100644 index 00000000000..c80f3fdedec --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c @@ -0,0 +1,73 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128bf16 x; + uint32_t a[4]; +} union128bf16_i; + +typedef union +{ + __m256bf16 x; + uint32_t a[8]; +} union256bf16_i; + +static uint16_t convert_fp32_to_bf16 (float fp) +{ + float_int_t fi; + fi.flt = fp; + return ((fi.int32 >> 16) & 0xffff); +} + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t bf16; + union128bf16_i src_128bh; + union256bf16_i src_256bh; + + for (int i = 0; i < 4; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + bf16 = convert_fp32_to_bf16 (fp32); + src_128bh.a[i] = bf16; // store bf16 at the lower part of the dword + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + bf16 = convert_fp32_to_bf16 (fp32); + src_256bh.a[i] = bf16; // store bf16 at the lower part of the dword + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneebf16_ps (&src_128bh.x); + dst_256.x = _mm256_cvtneebf16_ps (&src_256bh.x); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c new file mode 100644 index 00000000000..a862894746d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c @@ -0,0 +1,66 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -mf16c -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128h x; + uint32_t a[4]; +} union128h; + +typedef union +{ + __m256h x; + uint32_t a[8]; +} union256h; + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t fp16; + union128h src_128h; + union256h src_256h; + + for (int i = 0; i < 4; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + fp16 = _cvtss_sh (fp32, 0); + src_128h.a[i] = fp16; + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + fp16 = _cvtss_sh (fp32, 0); + src_256h.a[i] = fp16; + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneeph_ps (&src_128h.x); + dst_256.x = _mm256_cvtneeph_ps (&src_256h.x); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c new file mode 100644 index 00000000000..d95aee067ae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c @@ -0,0 +1,75 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128bf16 x; + uint32_t a[4]; +} union128bf16_i; + +typedef union +{ + __m256bf16 x; + uint32_t a[8]; +} union256bf16_i; + +static uint16_t convert_fp32_to_bf16 (float fp) +{ + float_int_t fi; + fi.flt = fp; + return ((fi.int32 >> 16) & 0xffff); +} + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t bf16; + union128bf16_i src_128bh; + union256bf16_i src_256bh; + + for (int i = 0; i < 4; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + bf16 = convert_fp32_to_bf16 (fp32); + // store bf16 at the upper part of the dword + src_128bh.a[i] = (bf16 << 16) & 0xffff0000; + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + bf16 = convert_fp32_to_bf16 (fp32); + // store bf16 at the upper part of the dword + src_256bh.a[i] = (bf16 << 16) & 0xffff0000; + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneobf16_ps (&src_128bh.x); + dst_256.x = _mm256_cvtneobf16_ps (&src_256bh.x); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c new file mode 100644 index 00000000000..95eb5d74765 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c @@ -0,0 +1,66 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -mf16c -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128h x; + uint32_t a[4]; +} union128h; + +typedef union +{ + __m256h x; + uint32_t a[8]; +} union256h; + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t fp16; + union128h src_128h; + union256h src_256h; + + for (int i = 0; i < 4; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + fp16 = _cvtss_sh (fp32, 0); + src_128h.a[i] = fp16 << 16; + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + fp16 = _cvtss_sh (fp32, 0); + src_256h.a[i] = fp16 << 16; + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneoph_ps (&src_128h.x); + dst_256.x = _mm256_cvtneoph_ps (&src_256h.x); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c new file mode 100644 index 00000000000..0861521111a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c @@ -0,0 +1,58 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128bf16 x; + unsigned short a[8]; +} union128bf16; + +void TEST (void) +{ + union128 src_128; + union256 src_256; + union128bf16 dst_128, dst_256; + uint16_t res_ref_128[8] = {0}, res_ref_256[8]; + float_int_t fp32; + for (int i = 0; i < 4; i++) + { + fp32.flt = (float) 2 * i + 7 + i * 0.25; + src_128.a[i] = fp32.flt; + res_ref_128[i] = fp32.int32 >> 16; + dst_128.a[i] = 117; + } + + for (int i = 0; i < 8; i++) + { + fp32.flt = (float) 2 * i + 7 + i * 0.25; + src_256.a[i] = fp32.flt; + res_ref_256[i] = fp32.int32 >> 16; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneps_avx_pbh (src_128.x); + dst_256.x = _mm256_cvtneps_avx_pbh (src_256.x); + + if (checkVus (dst_128.a, res_ref_128, 8)) + abort(); + if (checkVus (dst_128.a, res_ref_128, 8)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c b/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1a.c similarity index 100% rename from gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c rename to gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1a.c diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c b/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c new file mode 100644 index 00000000000..8b5d6a644bc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bf16 -mavx512vl -mavxneconvert -O2" } */ +/* { dg-final { scan-assembler-times "\{vex\} vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128bh res1, res2; +volatile __m128 x1; +volatile __m256 x2; +volatile __mmask8 m8; + +void extern +avx512bf16_test (void) +{ + res2 = _mm256_cvtneps_pbh (x2); + res2 = _mm256_mask_cvtneps_pbh (res2, m8, x2); + res2 = _mm256_maskz_cvtneps_pbh (m8, x2); + + res1 = _mm_cvtneps_pbh (x1); + res1 = _mm_mask_cvtneps_pbh (res1, m8, x1); + res1 = _mm_maskz_cvtneps_pbh (m8, x1); +} diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index a681bffe3e7..b3d33df7c9c 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -82,6 +82,7 @@ extern void test_avxvnni (void) __attribute__((__target__("avxvnni"))); extern void test_avx512fp16 (void) __attribute__((__target__("avx512fp16"))); extern void test_avxifma (void) __attribute__((__target__("avxifma"))); extern void test_avxvnniint8 (void) __attribute__((__target__("avxvnniint8"))); +extern void test_avxneconvert (void) __attribute__((__target__("avxneconvert"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx5124fmaps(void) __attribute__((__target__("no-avx5124fmaps"))); @@ -165,6 +166,7 @@ extern void test_no_avxvnni (void) __attribute__((__target__("no-avxvnni"))); extern void test_no_avx512fp16 (void) __attribute__((__target__("no-avx512fp16"))); extern void test_no_avxifma (void) __attribute__((__target__("no-avxifma"))); extern void test_no_avxvnniint8 (void) __attribute__((__target__("no-avxvnniint8"))); +extern void test_no_avxneconvert (void) __attribute__((__target__("no-avxneconvert"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c index ddde2df6657..3eabc49a6ab 100644 --- a/gcc/testsuite/gcc.target/i386/sse-12.c +++ b/gcc/testsuite/gcc.target/i386/sse-12.c @@ -3,7 +3,7 @@ popcntintrin.h gfniintrin.h and mm_malloc.h are usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ -/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma -mavxvnniint8 -mavxneconvert" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 2b293216c6f..b9cdfb690d1 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 78b51048b90..b6ee3806dcc 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index cc1c8cfa4be..71ac0f3da19 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -103,7 +103,7 @@ #ifndef DIFFERENT_PRAGMAS -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert") #endif /* Following intrinsics require immediate arguments. They @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1) /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */ #ifdef DIFFERENT_PRAGMAS -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert") #endif #include test_1 (_cvtss_sh, unsigned short, float, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 270f4483491..898dde80c8f 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -843,6 +843,6 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert") #include diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 64ccfc746bd..9228e810c45 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -9530,6 +9530,18 @@ proc check_effective_target_avxvnniint8 { } { } "-O0 -mavxvnniint8" ] } +# Return 1 if avxneconvert instructions can be compiled. +proc check_effective_target_avxneconvert { } { + return [check_no_compiler_messages avxneconvert object { + typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); + __m128 + _mm_bcstnebf16_ps (const void *__P) + { + return (__m128) __builtin_ia32_vbcstnebf162ps128 ((const short *) __P); + } + } "-O0 -mavxneconvert" ] +} + # Return 1 if sse instructions can be compiled. proc check_effective_target_sse { } { return [check_no_compiler_messages sse object { From patchwork Fri Oct 14 07:54:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 1689924 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=WPR4GDf1; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Mpdyg72Qvz23jn for ; Fri, 14 Oct 2022 18:57:15 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F1A03384D191 for ; Fri, 14 Oct 2022 07:57:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F1A03384D191 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734234; bh=RKVzZCX3A6wG4RdRtKw3FHefe0ecxl+t6y+eP3ANtsY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=WPR4GDf18rMeAkNFKewFW6hpmQE69fCqgyhDCnAhgFs5MCY7sssuif7WJDhBPY2lF lfGVToiUOUYXXX/ClGGAvfnqSTrfeYJJ3rrvfoqyV1t6OOZtfZErxilu9zxRcw4s6p shKVDwZ6WrNnsBD4qiI9Mnzjzm6DzAiLL8kXLaWI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 97C83385414D for ; Fri, 14 Oct 2022 07:55:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 97C83385414D X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="285038211" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="285038211" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:55:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488475" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488475" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:50 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 01A5E1009C90; Fri, 14 Oct 2022 15:54:48 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 5/6] Support Intel CMPccXADD Date: Fri, 14 Oct 2022 15:54:44 +0800 Message-Id: <20221014075445.7938-6-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect cmpccxadd. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_CMPCCXADD_SET, OPTION_MASK_ISA2_CMPCCXADD_UNSET): New. (ix86_handle_option): Handle -mcmpccxadd, unset cmpccxadd when avx2 is disabled. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_CMPCCXADD. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for cmpccxadd. * config.gcc: Add cmpccxaddintrin.h. * config/i386/cpuid.h (bit_CMPCCXADD): New. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE(INT, PINT, INT, INT, INT) and DEF_FUNCTION_TYPE(LONGLONG, PLONGLONG, LONGLONG, LONGLONG, INT). * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __CMPCCXADD__. * config/i386/i386-expand.cc (ix86_expand_special_args_builtin): Add new parameter to indicate constant position. Handle INT_FTYPE_PINT_INT_INT_INT and LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT. * config/i386/i386-isa.def (CMPCCXADD): Add DEF_PTA(CMPCCXADD). * config/i386/i386-options.cc (isa2_opts): Add -mcmpccxadd. (ix86_valid_target_attribute_inner_p): Handle cmpccxadd. * config/i386/i386.opt: Add option -mcmpccxadd. * config/i386/sync.md (cmpccxadd_): New define insn. * config/i386/x86gprintrin.h: Include cmpccxaddintrin.h. * doc/extend.texi: Document cmpccxadd. * doc/invoke.texi: Document -mcmpccxadd. * doc/sourcebuild.texi: Document target cmpccxadd. * config/i386/cmpccxaddintrin.h: New file. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Add -mcmpccxadd. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/avx-1.c: Add builtin define for enum. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-13.c: Add builtin define for enum. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/x86gprintrin-1.c: Add -mcmpccxadd for 64 bit target. * gcc.target/i386/x86gprintrin-2.c: Add -mcmpccxadd for 64 bit target. Add builtin define for enum. * gcc.target/i386/x86gprintrin-3.c: Add -mcmpccxadd for 64 bit target. * gcc.target/i386/x86gprintrin-4.c: Add mcmpccxadd for 64 bit target. * gcc.target/i386/x86gprintrin-5.c: Add mcpmccxadd for 64 bit target. Add builtin define for enum. * gcc.target/i386/cmpccxadd-1.c: New test. * gcc.target/i386/cmpccxadd-2.c: New test. --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.cc | 15 ++ gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 1 + gcc/config.gcc | 3 +- gcc/config/i386/cmpccxaddintrin.h | 89 +++++++++++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-builtin-types.def | 4 + gcc/config/i386/i386-builtin.def | 4 + gcc/config/i386/i386-c.cc | 2 + gcc/config/i386/i386-expand.cc | 22 ++- gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 4 +- gcc/config/i386/i386.opt | 5 + gcc/config/i386/sync.md | 42 ++++++ gcc/config/i386/x86gprintrin.h | 2 + gcc/doc/extend.texi | 5 + gcc/doc/invoke.texi | 10 +- gcc/doc/sourcebuild.texi | 3 + gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-1.c | 4 + gcc/testsuite/gcc.target/i386/cmpccxadd-1.c | 61 ++++++++ gcc/testsuite/gcc.target/i386/cmpccxadd-2.c | 138 ++++++++++++++++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/sse-13.c | 6 +- gcc/testsuite/gcc.target/i386/sse-23.c | 6 +- .../gcc.target/i386/x86gprintrin-1.c | 2 +- .../gcc.target/i386/x86gprintrin-2.c | 6 +- .../gcc.target/i386/x86gprintrin-3.c | 2 +- .../gcc.target/i386/x86gprintrin-4.c | 2 +- .../gcc.target/i386/x86gprintrin-5.c | 6 +- gcc/testsuite/lib/target-supports.exp | 10 ++ 33 files changed, 450 insertions(+), 15 deletions(-) create mode 100644 gcc/config/i386/cmpccxaddintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/cmpccxadd-1.c create mode 100644 gcc/testsuite/gcc.target/i386/cmpccxadd-2.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index e9fd586704d..f73834b086c 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -789,6 +789,8 @@ get_available_features (struct __processor_model *cpu_model, __cpuid_count (7, 1, eax, ebx, ecx, edx); if (eax & bit_HRESET) set_feature (FEATURE_HRESET); + if (eax & bit_CMPCCXADD) + set_feature(FEATURE_CMPCCXADD); if (avx_usable) { if (eax & bit_AVXVNNI) diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index f9c906f75cf..75966779d82 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -110,6 +110,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16 #define OPTION_MASK_ISA2_AVXVNNIINT8_SET OPTION_MASK_ISA2_AVXVNNIINT8 #define OPTION_MASK_ISA2_AVXNECONVERT_SET OPTION_MASK_ISA2_AVXNECONVERT +#define OPTION_MASK_ISA2_CMPCCXADD_SET OPTION_MASK_ISA2_CMPCCXADD /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -283,6 +284,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_WIDEKL_UNSET OPTION_MASK_ISA2_WIDEKL #define OPTION_MASK_ISA2_AVXVNNIINT8_UNSET OPTION_MASK_ISA2_AVXVNNIINT8 #define OPTION_MASK_ISA2_AVXNECONVERT_UNSET OPTION_MASK_ISA2_AVXNECONVERT +#define OPTION_MASK_ISA2_CMPCCXADD_UNSET OPTION_MASK_ISA2_CMPCCXADD /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -1181,6 +1183,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mcmpccxadd: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_CMPCCXADD_SET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_CMPCCXADD_SET; + } + else + { + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_CMPCCXADD_UNSET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_CMPCCXADD_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 2d3fbfc817a..5a61d817007 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -243,6 +243,7 @@ enum processor_features FEATURE_AVXIFMA, FEATURE_AVXVNNIINT8, FEATURE_AVXNECONVERT, + FEATURE_CMPCCXADD, CPU_FEATURE_MAX }; diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h index bceaee589ee..3035e4a8186 100644 --- a/gcc/common/config/i386/i386-isas.h +++ b/gcc/common/config/i386/i386-isas.h @@ -180,4 +180,5 @@ ISA_NAMES_TABLE_START P_NONE, "-mavxvnniint8") ISA_NAMES_TABLE_ENTRY("avxneconvert", FEATURE_AVXNECONVERT, P_NONE, "-mavxneconvert") + ISA_NAMES_TABLE_ENTRY("cmpccxadd", FEATURE_CMPCCXADD, P_NONE, "-mcmpccxadd") ISA_NAMES_TABLE_END diff --git a/gcc/config.gcc b/gcc/config.gcc index 840b62aee61..fe063bfbb26 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -422,7 +422,8 @@ i[34567]86-*-* | x86_64-*-*) amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h - avxifmaintrin.h avxvnniint8intrin.h avxneconvertintrin.h" + avxifmaintrin.h avxvnniint8intrin.h avxneconvertintrin.h + cmpccxaddintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/cmpccxaddintrin.h b/gcc/config/i386/cmpccxaddintrin.h new file mode 100644 index 00000000000..74ae015476d --- /dev/null +++ b/gcc/config/i386/cmpccxaddintrin.h @@ -0,0 +1,89 @@ +/* Copyright (C) 2012-2021 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _X86GPRINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _CMPCCXADDINTRIN_H_INCLUDED +#define _CMPCCXADDINTRIN_H_INCLUDED + +#ifdef __x86_64__ + +#ifndef __CMPCCXADD__ +#pragma GCC push_options +#pragma GCC target("cmpccxadd") +#define __DISABLE_CMPCCXADD__ +#endif /* __CMPCCXADD__ */ + +typedef enum { + _CMPCCX_BE, /* Below or equal. */ + _CMPCCX_B, /* Below. */ + _CMPCCX_LE, /* Less or equal. */ + _CMPCCX_L, /* Less. */ + _CMPCCX_NBE, /* Neither below nor equal. */ + _CMPCCX_NB, /* Not below. */ + _CMPCCX_NLE, /* Neither less nor equal. */ + _CMPCCX_NL, /* Not less. */ + _CMPCCX_NO, /* No overflow. */ + _CMPCCX_NP, /* No parity. */ + _CMPCCX_NS, /* No sign. */ + _CMPCCX_NZ, /* Not zero. */ + _CMPCCX_O, /* Overflow. */ + _CMPCCX_P, /* Parity. */ + _CMPCCX_S, /* Sign. */ + _CMPCCX_Z, /* Zero. */ +} _CMPCCX_ENUM; + +#ifdef __OPTIMIZE__ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__cmpccxadd_epi32 (int *__A, int __B, int __C, const _CMPCCX_ENUM __D) +{ + return __builtin_ia32_cmpccxadd (__A, __B, __C, __D); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__cmpccxadd_epi64 (long long *__A, long long __B, long long __C, + const _CMPCCX_ENUM __D) +{ + return __builtin_ia32_cmpccxadd64 (__A, __B, __C, __D); +} +#else +#define __cmpccxadd_epi32(A,B,C,D) \ +__builtin_ia32_cmpccxadd((int *) (A), (int) (B), (int) (C), \ + (_CMPCCX_ENUM)(D)) +#define __cmpccxadd_epi64(A,B,C,D) \ +__builtin_ia32_cmpccxadd64((int*) (A), (int) (B), (int) (C), \ + (_CMPCCX_ENUM)(D)) +#endif + +#ifdef __DISABLE_CMPCCXADD__ +#undef __DISABLE_CMPCCXADD__ +#pragma GCC pop_options +#endif /* __DISABLE_CMPCCXADD__ */ + +#endif + +#endif /* _CMPCCXADDINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index 18bbc0cb7be..19c0d033921 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -27,6 +27,7 @@ /* %eax */ #define bit_AVXVNNI (1 << 4) #define bit_AVX512BF16 (1 << 5) +#define bit_CMPCCXADD (1 << 7) #define bit_HRESET (1 << 22) #define bit_AVXIFMA (1 << 23) diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index ebf6e5b4ad8..922348fcd60 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1406,3 +1406,7 @@ DEF_FUNCTION_TYPE (V8BF, V8SF) DEF_FUNCTION_TYPE (V8BF, V4SF) DEF_FUNCTION_TYPE (V4SF, PCV8BF) DEF_FUNCTION_TYPE (V8SF, PCV16BF) + +# CMPccXADD builtins +DEF_FUNCTION_TYPE (INT, PINT, INT, INT, INT) +DEF_FUNCTION_TYPE (LONGLONG, PLONGLONG, LONGLONG, LONGLONG, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index a429577180c..d4d4fda1d4a 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -288,6 +288,10 @@ BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneobf162ps_v8sf, "__builti BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneoph2ps_v4sf, "__builtin_ia32_vcvtneoph2ps128", IX86_BUILTIN_VCVTNEOPH2PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8HF) BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneoph2ps_v8sf, "__builtin_ia32_vcvtneoph2ps256", IX86_BUILTIN_VCVTNEOPH2PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16HF) +/* CMPCCXADD */ +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_si, "__builtin_ia32_cmpccxadd", IX86_BUILTIN_CMPCCXADD, UNKNOWN, (int) INT_FTYPE_PINT_INT_INT_INT) +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_di, "__builtin_ia32_cmpccxadd64", IX86_BUILTIN_CMPCCXADD64, UNKNOWN, (int) LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT) + /* AVX512BW */ BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index 48934df664c..9885a724d0f 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -639,6 +639,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__AVXVNNIINT8__"); if (isa_flag2 & OPTION_MASK_ISA2_AVXNECONVERT) def_or_undef (parse_in, "__AVXNECONVERT__"); + if (isa_flag2 & OPTION_MASK_ISA2_CMPCCXADD) + def_or_undef (parse_in, "__CMPCCXADD__"); if (TARGET_IAMCU) { def_or_undef (parse_in, "__iamcu"); diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 1e29fe584af..cad2eb728fd 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -11825,8 +11825,9 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, tree arg; rtx pat, op; unsigned int i, nargs, arg_adjust, memory; + unsigned int constant = 100; bool aligned_mem = false; - rtx xops[3]; + rtx xops[4]; enum insn_code icode = d->icode; const struct insn_data_d *insn_p = &insn_data[icode]; machine_mode tmode = insn_p->operand[0].mode; @@ -12115,6 +12116,13 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, klass = load; memory = 0; break; + case INT_FTYPE_PINT_INT_INT_INT: + case LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT: + nargs = 4; + klass = load; + memory = 0; + constant = 3; + break; default: gcc_unreachable (); } @@ -12180,6 +12188,15 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, if (MEM_ALIGN (op) < align) set_mem_align (op, align); } + else if (i == constant) + { + /* This must be the constant. */ + if (!insn_p->operand[nargs].predicate(op, SImode)) + { + error ("the fourth argument must be one of enum %qs", "_CMPCCX_ENUM"); + return const0_rtx; + } + } else { /* This must be register. */ @@ -12221,6 +12238,9 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case 3: pat = GEN_FCN (icode) (target, xops[0], xops[1], xops[2]); break; + case 4: + pat = GEN_FCN (icode) (target, xops[0], xops[1], xops[2], xops[3]); + break; default: gcc_unreachable (); } diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def index 4ea3f96f69f..7ffc73ba23e 100644 --- a/gcc/config/i386/i386-isa.def +++ b/gcc/config/i386/i386-isa.def @@ -112,3 +112,4 @@ DEF_PTA(AVX512FP16) DEF_PTA(AVXIFMA) DEF_PTA(AVXVNNIINT8) DEF_PTA(AVXNECONVERT) +DEF_PTA(CMPCCXADD) diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index e59e2d8aeaf..fb872afdfb5 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -229,7 +229,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 }, { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA }, { "-mavxvnniint8", OPTION_MASK_ISA2_AVXVNNIINT8 }, - { "-mavxneconvert", OPTION_MASK_ISA2_AVXNECONVERT } + { "-mavxneconvert", OPTION_MASK_ISA2_AVXNECONVERT }, + { "-mcmpccxadd", OPTION_MASK_ISA2_CMPCCXADD } }; static struct ix86_target_opts isa_opts[] = { @@ -1078,6 +1079,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("avxifma", OPT_mavxifma), IX86_ATTR_ISA ("avxvnniint8", OPT_mavxvnniint8), IX86_ATTR_ISA ("avxneconvert", OPT_mavxneconvert), + IX86_ATTR_ISA ("cmpccxadd", OPT_mcmpccxadd), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 6e07b89ac4c..c4a3bdcf960 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1229,3 +1229,8 @@ mavxneconvert Target Mask(ISA2_AVXNECONVERT) Var(ix86_isa_flags2) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and AVXNECONVERT build-in functions and code generation. + +mcmpccxadd +Target Mask(ISA2_CMPCCXADD) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and +CMPCCXADD build-in functions and code generation. diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 92634d538cb..2b6f2f4c826 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -37,6 +37,9 @@ UNSPECV_CMPXCHG UNSPECV_XCHG UNSPECV_LOCK + + ;; For CMPccXADD support + UNSPECV_CMPCCXADD ]) (define_expand "sse2_lfence" @@ -1061,3 +1064,42 @@ (any_logic:SWI (match_dup 0) (match_dup 1)))] "" "lock{%;} %K2{}\t{%1, %0|%0, %1}") + +;; CMPCCXADD + +(define_insn "@cmpccxadd__1" + [(set (match_operand:SWI48x 1 "register_operand" "+r") + (match_operand:SWI48x 0 "memory_operand" "+m")) + (set (match_dup 0) + (unspec_volatile:SWI48x + [(match_dup 0) + (match_dup 1) + (match_operand:SWI48x 2 "register_operand" "r") + (match_operand:SI 3 "const_0_to_15_operand" "n")] + UNSPECV_CMPCCXADD)) + (clobber (reg:CC FLAGS_REG))] + "TARGET_CMPCCXADD && TARGET_64BIT" +{ + char buf[128]; + const char *ops = "cmp%sxadd\t{%%2, %%1, %%0|%%0, %%1, %%2}"; + char const *cc[16] = {"be" ,"b", "le", "l", "nbe", "nb", "nle", "nl", + "no", "np", "ns", "nz", "o", "p", "s", "z"}; + + snprintf (buf, sizeof (buf), ops, cc[INTVAL (operands[3])]); + output_asm_insn (buf, operands); + return ""; +}) + +(define_expand "cmpccxadd_" + [(match_operand:SWI48x 0 "register_operand") + (match_operand:SWI48x 1 "memory_operand") + (match_operand:SWI48x 2 "register_operand") + (match_operand:SWI48x 3 "register_operand") + (match_operand:SI 4 "const_0_to_15_operand")] + "TARGET_CMPCCXADD && TARGET_64BIT" +{ + emit_insn (gen_cmpccxadd_1 (mode, operands[1], operands[2], + operands[3], operands[4])); + emit_move_insn (operands[0], operands[2]); + DONE; +}) diff --git a/gcc/config/i386/x86gprintrin.h b/gcc/config/i386/x86gprintrin.h index e0be01d5e78..a84fbe9137d 100644 --- a/gcc/config/i386/x86gprintrin.h +++ b/gcc/config/i386/x86gprintrin.h @@ -52,6 +52,8 @@ #include +#include + #include #include diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0a4396f92bb..34c23240dfb 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7075,6 +7075,11 @@ Enable/disable the generation of the AVXVNNIINT8 instructions. @cindex @code{target("avxneconvert")} function attribute, x86 Enable/disable the generation of the AVXNECONVERT instructions. +@item cmpccxadd +@itemx no-cmpccxadd +@cindex @code{target("cmpccxadd")} function attribute, x86 +Enable/disable the generation of the CMPccXADD instructions. + @item cld @itemx no-cld @cindex @code{target("cld")} function attribute, x86 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 307fb7fa441..cbbc0201828 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1436,7 +1436,7 @@ See RS/6000 and PowerPC Options. -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni@gol --mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert @gol +-mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd @gol -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mkl -mwidekl @gol @@ -32902,6 +32902,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @need 200 @itemx -mavxneconvert @opindex mavxneconvert +@need 200 +@itemx -mcmpccxadd +@opindex mcmpccxadd These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA, @@ -32912,8 +32915,9 @@ XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2, GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16, ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE, UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16, -AVXIFMA, AVXVNNIINT8, AVXNECONVERT or CLDEMOTE extended instruction sets. Each -has a corresponding @option{-mno-} option to disable use of these instructions. +AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD or CLDEMOTE extended instruction +sets. Each has a corresponding @option{-mno-} option to disable use of these +instructions. These extensions are also available as built-in functions: see @ref{x86 Built-in Functions}, for details of the functions enabled and diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index a12175b6498..714595d33bf 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2511,6 +2511,9 @@ Target supports the execution of @code{amx-bf16} instructions. @item cell_hw Test system can execute AltiVec and Cell PPU instructions. +@item cmpccxadd +Target supports the execution of @code{cmpccxadd} instructions. + @item coldfire_fpu Target uses a ColdFire FPU. diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index dd3e71f25ed..f7dbbbbf619 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index cd7045cc4e4..2ac5d9f2df5 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 154e7b3b107..051a1b59b5b 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -835,6 +835,10 @@ #define __builtin_ia32_bextri_u32(X, Y) __builtin_ia32_bextri_u32 (X, 1) #define __builtin_ia32_bextri_u64(X, Y) __builtin_ia32_bextri_u64 (X, 1) +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + #include #include #include diff --git a/gcc/testsuite/gcc.target/i386/cmpccxadd-1.c b/gcc/testsuite/gcc.target/i386/cmpccxadd-1.c new file mode 100644 index 00000000000..699ed9b2dc2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/cmpccxadd-1.c @@ -0,0 +1,61 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mcmpccxadd" } */ +/* { dg-final { scan-assembler-times "cmpbexadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpbxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmplexadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmplxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnbexadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnbxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnlexadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnlxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnoxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnpxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnsxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnzxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpoxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmppxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpsxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpzxadd\[ \\t\]" 2 } } */ +#include + +int *a; +int b, c; +long long *d; +long long e, f; + +void extern +cmpccxadd_test(void) +{ + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_BE); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_BE); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_B); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_B); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_LE); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_LE); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_L); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_L); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NBE); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NBE); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NB); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NB); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NLE); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NLE); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NL); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NL); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NO); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NO); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NP); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NP); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NS); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NS); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NZ); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NZ); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_O); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_O); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_P); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_P); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_S); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_S); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_Z); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_Z); +} diff --git a/gcc/testsuite/gcc.target/i386/cmpccxadd-2.c b/gcc/testsuite/gcc.target/i386/cmpccxadd-2.c new file mode 100644 index 00000000000..76d17803fbb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/cmpccxadd-2.c @@ -0,0 +1,138 @@ +/* { dg-do run { target { ! ia32 } } } */ +/* { dg-options "-O2 -mcmpccxadd" } */ +/* { dg-require-effective-target cmpccxadd } */ + +#include +#include + +int +main() +{ + if (!__builtin_cpu_supports("cmpccxadd")) + return 0; + + int srcdest1[16] = { 1,1,1,1,2,1,2,1,1,2,2,2,-2147483648,4,1,1 }; + int srcdest2[16] = { 1,2,1,2,1,1,1,1,1,1,1,1,1,1,2,1 }; + int src3[16] = { 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }; + int _srcdest1[16], _srcdest2[16], res[16], cond[16]; + long long srcdest1_64[16] = { 1,1,1,1,2,1,2,1,1,2,2,2,-9223372036854775807LL-1,4,1,1 }; + long long srcdest2_64[16] = { 1,2,1,2,1,1,1,1,1,1,1,1,1,1,2,1 }; + long long src3_64[16] = { 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }; + long long _srcdest1_64[16], _srcdest2_64[16], res_64[16], cond_64[16]; + + int tmp2[16]; + long long tmp2_64[16]; + + int cf[16] = { 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; + int of[16] = { 0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0 }; + int sf[16] = { 0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0 }; + int zf[16] = { 1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1 }; + int af[16] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; + int pf[16] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0 }; + + for (int i = 0; i < 16; i++) + { + tmp2[i] = srcdest1[i] + src3[i]; + tmp2_64[i] = srcdest1_64[i] + src3_64[i]; + } + + cond[0] = (cf[0] || zf[0]) == 1 ? 1 : 0; + cond[1] = cf[1] == 1 ? 1 : 0; + cond[2] = (((sf[2] && !of[2]) || (!sf[2] && of[2])) || zf[2]) == 1 ? 1 : 0; + cond[3] = ((sf[3] && !of[3]) || (!sf[3] && of[3])) == 1 ? 1 : 0; + cond[4] = (cf[4] || zf[4]) == 0 ? 1 : 0; + cond[5] = cf[5] == 0 ? 1 : 0; + cond[6] = (((sf[6] && !of[6]) || (!sf[6] && of[6])) || zf[6]) == 0 ? 1 : 0; + cond[7] = ((sf[7] && !of[7]) || (!sf[7] && of[7])) == 0 ? 1 : 0; + cond[8] = of[8] == 0 ? 1 : 0; + cond[9] = pf[9] == 0 ? 1 : 0; + cond[10] = sf[10] == 0 ? 1 : 0; + cond[11] = zf[11] == 0 ? 1 : 0; + cond[12] = of[12] == 1 ? 1 : 0; + cond[13] = pf[13] == 1 ? 1 : 0; + cond[14] = sf[14] == 1 ? 1 : 0; + cond[15] = zf[15] == 1 ? 1 : 0; + + cond_64[0] = (cf[0] || zf[0]) == 1 ? 1 : 0; + cond_64[1] = cf[1] == 1 ? 1 : 0; + cond_64[2] = (((sf[2] && !of[2]) || (!sf[2] && of[2])) || zf[2]) == 1 ? 1 : 0; + cond_64[3] = ((sf[3] && !of[3]) || (!sf[3] && of[3])) == 1 ? 1 : 0; + cond_64[4] = (cf[4] || zf[4]) == 0 ? 1 : 0; + cond_64[5] = cf[5] == 0 ? 1 : 0; + cond_64[6] = (((sf[6] && !of[6]) || (!sf[6] && of[6])) || zf[6]) == 0 ? 1 : 0; + cond_64[7] = ((sf[7] && !of[7]) || (!sf[7] && of[7])) == 0 ? 1 : 0; + cond_64[8] = of[8] == 0 ? 1 : 0; + cond_64[9] = pf[9] == 0 ? 1 : 0; + cond_64[10] = sf[10] == 0 ? 1 : 0; + cond_64[11] = zf[11] == 0 ? 1 : 0; + cond_64[12] = of[12] == 1 ? 1 : 0; + cond_64[13] = pf[13] == 1 ? 1 : 0; + cond_64[14] = sf[14] == 1 ? 1 : 0; + cond_64[15] = zf[15] == 1 ? 1 : 0; + + for (int i = 0; i < 16; i++) + { + if (cond[i] == 1) + { + _srcdest1[i] = tmp2[i]; + } + else + { + _srcdest1[i] = srcdest1[i]; + } + if (cond_64[i] == 1) + { + _srcdest1_64[i] = tmp2_64[i]; + } + else + { + _srcdest1_64[i] = srcdest1_64[i]; + } + _srcdest2[i] = srcdest1[i]; + _srcdest2_64[i] = srcdest1_64[i]; + } + + res[0] = __cmpccxadd_epi32 (&srcdest1[0], srcdest2[0], src3[0], _CMPCCX_BE); + res[1] = __cmpccxadd_epi32 (&srcdest1[1], srcdest2[1], src3[1], _CMPCCX_B); + res[2] = __cmpccxadd_epi32 (&srcdest1[2], srcdest2[2], src3[2], _CMPCCX_LE); + res[3] = __cmpccxadd_epi32 (&srcdest1[3], srcdest2[3], src3[3], _CMPCCX_L); + res[4] = __cmpccxadd_epi32 (&srcdest1[4], srcdest2[4], src3[4], _CMPCCX_NBE); + res[5] = __cmpccxadd_epi32 (&srcdest1[5], srcdest2[5], src3[5], _CMPCCX_NB); + res[6] = __cmpccxadd_epi32 (&srcdest1[6], srcdest2[6], src3[6], _CMPCCX_NLE); + res[7] = __cmpccxadd_epi32 (&srcdest1[7], srcdest2[7], src3[7], _CMPCCX_NL); + res[8] = __cmpccxadd_epi32 (&srcdest1[8], srcdest2[8], src3[8], _CMPCCX_NO); + res[9] = __cmpccxadd_epi32 (&srcdest1[9], srcdest2[9], src3[9], _CMPCCX_NP); + res[10] = __cmpccxadd_epi32 (&srcdest1[10], srcdest2[10], src3[10], _CMPCCX_NS); + res[11] = __cmpccxadd_epi32 (&srcdest1[11], srcdest2[11], src3[11], _CMPCCX_NZ); + res[12] = __cmpccxadd_epi32 (&srcdest1[12], srcdest2[12], src3[12], _CMPCCX_O); + res[13] = __cmpccxadd_epi32 (&srcdest1[13], srcdest2[13], src3[13], _CMPCCX_P); + res[14] = __cmpccxadd_epi32 (&srcdest1[14], srcdest2[14], src3[14], _CMPCCX_S); + res[15] = __cmpccxadd_epi32 (&srcdest1[15], srcdest2[15], src3[15], _CMPCCX_Z); + + res_64[0] = __cmpccxadd_epi64(&srcdest1_64[0], srcdest2_64[0], src3_64[0], _CMPCCX_BE); + res_64[1] = __cmpccxadd_epi64(&srcdest1_64[1], srcdest2_64[1], src3_64[1], _CMPCCX_B); + res_64[2] = __cmpccxadd_epi64(&srcdest1_64[2], srcdest2_64[2], src3_64[2], _CMPCCX_LE); + res_64[3] = __cmpccxadd_epi64(&srcdest1_64[3], srcdest2_64[3], src3_64[3], _CMPCCX_L); + res_64[4] = __cmpccxadd_epi64(&srcdest1_64[4], srcdest2_64[4], src3_64[4], _CMPCCX_NBE); + res_64[5] = __cmpccxadd_epi64(&srcdest1_64[5], srcdest2_64[5], src3_64[5], _CMPCCX_NB); + res_64[6] = __cmpccxadd_epi64(&srcdest1_64[6], srcdest2_64[6], src3_64[6], _CMPCCX_NLE); + res_64[7] = __cmpccxadd_epi64(&srcdest1_64[7], srcdest2_64[7], src3_64[7], _CMPCCX_NL); + res_64[8] = __cmpccxadd_epi64(&srcdest1_64[8], srcdest2_64[8], src3_64[8], _CMPCCX_NO); + res_64[9] = __cmpccxadd_epi64(&srcdest1_64[9], srcdest2_64[9], src3_64[9], _CMPCCX_NP); + res_64[10] = __cmpccxadd_epi64(&srcdest1_64[10], srcdest2_64[10], src3_64[10], _CMPCCX_NS); + res_64[11] = __cmpccxadd_epi64(&srcdest1_64[11], srcdest2_64[11], src3_64[11], _CMPCCX_NZ); + res_64[12] = __cmpccxadd_epi64(&srcdest1_64[12], srcdest2_64[12], src3_64[12], _CMPCCX_O); + res_64[13] = __cmpccxadd_epi64(&srcdest1_64[13], srcdest2_64[13], src3_64[13], _CMPCCX_P); + res_64[14] = __cmpccxadd_epi64(&srcdest1_64[14], srcdest2_64[14], src3_64[14], _CMPCCX_S); + res_64[15] = __cmpccxadd_epi64(&srcdest1_64[15], srcdest2_64[15], src3_64[15], _CMPCCX_Z); + + for (int i = 0; i < 16; i++) + { + if ((srcdest1[i] != _srcdest1[i]) || (res[i] != _srcdest2[i])) + abort(); + if ((srcdest1_64[i] != _srcdest1_64[i]) || (res_64[i] != _srcdest2_64[i])) + abort(); + } + + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index b3d33df7c9c..2e35a7ae50e 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -83,6 +83,7 @@ extern void test_avx512fp16 (void) __attribute__((__target__("avx512fp16"))); extern void test_avxifma (void) __attribute__((__target__("avxifma"))); extern void test_avxvnniint8 (void) __attribute__((__target__("avxvnniint8"))); extern void test_avxneconvert (void) __attribute__((__target__("avxneconvert"))); +extern void test_cmpccxadd (void) __attribute__((__target__("cmpccxadd"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx5124fmaps(void) __attribute__((__target__("no-avx5124fmaps"))); @@ -167,6 +168,7 @@ extern void test_no_avx512fp16 (void) __attribute__((__target__("no-avx512fp16" extern void test_no_avxifma (void) __attribute__((__target__("no-avxifma"))); extern void test_no_avxvnniint8 (void) __attribute__((__target__("no-avxvnniint8"))); extern void test_no_avxneconvert (void) __attribute__((__target__("no-avxneconvert"))); +extern void test_no_cmpccxadd (void) __attribute__((__target__("no-cmpccxadd"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index b9cdfb690d1..e947b4347f4 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd" } */ /* { dg-add-options bind_pic_locally } */ #include @@ -842,4 +842,8 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + #include diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 898dde80c8f..757ba9c9a7d 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -843,6 +843,10 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert") +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert,cmpccxadd") #include diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-1.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-1.c index 293be094b78..76de89d0cb7 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-1.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-1.c @@ -1,7 +1,7 @@ /* Test that is usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ /* { dg-options "-O -std=c89 -pedantic-errors -march=x86-64 -madx -mbmi -mbmi2 -mcldemote -mclflushopt -mclwb -mclzero -menqcmd -mfsgsbase -mfxsr -mhreset -mlzcnt -mlwp -mmovdiri -mmwaitx -mpconfig -mpopcnt -mpku -mptwrite -mrdpid -mrdrnd -mrdseed -mrtm -mserialize -msgx -mshstk -mtbm -mtsxldtrk -mwaitpkg -mwbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-sse -mno-mmx" } */ -/* { dg-additional-options "-muintr" { target { ! ia32 } } } */ +/* { dg-additional-options "-mcmpccxadd -muintr" { target { ! ia32 } } } */ #include diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-2.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-2.c index c6330275746..aefad77f864 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-2.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-2.c @@ -1,7 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-O2 -Werror-implicit-function-declaration -march=x86-64 -madx -mbmi -mbmi2 -mcldemote -mclflushopt -mclwb -mclzero -menqcmd -mfsgsbase -mfxsr -mhreset -mlzcnt -mlwp -mmovdiri -mmwaitx -mpconfig -mpopcnt -mpku -mptwrite -mrdpid -mrdrnd -mrdseed -mrtm -mserialize -msgx -mshstk -mtbm -mtsxldtrk -mwaitpkg -mwbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-sse -mno-mmx" } */ /* { dg-add-options bind_pic_locally } */ -/* { dg-additional-options "-muintr" { target { ! ia32 } } } */ +/* { dg-additional-options "-mcmpccxadd -muintr" { target { ! ia32 } } } */ /* Test that the intrinsics in compile with optimization. All of them are defined as inline functions that reference the proper @@ -28,4 +28,8 @@ /* rtmintrin.h */ #define __builtin_ia32_xabort(N) __builtin_ia32_xabort(1) +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + #include diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-3.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-3.c index 3a7e1f4a10d..261c9180aa0 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-3.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-3.c @@ -1,7 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-O0 -Werror-implicit-function-declaration -march=x86-64 -madx -mbmi -mbmi2 -mcldemote -mclflushopt -mclwb -mclzero -menqcmd -mfsgsbase -mfxsr -mhreset -mlzcnt -mlwp -mmovdiri -mmwaitx -mpconfig -mpopcnt -mpku -mptwrite -mrdpid -mrdrnd -mrdseed -mrtm -mserialize -msgx -mshstk -mtbm -mtsxldtrk -mwaitpkg -mwbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-sse -mno-mmx" } */ /* { dg-add-options bind_pic_locally } */ -/* { dg-additional-options "-muintr" { target { ! ia32 } } } */ +/* { dg-additional-options "-mcmpccxadd -muintr" { target { ! ia32 } } } */ /* Test that the intrinsics in compile without optimization. All of them are defined as inline functions that reference the proper diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-4.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-4.c index d8a6126e5dc..7f76b870934 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-4.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-4.c @@ -15,7 +15,7 @@ #ifndef DIFFERENT_PRAGMAS #ifdef __x86_64__ -#pragma GCC target ("adx,bmi,bmi2,fsgsbase,fxsr,hreset,lwp,lzcnt,popcnt,rdrnd,rdseed,tbm,rtm,serialize,tsxldtrk,uintr,xsaveopt") +#pragma GCC target ("adx,bmi,bmi2,cmpccxadd,fsgsbase,fxsr,hreset,lwp,lzcnt,popcnt,rdrnd,rdseed,tbm,rtm,serialize,tsxldtrk,uintr,xsaveopt") #else #pragma GCC target ("adx,bmi,bmi2,fsgsbase,fxsr,hreset,lwp,lzcnt,popcnt,rdrnd,rdseed,tbm,rtm,serialize,tsxldtrk,xsaveopt") #endif diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-5.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-5.c index 9ef66fdad54..54d826c4f46 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-5.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-5.c @@ -27,8 +27,12 @@ /* rtmintrin.h */ #define __builtin_ia32_xabort(M) __builtin_ia32_xabort(1) +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + #ifdef __x86_64__ -#pragma GCC target ("adx,bmi,bmi2,clflushopt,clwb,clzero,enqcmd,fsgsbase,fxsr,hreset,lwp,lzcnt,mwaitx,pconfig,pku,popcnt,rdpid,rdrnd,rdseed,tbm,rtm,serialize,sgx,tsxldtrk,uintr,xsavec,xsaveopt,xsaves,wbnoinvd") +#pragma GCC target ("adx,bmi,bmi2,clflushopt,clwb,clzero,cmpccxadd,enqcmd,fsgsbase,fxsr,hreset,lwp,lzcnt,mwaitx,pconfig,pku,popcnt,rdpid,rdrnd,rdseed,tbm,rtm,serialize,sgx,tsxldtrk,uintr,xsavec,xsaveopt,xsaves,wbnoinvd") #else #pragma GCC target ("adx,bmi,bmi2,clflushopt,clwb,clzero,enqcmd,fsgsbase,fxsr,hreset,lwp,lzcnt,mwaitx,pconfig,pku,popcnt,rdpid,rdrnd,rdseed,tbm,rtm,serialize,sgx,tsxldtrk,xsavec,xsaveopt,xsaves,wbnoinvd") #endif diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 9228e810c45..d3b9aafb8f0 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -9542,6 +9542,16 @@ proc check_effective_target_avxneconvert { } { } "-O0 -mavxneconvert" ] } +# Return 1 if cmpccxadd instructions can be compiled. +proc check_effective_target_cmpccxadd { } { + return [check_no_compiler_messages cmpccxadd object { + int _cmpccxadd_epi32 (int *__A, int __B, int __C, const int __D) + { + return (int)__builtin_ia32_cmpccxadd (__A, __B, __C, 1); + } + } "-mcmpccxadd" ] +} + # Return 1 if sse instructions can be compiled. proc check_effective_target_sse { } { return [check_no_compiler_messages sse object { From patchwork Fri Oct 14 07:54:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 1689922 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=L0D+M7Jq; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MpdxR5nq3z23jn for ; Fri, 14 Oct 2022 18:56:11 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C768D385275F for ; Fri, 14 Oct 2022 07:56:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C768D385275F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734169; bh=ozpIppgD5pcW6YpT0hJI9u8MQmIhHmk4FltAuyvbzVU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=L0D+M7JqzDfLsHrul52AY6HHTgDwH4RA00LhZDjZdIjkI3gYheE2VZvRDnH6vBlAu wKwhTMIkjvdsObqAUXy9C4GlMGb4CzhnuSACA/GMTF9gRPuWM86fgqfAI/SFZVsW8B LbkK/0+LtrRmSLyGzT896nvgo5IrYkX6TJPD9r50= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 4EC623858C39 for ; Fri, 14 Oct 2022 07:54:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4EC623858C39 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597856" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597856" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:54:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488365" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488365" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:50 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 078861009C91; Fri, 14 Oct 2022 15:54:48 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 6/6] Initial Sierra Forest Support Date: Fri, 14 Oct 2022 15:54:45 +0800 Message-Id: <20221014075445.7938-7-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Add Sierra Forest. * common/config/i386/i386-common.cc (processor_names): Add Sierra Forest. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Add INTEL_SIERRAFOREST. * config.gcc: Add -march=sierraforest. * config/i386/driver-i386.cc (host_detect_local_cpu): Handle Sierra Forest. * config/i386/i386-c.cc (ix86_target_macros_internal): Ditto. * config/i386/i386-options.cc (m_SIERRAFOREST): New define. (processor_cost_table): Add sierra forest. * config/i386/i386.h (enum processor_type): Add PROCESSOR_SIERRA_FOREST. (PTA_SIERRAFOREST): Ditto. * doc/extend.texi: Add sierra forest. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Add sierra forest. * gcc.target/i386/funcspec-56.inc: Handle new march. --- gcc/common/config/i386/cpuinfo.h | 6 ++++++ gcc/common/config/i386/i386-common.cc | 3 +++ gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/config.gcc | 3 ++- gcc/config/i386/driver-i386.cc | 5 ++++- gcc/config/i386/i386-c.cc | 7 +++++++ gcc/config/i386/i386-options.cc | 2 ++ gcc/config/i386/i386.h | 3 +++ gcc/doc/extend.texi | 3 +++ gcc/doc/invoke.texi | 8 ++++++++ gcc/testsuite/g++.target/i386/mv16.C | 6 ++++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 1 + 12 files changed, 46 insertions(+), 2 deletions(-) diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index f73834b086c..cc499c46ed0 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -516,6 +516,12 @@ get_intel_cpu (struct __processor_model *cpu_model, cpu_model->__cpu_type = INTEL_COREI7; cpu_model->__cpu_subtype = INTEL_COREI7_SAPPHIRERAPIDS; break; + case 0xaf: + /* Sierra Forest. */ + cpu = "sierraforest"; + CHECK___builtin_cpu_is ("sierraforest"); + cpu_model->__cpu_type = INTEL_SIERRAFOREST; + break; case 0x17: case 0x1d: /* Penryn. */ diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 75966779d82..6ccc4d2f03c 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -1874,6 +1874,7 @@ const char *const processor_names[] = "goldmont", "goldmont-plus", "tremont", + "sierraforest", "knl", "knm", "skylake", @@ -2019,6 +2020,8 @@ const pta processor_alias_table[] = M_CPU_TYPE (INTEL_GOLDMONT_PLUS), P_PROC_SSE4_2}, {"tremont", PROCESSOR_TREMONT, CPU_HASWELL, PTA_TREMONT, M_CPU_TYPE (INTEL_TREMONT), P_PROC_SSE4_2}, + {"sierraforest", PROCESSOR_SIERRAFOREST, CPU_HASWELL, PTA_SIERRAFOREST, + M_CPU_SUBTYPE (INTEL_SIERRAFOREST), P_PROC_AVX2}, {"knl", PROCESSOR_KNL, CPU_SLM, PTA_KNL, M_CPU_TYPE (INTEL_KNL), P_PROC_AVX512F}, {"knm", PROCESSOR_KNM, CPU_SLM, PTA_KNM, diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 5a61d817007..a71a10ebbd7 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -58,6 +58,7 @@ enum processor_types INTEL_TREMONT, AMDFAM19H, ZHAOXIN_FAM7H, + INTEL_SIERRAFOREST, CPU_TYPE_MAX, BUILTIN_CPU_TYPE_MAX = CPU_TYPE_MAX }; diff --git a/gcc/config.gcc b/gcc/config.gcc index fe063bfbb26..c0e10a72bd5 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -665,7 +665,8 @@ slm nehalem westmere sandybridge ivybridge haswell broadwell bonnell \ silvermont knl knm skylake-avx512 cannonlake icelake-client icelake-server \ skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \ sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 nano-3000 \ -nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 native" +nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \ +sierraforest native" # Additional x86 processors supported by --with-cpu=. Each processor # MUST be separated by exactly one space. diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc index ef567045c67..be205a56ea2 100644 --- a/gcc/config/i386/driver-i386.cc +++ b/gcc/config/i386/driver-i386.cc @@ -589,8 +589,11 @@ const char *host_detect_local_cpu (int argc, const char **argv) /* This is unknown family 0x6 CPU. */ if (has_feature (FEATURE_AVX)) { + /* Assume Sierra Forest. */ + if (has_feature (FEATURE_AVXVNNIINT8)) + cpu = "sierraforest"; /* Assume Tiger Lake */ - if (has_feature (FEATURE_AVX512VP2INTERSECT)) + else if (has_feature (FEATURE_AVX512VP2INTERSECT)) cpu = "tigerlake"; /* Assume Sapphire Rapids. */ else if (has_feature (FEATURE_TSXLDTRK)) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index 9885a724d0f..4494c412995 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -198,6 +198,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__tremont"); def_or_undef (parse_in, "__tremont__"); break; + case PROCESSOR_SIERRAFOREST: + def_or_undef (parse_in, "__sierraforest"); + def_or_undef (parse_in, "__sierraforest__"); + break; case PROCESSOR_KNL: def_or_undef (parse_in, "__knl"); def_or_undef (parse_in, "__knl__"); @@ -377,6 +381,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, case PROCESSOR_TREMONT: def_or_undef (parse_in, "__tune_tremont__"); break; + case PROCESSOR_SIERRAFOREST: + def_or_undef (parse_in, "__tune_sierraforest__"); + break; case PROCESSOR_KNL: def_or_undef (parse_in, "__tune_knl__"); break; diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index fb872afdfb5..4526dc09fc4 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -136,6 +136,7 @@ along with GCC; see the file COPYING3. If not see #define m_GOLDMONT (HOST_WIDE_INT_1U<