Message ID | 20141121113856.GB2323@msticlxl7.ims.intel.com |
---|---|
State | New |
Headers | show |
On Fri, Nov 21, 2014 at 12:38 PM, Ilya Tocar <tocarip.intel@gmail.com> wrote: > On 20 Nov 09:43, Uros Bizjak wrote: >> On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar <tocarip.intel@gmail.com> wrote: >> > Hi, >> > >> > New revision of Intel ISA reference [1] has new instructions: >> > Clwb, pcommit and new flavors of AVX512. Patch bellow adds them. >> > I understand that stage 1 is closed, however those changes shouldn't >> > affect anything outside if i386 backend. And are extremely unlikely to >> > break existing functionality, and I personally think it's desirable for >> > newest GCC to support newest spec. >> > Bootstrapped/regtestsed on x86_64-unknown-linux-gnu. >> > Ok for trunk? >> >> Please split the patch into patch series, like it was done previously >> for AVX512F patches. >> >> Uros. >> > This part adds avx512vbmi. > I'll send vpermi2b autogen patch together with v64qi const perm later. > Boostraps/passes make check. > Ok for trunk? > > > gcc/ > * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512VBMI_SET< Please remove "<" in the above line. > OPTION_MASK_ISA_AVX512VBMI_UNSET): New. > (ix86_handle_option): Handle OPT_mavx512vbmi. > * config.gcc: Add avx512vbmiintrin.h, avx512vbmivlintrin.h. > * config/i386/avx512vbmiintrin.h: New file. > * config/i386/avx512vbmivlintrin.h: Ditto. > * config/i386/cpuid.h (bit_AVX512VBMI): New. > * config/i386/driver-i386.c (host_detect_local_cpu): Detect avx512vbmi. > * config/i386/i386-c.c (ix86_target_macros_internal): Define > __AVX512VBMI__. > * config/i386/i386.c (ix86_target_string): Add -mavx512vbmi. > (PTA_AVX512VBMI): Define. > (ix86_option_override_internal): Handle new options. > (ix86_valid_target_attribute_inner_p): Add avx512vbmi, > (ix86_builtins): Add IX86_BUILTIN_VPMULTISHIFTQB512, > IX86_BUILTIN_VPMULTISHIFTQB256, IX86_BUILTIN_VPMULTISHIFTQB128, > IX86_BUILTIN_VPERMVARQI512_MASK, IX86_BUILTIN_VPERMT2VARQI512, > IX86_BUILTIN_VPERMT2VARQI512_MASKZ, IX86_BUILTIN_VPERMI2VARQI512, > IX86_BUILTIN_VPERMVARQI256_MASK, IX86_BUILTIN_VPERMVARQI128_MASK, > IX86_BUILTIN_VPERMT2VARQI256, IX86_BUILTIN_VPERMT2VARQI256_MASKZ, > IX86_BUILTIN_VPERMT2VARQI128, IX86_BUILTIN_VPERMI2VARQI256, > IX86_BUILTIN_VPERMI2VARQI128. > (bdesc_special_args): Add __builtin_ia32_vpmultishiftqb512_mask, > __builtin_ia32_vpmultishiftqb256_mask, > __builtin_ia32_vpmultishiftqb128_mask, > __builtin_ia32_permvarqi512_mask, __builtin_ia32_vpermt2varqi512_mask, > __builtin_ia32_vpermt2varqi512_maskz, > __builtin_ia32_vpermi2varqi512_mask, __builtin_ia32_permvarqi256_mask, > __builtin_ia32_permvarqi128_mask, __builtin_ia32_vpermt2varqi256_mask, > __builtin_ia32_vpermt2varqi256_maskz, > __builtin_ia32_vpermt2varqi128_mask, > __builtin_ia32_vpermt2varqi128_maskz, > __builtin_ia32_vpermi2varqi256_mask, > __builtin_ia32_vpermi2varqi128_mask. > (ix86_hard_regno_mode_ok): Allow big masks for AVX512VBMI. > * config/i386/i386.h (TARGET_AVX512VBMI, TARGET_AVX512VBMI_P): Define. > * config/i386/i386.opt: Add mavx512vbmi. > * config/i386/immintrin.h: Include avx512vbmiintrin.h, > avx512vbmivlintrin.h. > * config/i386/sse.md (unspec): Add UNSPEC_VPMULTISHIFT. > (VI1_AVX512VL): New iterator. > (<avx512>_permvar<mode><mask_name>): Use it. > (<avx512>_vpermi2var<mode>3_maskz): Ditto. > (<avx512>_vpermi2var<mode>3<sd_maskz_name>): Ditto. > (<avx512>_vpermi2var<mode>3_mask): Ditto. > (<avx512>_vpermt2var<mode>3_maskz): Ditto. > (<avx512>_vpermt2var<mode>3<sd_maskz_name>): Ditto. > (<avx512>_vpermt2var<mode>3_mask): Ditto. > (vpmultishiftqb<mode><mask_name>): Ditto. > > gcc/testsuite/ > > * g++.dg/other/i386-2.C: Add -mavx512vbmi. > * g++.dg/other/i386-3.C: Ditto. > * gcc.target/i386/avx512f-helper.h: Add avx512vbmi-check.h. > * gcc.target/i386/avx512vbmi-check.h: Ditto. > * gcc.target/i386/avx512vbmi-vpermb-1.c: Ditto. > * gcc.target/i386/avx512vbmi-vpermb-2.c: Ditto. > * gcc.target/i386/avx512vbmi-vpermi2b-1.c: Ditto. > * gcc.target/i386/avx512vbmi-vpermi2b-2.c: Ditto. > * gcc.target/i386/avx512vbmi-vpermt2b-1.c: Ditto. > * gcc.target/i386/avx512vbmi-vpermt2b-2.c: Ditto. > * gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c: Ditto. > * gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c: Ditto. > * gcc.target/i386/avx512vl-vpermb-2.c: Ditto. > * gcc.target/i386/avx512vl-vpermi2b-2.c: Ditto. > * gcc.target/i386/avx512vl-vpermt2b-2.c: Ditto. > * gcc.target/i386/avx512vl-vpmaddhuq-2.c: Ditto. > * gcc.target/i386/avx512vl-vpmaddluq-2.c: Ditto. > * gcc.target/i386/avx512vl-vpmultishiftqb-2.c: Ditto. > * gcc.target/i386/i386.exp (check_effective_target_avx512vbmi): New. > * gcc.target/i386/sse-12.c: Add new options. > * gcc.target/i386/sse-13.c: Ditto. > * gcc.target/i386/sse-14.c: Ditto. > * gcc.target/i386/sse-22.c: Ditto. > * gcc.target/i386/sse-23.c: Ditto. OK. Thanks, Uros. > > --- > gcc/common/config/i386/i386-common.c | 16 ++ > gcc/config.gcc | 6 +- > gcc/config/i386/avx512vbmiintrin.h | 159 ++++++++++++ > gcc/config/i386/avx512vbmivlintrin.h | 275 +++++++++++++++++++++ > gcc/config/i386/cpuid.h | 1 + > gcc/config/i386/driver-i386.c | 6 +- > gcc/config/i386/i386-c.c | 2 + > gcc/config/i386/i386.c | 42 +++- > gcc/config/i386/i386.h | 2 + > gcc/config/i386/i386.opt | 4 + > gcc/config/i386/immintrin.h | 4 + > gcc/config/i386/sse.md | 115 +++++++++ > gcc/testsuite/g++.dg/other/i386-2.C | 2 +- > gcc/testsuite/g++.dg/other/i386-3.C | 2 +- > gcc/testsuite/gcc.target/i386/avx512f-helper.h | 5 + > gcc/testsuite/gcc.target/i386/avx512vbmi-check.h | 46 ++++ > .../gcc.target/i386/avx512vbmi-vpermb-1.c | 34 +++ > .../gcc.target/i386/avx512vbmi-vpermb-2.c | 51 ++++ > .../gcc.target/i386/avx512vbmi-vpermi2b-1.c | 25 ++ > .../gcc.target/i386/avx512vbmi-vpermi2b-2.c | 58 +++++ > .../gcc.target/i386/avx512vbmi-vpermt2b-1.c | 37 +++ > .../gcc.target/i386/avx512vbmi-vpermt2b-2.c | 70 ++++++ > .../gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c | 31 +++ > .../gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c | 68 +++++ > gcc/testsuite/gcc.target/i386/avx512vl-vpermb-2.c | 14 ++ > .../gcc.target/i386/avx512vl-vpermi2b-2.c | 14 ++ > .../gcc.target/i386/avx512vl-vpermt2b-2.c | 14 ++ > .../gcc.target/i386/avx512vl-vpmultishiftqb-2.c | 14 ++ > gcc/testsuite/gcc.target/i386/i386.exp | 15 ++ > gcc/testsuite/gcc.target/i386/sse-12.c | 2 +- > gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- > gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- > gcc/testsuite/gcc.target/i386/sse-22.c | 4 +- > gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- > 34 files changed, 1131 insertions(+), 13 deletions(-) > create mode 100644 gcc/config/i386/avx512vbmiintrin.h > create mode 100644 gcc/config/i386/avx512vbmivlintrin.h > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-check.h > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vpermb-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vpermi2b-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vpermt2b-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vpmultishiftqb-2.c > > diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c > index 73044a0..1c4f15e 100644 > --- a/gcc/common/config/i386/i386-common.c > +++ b/gcc/common/config/i386/i386-common.c > @@ -73,6 +73,8 @@ along with GCC; see the file COPYING3. If not see > (OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512F_SET) > #define OPTION_MASK_ISA_AVX512IFMA_SET \ > (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512F_SET) > +#define OPTION_MASK_ISA_AVX512VBMI_SET \ > + (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512F_SET) > #define OPTION_MASK_ISA_RTM_SET OPTION_MASK_ISA_RTM > #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW > #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED > @@ -170,6 +172,7 @@ along with GCC; see the file COPYING3. If not see > #define OPTION_MASK_ISA_AVX512BW_UNSET OPTION_MASK_ISA_AVX512BW > #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL > #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA > +#define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI > #define OPTION_MASK_ISA_RTM_UNSET OPTION_MASK_ISA_RTM > #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW > #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED > @@ -459,6 +462,19 @@ ix86_handle_option (struct gcc_options *opts, > } > return true; > > + case OPT_mavx512vbmi: > + if (value) > + { > + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI_SET; > + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI_SET; > + } > + else > + { > + opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VBMI_UNSET; > + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI_UNSET; > + } > + return true; > + > case OPT_mfma: > if (value) > { > diff --git a/gcc/config.gcc b/gcc/config.gcc > index dbf4191..da2a723 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -368,7 +368,8 @@ i[34567]86-*-*) > shaintrin.h clflushoptintrin.h xsavecintrin.h > xsavesintrin.h avx512dqintrin.h avx512bwintrin.h > avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h > - avx512ifmaintrin.h avx512ifmavlintrin.h" > + avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h > + avx512vbmivlintrin.h" > ;; > x86_64-*-*) > cpu_type=i386 > @@ -388,7 +389,8 @@ x86_64-*-*) > shaintrin.h clflushoptintrin.h xsavecintrin.h > xsavesintrin.h avx512dqintrin.h avx512bwintrin.h > avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h > - avx512ifmaintrin.h avx512ifmavlintrin.h" > + avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h > + avx512vbmivlintrin.h" > ;; > ia64-*-*) > extra_headers=ia64intrin.h > diff --git a/gcc/config/i386/avx512vbmiintrin.h b/gcc/config/i386/avx512vbmiintrin.h > new file mode 100644 > index 0000000..c2c59ce > --- /dev/null > +++ b/gcc/config/i386/avx512vbmiintrin.h > @@ -0,0 +1,159 @@ > +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. > + > + This file is part of GCC. > + > + GCC is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3, or (at your option) > + any later version. > + > + GCC is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + Under Section 7 of GPL version 3, you are granted additional > + permissions described in the GCC Runtime Library Exception, version > + 3.1, as published by the Free Software Foundation. > + > + You should have received a copy of the GNU General Public License and > + a copy of the GCC Runtime Library Exception along with this program; > + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _IMMINTRIN_H_INCLUDED > +#error "Never use <avx512vbmiintrin.h> directly; include <immintrin.h> instead." > +#endif > + > +#ifndef _AVX512VBMIINTRIN_H_INCLUDED > +#define _AVX512VBMIINTRIN_H_INCLUDED > + > +#ifndef __AVX512VBMI__ > +#pragma GCC push_options > +#pragma GCC target("avx512vbmi") > +#define __DISABLE_AVX512VBMI__ > +#endif /* __AVX512VBMI__ */ > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_mask_multishift_epi64_epi8 (__m512i __W, __mmask64 __M, __m512i __X, __m512i __Y) > +{ > + return (__m512i) __builtin_ia32_vpmultishiftqb512_mask ((__v64qi) __X, > + (__v64qi) __Y, > + (__v64qi) __W, > + (__mmask64) __M); > +} > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_maskz_multishift_epi64_epi8 (__mmask64 __M, __m512i __X, __m512i __Y) > +{ > + return (__m512i) __builtin_ia32_vpmultishiftqb512_mask ((__v64qi) __X, > + (__v64qi) __Y, > + (__v64qi) > + _mm512_setzero_si512 (), > + (__mmask64) __M); > +} > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_multishift_epi64_epi8 (__m512i __X, __m512i __Y) > +{ > + return (__m512i) __builtin_ia32_vpmultishiftqb512_mask ((__v64qi) __X, > + (__v64qi) __Y, > + (__v64qi) > + _mm512_undefined_si512 (), > + (__mmask64) -1); > +} > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_permutexvar_epi8 (__m512i __A, __m512i __B) > +{ > + return (__m512i) __builtin_ia32_permvarqi512_mask ((__v64qi) __B, > + (__v64qi) __A, > + (__v64qi) > + _mm512_undefined_si512 (), > + (__mmask64) -1); > +} > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_maskz_permutexvar_epi8 (__mmask64 __M, __m512i __A, > + __m512i __B) > +{ > + return (__m512i) __builtin_ia32_permvarqi512_mask ((__v64qi) __B, > + (__v64qi) __A, > + (__v64qi) > + _mm512_setzero_si512(), > + (__mmask64) __M); > +} > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_mask_permutexvar_epi8 (__m512i __W, __mmask64 __M, __m512i __A, > + __m512i __B) > +{ > + return (__m512i) __builtin_ia32_permvarqi512_mask ((__v64qi) __B, > + (__v64qi) __A, > + (__v64qi) __W, > + (__mmask64) __M); > +} > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_permutex2var_epi8 (__m512i __A, __m512i __I, __m512i __B) > +{ > + return (__m512i) __builtin_ia32_vpermt2varqi512_mask ((__v64qi) __I > + /* idx */ , > + (__v64qi) __A, > + (__v64qi) __B, > + (__mmask64) - > + 1); > +} > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_mask_permutex2var_epi8 (__m512i __A, __mmask64 __U, > + __m512i __I, __m512i __B) > +{ > + return (__m512i) __builtin_ia32_vpermt2varqi512_mask ((__v64qi) __I > + /* idx */ , > + (__v64qi) __A, > + (__v64qi) __B, > + (__mmask64) > + __U); > +} > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_mask2_permutex2var_epi8 (__m512i __A, __m512i __I, > + __mmask64 __U, __m512i __B) > +{ > + return (__m512i) __builtin_ia32_vpermi2varqi512_mask ((__v64qi) __A, > + (__v64qi) __I > + /* idx */ , > + (__v64qi) __B, > + (__mmask64) > + __U); > +} > + > +extern __inline __m512i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_maskz_permutex2var_epi8 (__mmask64 __U, __m512i __A, > + __m512i __I, __m512i __B) > +{ > + return (__m512i) __builtin_ia32_vpermt2varqi512_maskz ((__v64qi) __I > + /* idx */ , > + (__v64qi) __A, > + (__v64qi) __B, > + (__mmask64) > + __U); > +} > + > +#ifdef __DISABLE_AVX512VBMI__ > +#undef __DISABLE_AVX512VBMI__ > +#pragma GCC pop_options > +#endif /* __DISABLE_AVX512VBMI__ */ > + > +#endif /* _AVX512VBMIINTRIN_H_INCLUDED */ > diff --git a/gcc/config/i386/avx512vbmivlintrin.h b/gcc/config/i386/avx512vbmivlintrin.h > new file mode 100644 > index 0000000..b4ecdeb > --- /dev/null > +++ b/gcc/config/i386/avx512vbmivlintrin.h > @@ -0,0 +1,275 @@ > +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. > + > + This file is part of GCC. > + > + GCC is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3, or (at your option) > + any later version. > + > + GCC is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + Under Section 7 of GPL version 3, you are granted additional > + permissions described in the GCC Runtime Library Exception, version > + 3.1, as published by the Free Software Foundation. > + > + You should have received a copy of the GNU General Public License and > + a copy of the GCC Runtime Library Exception along with this program; > + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _IMMINTRIN_H_INCLUDED > +#error "Never use <avx512vbmivlintrin.h> directly; include <immintrin.h> instead." > +#endif > + > +#ifndef _AVX512VBMIVLINTRIN_H_INCLUDED > +#define _AVX512VBMIVLINTRIN_H_INCLUDED > + > +#if !defined(__AVX512VL__) || !defined(__AVX512VBMI__) > +#pragma GCC push_options > +#pragma GCC target("avx512vbmi,avx512vl") > +#define __DISABLE_AVX512VBMIVL__ > +#endif /* __AVX512VBMIVL__ */ > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_mask_multishift_epi64_epi8 (__m256i __W, __mmask32 __M, __m256i __X, __m256i __Y) > +{ > + return (__m256i) __builtin_ia32_vpmultishiftqb256_mask ((__v32qi) __X, > + (__v32qi) __Y, > + (__v32qi) __W, > + (__mmask32) __M); > +} > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_maskz_multishift_epi64_epi8 (__mmask32 __M, __m256i __X, __m256i __Y) > +{ > + return (__m256i) __builtin_ia32_vpmultishiftqb256_mask ((__v32qi) __X, > + (__v32qi) __Y, > + (__v32qi) > + _mm256_setzero_si256 (), > + (__mmask32) __M); > +} > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_multishift_epi64_epi8 (__m256i __X, __m256i __Y) > +{ > + return (__m256i) __builtin_ia32_vpmultishiftqb256_mask ((__v32qi) __X, > + (__v32qi) __Y, > + (__v32qi) > + _mm256_undefined_si256 (), > + (__mmask32) -1); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_mask_multishift_epi64_epi8 (__m128i __W, __mmask16 __M, __m128i __X, __m128i __Y) > +{ > + return (__m128i) __builtin_ia32_vpmultishiftqb128_mask ((__v16qi) __X, > + (__v16qi) __Y, > + (__v16qi) __W, > + (__mmask16) __M); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_maskz_multishift_epi64_epi8 (__mmask16 __M, __m128i __X, __m128i __Y) > +{ > + return (__m128i) __builtin_ia32_vpmultishiftqb128_mask ((__v16qi) __X, > + (__v16qi) __Y, > + (__v16qi) > + _mm_setzero_si128 (), > + (__mmask16) __M); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_multishift_epi64_epi8 (__m128i __X, __m128i __Y) > +{ > + return (__m128i) __builtin_ia32_vpmultishiftqb128_mask ((__v16qi) __X, > + (__v16qi) __Y, > + (__v16qi) > + _mm_undefined_si128 (), > + (__mmask16) -1); > +} > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_permutexvar_epi8 (__m256i __A, __m256i __B) > +{ > + return (__m256i) __builtin_ia32_permvarqi256_mask ((__v32qi) __B, > + (__v32qi) __A, > + (__v32qi) > + _mm256_undefined_si256 (), > + (__mmask32) -1); > +} > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_maskz_permutexvar_epi8 (__mmask32 __M, __m256i __A, > + __m256i __B) > +{ > + return (__m256i) __builtin_ia32_permvarqi256_mask ((__v32qi) __B, > + (__v32qi) __A, > + (__v32qi) > + _mm256_setzero_si256 (), > + (__mmask32) __M); > +} > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_mask_permutexvar_epi8 (__m256i __W, __mmask32 __M, __m256i __A, > + __m256i __B) > +{ > + return (__m256i) __builtin_ia32_permvarqi256_mask ((__v32qi) __B, > + (__v32qi) __A, > + (__v32qi) __W, > + (__mmask32) __M); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_permutexvar_epi8 (__m128i __A, __m128i __B) > +{ > + return (__m128i) __builtin_ia32_permvarqi128_mask ((__v16qi) __B, > + (__v16qi) __A, > + (__v16qi) > + _mm_undefined_si128 (), > + (__mmask16) -1); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_maskz_permutexvar_epi8 (__mmask16 __M, __m128i __A, __m128i __B) > +{ > + return (__m128i) __builtin_ia32_permvarqi128_mask ((__v16qi) __B, > + (__v16qi) __A, > + (__v16qi) > + _mm_setzero_si128 (), > + (__mmask16) __M); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_mask_permutexvar_epi8 (__m128i __W, __mmask16 __M, __m128i __A, > + __m128i __B) > +{ > + return (__m128i) __builtin_ia32_permvarqi128_mask ((__v16qi) __B, > + (__v16qi) __A, > + (__v16qi) __W, > + (__mmask16) __M); > +} > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_permutex2var_epi8 (__m256i __A, __m256i __I, __m256i __B) > +{ > + return (__m256i) __builtin_ia32_vpermt2varqi256_mask ((__v32qi) __I > + /* idx */ , > + (__v32qi) __A, > + (__v32qi) __B, > + (__mmask32) - > + 1); > +} > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_mask_permutex2var_epi8 (__m256i __A, __mmask32 __U, > + __m256i __I, __m256i __B) > +{ > + return (__m256i) __builtin_ia32_vpermt2varqi256_mask ((__v32qi) __I > + /* idx */ , > + (__v32qi) __A, > + (__v32qi) __B, > + (__mmask32) > + __U); > +} > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_mask2_permutex2var_epi8 (__m256i __A, __m256i __I, > + __mmask32 __U, __m256i __B) > +{ > + return (__m256i) __builtin_ia32_vpermi2varqi256_mask ((__v32qi) __A, > + (__v32qi) __I > + /* idx */ , > + (__v32qi) __B, > + (__mmask32) > + __U); > +} > + > +extern __inline __m256i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm256_maskz_permutex2var_epi8 (__mmask32 __U, __m256i __A, > + __m256i __I, __m256i __B) > +{ > + return (__m256i) __builtin_ia32_vpermt2varqi256_maskz ((__v32qi) __I > + /* idx */ , > + (__v32qi) __A, > + (__v32qi) __B, > + (__mmask32) > + __U); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_permutex2var_epi8 (__m128i __A, __m128i __I, __m128i __B) > +{ > + return (__m128i) __builtin_ia32_vpermt2varqi128_mask ((__v16qi) __I > + /* idx */ , > + (__v16qi) __A, > + (__v16qi) __B, > + (__mmask16) - > + 1); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_mask_permutex2var_epi8 (__m128i __A, __mmask16 __U, __m128i __I, > + __m128i __B) > +{ > + return (__m128i) __builtin_ia32_vpermt2varqi128_mask ((__v16qi) __I > + /* idx */ , > + (__v16qi) __A, > + (__v16qi) __B, > + (__mmask16) > + __U); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_mask2_permutex2var_epi8 (__m128i __A, __m128i __I, __mmask16 __U, > + __m128i __B) > +{ > + return (__m128i) __builtin_ia32_vpermi2varqi128_mask ((__v16qi) __A, > + (__v16qi) __I > + /* idx */ , > + (__v16qi) __B, > + (__mmask16) > + __U); > +} > + > +extern __inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_maskz_permutex2var_epi8 (__mmask16 __U, __m128i __A, __m128i __I, > + __m128i __B) > +{ > + return (__m128i) __builtin_ia32_vpermt2varqi128_maskz ((__v16qi) __I > + /* idx */ , > + (__v16qi) __A, > + (__v16qi) __B, > + (__mmask16) > + __U); > +} > + > +#ifdef __DISABLE_AVX512VBMIVL__ > +#undef __DISABLE_AVX512VBMIVL__ > +#pragma GCC pop_options > +#endif /* __DISABLE_AVX512VBMIVL__ */ > + > +#endif /* _AVX512VBMIVLINTRIN_H_INCLUDED */ > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > index e3e1ed6..0efb1a4 100644 > --- a/gcc/config/i386/cpuid.h > +++ b/gcc/config/i386/cpuid.h > @@ -87,6 +87,7 @@ > > /* %ecx */ > #define bit_PREFETCHWT1 (1 << 0) > +#define bit_AVX512VBMI (1 << 1) > > /* Extended State Enumeration Sub-leaf (%eax == 13, %ecx == 1) */ > #define bit_XSAVEOPT (1 << 0) > diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c > index cb82945..72dfd04 100644 > --- a/gcc/config/i386/driver-i386.c > +++ b/gcc/config/i386/driver-i386.c > @@ -412,7 +412,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) > unsigned int has_avx512f = 0, has_sha = 0, has_prefetchwt1 = 0; > unsigned int has_clflushopt = 0, has_xsavec = 0, has_xsaves = 0; > unsigned int has_avx512dq = 0, has_avx512bw = 0, has_avx512vl = 0; > - unsigned int has_avx512ifma = 0; > + unsigned int has_avx512vbmi = 0, has_avx512ifma = 0; > > bool arch; > > @@ -497,6 +497,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) > has_avx512vl = ebx & bit_AVX512IFMA; > > has_prefetchwt1 = ecx & bit_PREFETCHWT1; > + has_avx512vl = ecx & bit_AVX512VBMI; > } > > if (max_level >= 13) > @@ -928,6 +929,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) > const char *avx512bw = has_avx512bw ? " -mavx512bw" : " -mno-avx512bw"; > const char *avx512vl = has_avx512vl ? " -mavx512vl" : " -mno-avx512vl"; > const char *avx512ifma = has_avx512ifma ? " -mavx512ifma" : " -mno-avx512ifma"; > + const char *avx512vbmi = has_avx512vbmi ? " -mavx512vbmi" : " -mno-avx512vbmi"; > > options = concat (options, mmx, mmx3dnow, sse, sse2, sse3, ssse3, > sse4a, cx16, sahf, movbe, aes, sha, pclmul, > @@ -937,7 +939,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) > fxsr, xsave, xsaveopt, avx512f, avx512er, > avx512cd, avx512pf, prefetchwt1, clflushopt, > xsavec, xsaves, avx512dq, avx512bw, avx512vl, > - avx512ifma, NULL); > + avx512ifma, avx512vbmi, NULL); > } > > done: > diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c > index bf993d1..798eaa6 100644 > --- a/gcc/config/i386/i386-c.c > +++ b/gcc/config/i386/i386-c.c > @@ -351,6 +351,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, > def_or_undef (parse_in, "__AVX512BW__"); > if (isa_flag & OPTION_MASK_ISA_AVX512VL) > def_or_undef (parse_in, "__AVX512VL__"); > + if (isa_flag & OPTION_MASK_ISA_AVX512VBMI) > + def_or_undef (parse_in, "__AVX512VBMI__"); > if (isa_flag & OPTION_MASK_ISA_AVX512IFMA) > def_or_undef (parse_in, "__AVX512IFMA__"); > if (isa_flag & OPTION_MASK_ISA_FMA) > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index bf46258..baf3166 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -2619,6 +2619,7 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, const char *arch, > { "-mavx512bw", OPTION_MASK_ISA_AVX512BW }, > { "-mavx512vl", OPTION_MASK_ISA_AVX512VL }, > { "-mavx512ifma", OPTION_MASK_ISA_AVX512IFMA }, > + { "-mavx512vbmi", OPTION_MASK_ISA_AVX512VBMI }, > { "-msse4a", OPTION_MASK_ISA_SSE4A }, > { "-msse4.2", OPTION_MASK_ISA_SSE4_2 }, > { "-msse4.1", OPTION_MASK_ISA_SSE4_1 }, > @@ -3155,6 +3156,7 @@ ix86_option_override_internal (bool main_args_p, > #define PTA_AVX512BW (HOST_WIDE_INT_1 << 51) > #define PTA_AVX512VL (HOST_WIDE_INT_1 << 52) > #define PTA_AVX512IFMA (HOST_WIDE_INT_1 << 53) > +#define PTA_AVX512VBMI (HOST_WIDE_INT_1 << 54) > > #define PTA_CORE2 \ > (PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 \ > @@ -3735,6 +3737,9 @@ ix86_option_override_internal (bool main_args_p, > if (processor_alias_table[i].flags & PTA_MPX > && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MPX)) > opts->x_ix86_isa_flags |= OPTION_MASK_ISA_MPX; > + if (processor_alias_table[i].flags & PTA_AVX512VBMI > + && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512VBMI)) > + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI; > if (processor_alias_table[i].flags & PTA_AVX512IFMA > && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512IFMA)) > opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512IFMA; > @@ -4654,6 +4659,7 @@ ix86_valid_target_attribute_inner_p (tree args, char *p_strings[], > IX86_ATTR_ISA ("clflushopt", OPT_mclflushopt), > IX86_ATTR_ISA ("xsavec", OPT_mxsavec), > IX86_ATTR_ISA ("xsaves", OPT_mxsaves), > + IX86_ATTR_ISA ("avx512vbmi", OPT_mavx512vbmi), > IX86_ATTR_ISA ("avx512ifma", OPT_mavx512ifma), > > /* enum options */ > @@ -30064,6 +30070,23 @@ enum ix86_builtins > IX86_BUILTIN_VPMADD52LUQ128_MASKZ, > IX86_BUILTIN_VPMADD52HUQ128_MASKZ, > > + /* AVX-512VBMI */ > + IX86_BUILTIN_VPMULTISHIFTQB512, > + IX86_BUILTIN_VPMULTISHIFTQB256, > + IX86_BUILTIN_VPMULTISHIFTQB128, > + IX86_BUILTIN_VPERMVARQI512_MASK, > + IX86_BUILTIN_VPERMT2VARQI512, > + IX86_BUILTIN_VPERMT2VARQI512_MASKZ, > + IX86_BUILTIN_VPERMI2VARQI512, > + IX86_BUILTIN_VPERMVARQI256_MASK, > + IX86_BUILTIN_VPERMVARQI128_MASK, > + IX86_BUILTIN_VPERMT2VARQI256, > + IX86_BUILTIN_VPERMT2VARQI256_MASKZ, > + IX86_BUILTIN_VPERMT2VARQI128, > + IX86_BUILTIN_VPERMT2VARQI128_MASKZ, > + IX86_BUILTIN_VPERMI2VARQI256, > + IX86_BUILTIN_VPERMI2VARQI128, > + > /* SHA builtins. */ > IX86_BUILTIN_SHA1MSG1, > IX86_BUILTIN_SHA1MSG2, > @@ -32749,6 +32772,22 @@ static const struct builtin_description bdesc_args[] = > { OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, CODE_FOR_vpamdd52huqv2di_mask, "__builtin_ia32_vpmadd52huq128_mask", IX86_BUILTIN_VPMADD52HUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_QI }, > { OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, CODE_FOR_vpamdd52huqv2di_maskz, "__builtin_ia32_vpmadd52huq128_maskz", IX86_BUILTIN_VPMADD52HUQ128_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_QI }, > > + /* AVX512VBMI */ > + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_vpmultishiftqbv32qi_mask, "__builtin_ia32_vpmultishiftqb256_mask", IX86_BUILTIN_VPMULTISHIFTQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_vpmultishiftqbv16qi_mask, "__builtin_ia32_vpmultishiftqb128_mask", IX86_BUILTIN_VPMULTISHIFTQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, > + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_avx512bw_permvarv64qi_mask, "__builtin_ia32_permvarqi512_mask", IX86_BUILTIN_VPERMVARQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, > + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_avx512bw_vpermt2varv64qi3_mask, "__builtin_ia32_vpermt2varqi512_mask", IX86_BUILTIN_VPERMT2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, > + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_avx512bw_vpermt2varv64qi3_maskz, "__builtin_ia32_vpermt2varqi512_maskz", IX86_BUILTIN_VPERMT2VARQI512_MASKZ, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, > + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_avx512bw_vpermi2varv64qi3_mask, "__builtin_ia32_vpermi2varqi512_mask", IX86_BUILTIN_VPERMI2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_permvarv32qi_mask, "__builtin_ia32_permvarqi256_mask", IX86_BUILTIN_VPERMVARQI256_MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_permvarv16qi_mask, "__builtin_ia32_permvarqi128_mask", IX86_BUILTIN_VPERMVARQI128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermt2varv32qi3_mask, "__builtin_ia32_vpermt2varqi256_mask", IX86_BUILTIN_VPERMT2VARQI256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermt2varv32qi3_maskz, "__builtin_ia32_vpermt2varqi256_maskz", IX86_BUILTIN_VPERMT2VARQI256_MASKZ, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermt2varv16qi3_mask, "__builtin_ia32_vpermt2varqi128_mask", IX86_BUILTIN_VPERMT2VARQI128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermt2varv16qi3_maskz, "__builtin_ia32_vpermt2varqi128_maskz", IX86_BUILTIN_VPERMT2VARQI128_MASKZ, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermi2varv32qi3_mask, "__builtin_ia32_vpermi2varqi256_mask", IX86_BUILTIN_VPERMI2VARQI256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, > + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermi2varv16qi3_mask, "__builtin_ia32_vpermi2varqi128_mask", IX86_BUILTIN_VPERMI2VARQI128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, > }; > > /* Builtins with rounding support. */ > @@ -41521,7 +41560,8 @@ ix86_hard_regno_mode_ok (int regno, machine_mode mode) > return VALID_FP_MODE_P (mode); > if (MASK_REGNO_P (regno)) > return (VALID_MASK_REG_MODE (mode) > - || (TARGET_AVX512BW && VALID_MASK_AVX512BW_MODE (mode))); > + || ((TARGET_AVX512BW || TARGET_AVX512VBMI) > + && VALID_MASK_AVX512BW_MODE (mode))); > if (BND_REGNO_P (regno)) > return VALID_BND_REG_MODE (mode); > if (SSE_REGNO_P (regno)) > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index 481d68c..2596f81 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -77,6 +77,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > #define TARGET_AVX512BW_P(x) TARGET_ISA_AVX512BW_P(x) > #define TARGET_AVX512VL TARGET_ISA_AVX512VL > #define TARGET_AVX512VL_P(x) TARGET_ISA_AVX512VL_P(x) > +#define TARGET_AVX512VBMI TARGET_ISA_AVX512VBMI > +#define TARGET_AVX512VBMI_P(x) TARGET_ISA_AVX512VBMI_P(x) > #define TARGET_AVX512IFMA TARGET_ISA_AVX512IFMA > #define TARGET_AVX512IFMA_P(x) TARGET_ISA_AVX512IFMA_P(x) > #define TARGET_FMA TARGET_ISA_FMA > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index 012ff8d..f6ba8a7 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -657,6 +657,10 @@ mavx512ifma > Target Report Mask(ISA_AVX512IFMA) Var(ix86_isa_flags) Save > Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512IFMA built-in functions and code generation > > +mavx512vbmi > +Target Report Mask(ISA_AVX512VBMI) Var(ix86_isa_flags) Save > +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512VBMI built-in functions and code generation > + > mfma > Target Report Mask(ISA_FMA) Var(ix86_isa_flags) Save > Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and FMA built-in functions and code generation > diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h > index 5f11432..931ff15 100644 > --- a/gcc/config/i386/immintrin.h > +++ b/gcc/config/i386/immintrin.h > @@ -64,6 +64,10 @@ > > #include <avx512ifmavlintrin.h> > > +#include <avx512vbmiintrin.h> > + > +#include <avx512vbmivlintrin.h> > + > #include <shaintrin.h> > > #include <lzcntintrin.h> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 61cc904..ca5d720 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -145,6 +145,9 @@ > ;; For AVX512IFMA support > UNSPEC_VPMADD52LUQ > UNSPEC_VPMADD52HUQ > + > + ;; For AVX512VBMI support > + UNSPEC_VPMULTISHIFT > ]) > > (define_c_enum "unspecv" [ > @@ -179,6 +182,9 @@ > [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") > V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) > > +(define_mode_iterator VI1_AVX512VL > + [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")]) > + > ;; All vector modes > (define_mode_iterator V > [(V32QI "TARGET_AVX") V16QI > @@ -16469,6 +16475,18 @@ > (set_attr "mode" "<sseinsnmode>")]) > > (define_insn "<avx512>_permvar<mode><mask_name>" > + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") > + (unspec:VI1_AVX512VL > + [(match_operand:VI1_AVX512VL 1 "nonimmediate_operand" "vm") > + (match_operand:<sseintvecmode> 2 "register_operand" "v")] > + UNSPEC_VPERMVAR))] > + "TARGET_AVX512VBMI && <mask_mode512bit_condition>" > + "vperm<ssemodesuffix>\t{%1, %2, %0<mask_operand3>|%0<mask_operand3>, %2, %1}" > + [(set_attr "type" "sselog") > + (set_attr "prefix" "<mask_prefix2>") > + (set_attr "mode" "<sseinsnmode>")]) > + > +(define_insn "<avx512>_permvar<mode><mask_name>" > [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") > (unspec:VI2_AVX512VL > [(match_operand:VI2_AVX512VL 1 "nonimmediate_operand" "vm") > @@ -17011,6 +17029,20 @@ > }) > > (define_expand "<avx512>_vpermi2var<mode>3_maskz" > + [(match_operand:VI1_AVX512VL 0 "register_operand") > + (match_operand:VI1_AVX512VL 1 "register_operand") > + (match_operand:<sseintvecmode> 2 "register_operand") > + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand") > + (match_operand:<avx512fmaskmode> 4 "register_operand")] > + "TARGET_AVX512VBMI" > +{ > + emit_insn (gen_<avx512>_vpermi2var<mode>3_maskz_1 ( > + operands[0], operands[1], operands[2], operands[3], > + CONST0_RTX (<MODE>mode), operands[4])); > + DONE; > +}) > + > +(define_expand "<avx512>_vpermi2var<mode>3_maskz" > [(match_operand:VI2_AVX512VL 0 "register_operand" "=v") > (match_operand:VI2_AVX512VL 1 "register_operand" "v") > (match_operand:<sseintvecmode> 2 "register_operand" "0") > @@ -17038,6 +17070,19 @@ > (set_attr "mode" "<sseinsnmode>")]) > > (define_insn "<avx512>_vpermi2var<mode>3<sd_maskz_name>" > + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") > + (unspec:VI1_AVX512VL > + [(match_operand:VI1_AVX512VL 1 "register_operand" "v") > + (match_operand:<sseintvecmode> 2 "register_operand" "0") > + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm")] > + UNSPEC_VPERMI2))] > + "TARGET_AVX512VBMI" > + "vpermi2<ssemodesuffix>\t{%3, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %3}" > + [(set_attr "type" "sselog") > + (set_attr "prefix" "evex") > + (set_attr "mode" "<sseinsnmode>")]) > + > +(define_insn "<avx512>_vpermi2var<mode>3<sd_maskz_name>" > [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") > (unspec:VI2_AVX512VL > [(match_operand:VI2_AVX512VL 1 "register_operand" "v") > @@ -17067,6 +17112,22 @@ > (set_attr "mode" "<sseinsnmode>")]) > > (define_insn "<avx512>_vpermi2var<mode>3_mask" > + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") > + (vec_merge:VI1_AVX512VL > + (unspec:VI1_AVX512VL > + [(match_operand:VI1_AVX512VL 1 "register_operand" "v") > + (match_operand:<sseintvecmode> 2 "register_operand" "0") > + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm")] > + UNSPEC_VPERMI2_MASK) > + (match_dup 0) > + (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))] > + "TARGET_AVX512VBMI" > + "vpermi2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}" > + [(set_attr "type" "sselog") > + (set_attr "prefix" "evex") > + (set_attr "mode" "<sseinsnmode>")]) > + > +(define_insn "<avx512>_vpermi2var<mode>3_mask" > [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") > (vec_merge:VI2_AVX512VL > (unspec:VI2_AVX512VL > @@ -17097,6 +17158,20 @@ > }) > > (define_expand "<avx512>_vpermt2var<mode>3_maskz" > + [(match_operand:VI1_AVX512VL 0 "register_operand" "=v") > + (match_operand:<sseintvecmode> 1 "register_operand" "v") > + (match_operand:VI1_AVX512VL 2 "register_operand" "0") > + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm") > + (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")] > + "TARGET_AVX512VBMI" > +{ > + emit_insn (gen_<avx512>_vpermt2var<mode>3_maskz_1 ( > + operands[0], operands[1], operands[2], operands[3], > + CONST0_RTX (<MODE>mode), operands[4])); > + DONE; > +}) > + > +(define_expand "<avx512>_vpermt2var<mode>3_maskz" > [(match_operand:VI2_AVX512VL 0 "register_operand" "=v") > (match_operand:<sseintvecmode> 1 "register_operand" "v") > (match_operand:VI2_AVX512VL 2 "register_operand" "0") > @@ -17124,6 +17199,19 @@ > (set_attr "mode" "<sseinsnmode>")]) > > (define_insn "<avx512>_vpermt2var<mode>3<sd_maskz_name>" > + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") > + (unspec:VI1_AVX512VL > + [(match_operand:<sseintvecmode> 1 "register_operand" "v") > + (match_operand:VI1_AVX512VL 2 "register_operand" "0") > + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm")] > + UNSPEC_VPERMT2))] > + "TARGET_AVX512VBMI" > + "vpermt2<ssemodesuffix>\t{%3, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %3}" > + [(set_attr "type" "sselog") > + (set_attr "prefix" "evex") > + (set_attr "mode" "<sseinsnmode>")]) > + > +(define_insn "<avx512>_vpermt2var<mode>3<sd_maskz_name>" > [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") > (unspec:VI2_AVX512VL > [(match_operand:<sseintvecmode> 1 "register_operand" "v") > @@ -17153,6 +17241,22 @@ > (set_attr "mode" "<sseinsnmode>")]) > > (define_insn "<avx512>_vpermt2var<mode>3_mask" > + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") > + (vec_merge:VI1_AVX512VL > + (unspec:VI1_AVX512VL > + [(match_operand:<sseintvecmode> 1 "register_operand" "v") > + (match_operand:VI1_AVX512VL 2 "register_operand" "0") > + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm")] > + UNSPEC_VPERMT2) > + (match_dup 2) > + (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))] > + "TARGET_AVX512VBMI" > + "vpermt2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}" > + [(set_attr "type" "sselog") > + (set_attr "prefix" "evex") > + (set_attr "mode" "<sseinsnmode>")]) > + > +(define_insn "<avx512>_vpermt2var<mode>3_mask" > [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") > (vec_merge:VI2_AVX512VL > (unspec:VI2_AVX512VL > @@ -18519,3 +18623,14 @@ > (set_attr "prefix" "evex") > (set_attr "mode" "<sseinsnmode>")]) > > +(define_insn "vpmultishiftqb<mode><mask_name>" > + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") > + (unspec:VI1_AVX512VL > + [(match_operand:VI1_AVX512VL 1 "register_operand" "v") > + (match_operand:VI1_AVX512VL 2 "nonimmediate_operand" "vm")] > + UNSPEC_VPMULTISHIFT))] > + "TARGET_AVX512VBMI" > + "vpmultishiftqb\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}" > + [(set_attr "type" "sselog") > + (set_attr "prefix" "evex") > + (set_attr "mode" "<sseinsnmode>")]) > diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C > index a69a5e3..0368d35 100644 > --- a/gcc/testsuite/g++.dg/other/i386-2.C > +++ b/gcc/testsuite/g++.dg/other/i386-2.C > @@ -1,5 +1,5 @@ > /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ > -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma" } */ > +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi" } */ > > /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, > xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, > diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C > index d3a5bbd..3a3d5ff 100644 > --- a/gcc/testsuite/g++.dg/other/i386-3.C > +++ b/gcc/testsuite/g++.dg/other/i386-3.C > @@ -1,5 +1,5 @@ > /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ > -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma" } */ > +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi" } */ > > /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, > xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h > index d177429..e270cd2 100644 > --- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h > +++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h > @@ -22,6 +22,8 @@ > #include "avx512vl-check.h" > #elif defined (AVX512IFMA) > #include "avx512ifma-check.h" > +#elif defined (AVX512VBMI) > +#include "avx512vbmi-check.h" > #endif > > /* Macros expansion. */ > @@ -130,6 +132,9 @@ avx512vl_test (void) { test_256 (); test_128 (); } > #elif defined (AVX512IFMA) > void > avx512ifma_test (void) { test_512 (); } > +#elif defined (AVX512VBMI) > +void > +avx512vbmi_test (void) { test_512 (); } > #endif > > #endif /* AVX512F_HELPER_INCLUDED */ > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h b/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h > new file mode 100644 > index 0000000..591ff06 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h > @@ -0,0 +1,46 @@ > +#include <stdlib.h> > +#include "cpuid.h" > +#include "m512-check.h" > +#include "avx512f-os-support.h" > + > +static void avx512vbmi_test (void); > + > +static void __attribute__ ((noinline)) do_test (void) > +{ > + avx512vbmi_test (); > +} > + > +int > +main () > +{ > + unsigned int eax, ebx, ecx, edx; > + > + if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx)) > + return 0; > + > + if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE)) > + { > + if (__get_cpuid_max (0, NULL) < 7) > + return 0; > + > + __cpuid_count (7, 0, eax, ebx, ecx, edx); > + > + if ((avx512f_os_support ()) && ((ebx & bit_AVX512VBMI) == bit_AVX512VBMI)) > + { > + do_test (); > +#ifdef DEBUG > + printf ("PASSED\n"); > +#endif > + return 0; > + } > +#ifdef DEBUG > + printf ("SKIPPED\n"); > +#endif > + } > +#ifdef DEBUG > + else > + printf ("SKIPPED\n"); > +#endif > + > + return 0; > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-1.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-1.c > new file mode 100644 > index 0000000..59e568c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-1.c > @@ -0,0 +1,34 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vbmi -mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\{\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\[^\{\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\{\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ > + > +#include <immintrin.h> > + > +volatile __m512i x1; > +volatile __m256i x2; > +volatile __m128i x3; > +volatile __mmask64 m1; > +volatile __mmask32 m2; > +volatile __mmask16 m3; > + > +void extern > +avx512bw_test (void) > +{ > + x1 = _mm512_permutexvar_epi8 (x1, x1); > + x1 = _mm512_maskz_permutexvar_epi8 (m1, x1, x1); > + x1 = _mm512_mask_permutexvar_epi8 (x1, m1, x1, x1); > + x2 = _mm256_permutexvar_epi8 (x2, x2); > + x2 = _mm256_maskz_permutexvar_epi8 (m2, x2, x2); > + x2 = _mm256_mask_permutexvar_epi8 (x2, m2, x2, x2); > + x3 = _mm_permutexvar_epi8 (x3, x3); > + x3 = _mm_maskz_permutexvar_epi8 (m3, x3, x3); > + x3 = _mm_mask_permutexvar_epi8 (x3, m3, x3, x3); > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c > new file mode 100644 > index 0000000..fa22fd9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c > @@ -0,0 +1,51 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx512vbmi -DAVX512VBMI" } */ > +/* { dg-require-effective-target avx512vbmi } */ > + > +#include "avx512f-helper.h" > + > +#define SIZE (AVX512F_LEN / 8) > +#include "avx512f-mask-type.h" > + > +void > +CALC (char *ind, char *src, char *res) > +{ > + int i; > + > + for (i = 0; i < SIZE; i++) > + { > + res[i] = src[ind[i] & (SIZE - 1)]; > + } > +} > + > +void > +TEST (void) > +{ > + UNION_TYPE (AVX512F_LEN, i_b) s1, s2, res1, res2, res3; > + char res_ref[SIZE]; > + MASK_TYPE mask = MASK_VALUE; > + int i; > + > + for (i = 0; i < SIZE; i++) > + { > + s1.a[i] = i * i * i; > + s2.a[i] = i + 20; > + res2.a[i] = DEFAULT_VALUE; > + } > + > + res1.x = INTRINSIC (_permutexvar_epi8) (s1.x, s2.x); > + res2.x = INTRINSIC (_mask_permutexvar_epi8) (res2.x, mask, s1.x, s2.x); > + res3.x = INTRINSIC (_maskz_permutexvar_epi8) (mask, s1.x, s2.x); > + CALC (s1.a, s2.a, res_ref); > + > + if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref)) > + abort (); > + > + MASK_MERGE (i_b)(res_ref, mask, SIZE); > + if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref)) > + abort (); > + > + MASK_ZERO (i_b)(res_ref, mask, SIZE); > + if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref)) > + abort (); > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-1.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-1.c > new file mode 100644 > index 0000000..f760c76 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-1.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vbmi -mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vpermi2b\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermi2b\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermi2b\[ \\t\]+\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > + > +#include <immintrin.h> > + > +volatile __m512i x3; > +volatile __m256i x2; > +volatile __m128i x1; > +volatile __m512i z; > +volatile __m256i y; > +volatile __m128i x; > +volatile __mmask32 m3; > +volatile __mmask16 m2; > +volatile __mmask8 m1; > + > +void extern > +avx512bw_test (void) > +{ > + x3 = _mm512_mask2_permutex2var_epi8 (x3, z, m3, x3); > + x2 = _mm256_mask2_permutex2var_epi8 (x2, y, m2, x2); > + x1 = _mm_mask2_permutex2var_epi8 (x1, x, m1, x1); > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c > new file mode 100644 > index 0000000..694b23b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c > @@ -0,0 +1,58 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx512vbmi -DAVX512VBMI" } */ > +/* { dg-require-effective-target avx512vbmi } */ > + > +#include "avx512f-helper.h" > + > +#define SIZE (AVX512F_LEN / 8) > +#include "math.h" > +#include "values.h" > +#include "avx512f-mask-type.h" > + > +#define NUM 32 > + > +void > +CALC (char *dst, char *src1, char *ind, char *src2) > +{ > + int i; > + > + for (i = 0; i < SIZE; i++) > + { > + unsigned long long offset = ind[i] & (SIZE - 1); > + unsigned long long cond = ind[i] & SIZE; > + > + dst[i] = cond ? src2[offset] : src1[offset]; > + } > +} > + > +void > +TEST (void) > +{ > + int i, j; > + UNION_TYPE (AVX512F_LEN, i_b) s1, s2, res, ind; > + char res_ref[SIZE]; > + > + MASK_TYPE mask = MASK_VALUE; > + > + for (i = 0; i < NUM; i++) > + { > + for (j = 0; j < SIZE; j++) > + { > + ind.a[j] = DEFAULT_VALUE; > + s1.a[j] = i * 2 * j + 1; > + s2.a[j] = i * 2 * j; > + > + res.a[j] = DEFAULT_VALUE; > + } > + > + CALC (res_ref, s1.a, ind.a, s2.a); > + > + res.x = > + INTRINSIC (_mask2_permutex2var_epi8) (s1.x, ind.x, mask, > + s2.x); > + > + MASK_MERGE (i_b) (res_ref, mask, SIZE); > + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) > + abort (); > + } > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-1.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-1.c > new file mode 100644 > index 0000000..2e67a54 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-1.c > @@ -0,0 +1,37 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vbmi -mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%zmm\[0-9\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 3 } } * > +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%xmm\[0-9\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ > +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ > + > +#include <immintrin.h> > + > +volatile __m512i x3; > +volatile __m256i x2; > +volatile __m128i x1; > +volatile __m512i z; > +volatile __m256i y; > +volatile __m128i x; > +volatile __mmask32 m3; > +volatile __mmask16 m2; > +volatile __mmask8 m1; > + > +void extern > +avx512bw_test (void) > +{ > + x3 = _mm512_permutex2var_epi8 (x3, z, x3); > + x3 = _mm512_mask_permutex2var_epi8 (x3, m3, z, x3); > + x3 = _mm512_maskz_permutex2var_epi8 (m3, x3, z, x3); > + x2 = _mm256_permutex2var_epi8 (x2, y, x2); > + x2 = _mm256_mask_permutex2var_epi8 (x2, m2, y, x2); > + x2 = _mm256_maskz_permutex2var_epi8 (m2, x2, y, x2); > + x1 = _mm_permutex2var_epi8 (x1, x, x1); > + x1 = _mm_mask_permutex2var_epi8 (x1, m1, x, x1); > + x1 = _mm_maskz_permutex2var_epi8 (m1, x1, x, x1); > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c > new file mode 100644 > index 0000000..c9f46596 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c > @@ -0,0 +1,70 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx512vbmi -DAVX512VBMI" } */ > +/* { dg-require-effective-target avx512vbmi } */ > + > +#include "avx512f-helper.h" > + > +#define SIZE (AVX512F_LEN / 8) > +#include "math.h" > +#include "values.h" > +#include "avx512f-mask-type.h" > + > +#define NUM 32 > + > +void > +CALC (char *dst, char *src1, char *ind, char *src2) > +{ > + int i; > + > + for (i = 0; i < SIZE; i++) > + { > + unsigned long long offset = ind[i] & (SIZE - 1); > + unsigned long long cond = ind[i] & SIZE; > + > + dst[i] = cond ? src2[offset] : src1[offset]; > + } > +} > + > +void > +TEST (void) > +{ > + int i, j; > + UNION_TYPE (AVX512F_LEN, i_b) s1, s2, res1, res2, res3, ind; > + char res_ref[SIZE]; > + > + MASK_TYPE mask = MASK_VALUE; > + > + for (i = 0; i < NUM; i++) > + { > + for (j = 0; j < SIZE; j++) > + { > + ind.a[j] = i * (j << 1); > + s1.a[j] = DEFAULT_VALUE; > + s2.a[j] = 1.5 * i * 2 * j; > + > + res1.a[j] = DEFAULT_VALUE; > + res2.a[j] = DEFAULT_VALUE; > + res3.a[j] = DEFAULT_VALUE; > + } > + > + CALC (res_ref, s1.a, ind.a, s2.a); > + > + res1.x = INTRINSIC (_permutex2var_epi8) (s1.x, ind.x, s2.x); > + res2.x = > + INTRINSIC (_mask_permutex2var_epi8) (s1.x, mask, ind.x, s2.x); > + res3.x = > + INTRINSIC (_maskz_permutex2var_epi8) (mask, s1.x, ind.x, > + s2.x); > + > + if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref)) > + abort (); > + > + MASK_MERGE (i_b) (res_ref, mask, SIZE); > + if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref)) > + abort (); > + > + MASK_ZERO (i_b) (res_ref, mask, SIZE); > + if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref)) > + abort (); > + } > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c > new file mode 100644 > index 0000000..145591c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vbmi -mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ > +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ > +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ > +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ > + > +#include <immintrin.h> > + > +volatile __m512i _x1, _y1, _z1; > +volatile __m256i _x2, _y2, _z2; > +volatile __m128i _x3, _y3, _z3; > + > +void extern > +avx512vbmi_test (void) > +{ > + _x3 = _mm_multishift_epi64_epi8 (_y3, _z3); > + _x3 = _mm_mask_multishift_epi64_epi8 (_x3, 2, _y3, _z3); > + _x3 = _mm_maskz_multishift_epi64_epi8 (2, _y3, _z3); > + _x2 = _mm256_multishift_epi64_epi8 (_y2, _z2); > + _x2 = _mm256_mask_multishift_epi64_epi8 (_x2, 3, _y2, _z2); > + _x2 = _mm256_maskz_multishift_epi64_epi8 (3, _y2, _z2); > + _x1 = _mm512_multishift_epi64_epi8 (_y1, _z1); > + _x1 = _mm512_mask_multishift_epi64_epi8 (_x1, 3, _y1, _z1); > + _x1 = _mm512_maskz_multishift_epi64_epi8 (3, _y1, _z1); > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c > new file mode 100644 > index 0000000..936d938 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c > @@ -0,0 +1,68 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx512vbmi -DAVX512VBMI" } */ > +/* { dg-require-effective-target avx512vbmi } */ > + > +#include "avx512f-helper.h" > + > +#define SIZE (AVX512F_LEN / 8) > +#include "avx512f-mask-type.h" > + > +void > +CALC (char *r, char *s1, char *s2) > +{ > + int i, j, k; > + long long a, b, ctrl; > + > + for (i = 0; i < SIZE / sizeof (long long); i++) > + { > + union > + { > + long long x; > + char a[sizeof(long long)]; > + } src; > + > + for (j = 0; j < sizeof (long long); j++) > + src.a[j] = s2[i * sizeof (long long) + j]; > + for (j = 0; j < sizeof (long long); j++) > + { > + ctrl = s1[i * sizeof (long long) + j] & ((1 << sizeof (long long)) - 1); > + r[i * sizeof (long long) + j] = 0; > + for (k = 0; k < 8; k++) > + { > + r[i * sizeof (long long) + j] |= ((src.x >> ((ctrl + k) % (sizeof (long long) * 8))) & 1) << k; > + } > + } > + } > +} > + > +void > +TEST (void) > +{ > + UNION_TYPE (AVX512F_LEN, i_b) src1, src2, dst1, dst2, dst3; > + char dst_ref[SIZE]; > + int i; > + MASK_TYPE mask = MASK_VALUE; > + > + for (i = 0; i < SIZE; i++) > + { > + src1.a[i] = 15 + 3467 * i; > + src2.a[i] = 9217 + i; > + dst2.a[i] = DEFAULT_VALUE; > + } > + > + CALC (dst_ref, src1.a, src2.a); > + dst1.x = INTRINSIC (_multishift_epi64_epi8) (src1.x, src2.x); > + dst2.x = INTRINSIC (_mask_multishift_epi64_epi8) (dst2.x, mask, src1.x, src2.x); > + dst3.x = INTRINSIC (_maskz_multishift_epi64_epi8) (mask, src1.x, src2.x); > + > + if (UNION_CHECK (AVX512F_LEN, i_b) (dst1, dst_ref)) > + abort (); > + > + MASK_MERGE (i_b) (dst_ref, mask, SIZE); > + if (UNION_CHECK (AVX512F_LEN, i_b) (dst2, dst_ref)) > + abort (); > + > + MASK_ZERO (i_b) (dst_ref, mask, SIZE); > + if (UNION_CHECK (AVX512F_LEN, i_b) (dst3, dst_ref)) > + abort (); > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpermb-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpermb-2.c > new file mode 100644 > index 0000000..377f34e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpermb-2.c > @@ -0,0 +1,14 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx512vbmi -mavx512vl -DAVX512VL" } */ > +/* { dg-require-effective-target avx512vl } */ > + > +#define AVX512F_LEN 256 > +#define AVX512F_LEN_HALF 128 > +#include "avx512vbmi-vpermb-2.c" > + > +#undef AVX512F_LEN > +#undef AVX512F_LEN_HALF > + > +#define AVX512F_LEN 128 > +#define AVX512F_LEN_HALF 128 > +#include "avx512vbmi-vpermb-2.c" > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpermi2b-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpermi2b-2.c > new file mode 100644 > index 0000000..bd5dfc5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpermi2b-2.c > @@ -0,0 +1,14 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx512vbmi -mavx512vl -DAVX512VL" } */ > +/* { dg-require-effective-target avx512vl } */ > + > +#define AVX512F_LEN 256 > +#define AVX512F_LEN_HALF 128 > +#include "avx512vbmi-vpermi2b-2.c" > + > +#undef AVX512F_LEN > +#undef AVX512F_LEN_HALF > + > +#define AVX512F_LEN 128 > +#define AVX512F_LEN_HALF 128 > +#include "avx512vbmi-vpermi2b-2.c" > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpermt2b-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpermt2b-2.c > new file mode 100644 > index 0000000..a83eeb7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpermt2b-2.c > @@ -0,0 +1,14 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx512vbmi -mavx512vl -DAVX512VL" } */ > +/* { dg-require-effective-target avx512vl } */ > + > +#define AVX512F_LEN 256 > +#define AVX512F_LEN_HALF 128 > +#include "avx512vbmi-vpermt2b-2.c" > + > +#undef AVX512F_LEN > +#undef AVX512F_LEN_HALF > + > +#define AVX512F_LEN 128 > +#define AVX512F_LEN_HALF 128 > +#include "avx512vbmi-vpermt2b-2.c" > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpmultishiftqb-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpmultishiftqb-2.c > new file mode 100644 > index 0000000..d215e23 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpmultishiftqb-2.c > @@ -0,0 +1,14 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx512vbmi -mavx512vl -DAVX512VL" } */ > +/* { dg-require-effective-target avx512vl } */ > + > +#define AVX512F_LEN 256 > +#define AVX512F_LEN_HALF 128 > +#include "avx512vbmi-vpmultishiftqb-2.c" > + > +#undef AVX512F_LEN > +#undef AVX512F_LEN_HALF > + > +#define AVX512F_LEN 128 > +#define AVX512F_LEN_HALF 128 > +#include "avx512vbmi-vpmultishiftqb-2.c" > diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp > index 060eed3..ca5ef06 100644 > --- a/gcc/testsuite/gcc.target/i386/i386.exp > +++ b/gcc/testsuite/gcc.target/i386/i386.exp > @@ -365,6 +365,21 @@ proc check_effective_target_avx512ifma { } { > } "-mavx512ifma" ] > } > > +# Return 1 if avx512vbmi instructions can be compiled. > +proc check_effective_target_avx512vbmi { } { > + return [check_no_compiler_messages avx512vbmi object { > + typedef char __v64qi __attribute__ ((__vector_size__ (64))); > + __v64qi > + _mm512_multishift_epi64_epi8 (__v64qi __X, __v64qi __Y) > + { > + return (__v64qi) __builtin_ia32_vpmultishiftqb512_mask ((__v64qi) __X, > + (__v64qi) __Y, > + (__v64qi) __Y, > + -1); > + } > + } "-mavx512vbmi" ] > +} > + > # If a testcase doesn't have special options, use these. > global DEFAULT_CFLAGS > if ![info exists DEFAULT_CFLAGS] then { > diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c > index 1d8fa82..a83db92 100644 > --- a/gcc/testsuite/gcc.target/i386/sse-12.c > +++ b/gcc/testsuite/gcc.target/i386/sse-12.c > @@ -3,7 +3,7 @@ > popcntintrin.h and mm_malloc.h are usable > with -O -std=c89 -pedantic-errors. */ > /* { dg-do compile } */ > -/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512ifma" } */ > +/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma" } */ > > #include <x86intrin.h> > > diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c > index 878c475..f1d9157 100644 > --- a/gcc/testsuite/gcc.target/i386/sse-13.c > +++ b/gcc/testsuite/gcc.target/i386/sse-13.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512ifma" } */ > +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma" } */ > > #include <mm_malloc.h> > > diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c > index 4d3acb4..bc10109 100644 > --- a/gcc/testsuite/gcc.target/i386/sse-14.c > +++ b/gcc/testsuite/gcc.target/i386/sse-14.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma" } */ > +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi" } */ > /* { dg-add-options bind_pic_locally } */ > > #include <mm_malloc.h> > diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c > index 7861cf5..d54d1db 100644 > --- a/gcc/testsuite/gcc.target/i386/sse-22.c > +++ b/gcc/testsuite/gcc.target/i386/sse-22.c > @@ -100,7 +100,7 @@ > > > #ifndef DIFFERENT_PRAGMAS > -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512ifma") > +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512ifma") > #endif > > /* Following intrinsics require immediate arguments. They > @@ -215,7 +215,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1) > > /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */ > #ifdef DIFFERENT_PRAGMAS > -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma") > +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi") > #endif > #include <immintrin.h> > test_1 (_cvtss_sh, unsigned short, float, 1) > diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c > index 85d403e..e699bd3 100644 > --- a/gcc/testsuite/gcc.target/i386/sse-23.c > +++ b/gcc/testsuite/gcc.target/i386/sse-23.c > @@ -594,7 +594,7 @@ > #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D) > #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D) > > -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512ifma") > +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma") > #include <wmmintrin.h> > #include <smmintrin.h> > #include <mm3dnow.h> > -- > 1.8.3.1 >
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index 73044a0..1c4f15e 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -73,6 +73,8 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512F_SET) #define OPTION_MASK_ISA_AVX512IFMA_SET \ (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512F_SET) +#define OPTION_MASK_ISA_AVX512VBMI_SET \ + (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512F_SET) #define OPTION_MASK_ISA_RTM_SET OPTION_MASK_ISA_RTM #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED @@ -170,6 +172,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_AVX512BW_UNSET OPTION_MASK_ISA_AVX512BW #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA +#define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI #define OPTION_MASK_ISA_RTM_UNSET OPTION_MASK_ISA_RTM #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED @@ -459,6 +462,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mavx512vbmi: + if (value) + { + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI_SET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI_SET; + } + else + { + opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VBMI_UNSET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/config.gcc b/gcc/config.gcc index dbf4191..da2a723 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -368,7 +368,8 @@ i[34567]86-*-*) shaintrin.h clflushoptintrin.h xsavecintrin.h xsavesintrin.h avx512dqintrin.h avx512bwintrin.h avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h - avx512ifmaintrin.h avx512ifmavlintrin.h" + avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h + avx512vbmivlintrin.h" ;; x86_64-*-*) cpu_type=i386 @@ -388,7 +389,8 @@ x86_64-*-*) shaintrin.h clflushoptintrin.h xsavecintrin.h xsavesintrin.h avx512dqintrin.h avx512bwintrin.h avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h - avx512ifmaintrin.h avx512ifmavlintrin.h" + avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h + avx512vbmivlintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx512vbmiintrin.h b/gcc/config/i386/avx512vbmiintrin.h new file mode 100644 index 0000000..c2c59ce --- /dev/null +++ b/gcc/config/i386/avx512vbmiintrin.h @@ -0,0 +1,159 @@ +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use <avx512vbmiintrin.h> directly; include <immintrin.h> instead." +#endif + +#ifndef _AVX512VBMIINTRIN_H_INCLUDED +#define _AVX512VBMIINTRIN_H_INCLUDED + +#ifndef __AVX512VBMI__ +#pragma GCC push_options +#pragma GCC target("avx512vbmi") +#define __DISABLE_AVX512VBMI__ +#endif /* __AVX512VBMI__ */ + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_multishift_epi64_epi8 (__m512i __W, __mmask64 __M, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_vpmultishiftqb512_mask ((__v64qi) __X, + (__v64qi) __Y, + (__v64qi) __W, + (__mmask64) __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_multishift_epi64_epi8 (__mmask64 __M, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_vpmultishiftqb512_mask ((__v64qi) __X, + (__v64qi) __Y, + (__v64qi) + _mm512_setzero_si512 (), + (__mmask64) __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_multishift_epi64_epi8 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_vpmultishiftqb512_mask ((__v64qi) __X, + (__v64qi) __Y, + (__v64qi) + _mm512_undefined_si512 (), + (__mmask64) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutexvar_epi8 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_permvarqi512_mask ((__v64qi) __B, + (__v64qi) __A, + (__v64qi) + _mm512_undefined_si512 (), + (__mmask64) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutexvar_epi8 (__mmask64 __M, __m512i __A, + __m512i __B) +{ + return (__m512i) __builtin_ia32_permvarqi512_mask ((__v64qi) __B, + (__v64qi) __A, + (__v64qi) + _mm512_setzero_si512(), + (__mmask64) __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutexvar_epi8 (__m512i __W, __mmask64 __M, __m512i __A, + __m512i __B) +{ + return (__m512i) __builtin_ia32_permvarqi512_mask ((__v64qi) __B, + (__v64qi) __A, + (__v64qi) __W, + (__mmask64) __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex2var_epi8 (__m512i __A, __m512i __I, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermt2varqi512_mask ((__v64qi) __I + /* idx */ , + (__v64qi) __A, + (__v64qi) __B, + (__mmask64) - + 1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutex2var_epi8 (__m512i __A, __mmask64 __U, + __m512i __I, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermt2varqi512_mask ((__v64qi) __I + /* idx */ , + (__v64qi) __A, + (__v64qi) __B, + (__mmask64) + __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask2_permutex2var_epi8 (__m512i __A, __m512i __I, + __mmask64 __U, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermi2varqi512_mask ((__v64qi) __A, + (__v64qi) __I + /* idx */ , + (__v64qi) __B, + (__mmask64) + __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutex2var_epi8 (__mmask64 __U, __m512i __A, + __m512i __I, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermt2varqi512_maskz ((__v64qi) __I + /* idx */ , + (__v64qi) __A, + (__v64qi) __B, + (__mmask64) + __U); +} + +#ifdef __DISABLE_AVX512VBMI__ +#undef __DISABLE_AVX512VBMI__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512VBMI__ */ + +#endif /* _AVX512VBMIINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/avx512vbmivlintrin.h b/gcc/config/i386/avx512vbmivlintrin.h new file mode 100644 index 0000000..b4ecdeb --- /dev/null +++ b/gcc/config/i386/avx512vbmivlintrin.h @@ -0,0 +1,275 @@ +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use <avx512vbmivlintrin.h> directly; include <immintrin.h> instead." +#endif + +#ifndef _AVX512VBMIVLINTRIN_H_INCLUDED +#define _AVX512VBMIVLINTRIN_H_INCLUDED + +#if !defined(__AVX512VL__) || !defined(__AVX512VBMI__) +#pragma GCC push_options +#pragma GCC target("avx512vbmi,avx512vl") +#define __DISABLE_AVX512VBMIVL__ +#endif /* __AVX512VBMIVL__ */ + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_multishift_epi64_epi8 (__m256i __W, __mmask32 __M, __m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_vpmultishiftqb256_mask ((__v32qi) __X, + (__v32qi) __Y, + (__v32qi) __W, + (__mmask32) __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_multishift_epi64_epi8 (__mmask32 __M, __m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_vpmultishiftqb256_mask ((__v32qi) __X, + (__v32qi) __Y, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_multishift_epi64_epi8 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_vpmultishiftqb256_mask ((__v32qi) __X, + (__v32qi) __Y, + (__v32qi) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_multishift_epi64_epi8 (__m128i __W, __mmask16 __M, __m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_vpmultishiftqb128_mask ((__v16qi) __X, + (__v16qi) __Y, + (__v16qi) __W, + (__mmask16) __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_multishift_epi64_epi8 (__mmask16 __M, __m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_vpmultishiftqb128_mask ((__v16qi) __X, + (__v16qi) __Y, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_multishift_epi64_epi8 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_vpmultishiftqb128_mask ((__v16qi) __X, + (__v16qi) __Y, + (__v16qi) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutexvar_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i) __builtin_ia32_permvarqi256_mask ((__v32qi) __B, + (__v32qi) __A, + (__v32qi) + _mm256_undefined_si256 (), + (__mmask32) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_permutexvar_epi8 (__mmask32 __M, __m256i __A, + __m256i __B) +{ + return (__m256i) __builtin_ia32_permvarqi256_mask ((__v32qi) __B, + (__v32qi) __A, + (__v32qi) + _mm256_setzero_si256 (), + (__mmask32) __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_permutexvar_epi8 (__m256i __W, __mmask32 __M, __m256i __A, + __m256i __B) +{ + return (__m256i) __builtin_ia32_permvarqi256_mask ((__v32qi) __B, + (__v32qi) __A, + (__v32qi) __W, + (__mmask32) __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutexvar_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_permvarqi128_mask ((__v16qi) __B, + (__v16qi) __A, + (__v16qi) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_permutexvar_epi8 (__mmask16 __M, __m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_permvarqi128_mask ((__v16qi) __B, + (__v16qi) __A, + (__v16qi) + _mm_setzero_si128 (), + (__mmask16) __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_permutexvar_epi8 (__m128i __W, __mmask16 __M, __m128i __A, + __m128i __B) +{ + return (__m128i) __builtin_ia32_permvarqi128_mask ((__v16qi) __B, + (__v16qi) __A, + (__v16qi) __W, + (__mmask16) __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutex2var_epi8 (__m256i __A, __m256i __I, __m256i __B) +{ + return (__m256i) __builtin_ia32_vpermt2varqi256_mask ((__v32qi) __I + /* idx */ , + (__v32qi) __A, + (__v32qi) __B, + (__mmask32) - + 1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_permutex2var_epi8 (__m256i __A, __mmask32 __U, + __m256i __I, __m256i __B) +{ + return (__m256i) __builtin_ia32_vpermt2varqi256_mask ((__v32qi) __I + /* idx */ , + (__v32qi) __A, + (__v32qi) __B, + (__mmask32) + __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask2_permutex2var_epi8 (__m256i __A, __m256i __I, + __mmask32 __U, __m256i __B) +{ + return (__m256i) __builtin_ia32_vpermi2varqi256_mask ((__v32qi) __A, + (__v32qi) __I + /* idx */ , + (__v32qi) __B, + (__mmask32) + __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_permutex2var_epi8 (__mmask32 __U, __m256i __A, + __m256i __I, __m256i __B) +{ + return (__m256i) __builtin_ia32_vpermt2varqi256_maskz ((__v32qi) __I + /* idx */ , + (__v32qi) __A, + (__v32qi) __B, + (__mmask32) + __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutex2var_epi8 (__m128i __A, __m128i __I, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpermt2varqi128_mask ((__v16qi) __I + /* idx */ , + (__v16qi) __A, + (__v16qi) __B, + (__mmask16) - + 1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_permutex2var_epi8 (__m128i __A, __mmask16 __U, __m128i __I, + __m128i __B) +{ + return (__m128i) __builtin_ia32_vpermt2varqi128_mask ((__v16qi) __I + /* idx */ , + (__v16qi) __A, + (__v16qi) __B, + (__mmask16) + __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask2_permutex2var_epi8 (__m128i __A, __m128i __I, __mmask16 __U, + __m128i __B) +{ + return (__m128i) __builtin_ia32_vpermi2varqi128_mask ((__v16qi) __A, + (__v16qi) __I + /* idx */ , + (__v16qi) __B, + (__mmask16) + __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_permutex2var_epi8 (__mmask16 __U, __m128i __A, __m128i __I, + __m128i __B) +{ + return (__m128i) __builtin_ia32_vpermt2varqi128_maskz ((__v16qi) __I + /* idx */ , + (__v16qi) __A, + (__v16qi) __B, + (__mmask16) + __U); +} + +#ifdef __DISABLE_AVX512VBMIVL__ +#undef __DISABLE_AVX512VBMIVL__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512VBMIVL__ */ + +#endif /* _AVX512VBMIVLINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index e3e1ed6..0efb1a4 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -87,6 +87,7 @@ /* %ecx */ #define bit_PREFETCHWT1 (1 << 0) +#define bit_AVX512VBMI (1 << 1) /* Extended State Enumeration Sub-leaf (%eax == 13, %ecx == 1) */ #define bit_XSAVEOPT (1 << 0) diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c index cb82945..72dfd04 100644 --- a/gcc/config/i386/driver-i386.c +++ b/gcc/config/i386/driver-i386.c @@ -412,7 +412,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) unsigned int has_avx512f = 0, has_sha = 0, has_prefetchwt1 = 0; unsigned int has_clflushopt = 0, has_xsavec = 0, has_xsaves = 0; unsigned int has_avx512dq = 0, has_avx512bw = 0, has_avx512vl = 0; - unsigned int has_avx512ifma = 0; + unsigned int has_avx512vbmi = 0, has_avx512ifma = 0; bool arch; @@ -497,6 +497,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) has_avx512vl = ebx & bit_AVX512IFMA; has_prefetchwt1 = ecx & bit_PREFETCHWT1; + has_avx512vl = ecx & bit_AVX512VBMI; } if (max_level >= 13) @@ -928,6 +929,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) const char *avx512bw = has_avx512bw ? " -mavx512bw" : " -mno-avx512bw"; const char *avx512vl = has_avx512vl ? " -mavx512vl" : " -mno-avx512vl"; const char *avx512ifma = has_avx512ifma ? " -mavx512ifma" : " -mno-avx512ifma"; + const char *avx512vbmi = has_avx512vbmi ? " -mavx512vbmi" : " -mno-avx512vbmi"; options = concat (options, mmx, mmx3dnow, sse, sse2, sse3, ssse3, sse4a, cx16, sahf, movbe, aes, sha, pclmul, @@ -937,7 +939,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) fxsr, xsave, xsaveopt, avx512f, avx512er, avx512cd, avx512pf, prefetchwt1, clflushopt, xsavec, xsaves, avx512dq, avx512bw, avx512vl, - avx512ifma, NULL); + avx512ifma, avx512vbmi, NULL); } done: diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index bf993d1..798eaa6 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -351,6 +351,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__AVX512BW__"); if (isa_flag & OPTION_MASK_ISA_AVX512VL) def_or_undef (parse_in, "__AVX512VL__"); + if (isa_flag & OPTION_MASK_ISA_AVX512VBMI) + def_or_undef (parse_in, "__AVX512VBMI__"); if (isa_flag & OPTION_MASK_ISA_AVX512IFMA) def_or_undef (parse_in, "__AVX512IFMA__"); if (isa_flag & OPTION_MASK_ISA_FMA) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index bf46258..baf3166 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2619,6 +2619,7 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, const char *arch, { "-mavx512bw", OPTION_MASK_ISA_AVX512BW }, { "-mavx512vl", OPTION_MASK_ISA_AVX512VL }, { "-mavx512ifma", OPTION_MASK_ISA_AVX512IFMA }, + { "-mavx512vbmi", OPTION_MASK_ISA_AVX512VBMI }, { "-msse4a", OPTION_MASK_ISA_SSE4A }, { "-msse4.2", OPTION_MASK_ISA_SSE4_2 }, { "-msse4.1", OPTION_MASK_ISA_SSE4_1 }, @@ -3155,6 +3156,7 @@ ix86_option_override_internal (bool main_args_p, #define PTA_AVX512BW (HOST_WIDE_INT_1 << 51) #define PTA_AVX512VL (HOST_WIDE_INT_1 << 52) #define PTA_AVX512IFMA (HOST_WIDE_INT_1 << 53) +#define PTA_AVX512VBMI (HOST_WIDE_INT_1 << 54) #define PTA_CORE2 \ (PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 \ @@ -3735,6 +3737,9 @@ ix86_option_override_internal (bool main_args_p, if (processor_alias_table[i].flags & PTA_MPX && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MPX)) opts->x_ix86_isa_flags |= OPTION_MASK_ISA_MPX; + if (processor_alias_table[i].flags & PTA_AVX512VBMI + && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512VBMI)) + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI; if (processor_alias_table[i].flags & PTA_AVX512IFMA && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512IFMA)) opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512IFMA; @@ -4654,6 +4659,7 @@ ix86_valid_target_attribute_inner_p (tree args, char *p_strings[], IX86_ATTR_ISA ("clflushopt", OPT_mclflushopt), IX86_ATTR_ISA ("xsavec", OPT_mxsavec), IX86_ATTR_ISA ("xsaves", OPT_mxsaves), + IX86_ATTR_ISA ("avx512vbmi", OPT_mavx512vbmi), IX86_ATTR_ISA ("avx512ifma", OPT_mavx512ifma), /* enum options */ @@ -30064,6 +30070,23 @@ enum ix86_builtins IX86_BUILTIN_VPMADD52LUQ128_MASKZ, IX86_BUILTIN_VPMADD52HUQ128_MASKZ, + /* AVX-512VBMI */ + IX86_BUILTIN_VPMULTISHIFTQB512, + IX86_BUILTIN_VPMULTISHIFTQB256, + IX86_BUILTIN_VPMULTISHIFTQB128, + IX86_BUILTIN_VPERMVARQI512_MASK, + IX86_BUILTIN_VPERMT2VARQI512, + IX86_BUILTIN_VPERMT2VARQI512_MASKZ, + IX86_BUILTIN_VPERMI2VARQI512, + IX86_BUILTIN_VPERMVARQI256_MASK, + IX86_BUILTIN_VPERMVARQI128_MASK, + IX86_BUILTIN_VPERMT2VARQI256, + IX86_BUILTIN_VPERMT2VARQI256_MASKZ, + IX86_BUILTIN_VPERMT2VARQI128, + IX86_BUILTIN_VPERMT2VARQI128_MASKZ, + IX86_BUILTIN_VPERMI2VARQI256, + IX86_BUILTIN_VPERMI2VARQI128, + /* SHA builtins. */ IX86_BUILTIN_SHA1MSG1, IX86_BUILTIN_SHA1MSG2, @@ -32749,6 +32772,22 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, CODE_FOR_vpamdd52huqv2di_mask, "__builtin_ia32_vpmadd52huq128_mask", IX86_BUILTIN_VPMADD52HUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_QI }, { OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, CODE_FOR_vpamdd52huqv2di_maskz, "__builtin_ia32_vpmadd52huq128_maskz", IX86_BUILTIN_VPMADD52HUQ128_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_QI }, + /* AVX512VBMI */ + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_vpmultishiftqbv32qi_mask, "__builtin_ia32_vpmultishiftqb256_mask", IX86_BUILTIN_VPMULTISHIFTQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_vpmultishiftqbv16qi_mask, "__builtin_ia32_vpmultishiftqb128_mask", IX86_BUILTIN_VPMULTISHIFTQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_avx512bw_permvarv64qi_mask, "__builtin_ia32_permvarqi512_mask", IX86_BUILTIN_VPERMVARQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_avx512bw_vpermt2varv64qi3_mask, "__builtin_ia32_vpermt2varqi512_mask", IX86_BUILTIN_VPERMT2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_avx512bw_vpermt2varv64qi3_maskz, "__builtin_ia32_vpermt2varqi512_maskz", IX86_BUILTIN_VPERMT2VARQI512_MASKZ, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, + { OPTION_MASK_ISA_AVX512VBMI, CODE_FOR_avx512bw_vpermi2varv64qi3_mask, "__builtin_ia32_vpermi2varqi512_mask", IX86_BUILTIN_VPERMI2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_DI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_permvarv32qi_mask, "__builtin_ia32_permvarqi256_mask", IX86_BUILTIN_VPERMVARQI256_MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_permvarv16qi_mask, "__builtin_ia32_permvarqi128_mask", IX86_BUILTIN_VPERMVARQI128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermt2varv32qi3_mask, "__builtin_ia32_vpermt2varqi256_mask", IX86_BUILTIN_VPERMT2VARQI256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermt2varv32qi3_maskz, "__builtin_ia32_vpermt2varqi256_maskz", IX86_BUILTIN_VPERMT2VARQI256_MASKZ, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermt2varv16qi3_mask, "__builtin_ia32_vpermt2varqi128_mask", IX86_BUILTIN_VPERMT2VARQI128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermt2varv16qi3_maskz, "__builtin_ia32_vpermt2varqi128_maskz", IX86_BUILTIN_VPERMT2VARQI128_MASKZ, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermi2varv32qi3_mask, "__builtin_ia32_vpermi2varqi256_mask", IX86_BUILTIN_VPERMI2VARQI256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_SI }, + { OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermi2varv16qi3_mask, "__builtin_ia32_vpermi2varqi128_mask", IX86_BUILTIN_VPERMI2VARQI128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_HI }, }; /* Builtins with rounding support. */ @@ -41521,7 +41560,8 @@ ix86_hard_regno_mode_ok (int regno, machine_mode mode) return VALID_FP_MODE_P (mode); if (MASK_REGNO_P (regno)) return (VALID_MASK_REG_MODE (mode) - || (TARGET_AVX512BW && VALID_MASK_AVX512BW_MODE (mode))); + || ((TARGET_AVX512BW || TARGET_AVX512VBMI) + && VALID_MASK_AVX512BW_MODE (mode))); if (BND_REGNO_P (regno)) return VALID_BND_REG_MODE (mode); if (SSE_REGNO_P (regno)) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 481d68c..2596f81 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -77,6 +77,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define TARGET_AVX512BW_P(x) TARGET_ISA_AVX512BW_P(x) #define TARGET_AVX512VL TARGET_ISA_AVX512VL #define TARGET_AVX512VL_P(x) TARGET_ISA_AVX512VL_P(x) +#define TARGET_AVX512VBMI TARGET_ISA_AVX512VBMI +#define TARGET_AVX512VBMI_P(x) TARGET_ISA_AVX512VBMI_P(x) #define TARGET_AVX512IFMA TARGET_ISA_AVX512IFMA #define TARGET_AVX512IFMA_P(x) TARGET_ISA_AVX512IFMA_P(x) #define TARGET_FMA TARGET_ISA_FMA diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 012ff8d..f6ba8a7 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -657,6 +657,10 @@ mavx512ifma Target Report Mask(ISA_AVX512IFMA) Var(ix86_isa_flags) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512IFMA built-in functions and code generation +mavx512vbmi +Target Report Mask(ISA_AVX512VBMI) Var(ix86_isa_flags) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512VBMI built-in functions and code generation + mfma Target Report Mask(ISA_FMA) Var(ix86_isa_flags) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and FMA built-in functions and code generation diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 5f11432..931ff15 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -64,6 +64,10 @@ #include <avx512ifmavlintrin.h> +#include <avx512vbmiintrin.h> + +#include <avx512vbmivlintrin.h> + #include <shaintrin.h> #include <lzcntintrin.h> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 61cc904..ca5d720 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -145,6 +145,9 @@ ;; For AVX512IFMA support UNSPEC_VPMADD52LUQ UNSPEC_VPMADD52HUQ + + ;; For AVX512VBMI support + UNSPEC_VPMULTISHIFT ]) (define_c_enum "unspecv" [ @@ -179,6 +182,9 @@ [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) +(define_mode_iterator VI1_AVX512VL + [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")]) + ;; All vector modes (define_mode_iterator V [(V32QI "TARGET_AVX") V16QI @@ -16469,6 +16475,18 @@ (set_attr "mode" "<sseinsnmode>")]) (define_insn "<avx512>_permvar<mode><mask_name>" + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") + (unspec:VI1_AVX512VL + [(match_operand:VI1_AVX512VL 1 "nonimmediate_operand" "vm") + (match_operand:<sseintvecmode> 2 "register_operand" "v")] + UNSPEC_VPERMVAR))] + "TARGET_AVX512VBMI && <mask_mode512bit_condition>" + "vperm<ssemodesuffix>\t{%1, %2, %0<mask_operand3>|%0<mask_operand3>, %2, %1}" + [(set_attr "type" "sselog") + (set_attr "prefix" "<mask_prefix2>") + (set_attr "mode" "<sseinsnmode>")]) + +(define_insn "<avx512>_permvar<mode><mask_name>" [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") (unspec:VI2_AVX512VL [(match_operand:VI2_AVX512VL 1 "nonimmediate_operand" "vm") @@ -17011,6 +17029,20 @@ }) (define_expand "<avx512>_vpermi2var<mode>3_maskz" + [(match_operand:VI1_AVX512VL 0 "register_operand") + (match_operand:VI1_AVX512VL 1 "register_operand") + (match_operand:<sseintvecmode> 2 "register_operand") + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand") + (match_operand:<avx512fmaskmode> 4 "register_operand")] + "TARGET_AVX512VBMI" +{ + emit_insn (gen_<avx512>_vpermi2var<mode>3_maskz_1 ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (<MODE>mode), operands[4])); + DONE; +}) + +(define_expand "<avx512>_vpermi2var<mode>3_maskz" [(match_operand:VI2_AVX512VL 0 "register_operand" "=v") (match_operand:VI2_AVX512VL 1 "register_operand" "v") (match_operand:<sseintvecmode> 2 "register_operand" "0") @@ -17038,6 +17070,19 @@ (set_attr "mode" "<sseinsnmode>")]) (define_insn "<avx512>_vpermi2var<mode>3<sd_maskz_name>" + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") + (unspec:VI1_AVX512VL + [(match_operand:VI1_AVX512VL 1 "register_operand" "v") + (match_operand:<sseintvecmode> 2 "register_operand" "0") + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm")] + UNSPEC_VPERMI2))] + "TARGET_AVX512VBMI" + "vpermi2<ssemodesuffix>\t{%3, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %3}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "<sseinsnmode>")]) + +(define_insn "<avx512>_vpermi2var<mode>3<sd_maskz_name>" [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") (unspec:VI2_AVX512VL [(match_operand:VI2_AVX512VL 1 "register_operand" "v") @@ -17067,6 +17112,22 @@ (set_attr "mode" "<sseinsnmode>")]) (define_insn "<avx512>_vpermi2var<mode>3_mask" + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") + (vec_merge:VI1_AVX512VL + (unspec:VI1_AVX512VL + [(match_operand:VI1_AVX512VL 1 "register_operand" "v") + (match_operand:<sseintvecmode> 2 "register_operand" "0") + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm")] + UNSPEC_VPERMI2_MASK) + (match_dup 0) + (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))] + "TARGET_AVX512VBMI" + "vpermi2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "<sseinsnmode>")]) + +(define_insn "<avx512>_vpermi2var<mode>3_mask" [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") (vec_merge:VI2_AVX512VL (unspec:VI2_AVX512VL @@ -17097,6 +17158,20 @@ }) (define_expand "<avx512>_vpermt2var<mode>3_maskz" + [(match_operand:VI1_AVX512VL 0 "register_operand" "=v") + (match_operand:<sseintvecmode> 1 "register_operand" "v") + (match_operand:VI1_AVX512VL 2 "register_operand" "0") + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm") + (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")] + "TARGET_AVX512VBMI" +{ + emit_insn (gen_<avx512>_vpermt2var<mode>3_maskz_1 ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (<MODE>mode), operands[4])); + DONE; +}) + +(define_expand "<avx512>_vpermt2var<mode>3_maskz" [(match_operand:VI2_AVX512VL 0 "register_operand" "=v") (match_operand:<sseintvecmode> 1 "register_operand" "v") (match_operand:VI2_AVX512VL 2 "register_operand" "0") @@ -17124,6 +17199,19 @@ (set_attr "mode" "<sseinsnmode>")]) (define_insn "<avx512>_vpermt2var<mode>3<sd_maskz_name>" + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") + (unspec:VI1_AVX512VL + [(match_operand:<sseintvecmode> 1 "register_operand" "v") + (match_operand:VI1_AVX512VL 2 "register_operand" "0") + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm")] + UNSPEC_VPERMT2))] + "TARGET_AVX512VBMI" + "vpermt2<ssemodesuffix>\t{%3, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %3}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "<sseinsnmode>")]) + +(define_insn "<avx512>_vpermt2var<mode>3<sd_maskz_name>" [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") (unspec:VI2_AVX512VL [(match_operand:<sseintvecmode> 1 "register_operand" "v") @@ -17153,6 +17241,22 @@ (set_attr "mode" "<sseinsnmode>")]) (define_insn "<avx512>_vpermt2var<mode>3_mask" + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") + (vec_merge:VI1_AVX512VL + (unspec:VI1_AVX512VL + [(match_operand:<sseintvecmode> 1 "register_operand" "v") + (match_operand:VI1_AVX512VL 2 "register_operand" "0") + (match_operand:VI1_AVX512VL 3 "nonimmediate_operand" "vm")] + UNSPEC_VPERMT2) + (match_dup 2) + (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))] + "TARGET_AVX512VBMI" + "vpermt2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "<sseinsnmode>")]) + +(define_insn "<avx512>_vpermt2var<mode>3_mask" [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") (vec_merge:VI2_AVX512VL (unspec:VI2_AVX512VL @@ -18519,3 +18623,14 @@ (set_attr "prefix" "evex") (set_attr "mode" "<sseinsnmode>")]) +(define_insn "vpmultishiftqb<mode><mask_name>" + [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=v") + (unspec:VI1_AVX512VL + [(match_operand:VI1_AVX512VL 1 "register_operand" "v") + (match_operand:VI1_AVX512VL 2 "nonimmediate_operand" "vm")] + UNSPEC_VPMULTISHIFT))] + "TARGET_AVX512VBMI" + "vpmultishiftqb\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "<sseinsnmode>")]) diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index a69a5e3..0368d35 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index d3a5bbd..3a3d5ff 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h index d177429..e270cd2 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h +++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h @@ -22,6 +22,8 @@ #include "avx512vl-check.h" #elif defined (AVX512IFMA) #include "avx512ifma-check.h" +#elif defined (AVX512VBMI) +#include "avx512vbmi-check.h" #endif /* Macros expansion. */ @@ -130,6 +132,9 @@ avx512vl_test (void) { test_256 (); test_128 (); } #elif defined (AVX512IFMA) void avx512ifma_test (void) { test_512 (); } +#elif defined (AVX512VBMI) +void +avx512vbmi_test (void) { test_512 (); } #endif #endif /* AVX512F_HELPER_INCLUDED */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h b/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h new file mode 100644 index 0000000..591ff06 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h @@ -0,0 +1,46 @@ +#include <stdlib.h> +#include "cpuid.h" +#include "m512-check.h" +#include "avx512f-os-support.h" + +static void avx512vbmi_test (void); + +static void __attribute__ ((noinline)) do_test (void) +{ + avx512vbmi_test (); +} + +int +main () +{ + unsigned int eax, ebx, ecx, edx; + + if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx)) + return 0; + + if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE)) + { + if (__get_cpuid_max (0, NULL) < 7) + return 0; + + __cpuid_count (7, 0, eax, ebx, ecx, edx); + + if ((avx512f_os_support ()) && ((ebx & bit_AVX512VBMI) == bit_AVX512VBMI)) + { + do_test (); +#ifdef DEBUG + printf ("PASSED\n"); +#endif + return 0; + } +#ifdef DEBUG + printf ("SKIPPED\n"); +#endif + } +#ifdef DEBUG + else + printf ("SKIPPED\n"); +#endif + + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-1.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-1.c new file mode 100644 index 0000000..59e568c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-1.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vbmi -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\{\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\[^\{\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\{\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpermb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ + +#include <immintrin.h> + +volatile __m512i x1; +volatile __m256i x2; +volatile __m128i x3; +volatile __mmask64 m1; +volatile __mmask32 m2; +volatile __mmask16 m3; + +void extern +avx512bw_test (void) +{ + x1 = _mm512_permutexvar_epi8 (x1, x1); + x1 = _mm512_maskz_permutexvar_epi8 (m1, x1, x1); + x1 = _mm512_mask_permutexvar_epi8 (x1, m1, x1, x1); + x2 = _mm256_permutexvar_epi8 (x2, x2); + x2 = _mm256_maskz_permutexvar_epi8 (m2, x2, x2); + x2 = _mm256_mask_permutexvar_epi8 (x2, m2, x2, x2); + x3 = _mm_permutexvar_epi8 (x3, x3); + x3 = _mm_maskz_permutexvar_epi8 (m3, x3, x3); + x3 = _mm_mask_permutexvar_epi8 (x3, m3, x3, x3); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c new file mode 100644 index 0000000..fa22fd9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermb-2.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vbmi -DAVX512VBMI" } */ +/* { dg-require-effective-target avx512vbmi } */ + +#include "avx512f-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#include "avx512f-mask-type.h" + +void +CALC (char *ind, char *src, char *res) +{ + int i; + + for (i = 0; i < SIZE; i++) + { + res[i] = src[ind[i] & (SIZE - 1)]; + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, i_b) s1, s2, res1, res2, res3; + char res_ref[SIZE]; + MASK_TYPE mask = MASK_VALUE; + int i; + + for (i = 0; i < SIZE; i++) + { + s1.a[i] = i * i * i; + s2.a[i] = i + 20; + res2.a[i] = DEFAULT_VALUE; + } + + res1.x = INTRINSIC (_permutexvar_epi8) (s1.x, s2.x); + res2.x = INTRINSIC (_mask_permutexvar_epi8) (res2.x, mask, s1.x, s2.x); + res3.x = INTRINSIC (_maskz_permutexvar_epi8) (mask, s1.x, s2.x); + CALC (s1.a, s2.a, res_ref); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref)) + abort (); + + MASK_MERGE (i_b)(res_ref, mask, SIZE); + if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref)) + abort (); + + MASK_ZERO (i_b)(res_ref, mask, SIZE); + if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-1.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-1.c new file mode 100644 index 0000000..f760c76 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-1.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vbmi -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpermi2b\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpermi2b\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpermi2b\[ \\t\]+\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ + +#include <immintrin.h> + +volatile __m512i x3; +volatile __m256i x2; +volatile __m128i x1; +volatile __m512i z; +volatile __m256i y; +volatile __m128i x; +volatile __mmask32 m3; +volatile __mmask16 m2; +volatile __mmask8 m1; + +void extern +avx512bw_test (void) +{ + x3 = _mm512_mask2_permutex2var_epi8 (x3, z, m3, x3); + x2 = _mm256_mask2_permutex2var_epi8 (x2, y, m2, x2); + x1 = _mm_mask2_permutex2var_epi8 (x1, x, m1, x1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c new file mode 100644 index 0000000..694b23b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermi2b-2.c @@ -0,0 +1,58 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vbmi -DAVX512VBMI" } */ +/* { dg-require-effective-target avx512vbmi } */ + +#include "avx512f-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#include "math.h" +#include "values.h" +#include "avx512f-mask-type.h" + +#define NUM 32 + +void +CALC (char *dst, char *src1, char *ind, char *src2) +{ + int i; + + for (i = 0; i < SIZE; i++) + { + unsigned long long offset = ind[i] & (SIZE - 1); + unsigned long long cond = ind[i] & SIZE; + + dst[i] = cond ? src2[offset] : src1[offset]; + } +} + +void +TEST (void) +{ + int i, j; + UNION_TYPE (AVX512F_LEN, i_b) s1, s2, res, ind; + char res_ref[SIZE]; + + MASK_TYPE mask = MASK_VALUE; + + for (i = 0; i < NUM; i++) + { + for (j = 0; j < SIZE; j++) + { + ind.a[j] = DEFAULT_VALUE; + s1.a[j] = i * 2 * j + 1; + s2.a[j] = i * 2 * j; + + res.a[j] = DEFAULT_VALUE; + } + + CALC (res_ref, s1.a, ind.a, s2.a); + + res.x = + INTRINSIC (_mask2_permutex2var_epi8) (s1.x, ind.x, mask, + s2.x); + + MASK_MERGE (i_b) (res_ref, mask, SIZE); + if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref)) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-1.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-1.c new file mode 100644 index 0000000..2e67a54 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-1.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vbmi -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%zmm\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 3 } } * +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%xmm\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpermt2b\[ \\t\]+\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ + +#include <immintrin.h> + +volatile __m512i x3; +volatile __m256i x2; +volatile __m128i x1; +volatile __m512i z; +volatile __m256i y; +volatile __m128i x; +volatile __mmask32 m3; +volatile __mmask16 m2; +volatile __mmask8 m1; + +void extern +avx512bw_test (void) +{ + x3 = _mm512_permutex2var_epi8 (x3, z, x3); + x3 = _mm512_mask_permutex2var_epi8 (x3, m3, z, x3); + x3 = _mm512_maskz_permutex2var_epi8 (m3, x3, z, x3); + x2 = _mm256_permutex2var_epi8 (x2, y, x2); + x2 = _mm256_mask_permutex2var_epi8 (x2, m2, y, x2); + x2 = _mm256_maskz_permutex2var_epi8 (m2, x2, y, x2); + x1 = _mm_permutex2var_epi8 (x1, x, x1); + x1 = _mm_mask_permutex2var_epi8 (x1, m1, x, x1); + x1 = _mm_maskz_permutex2var_epi8 (m1, x1, x, x1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c new file mode 100644 index 0000000..c9f46596 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-2.c @@ -0,0 +1,70 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vbmi -DAVX512VBMI" } */ +/* { dg-require-effective-target avx512vbmi } */ + +#include "avx512f-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#include "math.h" +#include "values.h" +#include "avx512f-mask-type.h" + +#define NUM 32 + +void +CALC (char *dst, char *src1, char *ind, char *src2) +{ + int i; + + for (i = 0; i < SIZE; i++) + { + unsigned long long offset = ind[i] & (SIZE - 1); + unsigned long long cond = ind[i] & SIZE; + + dst[i] = cond ? src2[offset] : src1[offset]; + } +} + +void +TEST (void) +{ + int i, j; + UNION_TYPE (AVX512F_LEN, i_b) s1, s2, res1, res2, res3, ind; + char res_ref[SIZE]; + + MASK_TYPE mask = MASK_VALUE; + + for (i = 0; i < NUM; i++) + { + for (j = 0; j < SIZE; j++) + { + ind.a[j] = i * (j << 1); + s1.a[j] = DEFAULT_VALUE; + s2.a[j] = 1.5 * i * 2 * j; + + res1.a[j] = DEFAULT_VALUE; + res2.a[j] = DEFAULT_VALUE; + res3.a[j] = DEFAULT_VALUE; + } + + CALC (res_ref, s1.a, ind.a, s2.a); + + res1.x = INTRINSIC (_permutex2var_epi8) (s1.x, ind.x, s2.x); + res2.x = + INTRINSIC (_mask_permutex2var_epi8) (s1.x, mask, ind.x, s2.x); + res3.x = + INTRINSIC (_maskz_permutex2var_epi8) (mask, s1.x, ind.x, + s2.x); + + if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref)) + abort (); + + MASK_MERGE (i_b) (res_ref, mask, SIZE); + if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref)) + abort (); + + MASK_ZERO (i_b) (res_ref, mask, SIZE); + if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref)) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c new file mode 100644 index 0000000..145591c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vbmi -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\[^\n\]*%ymm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmultishiftqb\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ + +#include <immintrin.h> + +volatile __m512i _x1, _y1, _z1; +volatile __m256i _x2, _y2, _z2; +volatile __m128i _x3, _y3, _z3; + +void extern +avx512vbmi_test (void) +{ + _x3 = _mm_multishift_epi64_epi8 (_y3, _z3); + _x3 = _mm_mask_multishift_epi64_epi8 (_x3, 2, _y3, _z3); + _x3 = _mm_maskz_multishift_epi64_epi8 (2, _y3, _z3); + _x2 = _mm256_multishift_epi64_epi8 (_y2, _z2); + _x2 = _mm256_mask_multishift_epi64_epi8 (_x2, 3, _y2, _z2); + _x2 = _mm256_maskz_multishift_epi64_epi8 (3, _y2, _z2); + _x1 = _mm512_multishift_epi64_epi8 (_y1, _z1); + _x1 = _mm512_mask_multishift_epi64_epi8 (_x1, 3, _y1, _z1); + _x1 = _mm512_maskz_multishift_epi64_epi8 (3, _y1, _z1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c new file mode 100644 index 0000000..936d938 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c @@ -0,0 +1,68 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vbmi -DAVX512VBMI" } */ +/* { dg-require-effective-target avx512vbmi } */ + +#include "avx512f-helper.h" + +#define SIZE (AVX512F_LEN / 8) +#include "avx512f-mask-type.h" + +void +CALC (char *r, char *s1, char *s2) +{ + int i, j, k; + long long a, b, ctrl; + + for (i = 0; i < SIZE / sizeof (long long); i++) + { + union + { + long long x; + char a[sizeof(long long)]; + } src; + + for (j = 0; j < sizeof (long long); j++) + src.a[j] = s2[i * sizeof (long long) + j]; + for (j = 0; j < sizeof (long long); j++) + { + ctrl = s1[i * sizeof (long long) + j] & ((1 << sizeof (long long)) - 1); + r[i * sizeof (long long) + j] = 0; + for (k = 0; k < 8; k++) + { + r[i * sizeof (long long) + j] |= ((src.x >> ((ctrl + k) % (sizeof (long long) * 8))) & 1) << k; + } + } + } +} + +void +TEST (void) +{ + UNION_TYPE (AVX512F_LEN, i_b) src1, src2, dst1, dst2, dst3; + char dst_ref[SIZE]; + int i; + MASK_TYPE mask = MASK_VALUE; + + for (i = 0; i < SIZE; i++) + { + src1.a[i] = 15 + 3467 * i; + src2.a[i] = 9217 + i; + dst2.a[i] = DEFAULT_VALUE; + } + + CALC (dst_ref, src1.a, src2.a); + dst1.x = INTRINSIC (_multishift_epi64_epi8) (src1.x, src2.x); + dst2.x = INTRINSIC (_mask_multishift_epi64_epi8) (dst2.x, mask, src1.x, src2.x); + dst3.x = INTRINSIC (_maskz_multishift_epi64_epi8) (mask, src1.x, src2.x); + + if (UNION_CHECK (AVX512F_LEN, i_b) (dst1, dst_ref)) + abort (); + + MASK_MERGE (i_b) (dst_ref, mask, SIZE); + if (UNION_CHECK (AVX512F_LEN, i_b) (dst2, dst_ref)) + abort (); + + MASK_ZERO (i_b) (dst_ref, mask, SIZE); + if (UNION_CHECK (AVX512F_LEN, i_b) (dst3, dst_ref)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpermb-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpermb-2.c new file mode 100644 index 0000000..377f34e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpermb-2.c @@ -0,0 +1,14 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vbmi -mavx512vl -DAVX512VL" } */ +/* { dg-require-effective-target avx512vl } */ + +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512vbmi-vpermb-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512vbmi-vpermb-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpermi2b-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpermi2b-2.c new file mode 100644 index 0000000..bd5dfc5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpermi2b-2.c @@ -0,0 +1,14 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vbmi -mavx512vl -DAVX512VL" } */ +/* { dg-require-effective-target avx512vl } */ + +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512vbmi-vpermi2b-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512vbmi-vpermi2b-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpermt2b-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpermt2b-2.c new file mode 100644 index 0000000..a83eeb7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpermt2b-2.c @@ -0,0 +1,14 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vbmi -mavx512vl -DAVX512VL" } */ +/* { dg-require-effective-target avx512vl } */ + +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512vbmi-vpermt2b-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512vbmi-vpermt2b-2.c" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vpmultishiftqb-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vpmultishiftqb-2.c new file mode 100644 index 0000000..d215e23 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vpmultishiftqb-2.c @@ -0,0 +1,14 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vbmi -mavx512vl -DAVX512VL" } */ +/* { dg-require-effective-target avx512vl } */ + +#define AVX512F_LEN 256 +#define AVX512F_LEN_HALF 128 +#include "avx512vbmi-vpmultishiftqb-2.c" + +#undef AVX512F_LEN +#undef AVX512F_LEN_HALF + +#define AVX512F_LEN 128 +#define AVX512F_LEN_HALF 128 +#include "avx512vbmi-vpmultishiftqb-2.c" diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp index 060eed3..ca5ef06 100644 --- a/gcc/testsuite/gcc.target/i386/i386.exp +++ b/gcc/testsuite/gcc.target/i386/i386.exp @@ -365,6 +365,21 @@ proc check_effective_target_avx512ifma { } { } "-mavx512ifma" ] } +# Return 1 if avx512vbmi instructions can be compiled. +proc check_effective_target_avx512vbmi { } { + return [check_no_compiler_messages avx512vbmi object { + typedef char __v64qi __attribute__ ((__vector_size__ (64))); + __v64qi + _mm512_multishift_epi64_epi8 (__v64qi __X, __v64qi __Y) + { + return (__v64qi) __builtin_ia32_vpmultishiftqb512_mask ((__v64qi) __X, + (__v64qi) __Y, + (__v64qi) __Y, + -1); + } + } "-mavx512vbmi" ] +} + # If a testcase doesn't have special options, use these. global DEFAULT_CFLAGS if ![info exists DEFAULT_CFLAGS] then { diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c index 1d8fa82..a83db92 100644 --- a/gcc/testsuite/gcc.target/i386/sse-12.c +++ b/gcc/testsuite/gcc.target/i386/sse-12.c @@ -3,7 +3,7 @@ popcntintrin.h and mm_malloc.h are usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ -/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512ifma" } */ +/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma" } */ #include <x86intrin.h> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 878c475..f1d9157 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512ifma" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma" } */ #include <mm_malloc.h> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 4d3acb4..bc10109 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi" } */ /* { dg-add-options bind_pic_locally } */ #include <mm_malloc.h> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 7861cf5..d54d1db 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -100,7 +100,7 @@ #ifndef DIFFERENT_PRAGMAS -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512ifma") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512ifma") #endif /* Following intrinsics require immediate arguments. They @@ -215,7 +215,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1) /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */ #ifdef DIFFERENT_PRAGMAS -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma") +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi") #endif #include <immintrin.h> test_1 (_cvtss_sh, unsigned short, float, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 85d403e..e699bd3 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -594,7 +594,7 @@ #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D) #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512ifma") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma") #include <wmmintrin.h> #include <smmintrin.h> #include <mm3dnow.h>