Message ID | 20230921072013.2124750-1-lin1.hu@intel.com |
---|---|
Headers | show |
Series | Support -mevex512 for AVX512 | expand |
On Thu, Sep 21, 2023 at 3:22 PM Hu, Lin1 <lin1.hu@intel.com> wrote: > > Hi all, > > After previous discussion, instead of supporting option -mavx10.1, we > will first introduct option -m[no-]evex512, which will enable/disable > 512 bit register and 64 bit mask register. > > It will not change the current option behavior since if AVX512F is > enabled with no evex512 option specified, it will automatically enable > 512 bit register and 64 bit mask register. > > How the patches go comes following: > > Patch 1 added initial support for option -mevex512. > > Patch 2-6 refined current intrin file to push evex512 target for all > 512 bit intrins. Those scalar intrins remained untouched. > > Patch 7-11 added OPTION_MASK_ISA2_EVEX512 for all related builtins. > > Patch 12 disabled zmm register, 512 bit libmvec call for no-evex512, > also requested evex512 for vectorization when using 512 bit register. > > Patch 13-17 supported evex512 in related patterns. > > Patch 18 added testcases for -mno-evex512 and allowed its usage. > > The patches currently cause scan-asm fail for pr89229-{5,6,7}b.c since > we will emit scalar vmovss here. When trying to use x/ymm 16+ w/o > avx512vl but with avx512f+evex512, I suppose we could either emit scalar > or zmm instructions. It is quite a rare case on HW since there is no > HW w/o avx512vl but with avx512f, so I prefer to not to add maintainence > effort here to get a slightly perf improvement. But it could be changed > to former behavior. To make it easier for people to test before committing, I pushed the patch to the vendor branch refs/vendors/ix86/heads/evex512. Welcome to try it out. > > Discussions are welcomed for all the patches. > > Thx, > Haochen > > Haochen Jiang (18): > Initial support for -mevex512 > Push evex512 target for 512 bit intrins > Push evex512 target for 512 bit intrins > Push evex512 target for 512 bit intrins > Push evex512 target for 512 bit intrins > Push evex512 target for 512 bit intrins > Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins > Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins > Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins > Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins > Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins > Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 > Support -mevex512 for AVX512F intrins > Support -mevex512 for AVX512DQ intrins > Support -mevex512 for AVX512BW intrins > Support -mevex512 for > AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ > intrins > Support -mevex512 for AVX512FP16 intrins > Allow -mno-evex512 usage > > gcc/common/config/i386/i386-common.cc | 15 + > gcc/config.gcc | 19 +- > gcc/config/i386/avx5124fmapsintrin.h | 2 +- > gcc/config/i386/avx5124vnniwintrin.h | 2 +- > gcc/config/i386/avx512bf16intrin.h | 31 +- > gcc/config/i386/avx512bitalgintrin.h | 155 +- > gcc/config/i386/avx512bitalgvlintrin.h | 180 + > gcc/config/i386/avx512bwintrin.h | 291 +- > gcc/config/i386/avx512dqintrin.h | 1840 +- > gcc/config/i386/avx512erintrin.h | 2 +- > gcc/config/i386/avx512fintrin.h | 19663 +++++++++--------- > gcc/config/i386/avx512fp16intrin.h | 8925 ++++---- > gcc/config/i386/avx512ifmaintrin.h | 4 +- > gcc/config/i386/avx512pfintrin.h | 2 +- > gcc/config/i386/avx512vbmi2intrin.h | 4 +- > gcc/config/i386/avx512vbmiintrin.h | 4 +- > gcc/config/i386/avx512vnniintrin.h | 4 +- > gcc/config/i386/avx512vp2intersectintrin.h | 4 +- > gcc/config/i386/avx512vpopcntdqintrin.h | 4 +- > gcc/config/i386/gfniintrin.h | 76 +- > gcc/config/i386/i386-builtin.def | 1312 +- > gcc/config/i386/i386-builtins.cc | 96 +- > gcc/config/i386/i386-c.cc | 2 + > gcc/config/i386/i386-expand.cc | 18 +- > gcc/config/i386/i386-options.cc | 33 +- > gcc/config/i386/i386.cc | 168 +- > gcc/config/i386/i386.h | 7 +- > gcc/config/i386/i386.md | 127 +- > gcc/config/i386/i386.opt | 4 + > gcc/config/i386/immintrin.h | 2 + > gcc/config/i386/predicates.md | 3 +- > gcc/config/i386/sse.md | 854 +- > gcc/config/i386/vaesintrin.h | 4 +- > gcc/config/i386/vpclmulqdqintrin.h | 4 +- > gcc/testsuite/gcc.target/i386/noevex512-1.c | 13 + > gcc/testsuite/gcc.target/i386/noevex512-2.c | 13 + > gcc/testsuite/gcc.target/i386/noevex512-3.c | 13 + > gcc/testsuite/gcc.target/i386/pr89229-5b.c | 2 +- > gcc/testsuite/gcc.target/i386/pr89229-6b.c | 2 +- > gcc/testsuite/gcc.target/i386/pr89229-7b.c | 2 +- > gcc/testsuite/gcc.target/i386/pr90096.c | 2 +- > 41 files changed, 17170 insertions(+), 16738 deletions(-) > create mode 100644 gcc/config/i386/avx512bitalgvlintrin.h > create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-3.c > > -- > 2.31.1 >
Thanks for the new patch! I see that there's a new __EVEX512__ define. Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?
Hi,
Thanks for you reply.
I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches.
For example:
I have a segment of code like:
#if defined(__EVEX512__):
__mm512.*__;
#else
__mm256.*__;
#endif
But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.
So the code should be:
#if defined(__EVEX512__):
__mm512.*__;
#elif defined(__EVEX256__):
__mm256.*__;
#else
__mm512.*__;
#endif
If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.
BRs,
Lin
-----Original Message-----
From: ZiNgA BuRgA <zingaburga@hotmail.com>
Sent: Thursday, September 28, 2023 8:32 AM
To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512
Thanks for the new patch!
I see that there's a new __EVEX512__ define. Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?
That sounds about right. The code I had in mind would perhaps look like: #if defined(__AVX512BW__) && defined(__AVX512VL__) #if defined(__EVEX256__) && !defined(__EVEX512__) // compiled code is AVX10.1/256 and AVX512 compatible #else // compiled code is only AVX512 compatible #endif // some code which only uses 256b instructions __m256i... #endif The '__EVEX256__' define would avoid needing to check compiler versions. Hopefully you can align it with whatever Clang does: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661/18 Thanks! On 28/09/2023 12:26 pm, Hu, Lin1 wrote: > Hi, > > Thanks for you reply. > > I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches. > > For example: > > I have a segment of code like: > #if defined(__EVEX512__): > __mm512.*__; > #else > __mm256.*__; > #endif > > But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit. > > So the code should be: > #if defined(__EVEX512__): > __mm512.*__; > #elif defined(__EVEX256__): > __mm256.*__; > #else > __mm512.*__; > #endif > > If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower. > > BRs, > Lin > > -----Original Message----- > From: ZiNgA BuRgA <zingaburga@hotmail.com> > Sent: Thursday, September 28, 2023 8:32 AM > To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512 > > Thanks for the new patch! > > I see that there's a new __EVEX512__ define. Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks? > >
On Thu, Sep 28, 2023 at 11:23 AM ZiNgA BuRgA <zingaburga@hotmail.com> wrote: > > That sounds about right. The code I had in mind would perhaps look like: > > > #if defined(__AVX512BW__) && defined(__AVX512VL__) > #if defined(__EVEX256__) && !defined(__EVEX512__) > // compiled code is AVX10.1/256 and AVX512 compatible > #else > // compiled code is only AVX512 compatible > #endif > > // some code which only uses 256b instructions > __m256i... > #endif > > > The '__EVEX256__' define would avoid needing to check compiler versions. Sounds reasonable, regarding how to set __EVEX256__, I think it should be set/unset along with __AVX512VL__ and __EVEX512__ should not unset __EVEX256__. > Hopefully you can align it with whatever Clang does: > https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661/18 > > Thanks! > > On 28/09/2023 12:26 pm, Hu, Lin1 wrote: > > Hi, > > > > Thanks for you reply. > > > > I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches. > > > > For example: > > > > I have a segment of code like: > > #if defined(__EVEX512__): > > __mm512.*__; > > #else > > __mm256.*__; > > #endif > > > > But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit. > > > > So the code should be: > > #if defined(__EVEX512__): > > __mm512.*__; > > #elif defined(__EVEX256__): > > __mm256.*__; > > #else > > __mm512.*__; > > #endif > > > > If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower. > > > > BRs, > > Lin > > > > -----Original Message----- > > From: ZiNgA BuRgA <zingaburga@hotmail.com> > > Sent: Thursday, September 28, 2023 8:32 AM > > To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512 > > > > Thanks for the new patch! > > > > I see that there's a new __EVEX512__ define. Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks? > > > > >