[00/18] Support -mevex512 for AVX512

Message ID	20230921072013.2124750-1-lin1.hu@intel.com
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F05B53857BB3 From: "Hu, Lin1" <lin1.hu@intel.com> To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 00/18] Support -mevex512 for AVX512 Date: Thu, 21 Sep 2023 15:19:55 +0800 Message-Id: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	Support -mevex512 for AVX512 \| expand [00/18] Support -mevex512 for AVX512 [01/18] Initial support for -mevex512 [03/18,2/5] Push evex512 target for 512 bit intrins [04/18,3/5] Push evex512 target for 512 bit intrins [05/18,4/5] Push evex512 target for 512 bit intrins [06/18,5/5] Push evex512 target for 512 bit intrins [07/18,1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins [08/18,2/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins [09/18,3/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins [10/18,4/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins [11/18,5/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins [12/18] Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 [13/18] Support -mevex512 for AVX512F intrins [14/18] Support -mevex512 for AVX512DQ intrins [15/18] Support -mevex512 for AVX512BW intrins [16/18] Support -mevex512 for AVX512{IFMA, VBMI, VNNI, BF16, VPOPCNTDQ, VBMI2, BITALG, VP2INTERSECT… [17/18] Support -mevex512 for AVX512FP16 intrins [18/18] Allow -mno-evex512 usage

Hu, Lin1 Sept. 21, 2023, 7:19 a.m. UTC

Hi all,

After previous discussion, instead of supporting option -mavx10.1, we
will first introduct option -m[no-]evex512, which will enable/disable
512 bit register and 64 bit mask register.

It will not change the current option behavior since if AVX512F is
enabled with no evex512 option specified, it will automatically enable
512 bit register and 64 bit mask register.

How the patches go comes following:

Patch 1 added initial support for option -mevex512.

Patch 2-6 refined current intrin file to push evex512 target for all
512 bit intrins. Those scalar intrins remained untouched.

Patch 7-11 added OPTION_MASK_ISA2_EVEX512 for all related builtins.

Patch 12 disabled zmm register, 512 bit libmvec call for no-evex512,
also requested evex512 for vectorization when using 512 bit register.

Patch 13-17 supported evex512 in related patterns.

Patch 18 added testcases for -mno-evex512 and allowed its usage.

The patches currently cause scan-asm fail for pr89229-{5,6,7}b.c since
we will emit scalar vmovss here. When trying to use x/ymm 16+ w/o
avx512vl but with avx512f+evex512, I suppose we could either emit scalar
or zmm instructions. It is quite a rare case on HW since there is no
HW w/o avx512vl but with avx512f, so I prefer to not to add maintainence
effort here to get a slightly perf improvement. But it could be changed
to former behavior.

Discussions are welcomed for all the patches.

Thx,
Haochen

Haochen Jiang (18):
  Initial support for -mevex512
  Push evex512 target for 512 bit intrins
  Push evex512 target for 512 bit intrins
  Push evex512 target for 512 bit intrins
  Push evex512 target for 512 bit intrins
  Push evex512 target for 512 bit intrins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
  Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512
  Support -mevex512 for AVX512F intrins
  Support -mevex512 for AVX512DQ intrins
  Support -mevex512 for AVX512BW intrins
  Support -mevex512 for
    AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ
    intrins
  Support -mevex512 for AVX512FP16 intrins
  Allow -mno-evex512 usage

 gcc/common/config/i386/i386-common.cc       |    15 +
 gcc/config.gcc                              |    19 +-
 gcc/config/i386/avx5124fmapsintrin.h        |     2 +-
 gcc/config/i386/avx5124vnniwintrin.h        |     2 +-
 gcc/config/i386/avx512bf16intrin.h          |    31 +-
 gcc/config/i386/avx512bitalgintrin.h        |   155 +-
 gcc/config/i386/avx512bitalgvlintrin.h      |   180 +
 gcc/config/i386/avx512bwintrin.h            |   291 +-
 gcc/config/i386/avx512dqintrin.h            |  1840 +-
 gcc/config/i386/avx512erintrin.h            |     2 +-
 gcc/config/i386/avx512fintrin.h             | 19663 +++++++++---------
 gcc/config/i386/avx512fp16intrin.h          |  8925 ++++----
 gcc/config/i386/avx512ifmaintrin.h          |     4 +-
 gcc/config/i386/avx512pfintrin.h            |     2 +-
 gcc/config/i386/avx512vbmi2intrin.h         |     4 +-
 gcc/config/i386/avx512vbmiintrin.h          |     4 +-
 gcc/config/i386/avx512vnniintrin.h          |     4 +-
 gcc/config/i386/avx512vp2intersectintrin.h  |     4 +-
 gcc/config/i386/avx512vpopcntdqintrin.h     |     4 +-
 gcc/config/i386/gfniintrin.h                |    76 +-
 gcc/config/i386/i386-builtin.def            |  1312 +-
 gcc/config/i386/i386-builtins.cc            |    96 +-
 gcc/config/i386/i386-c.cc                   |     2 +
 gcc/config/i386/i386-expand.cc              |    18 +-
 gcc/config/i386/i386-options.cc             |    33 +-
 gcc/config/i386/i386.cc                     |   168 +-
 gcc/config/i386/i386.h                      |     7 +-
 gcc/config/i386/i386.md                     |   127 +-
 gcc/config/i386/i386.opt                    |     4 +
 gcc/config/i386/immintrin.h                 |     2 +
 gcc/config/i386/predicates.md               |     3 +-
 gcc/config/i386/sse.md                      |   854 +-
 gcc/config/i386/vaesintrin.h                |     4 +-
 gcc/config/i386/vpclmulqdqintrin.h          |     4 +-
 gcc/testsuite/gcc.target/i386/noevex512-1.c |    13 +
 gcc/testsuite/gcc.target/i386/noevex512-2.c |    13 +
 gcc/testsuite/gcc.target/i386/noevex512-3.c |    13 +
 gcc/testsuite/gcc.target/i386/pr89229-5b.c  |     2 +-
 gcc/testsuite/gcc.target/i386/pr89229-6b.c  |     2 +-
 gcc/testsuite/gcc.target/i386/pr89229-7b.c  |     2 +-
 gcc/testsuite/gcc.target/i386/pr90096.c     |     2 +-
 41 files changed, 17170 insertions(+), 16738 deletions(-)
 create mode 100644 gcc/config/i386/avx512bitalgvlintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-3.c

Hongtao Liu Sept. 22, 2023, 3:30 a.m. UTC | #1

On Thu, Sep 21, 2023 at 3:22 PM Hu, Lin1 <lin1.hu@intel.com> wrote:
>
> Hi all,
>
> After previous discussion, instead of supporting option -mavx10.1, we
> will first introduct option -m[no-]evex512, which will enable/disable
> 512 bit register and 64 bit mask register.
>
> It will not change the current option behavior since if AVX512F is
> enabled with no evex512 option specified, it will automatically enable
> 512 bit register and 64 bit mask register.
>
> How the patches go comes following:
>
> Patch 1 added initial support for option -mevex512.
>
> Patch 2-6 refined current intrin file to push evex512 target for all
> 512 bit intrins. Those scalar intrins remained untouched.
>
> Patch 7-11 added OPTION_MASK_ISA2_EVEX512 for all related builtins.
>
> Patch 12 disabled zmm register, 512 bit libmvec call for no-evex512,
> also requested evex512 for vectorization when using 512 bit register.
>
> Patch 13-17 supported evex512 in related patterns.
>
> Patch 18 added testcases for -mno-evex512 and allowed its usage.
>
> The patches currently cause scan-asm fail for pr89229-{5,6,7}b.c since
> we will emit scalar vmovss here. When trying to use x/ymm 16+ w/o
> avx512vl but with avx512f+evex512, I suppose we could either emit scalar
> or zmm instructions. It is quite a rare case on HW since there is no
> HW w/o avx512vl but with avx512f, so I prefer to not to add maintainence
> effort here to get a slightly perf improvement. But it could be changed
> to former behavior.
To make it easier for people to test before committing, I pushed the
patch to the vendor branch
refs/vendors/ix86/heads/evex512.
Welcome to try it out.

>
> Discussions are welcomed for all the patches.
>
> Thx,
> Haochen
>
> Haochen Jiang (18):
>   Initial support for -mevex512
>   Push evex512 target for 512 bit intrins
>   Push evex512 target for 512 bit intrins
>   Push evex512 target for 512 bit intrins
>   Push evex512 target for 512 bit intrins
>   Push evex512 target for 512 bit intrins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
>   Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512
>   Support -mevex512 for AVX512F intrins
>   Support -mevex512 for AVX512DQ intrins
>   Support -mevex512 for AVX512BW intrins
>   Support -mevex512 for
>     AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ
>     intrins
>   Support -mevex512 for AVX512FP16 intrins
>   Allow -mno-evex512 usage
>
>  gcc/common/config/i386/i386-common.cc       |    15 +
>  gcc/config.gcc                              |    19 +-
>  gcc/config/i386/avx5124fmapsintrin.h        |     2 +-
>  gcc/config/i386/avx5124vnniwintrin.h        |     2 +-
>  gcc/config/i386/avx512bf16intrin.h          |    31 +-
>  gcc/config/i386/avx512bitalgintrin.h        |   155 +-
>  gcc/config/i386/avx512bitalgvlintrin.h      |   180 +
>  gcc/config/i386/avx512bwintrin.h            |   291 +-
>  gcc/config/i386/avx512dqintrin.h            |  1840 +-
>  gcc/config/i386/avx512erintrin.h            |     2 +-
>  gcc/config/i386/avx512fintrin.h             | 19663 +++++++++---------
>  gcc/config/i386/avx512fp16intrin.h          |  8925 ++++----
>  gcc/config/i386/avx512ifmaintrin.h          |     4 +-
>  gcc/config/i386/avx512pfintrin.h            |     2 +-
>  gcc/config/i386/avx512vbmi2intrin.h         |     4 +-
>  gcc/config/i386/avx512vbmiintrin.h          |     4 +-
>  gcc/config/i386/avx512vnniintrin.h          |     4 +-
>  gcc/config/i386/avx512vp2intersectintrin.h  |     4 +-
>  gcc/config/i386/avx512vpopcntdqintrin.h     |     4 +-
>  gcc/config/i386/gfniintrin.h                |    76 +-
>  gcc/config/i386/i386-builtin.def            |  1312 +-
>  gcc/config/i386/i386-builtins.cc            |    96 +-
>  gcc/config/i386/i386-c.cc                   |     2 +
>  gcc/config/i386/i386-expand.cc              |    18 +-
>  gcc/config/i386/i386-options.cc             |    33 +-
>  gcc/config/i386/i386.cc                     |   168 +-
>  gcc/config/i386/i386.h                      |     7 +-
>  gcc/config/i386/i386.md                     |   127 +-
>  gcc/config/i386/i386.opt                    |     4 +
>  gcc/config/i386/immintrin.h                 |     2 +
>  gcc/config/i386/predicates.md               |     3 +-
>  gcc/config/i386/sse.md                      |   854 +-
>  gcc/config/i386/vaesintrin.h                |     4 +-
>  gcc/config/i386/vpclmulqdqintrin.h          |     4 +-
>  gcc/testsuite/gcc.target/i386/noevex512-1.c |    13 +
>  gcc/testsuite/gcc.target/i386/noevex512-2.c |    13 +
>  gcc/testsuite/gcc.target/i386/noevex512-3.c |    13 +
>  gcc/testsuite/gcc.target/i386/pr89229-5b.c  |     2 +-
>  gcc/testsuite/gcc.target/i386/pr89229-6b.c  |     2 +-
>  gcc/testsuite/gcc.target/i386/pr89229-7b.c  |     2 +-
>  gcc/testsuite/gcc.target/i386/pr90096.c     |     2 +-
>  41 files changed, 17170 insertions(+), 16738 deletions(-)
>  create mode 100644 gcc/config/i386/avx512bitalgvlintrin.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-3.c
>
> --
> 2.31.1
>

ZiNgA BuRgA Sept. 28, 2023, 12:32 a.m. UTC | #2

Thanks for the new patch!

I see that there's a new __EVEX512__ define.  Will there be some 
__EVEX256__ (or maybe some max EVEX width) define, so that code can 
detect whether the compiler supports AVX10.1/256 without resorting to 
version checks?

Hu, Lin1 Sept. 28, 2023, 2:26 a.m. UTC | #3

Hi, 

Thanks for you reply.

I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches. 

For example:

I have a segment of code like:
#if defined(__EVEX512__):
__mm512.*__;
#else
__mm256.*__;
#endif

But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.

So the code should be:
#if defined(__EVEX512__):
__mm512.*__;
#elif defined(__EVEX256__):
__mm256.*__;
#else
__mm512.*__;
#endif

If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.

BRs,
Lin

-----Original Message-----
From: ZiNgA BuRgA <zingaburga@hotmail.com> 
Sent: Thursday, September 28, 2023 8:32 AM
To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512

Thanks for the new patch!

I see that there's a new __EVEX512__ define.  Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?

ZiNgA BuRgA Sept. 28, 2023, 3:23 a.m. UTC | #4

That sounds about right.  The code I had in mind would perhaps look like:


#if defined(__AVX512BW__) && defined(__AVX512VL__)
     #if defined(__EVEX256__) && !defined(__EVEX512__)
         // compiled code is AVX10.1/256 and AVX512 compatible
     #else
         // compiled code is only AVX512 compatible
     #endif

     // some code which only uses 256b instructions
     __m256i...
#endif


The '__EVEX256__' define would avoid needing to check compiler versions.
Hopefully you can align it with whatever Clang does: 
https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661/18

Thanks!

On 28/09/2023 12:26 pm, Hu, Lin1 wrote:
> Hi,
>
> Thanks for you reply.
>
> I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches.
>
> For example:
>
> I have a segment of code like:
> #if defined(__EVEX512__):
> __mm512.*__;
> #else
> __mm256.*__;
> #endif
>
> But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.
>
> So the code should be:
> #if defined(__EVEX512__):
> __mm512.*__;
> #elif defined(__EVEX256__):
> __mm256.*__;
> #else
> __mm512.*__;
> #endif
>
> If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.
>
> BRs,
> Lin
>
> -----Original Message-----
> From: ZiNgA BuRgA <zingaburga@hotmail.com>
> Sent: Thursday, September 28, 2023 8:32 AM
> To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512
>
> Thanks for the new patch!
>
> I see that there's a new __EVEX512__ define.  Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?
>
>

Hongtao Liu Oct. 7, 2023, 2:33 a.m. UTC | #5

On Thu, Sep 28, 2023 at 11:23 AM ZiNgA BuRgA <zingaburga@hotmail.com> wrote:
>
> That sounds about right.  The code I had in mind would perhaps look like:
>
>
> #if defined(__AVX512BW__) && defined(__AVX512VL__)
>      #if defined(__EVEX256__) && !defined(__EVEX512__)
>          // compiled code is AVX10.1/256 and AVX512 compatible
>      #else
>          // compiled code is only AVX512 compatible
>      #endif
>
>      // some code which only uses 256b instructions
>      __m256i...
> #endif
>
>
> The '__EVEX256__' define would avoid needing to check compiler versions.
Sounds reasonable, regarding how to set __EVEX256__, I think it should
be set/unset along with __AVX512VL__ and __EVEX512__ should not unset
__EVEX256__.

> Hopefully you can align it with whatever Clang does:
> https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661/18

>
> Thanks!
>
> On 28/09/2023 12:26 pm, Hu, Lin1 wrote:
> > Hi,
> >
> > Thanks for you reply.
> >
> > I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches.
> >
> > For example:
> >
> > I have a segment of code like:
> > #if defined(__EVEX512__):
> > __mm512.*__;
> > #else
> > __mm256.*__;
> > #endif
> >
> > But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.
> >
> > So the code should be:
> > #if defined(__EVEX512__):
> > __mm512.*__;
> > #elif defined(__EVEX256__):
> > __mm256.*__;
> > #else
> > __mm512.*__;
> > #endif
> >
> > If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.
> >
> > BRs,
> > Lin
> >
> > -----Original Message-----
> > From: ZiNgA BuRgA <zingaburga@hotmail.com>
> > Sent: Thursday, September 28, 2023 8:32 AM
> > To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512
> >
> > Thanks for the new patch!
> >
> > I see that there's a new __EVEX512__ define.  Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?
> >
> >
>

[00/18] Support -mevex512 for AVX512

Message

Comments