[00/12] AVX10.2: Support new instructions

Message ID	20240819085717.193256-1-haochen.jiang@intel.com
Headers	show Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 15F6E3864C64 From: Haochen Jiang <haochen.jiang@intel.com> To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com Subject: [PATCH 00/12] AVX10.2: Support new instructions Date: Mon, 19 Aug 2024 01:56:44 -0700 Message-ID: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	AVX10.2: Support new instructions \| expand [00/12] AVX10.2: Support new instructions [01/12] i386: Refactor m512-check.h [02/12,1/2] AVX10.2: Support media instructions [03/12,2/2] AVX10.2: Support media instructions [04/12] AVX10.2: Support convert instructions [05/12,1/2] AVX10.2: Support BF16 instructions [06/12,2/2] AVX10.2: Support BF16 instructions [07/12,1/2] AVX10.2: Support saturating convert instructions [08/12,2/2] AVX10.2: Support saturating convert instructions [09/12] AVX10.2: Support minmax instructions [10/12] AVX10.2: Support vector copy instructions [11/12] AVX10.2: Support compare instructions [12/12] i386: Add bf8 -> fp16 intrin

Message ID

20240819085717.193256-1-haochen.jiang@intel.com

Headers

DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 15F6E3864C64
From: Haochen Jiang <haochen.jiang@intel.com>
To: gcc-patches@gcc.gnu.org
Cc: hongtao.liu@intel.com,
	zewei.mo@pitt.edu,
	ubizjak@gmail.com
Subject: [PATCH 00/12] AVX10.2: Support new instructions
Date: Mon, 19 Aug 2024 01:56:44 -0700
Message-ID: <20240819085717.193256-1-haochen.jiang@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

Series

AVX10.2: Support new instructions | expand

Message

Haochen Jiang Aug. 19, 2024, 8:56 a.m. UTC

Hi all,

The AVX10.2 ymm rounding patches has been merged to trunk around
6 hours ago. As mentioned before, next step will be AVX10.2 new
instruction support.

This patch series could be divided into three part.

The first patch will refactor m512-check.h under testsuite to reuse
AVX-512 helper functions and unions and avoid ABI warnings when using
AVX10.

The following ten patches will support all AVX10.2 new instrctions,
including:

  - AI Datatypes, Conversions, and post-Convolution Instructions.
  - Media Acceleration.
  - IEEE-754-2019 Minimum and Maximum Support.
  - Saturating Conversions.
  - Zero-extending Partial Vector Copies.
  - FP Scalar Comparison.

For FP Scalar Comparison part (a.k.a comx instructions), we will only
provide pattern support but not intrin support since it is redundant
with comi ones for common usage. We will also add some optimizations
afterwards for common usage with comx instructions. If there are some
strong requests, we will add intrin support in the future.

The final patch will add bf8 -> fp16 intrin for convenience. Since the
conversion from bf8 to fp16 is only casting for fraction part due to
same bits for exponent part, we will use a sequence of instructions
instead of new instructions. It is just like the scenario for bf16 ->
fp32 conversion.

After all these patch merged, the next step would be optimizations based
on AVX10.2 new instructions, including vnni vectorization, bf16
vectorization, comx optmization, etc.

Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk?

Thx,
Haochen

Comments

Hongtao Liu Aug. 26, 2024, 1:45 a.m. UTC | #1

On Mon, Aug 19, 2024 at 4:57 PM Haochen Jiang <haochen.jiang@intel.com> wrote:
>
> Hi all,
>
> The AVX10.2 ymm rounding patches has been merged to trunk around
> 6 hours ago. As mentioned before, next step will be AVX10.2 new
> instruction support.
>
> This patch series could be divided into three part.
>
> The first patch will refactor m512-check.h under testsuite to reuse
> AVX-512 helper functions and unions and avoid ABI warnings when using
> AVX10.
>
> The following ten patches will support all AVX10.2 new instrctions,
> including:
>
>   - AI Datatypes, Conversions, and post-Convolution Instructions.
>   - Media Acceleration.
>   - IEEE-754-2019 Minimum and Maximum Support.
>   - Saturating Conversions.
>   - Zero-extending Partial Vector Copies.
>   - FP Scalar Comparison.
>
> For FP Scalar Comparison part (a.k.a comx instructions), we will only
> provide pattern support but not intrin support since it is redundant
> with comi ones for common usage. We will also add some optimizations
> afterwards for common usage with comx instructions. If there are some
> strong requests, we will add intrin support in the future.
>
> The final patch will add bf8 -> fp16 intrin for convenience. Since the
> conversion from bf8 to fp16 is only casting for fraction part due to
> same bits for exponent part, we will use a sequence of instructions
> instead of new instructions. It is just like the scenario for bf16 ->
> fp32 conversion.
>
> After all these patch merged, the next step would be optimizations based
> on AVX10.2 new instructions, including vnni vectorization, bf16
> vectorization, comx optmization, etc.
>
> Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk?
Ok for all 12 patches.
>
> Thx,
> Haochen
>