mbox series

[00/12] AVX10.2: Support new instructions

Message ID 20240819085717.193256-1-haochen.jiang@intel.com
Headers show
Series AVX10.2: Support new instructions | expand

Message

Haochen Jiang Aug. 19, 2024, 8:56 a.m. UTC
Hi all,

The AVX10.2 ymm rounding patches has been merged to trunk around
6 hours ago. As mentioned before, next step will be AVX10.2 new
instruction support.

This patch series could be divided into three part.

The first patch will refactor m512-check.h under testsuite to reuse
AVX-512 helper functions and unions and avoid ABI warnings when using
AVX10.

The following ten patches will support all AVX10.2 new instrctions,
including:

  - AI Datatypes, Conversions, and post-Convolution Instructions.
  - Media Acceleration.
  - IEEE-754-2019 Minimum and Maximum Support.
  - Saturating Conversions.
  - Zero-extending Partial Vector Copies.
  - FP Scalar Comparison.

For FP Scalar Comparison part (a.k.a comx instructions), we will only
provide pattern support but not intrin support since it is redundant
with comi ones for common usage. We will also add some optimizations
afterwards for common usage with comx instructions. If there are some
strong requests, we will add intrin support in the future.

The final patch will add bf8 -> fp16 intrin for convenience. Since the
conversion from bf8 to fp16 is only casting for fraction part due to
same bits for exponent part, we will use a sequence of instructions
instead of new instructions. It is just like the scenario for bf16 ->
fp32 conversion.

After all these patch merged, the next step would be optimizations based
on AVX10.2 new instructions, including vnni vectorization, bf16
vectorization, comx optmization, etc.

Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk?

Thx,
Haochen

Comments

Hongtao Liu Aug. 26, 2024, 1:45 a.m. UTC | #1
On Mon, Aug 19, 2024 at 4:57 PM Haochen Jiang <haochen.jiang@intel.com> wrote:
>
> Hi all,
>
> The AVX10.2 ymm rounding patches has been merged to trunk around
> 6 hours ago. As mentioned before, next step will be AVX10.2 new
> instruction support.
>
> This patch series could be divided into three part.
>
> The first patch will refactor m512-check.h under testsuite to reuse
> AVX-512 helper functions and unions and avoid ABI warnings when using
> AVX10.
>
> The following ten patches will support all AVX10.2 new instrctions,
> including:
>
>   - AI Datatypes, Conversions, and post-Convolution Instructions.
>   - Media Acceleration.
>   - IEEE-754-2019 Minimum and Maximum Support.
>   - Saturating Conversions.
>   - Zero-extending Partial Vector Copies.
>   - FP Scalar Comparison.
>
> For FP Scalar Comparison part (a.k.a comx instructions), we will only
> provide pattern support but not intrin support since it is redundant
> with comi ones for common usage. We will also add some optimizations
> afterwards for common usage with comx instructions. If there are some
> strong requests, we will add intrin support in the future.
>
> The final patch will add bf8 -> fp16 intrin for convenience. Since the
> conversion from bf8 to fp16 is only casting for fraction part due to
> same bits for exponent part, we will use a sequence of instructions
> instead of new instructions. It is just like the scenario for bf16 ->
> fp32 conversion.
>
> After all these patch merged, the next step would be optimizations based
> on AVX10.2 new instructions, including vnni vectorization, bf16
> vectorization, comx optmization, etc.
>
> Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk?
Ok for all 12 patches.
>
> Thx,
> Haochen
>