mbox series

[v2,0/4] aarch64: Add fp8 sve foundation

Message ID 20241108161020.921071-1-claudio.bantaloukas@arm.com
Headers show
Series aarch64: Add fp8 sve foundation | expand

Message

Claudio Bantaloukas Nov. 8, 2024, 4:10 p.m. UTC
The ACLE defines a new set of fp8 vector types and intrinsics that operate on
these, some of them operating on the vectors as if they were bags of bits and
some requiring an additional argument of type fpm_t.

The following patches introduce:
- the types
- intrinsics that operate without the fpm_t type
- foundational changes that will be used to implement intrinsics requiring an
  fpm_t argument at the end
- conversion intrinsics

Compared to v1 of this patch adds:
- A change has been added to fix return of scalar fp8 values
- Added tests for sve<->simd conversions
- Support for svcvt* intrinsics along with supporting shapes

Is this ok for master? I do not have commit rights yet, if ok, can someone commit it on my behalf?

Regression tested on aarch64-unknown-linux-gnu.

Thanks,
Claudio Bantaloukas


Claudio Bantaloukas (4):
  aarch64: return scalar fp8 values in fp registers
  aarch64: Add basic svmfloat8_t support to arm_sve.h
  aarch64: specify fpm mode in function instances and groups
  aarch64: add svcvt* FP8 intrinsics

 .../aarch64/aarch64-sve-builtins-base.cc      |  15 +-
 .../aarch64/aarch64-sve-builtins-base.def     |   3 +-
 .../aarch64/aarch64-sve-builtins-shapes.cc    |  77 ++++-
 .../aarch64/aarch64-sve-builtins-shapes.h     |   2 +
 .../aarch64/aarch64-sve-builtins-sme.def      | 130 ++++----
 .../aarch64/aarch64-sve-builtins-sve2.cc      |  48 ++-
 .../aarch64/aarch64-sve-builtins-sve2.def     | 108 ++++---
 .../aarch64/aarch64-sve-builtins-sve2.h       |   6 +
 gcc/config/aarch64/aarch64-sve-builtins.cc    |  61 +++-
 gcc/config/aarch64/aarch64-sve-builtins.def   |   7 +-
 gcc/config/aarch64/aarch64-sve-builtins.h     |  26 +-
 gcc/config/aarch64/aarch64-sve2.md            |  52 +++
 gcc/config/aarch64/aarch64.cc                 |   3 +-
 gcc/config/aarch64/aarch64.h                  |   5 +
 gcc/config/aarch64/iterators.md               |  25 ++
 .../aarch64/sve/acle/general-c++/mangle_1.C   |   2 +
 .../aarch64/sve/acle/general-c++/mangle_2.C   |   2 +
 .../gcc.target/aarch64/fp8_scalar_1.c         |   4 +-
 .../aarch64/sve/acle/asm/clasta_mf8.c         |  52 +++
 .../aarch64/sve/acle/asm/clastb_mf8.c         |  52 +++
 .../aarch64/sve/acle/asm/create2_1.c          |  15 +
 .../aarch64/sve/acle/asm/create3_1.c          |  11 +
 .../aarch64/sve/acle/asm/create4_1.c          |  12 +
 .../aarch64/sve/acle/asm/dup_lane_mf8.c       | 124 ++++++++
 .../gcc.target/aarch64/sve/acle/asm/dup_mf8.c |  31 ++
 .../aarch64/sve/acle/asm/dup_neonq_mf8.c      |  30 ++
 .../aarch64/sve/acle/asm/dupq_lane_mf8.c      |  48 +++
 .../gcc.target/aarch64/sve/acle/asm/ext_mf8.c |  73 +++++
 .../aarch64/sve/acle/asm/get2_mf8.c           |  55 ++++
 .../aarch64/sve/acle/asm/get3_mf8.c           | 108 +++++++
 .../aarch64/sve/acle/asm/get4_mf8.c           | 179 +++++++++++
 .../aarch64/sve/acle/asm/get_neonq_mf8.c      |  33 ++
 .../aarch64/sve/acle/asm/insr_mf8.c           |  22 ++
 .../aarch64/sve/acle/asm/lasta_mf8.c          |  12 +
 .../aarch64/sve/acle/asm/lastb_mf8.c          |  12 +
 .../gcc.target/aarch64/sve/acle/asm/ld1_mf8.c | 162 ++++++++++
 .../aarch64/sve/acle/asm/ld1ro_mf8.c          | 121 +++++++
 .../aarch64/sve/acle/asm/ld1rq_mf8.c          | 137 ++++++++
 .../gcc.target/aarch64/sve/acle/asm/ld2_mf8.c | 204 ++++++++++++
 .../gcc.target/aarch64/sve/acle/asm/ld3_mf8.c | 246 +++++++++++++++
 .../gcc.target/aarch64/sve/acle/asm/ld4_mf8.c | 290 +++++++++++++++++
 .../aarch64/sve/acle/asm/ldff1_mf8.c          |  91 ++++++
 .../aarch64/sve/acle/asm/ldnf1_mf8.c          | 155 +++++++++
 .../aarch64/sve/acle/asm/ldnt1_mf8.c          | 162 ++++++++++
 .../gcc.target/aarch64/sve/acle/asm/len_mf8.c |  12 +
 .../aarch64/sve/acle/asm/reinterpret_bf16.c   |  17 +
 .../aarch64/sve/acle/asm/reinterpret_f16.c    |  17 +
 .../aarch64/sve/acle/asm/reinterpret_f32.c    |  17 +
 .../aarch64/sve/acle/asm/reinterpret_f64.c    |  17 +
 .../aarch64/sve/acle/asm/reinterpret_mf8.c    | 297 ++++++++++++++++++
 .../aarch64/sve/acle/asm/reinterpret_s16.c    |  17 +
 .../aarch64/sve/acle/asm/reinterpret_s32.c    |  17 +
 .../aarch64/sve/acle/asm/reinterpret_s64.c    |  17 +
 .../aarch64/sve/acle/asm/reinterpret_s8.c     |  17 +
 .../aarch64/sve/acle/asm/reinterpret_u16.c    |  28 ++
 .../aarch64/sve/acle/asm/reinterpret_u32.c    |  28 ++
 .../aarch64/sve/acle/asm/reinterpret_u64.c    |  28 ++
 .../aarch64/sve/acle/asm/reinterpret_u8.c     |  28 ++
 .../gcc.target/aarch64/sve/acle/asm/rev_mf8.c |  21 ++
 .../gcc.target/aarch64/sve/acle/asm/sel_mf8.c |  30 ++
 .../aarch64/sve/acle/asm/set2_mf8.c           |  41 +++
 .../aarch64/sve/acle/asm/set3_mf8.c           |  63 ++++
 .../aarch64/sve/acle/asm/set4_mf8.c           |  87 +++++
 .../aarch64/sve/acle/asm/set_neonq_mf8.c      |  23 ++
 .../aarch64/sve/acle/asm/splice_mf8.c         |  33 ++
 .../gcc.target/aarch64/sve/acle/asm/st1_mf8.c | 162 ++++++++++
 .../gcc.target/aarch64/sve/acle/asm/st2_mf8.c | 204 ++++++++++++
 .../gcc.target/aarch64/sve/acle/asm/st3_mf8.c | 246 +++++++++++++++
 .../gcc.target/aarch64/sve/acle/asm/st4_mf8.c | 290 +++++++++++++++++
 .../aarch64/sve/acle/asm/stnt1_mf8.c          | 162 ++++++++++
 .../gcc.target/aarch64/sve/acle/asm/tbl_mf8.c |  30 ++
 .../aarch64/sve/acle/asm/test_sve_acle.h      |   2 +-
 .../aarch64/sve/acle/asm/trn1_mf8.c           |  30 ++
 .../aarch64/sve/acle/asm/trn1q_mf8.c          |  33 ++
 .../aarch64/sve/acle/asm/trn2_mf8.c           |  30 ++
 .../aarch64/sve/acle/asm/trn2q_mf8.c          |  33 ++
 .../aarch64/sve/acle/asm/undef2_1.c           |   7 +
 .../aarch64/sve/acle/asm/undef3_1.c           |   7 +
 .../aarch64/sve/acle/asm/undef4_1.c           |   7 +
 .../gcc.target/aarch64/sve/acle/asm/undef_1.c |   7 +
 .../aarch64/sve/acle/asm/uzp1_mf8.c           |  30 ++
 .../aarch64/sve/acle/asm/uzp1q_mf8.c          |  33 ++
 .../aarch64/sve/acle/asm/uzp2_mf8.c           |  30 ++
 .../aarch64/sve/acle/asm/uzp2q_mf8.c          |  33 ++
 .../aarch64/sve/acle/asm/zip1_mf8.c           |  30 ++
 .../aarch64/sve/acle/asm/zip1q_mf8.c          |  33 ++
 .../aarch64/sve/acle/asm/zip2_mf8.c           |  30 ++
 .../aarch64/sve/acle/asm/zip2q_mf8.c          |  33 ++
 .../general-c/unary_convert_narrowxn_fpm_1.c  |  38 +++
 .../acle/general-c/unary_convertxn_fpm_1.c    |  60 ++++
 .../gcc.target/aarch64/sve/pcs/annotate_1.c   |   8 +
 .../gcc.target/aarch64/sve/pcs/annotate_2.c   |   8 +
 .../gcc.target/aarch64/sve/pcs/annotate_3.c   |   8 +
 .../gcc.target/aarch64/sve/pcs/annotate_4.c   |  12 +
 .../gcc.target/aarch64/sve/pcs/annotate_5.c   |  12 +
 .../gcc.target/aarch64/sve/pcs/annotate_6.c   |  12 +
 .../gcc.target/aarch64/sve/pcs/annotate_7.c   |   8 +
 .../aarch64/sve/pcs/args_5_be_mf8.c           |  63 ++++
 .../aarch64/sve/pcs/args_5_le_mf8.c           |  58 ++++
 .../aarch64/sve/pcs/args_6_be_mf8.c           |  71 +++++
 .../aarch64/sve/pcs/args_6_le_mf8.c           |  70 +++++
 .../aarch64/sve/pcs/gnu_vectors_1.c           |  12 +-
 .../aarch64/sve/pcs/gnu_vectors_2.c           |  10 +-
 .../gcc.target/aarch64/sve/pcs/return_4.c     |  21 +-
 .../aarch64/sve/pcs/return_4_1024.c           |  21 +-
 .../gcc.target/aarch64/sve/pcs/return_4_128.c |  21 +-
 .../aarch64/sve/pcs/return_4_2048.c           |  21 +-
 .../gcc.target/aarch64/sve/pcs/return_4_256.c |  21 +-
 .../gcc.target/aarch64/sve/pcs/return_4_512.c |  21 +-
 .../gcc.target/aarch64/sve/pcs/return_5.c     |  21 +-
 .../aarch64/sve/pcs/return_5_1024.c           |  21 +-
 .../gcc.target/aarch64/sve/pcs/return_5_128.c |  21 +-
 .../aarch64/sve/pcs/return_5_2048.c           |  21 +-
 .../gcc.target/aarch64/sve/pcs/return_5_256.c |  21 +-
 .../gcc.target/aarch64/sve/pcs/return_5_512.c |  21 +-
 .../gcc.target/aarch64/sve/pcs/return_6.c     |  24 ++
 .../aarch64/sve/pcs/return_6_1024.c           |  22 ++
 .../gcc.target/aarch64/sve/pcs/return_6_128.c |  19 ++
 .../aarch64/sve/pcs/return_6_2048.c           |  22 ++
 .../gcc.target/aarch64/sve/pcs/return_6_256.c |  22 ++
 .../gcc.target/aarch64/sve/pcs/return_6_512.c |  22 ++
 .../gcc.target/aarch64/sve/pcs/return_7.c     |  28 ++
 .../gcc.target/aarch64/sve/pcs/return_8.c     |  29 ++
 .../gcc.target/aarch64/sve/pcs/return_9.c     |  33 ++
 .../aarch64/sve/pcs/varargs_2_mf8.c           | 170 ++++++++++
 .../aarch64/sve2/acle/asm/cvt_mf8.c           |  48 +++
 .../aarch64/sve2/acle/asm/cvtlt_mf8.c         |  47 +++
 .../aarch64/sve2/acle/asm/cvtn_mf8.c          |  59 ++++
 .../aarch64/sve2/acle/asm/tbl2_mf8.c          |  31 ++
 .../aarch64/sve2/acle/asm/tbx_mf8.c           |  37 +++
 .../aarch64/sve2/acle/asm/whilerw_mf8.c       |  50 +++
 .../aarch64/sve2/acle/asm/whilewr_mf8.c       |  50 +++
 gcc/testsuite/lib/target-supports.exp         |   2 +-
 133 files changed, 6618 insertions(+), 169 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/clasta_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/clastb_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dup_lane_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dup_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dup_neonq_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dupq_lane_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ext_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get2_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get3_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get4_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get_neonq_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/insr_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lasta_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lastb_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1rq_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld2_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld3_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld4_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnt1_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/len_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rev_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sel_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set2_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set3_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set4_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set_neonq_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/splice_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st2_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st3_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st4_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/stnt1_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tbl_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn1_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn1q_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn2_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn2q_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp1_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp1q_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp2_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp2q_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip1_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip1q_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip2_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip2q_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/unary_convert_narrowxn_fpm_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/unary_convertxn_fpm_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pcs/args_5_be_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pcs/args_5_le_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pcs/args_6_be_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pcs/args_6_le_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pcs/varargs_2_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvtlt_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvtn_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/tbl2_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/tbx_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/whilerw_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/whilewr_mf8.c