Message ID | 20240916093819.12740-1-christophe.lyon@arm.com |
---|---|
Headers | show |
Series | arm, MVE: Refactor the vst and vld intrinsics | expand |
ping? On Mon, 16 Sept 2024 at 11:39, Christophe Lyon <christophe.lyon@arm.com> wrote: > > From: Alfie Richards <Alfie.Richards@arm.com> > > Hi, > > This patch series refactors the MVE vst and vld intrinsics to use the builtins > framework. > > This is a prerequisite for a later patch which adds gimple folding which in > turn enables some optimisations that were being missed. > > I have crosscompiled on an x64 machine and regtested it. > > Ok for master? > > Thanks, > Alfie Richards > > Alfie Richards (5): > arm: [MVE intrinsics] fix vst tests > arm: [MVE intrinsics] Add load_ext intrinsic shape > arm: [MVE intrinsics] Add load_extending and store_truncating function > bases > arm: [MVE intrinsics] Add support for predicated contiguous loads and > stores > arm: [MVE intrinsics] Rework MVE vld/vst intrinsics > > gcc/config/arm/arm-mve-builtins-base.cc | 135 ++- > gcc/config/arm/arm-mve-builtins-base.def | 20 +- > gcc/config/arm/arm-mve-builtins-base.h | 6 + > gcc/config/arm/arm-mve-builtins-functions.h | 119 ++- > gcc/config/arm/arm-mve-builtins-shapes.cc | 30 +- > gcc/config/arm/arm-mve-builtins-shapes.h | 1 + > gcc/config/arm/arm-mve-builtins.cc | 19 +- > gcc/config/arm/arm-protos.h | 3 + > gcc/config/arm/arm.cc | 15 + > gcc/config/arm/arm_mve.h | 978 +----------------- > gcc/config/arm/arm_mve_builtins.def | 38 - > gcc/config/arm/iterators.md | 37 +- > gcc/config/arm/mve.md | 662 ++++-------- > gcc/config/arm/unspecs.md | 29 +- > .../arm/mve/intrinsics/vst1q_p_f16.c | 4 +- > .../arm/mve/intrinsics/vst1q_p_f32.c | 4 +- > .../arm/mve/intrinsics/vst1q_p_s16.c | 4 +- > .../arm/mve/intrinsics/vst1q_p_s32.c | 4 +- > .../arm/mve/intrinsics/vst1q_p_s8.c | 4 +- > .../arm/mve/intrinsics/vst1q_p_u16.c | 4 +- > .../arm/mve/intrinsics/vst1q_p_u32.c | 4 +- > .../arm/mve/intrinsics/vst1q_p_u8.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst2q_f16.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst2q_f32.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst2q_s16.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst2q_s32.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst2q_s8.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst2q_u16.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst2q_u32.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst2q_u8.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst4q_f16.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst4q_f32.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst4q_s16.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst4q_s32.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst4q_s8.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst4q_u16.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst4q_u32.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vst4q_u8.c | 4 +- > .../arm/mve/intrinsics/vstrbq_p_s16.c | 4 +- > .../arm/mve/intrinsics/vstrbq_p_s32.c | 4 +- > .../arm/mve/intrinsics/vstrbq_p_s8.c | 4 +- > .../arm/mve/intrinsics/vstrbq_p_u16.c | 4 +- > .../arm/mve/intrinsics/vstrbq_p_u32.c | 4 +- > .../arm/mve/intrinsics/vstrbq_p_u8.c | 4 +- > .../arm/mve/intrinsics/vstrbq_s16.c | 4 +- > .../arm/mve/intrinsics/vstrbq_s32.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vstrbq_s8.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_p_s16.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_p_s32.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_p_s8.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_p_u16.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_p_u32.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_p_u8.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_s16.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_s32.c | 4 +- > .../mve/intrinsics/vstrbq_scatter_offset_s8.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_u16.c | 4 +- > .../intrinsics/vstrbq_scatter_offset_u32.c | 4 +- > .../mve/intrinsics/vstrbq_scatter_offset_u8.c | 4 +- > .../arm/mve/intrinsics/vstrbq_u16.c | 4 +- > .../arm/mve/intrinsics/vstrbq_u32.c | 4 +- > .../gcc.target/arm/mve/intrinsics/vstrbq_u8.c | 4 +- > .../intrinsics/vstrdq_scatter_base_p_s64.c | 4 +- > .../intrinsics/vstrdq_scatter_base_p_u64.c | 4 +- > .../mve/intrinsics/vstrdq_scatter_base_s64.c | 4 +- > .../mve/intrinsics/vstrdq_scatter_base_u64.c | 4 +- > .../intrinsics/vstrdq_scatter_base_wb_p_s64.c | 4 +- > .../intrinsics/vstrdq_scatter_base_wb_p_u64.c | 4 +- > .../intrinsics/vstrdq_scatter_base_wb_s64.c | 4 +- > .../intrinsics/vstrdq_scatter_base_wb_u64.c | 4 +- > .../intrinsics/vstrdq_scatter_offset_p_s64.c | 4 +- > .../intrinsics/vstrdq_scatter_offset_p_u64.c | 4 +- > .../intrinsics/vstrdq_scatter_offset_s64.c | 4 +- > .../intrinsics/vstrdq_scatter_offset_u64.c | 4 +- > .../vstrdq_scatter_shifted_offset_p_s64.c | 4 +- > .../vstrdq_scatter_shifted_offset_p_u64.c | 4 +- > .../vstrdq_scatter_shifted_offset_s64.c | 4 +- > .../vstrdq_scatter_shifted_offset_u64.c | 4 +- > .../arm/mve/intrinsics/vstrhq_f16.c | 4 +- > .../arm/mve/intrinsics/vstrhq_p_f16.c | 4 +- > .../arm/mve/intrinsics/vstrhq_p_s16.c | 4 +- > .../arm/mve/intrinsics/vstrhq_p_s32.c | 4 +- > .../arm/mve/intrinsics/vstrhq_p_u16.c | 4 +- > .../arm/mve/intrinsics/vstrhq_p_u32.c | 4 +- > .../arm/mve/intrinsics/vstrhq_s16.c | 4 +- > .../arm/mve/intrinsics/vstrhq_s32.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_f16.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_p_f16.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_p_s16.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_p_s32.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_p_u16.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_p_u32.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_s16.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_s32.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_u16.c | 4 +- > .../intrinsics/vstrhq_scatter_offset_u32.c | 4 +- > .../vstrhq_scatter_shifted_offset_f16.c | 4 +- > .../vstrhq_scatter_shifted_offset_p_f16.c | 4 +- > .../vstrhq_scatter_shifted_offset_p_s16.c | 4 +- > .../vstrhq_scatter_shifted_offset_p_s32.c | 4 +- > .../vstrhq_scatter_shifted_offset_p_u16.c | 4 +- > .../vstrhq_scatter_shifted_offset_p_u32.c | 4 +- > .../vstrhq_scatter_shifted_offset_s16.c | 4 +- > .../vstrhq_scatter_shifted_offset_s32.c | 4 +- > .../vstrhq_scatter_shifted_offset_u16.c | 4 +- > .../vstrhq_scatter_shifted_offset_u32.c | 4 +- > .../arm/mve/intrinsics/vstrhq_u16.c | 4 +- > .../arm/mve/intrinsics/vstrhq_u32.c | 4 +- > .../arm/mve/intrinsics/vstrwq_f32.c | 4 +- > .../arm/mve/intrinsics/vstrwq_p_f32.c | 4 +- > .../arm/mve/intrinsics/vstrwq_p_s32.c | 4 +- > .../arm/mve/intrinsics/vstrwq_p_u32.c | 4 +- > .../arm/mve/intrinsics/vstrwq_s32.c | 4 +- > .../mve/intrinsics/vstrwq_scatter_base_f32.c | 4 +- > .../intrinsics/vstrwq_scatter_base_p_f32.c | 4 +- > .../intrinsics/vstrwq_scatter_base_p_s32.c | 4 +- > .../intrinsics/vstrwq_scatter_base_p_u32.c | 4 +- > .../mve/intrinsics/vstrwq_scatter_base_s32.c | 4 +- > .../mve/intrinsics/vstrwq_scatter_base_u32.c | 4 +- > .../intrinsics/vstrwq_scatter_base_wb_f32.c | 4 +- > .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 4 +- > .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 4 +- > .../intrinsics/vstrwq_scatter_base_wb_p_u32.c | 4 +- > .../intrinsics/vstrwq_scatter_base_wb_s32.c | 4 +- > .../intrinsics/vstrwq_scatter_base_wb_u32.c | 4 +- > .../intrinsics/vstrwq_scatter_offset_f32.c | 4 +- > .../intrinsics/vstrwq_scatter_offset_p_f32.c | 4 +- > .../intrinsics/vstrwq_scatter_offset_p_s32.c | 4 +- > .../intrinsics/vstrwq_scatter_offset_p_u32.c | 4 +- > .../intrinsics/vstrwq_scatter_offset_s32.c | 4 +- > .../intrinsics/vstrwq_scatter_offset_u32.c | 4 +- > .../vstrwq_scatter_shifted_offset_f32.c | 4 +- > .../vstrwq_scatter_shifted_offset_p_f32.c | 4 +- > .../vstrwq_scatter_shifted_offset_p_s32.c | 4 +- > .../vstrwq_scatter_shifted_offset_p_u32.c | 4 +- > .../vstrwq_scatter_shifted_offset_s32.c | 4 +- > .../vstrwq_scatter_shifted_offset_u32.c | 4 +- > .../arm/mve/intrinsics/vstrwq_u32.c | 4 +- > 138 files changed, 783 insertions(+), 1805 deletions(-) > > -- > 2.34.1 >
From: Alfie Richards <Alfie.Richards@arm.com> Hi, This patch series refactors the MVE vst and vld intrinsics to use the builtins framework. This is a prerequisite for a later patch which adds gimple folding which in turn enables some optimisations that were being missed. I have crosscompiled on an x64 machine and regtested it. Ok for master? Thanks, Alfie Richards Alfie Richards (5): arm: [MVE intrinsics] fix vst tests arm: [MVE intrinsics] Add load_ext intrinsic shape arm: [MVE intrinsics] Add load_extending and store_truncating function bases arm: [MVE intrinsics] Add support for predicated contiguous loads and stores arm: [MVE intrinsics] Rework MVE vld/vst intrinsics gcc/config/arm/arm-mve-builtins-base.cc | 135 ++- gcc/config/arm/arm-mve-builtins-base.def | 20 +- gcc/config/arm/arm-mve-builtins-base.h | 6 + gcc/config/arm/arm-mve-builtins-functions.h | 119 ++- gcc/config/arm/arm-mve-builtins-shapes.cc | 30 +- gcc/config/arm/arm-mve-builtins-shapes.h | 1 + gcc/config/arm/arm-mve-builtins.cc | 19 +- gcc/config/arm/arm-protos.h | 3 + gcc/config/arm/arm.cc | 15 + gcc/config/arm/arm_mve.h | 978 +----------------- gcc/config/arm/arm_mve_builtins.def | 38 - gcc/config/arm/iterators.md | 37 +- gcc/config/arm/mve.md | 662 ++++-------- gcc/config/arm/unspecs.md | 29 +- .../arm/mve/intrinsics/vst1q_p_f16.c | 4 +- .../arm/mve/intrinsics/vst1q_p_f32.c | 4 +- .../arm/mve/intrinsics/vst1q_p_s16.c | 4 +- .../arm/mve/intrinsics/vst1q_p_s32.c | 4 +- .../arm/mve/intrinsics/vst1q_p_s8.c | 4 +- .../arm/mve/intrinsics/vst1q_p_u16.c | 4 +- .../arm/mve/intrinsics/vst1q_p_u32.c | 4 +- .../arm/mve/intrinsics/vst1q_p_u8.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst2q_f16.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst2q_f32.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst2q_s16.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst2q_s32.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst2q_s8.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst2q_u16.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst2q_u32.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst2q_u8.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst4q_f16.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst4q_f32.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst4q_s16.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst4q_s32.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst4q_s8.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst4q_u16.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst4q_u32.c | 4 +- .../gcc.target/arm/mve/intrinsics/vst4q_u8.c | 4 +- .../arm/mve/intrinsics/vstrbq_p_s16.c | 4 +- .../arm/mve/intrinsics/vstrbq_p_s32.c | 4 +- .../arm/mve/intrinsics/vstrbq_p_s8.c | 4 +- .../arm/mve/intrinsics/vstrbq_p_u16.c | 4 +- .../arm/mve/intrinsics/vstrbq_p_u32.c | 4 +- .../arm/mve/intrinsics/vstrbq_p_u8.c | 4 +- .../arm/mve/intrinsics/vstrbq_s16.c | 4 +- .../arm/mve/intrinsics/vstrbq_s32.c | 4 +- .../gcc.target/arm/mve/intrinsics/vstrbq_s8.c | 4 +- .../intrinsics/vstrbq_scatter_offset_p_s16.c | 4 +- .../intrinsics/vstrbq_scatter_offset_p_s32.c | 4 +- .../intrinsics/vstrbq_scatter_offset_p_s8.c | 4 +- .../intrinsics/vstrbq_scatter_offset_p_u16.c | 4 +- .../intrinsics/vstrbq_scatter_offset_p_u32.c | 4 +- .../intrinsics/vstrbq_scatter_offset_p_u8.c | 4 +- .../intrinsics/vstrbq_scatter_offset_s16.c | 4 +- .../intrinsics/vstrbq_scatter_offset_s32.c | 4 +- .../mve/intrinsics/vstrbq_scatter_offset_s8.c | 4 +- .../intrinsics/vstrbq_scatter_offset_u16.c | 4 +- .../intrinsics/vstrbq_scatter_offset_u32.c | 4 +- .../mve/intrinsics/vstrbq_scatter_offset_u8.c | 4 +- .../arm/mve/intrinsics/vstrbq_u16.c | 4 +- .../arm/mve/intrinsics/vstrbq_u32.c | 4 +- .../gcc.target/arm/mve/intrinsics/vstrbq_u8.c | 4 +- .../intrinsics/vstrdq_scatter_base_p_s64.c | 4 +- .../intrinsics/vstrdq_scatter_base_p_u64.c | 4 +- .../mve/intrinsics/vstrdq_scatter_base_s64.c | 4 +- .../mve/intrinsics/vstrdq_scatter_base_u64.c | 4 +- .../intrinsics/vstrdq_scatter_base_wb_p_s64.c | 4 +- .../intrinsics/vstrdq_scatter_base_wb_p_u64.c | 4 +- .../intrinsics/vstrdq_scatter_base_wb_s64.c | 4 +- .../intrinsics/vstrdq_scatter_base_wb_u64.c | 4 +- .../intrinsics/vstrdq_scatter_offset_p_s64.c | 4 +- .../intrinsics/vstrdq_scatter_offset_p_u64.c | 4 +- .../intrinsics/vstrdq_scatter_offset_s64.c | 4 +- .../intrinsics/vstrdq_scatter_offset_u64.c | 4 +- .../vstrdq_scatter_shifted_offset_p_s64.c | 4 +- .../vstrdq_scatter_shifted_offset_p_u64.c | 4 +- .../vstrdq_scatter_shifted_offset_s64.c | 4 +- .../vstrdq_scatter_shifted_offset_u64.c | 4 +- .../arm/mve/intrinsics/vstrhq_f16.c | 4 +- .../arm/mve/intrinsics/vstrhq_p_f16.c | 4 +- .../arm/mve/intrinsics/vstrhq_p_s16.c | 4 +- .../arm/mve/intrinsics/vstrhq_p_s32.c | 4 +- .../arm/mve/intrinsics/vstrhq_p_u16.c | 4 +- .../arm/mve/intrinsics/vstrhq_p_u32.c | 4 +- .../arm/mve/intrinsics/vstrhq_s16.c | 4 +- .../arm/mve/intrinsics/vstrhq_s32.c | 4 +- .../intrinsics/vstrhq_scatter_offset_f16.c | 4 +- .../intrinsics/vstrhq_scatter_offset_p_f16.c | 4 +- .../intrinsics/vstrhq_scatter_offset_p_s16.c | 4 +- .../intrinsics/vstrhq_scatter_offset_p_s32.c | 4 +- .../intrinsics/vstrhq_scatter_offset_p_u16.c | 4 +- .../intrinsics/vstrhq_scatter_offset_p_u32.c | 4 +- .../intrinsics/vstrhq_scatter_offset_s16.c | 4 +- .../intrinsics/vstrhq_scatter_offset_s32.c | 4 +- .../intrinsics/vstrhq_scatter_offset_u16.c | 4 +- .../intrinsics/vstrhq_scatter_offset_u32.c | 4 +- .../vstrhq_scatter_shifted_offset_f16.c | 4 +- .../vstrhq_scatter_shifted_offset_p_f16.c | 4 +- .../vstrhq_scatter_shifted_offset_p_s16.c | 4 +- .../vstrhq_scatter_shifted_offset_p_s32.c | 4 +- .../vstrhq_scatter_shifted_offset_p_u16.c | 4 +- .../vstrhq_scatter_shifted_offset_p_u32.c | 4 +- .../vstrhq_scatter_shifted_offset_s16.c | 4 +- .../vstrhq_scatter_shifted_offset_s32.c | 4 +- .../vstrhq_scatter_shifted_offset_u16.c | 4 +- .../vstrhq_scatter_shifted_offset_u32.c | 4 +- .../arm/mve/intrinsics/vstrhq_u16.c | 4 +- .../arm/mve/intrinsics/vstrhq_u32.c | 4 +- .../arm/mve/intrinsics/vstrwq_f32.c | 4 +- .../arm/mve/intrinsics/vstrwq_p_f32.c | 4 +- .../arm/mve/intrinsics/vstrwq_p_s32.c | 4 +- .../arm/mve/intrinsics/vstrwq_p_u32.c | 4 +- .../arm/mve/intrinsics/vstrwq_s32.c | 4 +- .../mve/intrinsics/vstrwq_scatter_base_f32.c | 4 +- .../intrinsics/vstrwq_scatter_base_p_f32.c | 4 +- .../intrinsics/vstrwq_scatter_base_p_s32.c | 4 +- .../intrinsics/vstrwq_scatter_base_p_u32.c | 4 +- .../mve/intrinsics/vstrwq_scatter_base_s32.c | 4 +- .../mve/intrinsics/vstrwq_scatter_base_u32.c | 4 +- .../intrinsics/vstrwq_scatter_base_wb_f32.c | 4 +- .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 4 +- .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 4 +- .../intrinsics/vstrwq_scatter_base_wb_p_u32.c | 4 +- .../intrinsics/vstrwq_scatter_base_wb_s32.c | 4 +- .../intrinsics/vstrwq_scatter_base_wb_u32.c | 4 +- .../intrinsics/vstrwq_scatter_offset_f32.c | 4 +- .../intrinsics/vstrwq_scatter_offset_p_f32.c | 4 +- .../intrinsics/vstrwq_scatter_offset_p_s32.c | 4 +- .../intrinsics/vstrwq_scatter_offset_p_u32.c | 4 +- .../intrinsics/vstrwq_scatter_offset_s32.c | 4 +- .../intrinsics/vstrwq_scatter_offset_u32.c | 4 +- .../vstrwq_scatter_shifted_offset_f32.c | 4 +- .../vstrwq_scatter_shifted_offset_p_f32.c | 4 +- .../vstrwq_scatter_shifted_offset_p_s32.c | 4 +- .../vstrwq_scatter_shifted_offset_p_u32.c | 4 +- .../vstrwq_scatter_shifted_offset_s32.c | 4 +- .../vstrwq_scatter_shifted_offset_u32.c | 4 +- .../arm/mve/intrinsics/vstrwq_u32.c | 4 +- 138 files changed, 783 insertions(+), 1805 deletions(-)