Message ID | AM0PR08MB538007365371A43722463E699BC80@AM0PR08MB5380.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | [ARM] : Fix for MVE ACLE intrinsics with writeback (PR94317). | expand |
Hi Srinath, > -----Original Message----- > From: Srinath Parvathaneni <Srinath.Parvathaneni@arm.com> > Sent: 31 March 2020 17:13 > To: gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com> > Subject: [GCC][PATCH][ARM]: Fix for MVE ACLE intrinsics with writeback > (PR94317). > > Hello, > > Following MVE ACLE intrinsics have an issue with writeback to the base > address. > > vldrdq_gather_base_wb_s64, vldrdq_gather_base_wb_u64, > vldrdq_gather_base_wb_z_s64, vldrdq_gather_base_wb_z_u64, > vldrwq_gather_base_wb_s32, vldrwq_gather_base_wb_u32, > vldrwq_gather_base_wb_z_s32, vldrwq_gather_base_wb_z_u32, > vldrwq_gather_base_wb_f32, vldrwq_gather_base_wb_z_f32. > > This patch fixes the bug reported in PR94317 by adding separate builtin calls > to update the result and writeback to base address for the above intrinsics. > > Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more > details. > [1] https://developer.arm.com/architectures/instruction-sets/simd- > isas/helium/mve-intrinsics > > Regression tested on arm-none-eabi and found no regressions. > > Ok for trunk? Thanks, I've pushed this patch to master. Kyrill > > Thanks, > Srinath. > > gcc/ChangeLog: > > 2020-03-31 Srinath Parvathaneni <srinath.parvathaneni@arm.com> > > PR target/94317 > * config/arm/arm-builtins.c (LDRGBWBXU_QUALIFIERS): Define. > (LDRGBWBXU_Z_QUALIFIERS): Likewise. > * config/arm/arm_mve.h (__arm_vldrdq_gather_base_wb_s64): > Modify > intrinsic defintion by adding a new builtin call to writeback into base > address. > (__arm_vldrdq_gather_base_wb_u64): Likewise. > (__arm_vldrdq_gather_base_wb_z_s64): Likewise. > (__arm_vldrdq_gather_base_wb_z_u64): Likewise. > (__arm_vldrwq_gather_base_wb_s32): Likewise. > (__arm_vldrwq_gather_base_wb_u32): Likewise. > (__arm_vldrwq_gather_base_wb_z_s32): Likewise. > (__arm_vldrwq_gather_base_wb_z_u32): Likewise. > (__arm_vldrwq_gather_base_wb_f32): Likewise. > (__arm_vldrwq_gather_base_wb_z_f32): Likewise. > * config/arm/arm_mve_builtins.def (vldrwq_gather_base_wb_z_u): > Modify > builtin's qualifier. > (vldrdq_gather_base_wb_z_u): Likewise. > (vldrwq_gather_base_wb_u): Likewise. > (vldrdq_gather_base_wb_u): Likewise. > (vldrwq_gather_base_wb_z_s): Likewise. > (vldrwq_gather_base_wb_z_f): Likewise. > (vldrdq_gather_base_wb_z_s): Likewise. > (vldrwq_gather_base_wb_s): Likewise. > (vldrwq_gather_base_wb_f): Likewise. > (vldrdq_gather_base_wb_s): Likewise. > (vldrwq_gather_base_nowb_z_u): Define builtin. > (vldrdq_gather_base_nowb_z_u): Likewise. > (vldrwq_gather_base_nowb_u): Likewise. > (vldrdq_gather_base_nowb_u): Likewise. > (vldrwq_gather_base_nowb_z_s): Likewise. > (vldrwq_gather_base_nowb_z_f): Likewise. > (vldrdq_gather_base_nowb_z_s): Likewise. > (vldrwq_gather_base_nowb_s): Likewise. > (vldrwq_gather_base_nowb_f): Likewise. > (vldrdq_gather_base_nowb_s): Likewise. > * config/arm/mve.md (mve_vldrwq_gather_base_nowb_<supf>v4si): > Define RTL > pattern. > (mve_vldrwq_gather_base_wb_<supf>v4si): Modify RTL pattern. > (mve_vldrwq_gather_base_nowb_z_<supf>v4si): Define RTL pattern. > (mve_vldrwq_gather_base_wb_z_<supf>v4si): Modify RTL pattern. > (mve_vldrwq_gather_base_wb_fv4sf): Modify RTL pattern. > (mve_vldrwq_gather_base_nowb_fv4sf): Define RTL pattern. > (mve_vldrwq_gather_base_wb_z_fv4sf): Modify RTL pattern. > (mve_vldrwq_gather_base_nowb_z_fv4sf): Define RTL pattern. > (mve_vldrdq_gather_base_nowb_<supf>v4di): Define RTL pattern. > (mve_vldrdq_gather_base_wb_<supf>v4di): Modify RTL pattern. > (mve_vldrdq_gather_base_nowb_z_<supf>v4di): Define RTL pattern. > (mve_vldrdq_gather_base_wb_z_<supf>v4di): Modify RTL pattern. > > gcc/testsuite/ChangeLog: > > 2020-03-31 Srinath Parvathaneni <srinath.parvathaneni@arm.com> > > PR target/94317 > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: > Modify > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c: > Likewise. > > > > ############### Attachment also inlined for ease of reply > ############### > > > diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c > index > 56f0db21ea95dcd738877daba27f1cb60f0d5a32..832b9107424fd9a4a0ee272 > b773b3d0929172370 100644 > --- a/gcc/config/arm/arm-builtins.c > +++ b/gcc/config/arm/arm-builtins.c > @@ -719,6 +719,17 @@ > arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers[SIMD_M > AX_BUILTIN_ARGS] > (arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers) > > static enum arm_type_qualifiers > +arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS] > + = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate}; > +#define LDRGBWBXU_QUALIFIERS (arm_ldrgbwbxu_qualifiers) > + > +static enum arm_type_qualifiers > +arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS] > + = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate, > + qualifier_unsigned}; > +#define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers) > + > +static enum arm_type_qualifiers > arm_ldrgbwbs_qualifiers[SIMD_MAX_BUILTIN_ARGS] > = { qualifier_none, qualifier_unsigned, qualifier_immediate}; #define > LDRGBWBS_QUALIFIERS (arm_ldrgbwbs_qualifiers) diff --git > a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index > f1dcdc2153217e796c58526ba0e5be11be642234..47a6268e0800958f49d4623 > 8fe34ec749d243929 100644 > --- a/gcc/config/arm/arm_mve.h > +++ b/gcc/config/arm/arm_mve.h > @@ -13903,8 +13903,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrdq_gather_base_wb_s64 (uint64x2_t * __addr, const int __offset) > { > int64x2_t > - result = __builtin_mve_vldrdq_gather_base_wb_sv2di (*__addr, __offset); > - __addr += __offset; > + result = __builtin_mve_vldrdq_gather_base_nowb_sv2di (*__addr, > + __offset); *__addr = __builtin_mve_vldrdq_gather_base_wb_sv2di > + (*__addr, __offset); > return result; > } > > @@ -13913,8 +13913,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrdq_gather_base_wb_u64 (uint64x2_t * __addr, const int > __offset) { > uint64x2_t > - result = __builtin_mve_vldrdq_gather_base_wb_uv2di (*__addr, __offset); > - __addr += __offset; > + result = __builtin_mve_vldrdq_gather_base_nowb_uv2di (*__addr, > + __offset); *__addr = __builtin_mve_vldrdq_gather_base_wb_uv2di > + (*__addr, __offset); > return result; > } > > @@ -13923,8 +13923,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrdq_gather_base_wb_z_s64 (uint64x2_t * __addr, const int > __offset, mve_pred16_t __p) { > int64x2_t > - result = __builtin_mve_vldrdq_gather_base_wb_z_sv2di (*__addr, __offset, > __p); > - __addr += __offset; > + result = __builtin_mve_vldrdq_gather_base_nowb_z_sv2di (*__addr, > + __offset, __p); *__addr = __builtin_mve_vldrdq_gather_base_wb_z_sv2di > + (*__addr, __offset, __p); > return result; > } > > @@ -13933,8 +13933,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrdq_gather_base_wb_z_u64 (uint64x2_t * __addr, const int > __offset, mve_pred16_t __p) { > uint64x2_t > - result = __builtin_mve_vldrdq_gather_base_wb_z_uv2di (*__addr, > __offset, __p); > - __addr += __offset; > + result = __builtin_mve_vldrdq_gather_base_nowb_z_uv2di (*__addr, > + __offset, __p); *__addr = __builtin_mve_vldrdq_gather_base_wb_z_uv2di > + (*__addr, __offset, __p); > return result; > } > > @@ -13943,8 +13943,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrwq_gather_base_wb_s32 (uint32x4_t * __addr, const int > __offset) { > int32x4_t > - result = __builtin_mve_vldrwq_gather_base_wb_sv4si (*__addr, __offset); > - __addr += __offset; > + result = __builtin_mve_vldrwq_gather_base_nowb_sv4si (*__addr, > + __offset); *__addr = __builtin_mve_vldrwq_gather_base_wb_sv4si > + (*__addr, __offset); > return result; > } > > @@ -13953,8 +13953,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrwq_gather_base_wb_u32 (uint32x4_t * __addr, const int > __offset) { > uint32x4_t > - result = __builtin_mve_vldrwq_gather_base_wb_uv4si (*__addr, __offset); > - __addr += __offset; > + result = __builtin_mve_vldrwq_gather_base_nowb_uv4si (*__addr, > + __offset); *__addr = __builtin_mve_vldrwq_gather_base_wb_uv4si > + (*__addr, __offset); > return result; > } > > @@ -13963,8 +13963,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrwq_gather_base_wb_z_s32 (uint32x4_t * __addr, const int > __offset, mve_pred16_t __p) { > int32x4_t > - result = __builtin_mve_vldrwq_gather_base_wb_z_sv4si (*__addr, > __offset, __p); > - __addr += __offset; > + result = __builtin_mve_vldrwq_gather_base_nowb_z_sv4si (*__addr, > + __offset, __p); *__addr = __builtin_mve_vldrwq_gather_base_wb_z_sv4si > + (*__addr, __offset, __p); > return result; > } > > @@ -13973,8 +13973,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrwq_gather_base_wb_z_u32 (uint32x4_t * __addr, const int > __offset, mve_pred16_t __p) { > uint32x4_t > - result = __builtin_mve_vldrwq_gather_base_wb_z_uv4si (*__addr, > __offset, __p); > - __addr += __offset; > + result = __builtin_mve_vldrwq_gather_base_nowb_z_uv4si (*__addr, > + __offset, __p); *__addr = __builtin_mve_vldrwq_gather_base_wb_z_uv4si > + (*__addr, __offset, __p); > return result; > } > > @@ -19372,8 +19372,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrwq_gather_base_wb_f32 (uint32x4_t * __addr, const int __offset) > { > float32x4_t > - result = __builtin_mve_vldrwq_gather_base_wb_fv4sf (*__addr, __offset); > - __addr += __offset; > + result = __builtin_mve_vldrwq_gather_base_nowb_fv4sf (*__addr, > + __offset); *__addr = __builtin_mve_vldrwq_gather_base_wb_fv4sf > + (*__addr, __offset); > return result; > } > > @@ -19382,8 +19382,8 @@ __attribute__ ((__always_inline__, > __gnu_inline__, __artificial__)) > __arm_vldrwq_gather_base_wb_z_f32 (uint32x4_t * __addr, const int > __offset, mve_pred16_t __p) { > float32x4_t > - result = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf (*__addr, > __offset, __p); > - __addr += __offset; > + result = __builtin_mve_vldrwq_gather_base_nowb_z_fv4sf (*__addr, > + __offset, __p); *__addr = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf > + (*__addr, __offset, __p); > return result; > } > > diff --git a/gcc/config/arm/arm_mve_builtins.def > b/gcc/config/arm/arm_mve_builtins.def > index > 2fb975944b9fdac9de4b5a1bec3962be410637f1..753e40a951d071c1ab77476 > a1cc4779e91689178 100644 > --- a/gcc/config/arm/arm_mve_builtins.def > +++ b/gcc/config/arm/arm_mve_builtins.def > @@ -847,16 +847,26 @@ VAR1 (STRSBWBS, vstrdq_scatter_base_wb_s, v2di) > VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_s, v4si) > VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_f, v4sf) > VAR1 (STRSBWBS_P, vstrdq_scatter_base_wb_p_s, v2di) > -VAR1 (LDRGBWBU_Z, vldrwq_gather_base_wb_z_u, v4si) > -VAR1 (LDRGBWBU_Z, vldrdq_gather_base_wb_z_u, v2di) > -VAR1 (LDRGBWBU, vldrwq_gather_base_wb_u, v4si) > -VAR1 (LDRGBWBU, vldrdq_gather_base_wb_u, v2di) > -VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_s, v4si) > -VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_f, v4sf) > -VAR1 (LDRGBWBS_Z, vldrdq_gather_base_wb_z_s, v2di) > -VAR1 (LDRGBWBS, vldrwq_gather_base_wb_s, v4si) > -VAR1 (LDRGBWBS, vldrwq_gather_base_wb_f, v4sf) > -VAR1 (LDRGBWBS, vldrdq_gather_base_wb_s, v2di) > +VAR1 (LDRGBWBU_Z, vldrwq_gather_base_nowb_z_u, v4si) > +VAR1 (LDRGBWBU_Z, vldrdq_gather_base_nowb_z_u, v2di) > +VAR1 (LDRGBWBU, vldrwq_gather_base_nowb_u, v4si) > +VAR1 (LDRGBWBU, vldrdq_gather_base_nowb_u, v2di) > +VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_s, v4si) > +VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_f, v4sf) > +VAR1 (LDRGBWBS_Z, vldrdq_gather_base_nowb_z_s, v2di) > +VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_s, v4si) > +VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_f, v4sf) > +VAR1 (LDRGBWBS, vldrdq_gather_base_nowb_s, v2di) > +VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_s, v2di) > +VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_u, v2di) > +VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_s, v2di) > +VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_u, v2di) > +VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_s, v4si) > +VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_f, v4sf) > +VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_u, v4si) > +VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_s, v4si) > +VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_f, v4sf) > +VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_u, v4si) > VAR1 (BINOP_NONE_NONE_NONE, vadciq_s, v4si) > VAR1 (BINOP_UNONE_UNONE_UNONE, vadciq_u, v4si) > VAR1 (BINOP_NONE_NONE_NONE, vadcq_s, v4si) diff --git > a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index > df602b07840bb4ccb9aa2a9b10992ba7078452ba..d1028f4542b4972b4080e46 > 544c86d625d77383a 100644 > --- a/gcc/config/arm/mve.md > +++ b/gcc/config/arm/mve.md > @@ -10420,6 +10420,20 @@ > (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)] > "TARGET_HAVE_MVE" > { > + rtx ignore_result = gen_reg_rtx (V4SImode); > + emit_insn ( > + gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (ignore_result, > operands[0], > + operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "mve_vldrwq_gather_base_nowb_<supf>v4si" > + [(match_operand:V4SI 0 "s_register_operand") > + (match_operand:V4SI 1 "s_register_operand") > + (match_operand:SI 2 "mve_vldrd_immediate") > + (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)] > + "TARGET_HAVE_MVE" > +{ > rtx ignore_wb = gen_reg_rtx (V4SImode); > emit_insn ( > gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (operands[0], > ignore_wb, @@ -10459,6 +10473,21 @@ > (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)] > "TARGET_HAVE_MVE" > { > + rtx ignore_result = gen_reg_rtx (V4SImode); > + emit_insn ( > + gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (ignore_result, > operands[0], > + operands[1], operands[2], > + operands[3])); > + DONE; > +}) > +(define_expand "mve_vldrwq_gather_base_nowb_z_<supf>v4si" > + [(match_operand:V4SI 0 "s_register_operand") > + (match_operand:V4SI 1 "s_register_operand") > + (match_operand:SI 2 "mve_vldrd_immediate") > + (match_operand:HI 3 "vpr_register_operand") > + (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)] > + "TARGET_HAVE_MVE" > +{ > rtx ignore_wb = gen_reg_rtx (V4SImode); > emit_insn ( > gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (operands[0], > ignore_wb, @@ -10487,12 +10516,26 @@ > ops[0] = operands[0]; > ops[1] = operands[2]; > ops[2] = operands[3]; > - output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops); > + output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops); > return ""; > } > [(set_attr "length" "8")]) > > (define_expand "mve_vldrwq_gather_base_wb_fv4sf" > + [(match_operand:V4SI 0 "s_register_operand") > + (match_operand:V4SI 1 "s_register_operand") > + (match_operand:SI 2 "mve_vldrd_immediate") > + (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)] > + "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" > +{ > + rtx ignore_result = gen_reg_rtx (V4SFmode); > + emit_insn ( > + gen_mve_vldrwq_gather_base_wb_fv4sf_insn (ignore_result, operands[0], > + operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "mve_vldrwq_gather_base_nowb_fv4sf" > [(match_operand:V4SF 0 "s_register_operand") > (match_operand:V4SI 1 "s_register_operand") > (match_operand:SI 2 "mve_vldrd_immediate") @@ -10531,6 +10574,22 > @@ > [(set_attr "length" "4")]) > > (define_expand "mve_vldrwq_gather_base_wb_z_fv4sf" > + [(match_operand:V4SI 0 "s_register_operand") > + (match_operand:V4SI 1 "s_register_operand") > + (match_operand:SI 2 "mve_vldrd_immediate") > + (match_operand:HI 3 "vpr_register_operand") > + (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)] > + "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" > +{ > + rtx ignore_result = gen_reg_rtx (V4SFmode); > + emit_insn ( > + gen_mve_vldrwq_gather_base_wb_z_fv4sf_insn (ignore_result, > operands[0], > + operands[1], operands[2], > + operands[3])); > + DONE; > +}) > + > +(define_expand "mve_vldrwq_gather_base_nowb_z_fv4sf" > [(match_operand:V4SF 0 "s_register_operand") > (match_operand:V4SI 1 "s_register_operand") > (match_operand:SI 2 "mve_vldrd_immediate") @@ -10566,7 +10625,7 > @@ > ops[0] = operands[0]; > ops[1] = operands[2]; > ops[2] = operands[3]; > - output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops); > + output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -10578,6 +10637,20 @@ > (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)] > "TARGET_HAVE_MVE" > { > + rtx ignore_result = gen_reg_rtx (V2DImode); > + emit_insn ( > + gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (ignore_result, > operands[0], > + operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "mve_vldrdq_gather_base_nowb_<supf>v2di" > + [(match_operand:V2DI 0 "s_register_operand") > + (match_operand:V2DI 1 "s_register_operand") > + (match_operand:SI 2 "mve_vldrd_immediate") > + (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)] > + "TARGET_HAVE_MVE" > +{ > rtx ignore_wb = gen_reg_rtx (V2DImode); > emit_insn ( > gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (operands[0], > ignore_wb, @@ -10585,6 +10658,7 @@ > DONE; > }) > > + > ;; > ;; [vldrdq_gather_base_wb_s vldrdq_gather_base_wb_u] ;; @@ -10617,6 > +10691,22 @@ > (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)] > "TARGET_HAVE_MVE" > { > + rtx ignore_result = gen_reg_rtx (V2DImode); > + emit_insn ( > + gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (ignore_result, > operands[0], > + operands[1], operands[2], > + operands[3])); > + DONE; > +}) > + > +(define_expand "mve_vldrdq_gather_base_nowb_z_<supf>v2di" > + [(match_operand:V2DI 0 "s_register_operand") > + (match_operand:V2DI 1 "s_register_operand") > + (match_operand:SI 2 "mve_vldrd_immediate") > + (match_operand:HI 3 "vpr_register_operand") > + (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)] > + "TARGET_HAVE_MVE" > +{ > rtx ignore_wb = gen_reg_rtx (V2DImode); > emit_insn ( > gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (operands[0], > ignore_wb, @@ -10660,7 +10750,7 @@ > ops[0] = operands[0]; > ops[1] = operands[2]; > ops[2] = operands[3]; > - output_asm_insn ("vpst\;\tvldrdt.u64\t%q0, [%q1, %2]!",ops); > + output_asm_insn ("vpst\;vldrdt.u64\t%q0, [%q1, %2]!",ops); > return ""; > } > [(set_attr "length" "8")]) > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c > index > a5c5a61345cb0a46abc7796ceff195698cabe804..0d1ee769ec64b55c7559ce9d > c14f8a6ae2e43e34 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_ > +++ s64.c > @@ -10,4 +10,6 @@ foo (uint64x2_t * addr) > return vldrdq_gather_base_wb_s64 (addr, 8); } > > -/* { dg-final { scan-assembler "vldrd.64" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64. > c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64. > c > index > 442bca92a43c05124717bf6ea0c44672941091f0..cb2a41bdcd32b553a93d3bcc > 4787d506f1b54f74 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64. > c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_ > +++ u64.c > @@ -10,4 +10,6 @@ foo (uint64x2_t * addr) > return vldrdq_gather_base_wb_u64 (addr, 8); } > > -/* { dg-final { scan-assembler "vldrd.64" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6 > 4.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6 > 4.c > index > 1863d0835e12328b7b7bb824f59e3d441042f56d..243fbeacc3429025202da2ff > 157ade38a472e123 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6 > 4.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_ > +++ z_s64.c > @@ -8,4 +8,8 @@ int64x2_t foo (uint64x2_t * addr, mve_pred16_t p) > return vldrdq_gather_base_wb_z_s64 (addr, 1016, p); } > > -/* { dg-final { scan-assembler "vldrdt.u64" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*$" } } */ > +/* { dg-final { scan-assembler "vpst" } } */ > +/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6 > 4.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6 > 4.c > index > 7ba272a112607b0e57a3d4659e5b4033044af83c..10ba42405fe8fde9d4f8993 > b20e41a59c7bb2e77 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6 > 4.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_ > +++ z_u64.c > @@ -8,4 +8,8 @@ uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p) > return vldrdq_gather_base_wb_z_u64 (addr, 8, p); } > > -/* { dg-final { scan-assembler "vldrdt.u64" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ > +/* { dg-final { scan-assembler "vpst" } } */ > +/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32. > c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32. > c > index > 6b496873f173e30414ffcddf50513758bc8ca770..db8108e37325c4e1fafd2293d > 48eba0c33309073 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32. > c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_ > +++ f32.c > @@ -10,4 +10,6 @@ foo (uint32x4_t * addr) > return vldrwq_gather_base_wb_f32 (addr, 8); } > > -/* { dg-final { scan-assembler "vldrw.u32" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32. > c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32. > c > index > 9bbbd0d701546b5ec224129aef49e632addea550..3da64e218e2c0789e996be > 551650033567eba4e5 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32. > c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_ > +++ s32.c > @@ -10,4 +10,6 @@ foo (uint32x4_t * addr) > return vldrwq_gather_base_wb_s32 (addr, 8); } > > -/* { dg-final { scan-assembler "vldrw.u32" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32. > c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32. > c > index > 774230b290367a7d28f0c8579be26fc9c75db1cb..2597ee11608bfe21d697f225 > 0bee7e69c0cc7aec 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32. > c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_ > +++ u32.c > @@ -10,4 +10,6 @@ foo (uint32x4_t * addr) > return vldrwq_gather_base_wb_u32 (addr, 8); } > > -/* { dg-final { scan-assembler "vldrw.u32" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3 > 2.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3 > 2.c > index > 6400f014a88ccf34fef15effff65f9b1267dbd5f..f1ba63855be254d96806c16317 > 7e32856294c106 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3 > 2.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_ > +++ z_f32.c > @@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p) > return vldrwq_gather_base_wb_z_f32 (addr, 8, p); } > > -/* { dg-final { scan-assembler "vldrwt.u32" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vmsr\tP0, r\[0-9\]+.*" } } */ > +/* { dg-final { scan-assembler "vpst" } } */ > +/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3 > 2.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3 > 2.c > index > de7006c51f17665b80b83fd5ea034477b7a7e778..56da5a46c64d2946ceade86 > 89105048e19efdc6a 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3 > 2.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_ > +++ z_s32.c > @@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p) > return vldrwq_gather_base_wb_z_s32 (addr, 8, p); } > > -/* { dg-final { scan-assembler "vldrwt.u32" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ > +/* { dg-final { scan-assembler "vpst" } } */ > +/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3 > 2.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u > 32.c > index > 6c9608f07ba966876804f56403a4352a51a0e0c4..63165d97c1a7b4120be0363 > 48a09b73afddd36d1 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3 > 2.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_ > +++ z_u32.c > @@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p) > return vldrwq_gather_base_wb_z_u32 (addr, 8, p); } > > -/* { dg-final { scan-assembler "vldrwt.u32" } } */ > +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */ > +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ > +/* { dg-final { scan-assembler "vpst" } } */ > +/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, > +#\[0-9\]+\\\]!" } } */ > +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } > +} */
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 56f0db21ea95dcd738877daba27f1cb60f0d5a32..832b9107424fd9a4a0ee272b773b3d0929172370 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -719,6 +719,17 @@ arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] (arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers) static enum arm_type_qualifiers +arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate}; +#define LDRGBWBXU_QUALIFIERS (arm_ldrgbwbxu_qualifiers) + +static enum arm_type_qualifiers +arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate, + qualifier_unsigned}; +#define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers) + +static enum arm_type_qualifiers arm_ldrgbwbs_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_unsigned, qualifier_immediate}; #define LDRGBWBS_QUALIFIERS (arm_ldrgbwbs_qualifiers) diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index f1dcdc2153217e796c58526ba0e5be11be642234..47a6268e0800958f49d46238fe34ec749d243929 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -13903,8 +13903,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrdq_gather_base_wb_s64 (uint64x2_t * __addr, const int __offset) { int64x2_t - result = __builtin_mve_vldrdq_gather_base_wb_sv2di (*__addr, __offset); - __addr += __offset; + result = __builtin_mve_vldrdq_gather_base_nowb_sv2di (*__addr, __offset); + *__addr = __builtin_mve_vldrdq_gather_base_wb_sv2di (*__addr, __offset); return result; } @@ -13913,8 +13913,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrdq_gather_base_wb_u64 (uint64x2_t * __addr, const int __offset) { uint64x2_t - result = __builtin_mve_vldrdq_gather_base_wb_uv2di (*__addr, __offset); - __addr += __offset; + result = __builtin_mve_vldrdq_gather_base_nowb_uv2di (*__addr, __offset); + *__addr = __builtin_mve_vldrdq_gather_base_wb_uv2di (*__addr, __offset); return result; } @@ -13923,8 +13923,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrdq_gather_base_wb_z_s64 (uint64x2_t * __addr, const int __offset, mve_pred16_t __p) { int64x2_t - result = __builtin_mve_vldrdq_gather_base_wb_z_sv2di (*__addr, __offset, __p); - __addr += __offset; + result = __builtin_mve_vldrdq_gather_base_nowb_z_sv2di (*__addr, __offset, __p); + *__addr = __builtin_mve_vldrdq_gather_base_wb_z_sv2di (*__addr, __offset, __p); return result; } @@ -13933,8 +13933,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrdq_gather_base_wb_z_u64 (uint64x2_t * __addr, const int __offset, mve_pred16_t __p) { uint64x2_t - result = __builtin_mve_vldrdq_gather_base_wb_z_uv2di (*__addr, __offset, __p); - __addr += __offset; + result = __builtin_mve_vldrdq_gather_base_nowb_z_uv2di (*__addr, __offset, __p); + *__addr = __builtin_mve_vldrdq_gather_base_wb_z_uv2di (*__addr, __offset, __p); return result; } @@ -13943,8 +13943,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrwq_gather_base_wb_s32 (uint32x4_t * __addr, const int __offset) { int32x4_t - result = __builtin_mve_vldrwq_gather_base_wb_sv4si (*__addr, __offset); - __addr += __offset; + result = __builtin_mve_vldrwq_gather_base_nowb_sv4si (*__addr, __offset); + *__addr = __builtin_mve_vldrwq_gather_base_wb_sv4si (*__addr, __offset); return result; } @@ -13953,8 +13953,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrwq_gather_base_wb_u32 (uint32x4_t * __addr, const int __offset) { uint32x4_t - result = __builtin_mve_vldrwq_gather_base_wb_uv4si (*__addr, __offset); - __addr += __offset; + result = __builtin_mve_vldrwq_gather_base_nowb_uv4si (*__addr, __offset); + *__addr = __builtin_mve_vldrwq_gather_base_wb_uv4si (*__addr, __offset); return result; } @@ -13963,8 +13963,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrwq_gather_base_wb_z_s32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p) { int32x4_t - result = __builtin_mve_vldrwq_gather_base_wb_z_sv4si (*__addr, __offset, __p); - __addr += __offset; + result = __builtin_mve_vldrwq_gather_base_nowb_z_sv4si (*__addr, __offset, __p); + *__addr = __builtin_mve_vldrwq_gather_base_wb_z_sv4si (*__addr, __offset, __p); return result; } @@ -13973,8 +13973,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrwq_gather_base_wb_z_u32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p) { uint32x4_t - result = __builtin_mve_vldrwq_gather_base_wb_z_uv4si (*__addr, __offset, __p); - __addr += __offset; + result = __builtin_mve_vldrwq_gather_base_nowb_z_uv4si (*__addr, __offset, __p); + *__addr = __builtin_mve_vldrwq_gather_base_wb_z_uv4si (*__addr, __offset, __p); return result; } @@ -19372,8 +19372,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrwq_gather_base_wb_f32 (uint32x4_t * __addr, const int __offset) { float32x4_t - result = __builtin_mve_vldrwq_gather_base_wb_fv4sf (*__addr, __offset); - __addr += __offset; + result = __builtin_mve_vldrwq_gather_base_nowb_fv4sf (*__addr, __offset); + *__addr = __builtin_mve_vldrwq_gather_base_wb_fv4sf (*__addr, __offset); return result; } @@ -19382,8 +19382,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrwq_gather_base_wb_z_f32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p) { float32x4_t - result = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf (*__addr, __offset, __p); - __addr += __offset; + result = __builtin_mve_vldrwq_gather_base_nowb_z_fv4sf (*__addr, __offset, __p); + *__addr = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf (*__addr, __offset, __p); return result; } diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def index 2fb975944b9fdac9de4b5a1bec3962be410637f1..753e40a951d071c1ab77476a1cc4779e91689178 100644 --- a/gcc/config/arm/arm_mve_builtins.def +++ b/gcc/config/arm/arm_mve_builtins.def @@ -847,16 +847,26 @@ VAR1 (STRSBWBS, vstrdq_scatter_base_wb_s, v2di) VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_s, v4si) VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_f, v4sf) VAR1 (STRSBWBS_P, vstrdq_scatter_base_wb_p_s, v2di) -VAR1 (LDRGBWBU_Z, vldrwq_gather_base_wb_z_u, v4si) -VAR1 (LDRGBWBU_Z, vldrdq_gather_base_wb_z_u, v2di) -VAR1 (LDRGBWBU, vldrwq_gather_base_wb_u, v4si) -VAR1 (LDRGBWBU, vldrdq_gather_base_wb_u, v2di) -VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_s, v4si) -VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_f, v4sf) -VAR1 (LDRGBWBS_Z, vldrdq_gather_base_wb_z_s, v2di) -VAR1 (LDRGBWBS, vldrwq_gather_base_wb_s, v4si) -VAR1 (LDRGBWBS, vldrwq_gather_base_wb_f, v4sf) -VAR1 (LDRGBWBS, vldrdq_gather_base_wb_s, v2di) +VAR1 (LDRGBWBU_Z, vldrwq_gather_base_nowb_z_u, v4si) +VAR1 (LDRGBWBU_Z, vldrdq_gather_base_nowb_z_u, v2di) +VAR1 (LDRGBWBU, vldrwq_gather_base_nowb_u, v4si) +VAR1 (LDRGBWBU, vldrdq_gather_base_nowb_u, v2di) +VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_s, v4si) +VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_f, v4sf) +VAR1 (LDRGBWBS_Z, vldrdq_gather_base_nowb_z_s, v2di) +VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_s, v4si) +VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_f, v4sf) +VAR1 (LDRGBWBS, vldrdq_gather_base_nowb_s, v2di) +VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_s, v2di) +VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_u, v2di) +VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_s, v2di) +VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_u, v2di) +VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_s, v4si) +VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_f, v4sf) +VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_u, v4si) +VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_s, v4si) +VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_f, v4sf) +VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_u, v4si) VAR1 (BINOP_NONE_NONE_NONE, vadciq_s, v4si) VAR1 (BINOP_UNONE_UNONE_UNONE, vadciq_u, v4si) VAR1 (BINOP_NONE_NONE_NONE, vadcq_s, v4si) diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index df602b07840bb4ccb9aa2a9b10992ba7078452ba..d1028f4542b4972b4080e46544c86d625d77383a 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -10420,6 +10420,20 @@ (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)] "TARGET_HAVE_MVE" { + rtx ignore_result = gen_reg_rtx (V4SImode); + emit_insn ( + gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (ignore_result, operands[0], + operands[1], operands[2])); + DONE; +}) + +(define_expand "mve_vldrwq_gather_base_nowb_<supf>v4si" + [(match_operand:V4SI 0 "s_register_operand") + (match_operand:V4SI 1 "s_register_operand") + (match_operand:SI 2 "mve_vldrd_immediate") + (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)] + "TARGET_HAVE_MVE" +{ rtx ignore_wb = gen_reg_rtx (V4SImode); emit_insn ( gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (operands[0], ignore_wb, @@ -10459,6 +10473,21 @@ (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)] "TARGET_HAVE_MVE" { + rtx ignore_result = gen_reg_rtx (V4SImode); + emit_insn ( + gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (ignore_result, operands[0], + operands[1], operands[2], + operands[3])); + DONE; +}) +(define_expand "mve_vldrwq_gather_base_nowb_z_<supf>v4si" + [(match_operand:V4SI 0 "s_register_operand") + (match_operand:V4SI 1 "s_register_operand") + (match_operand:SI 2 "mve_vldrd_immediate") + (match_operand:HI 3 "vpr_register_operand") + (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)] + "TARGET_HAVE_MVE" +{ rtx ignore_wb = gen_reg_rtx (V4SImode); emit_insn ( gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (operands[0], ignore_wb, @@ -10487,12 +10516,26 @@ ops[0] = operands[0]; ops[1] = operands[2]; ops[2] = operands[3]; - output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops); + output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops); return ""; } [(set_attr "length" "8")]) (define_expand "mve_vldrwq_gather_base_wb_fv4sf" + [(match_operand:V4SI 0 "s_register_operand") + (match_operand:V4SI 1 "s_register_operand") + (match_operand:SI 2 "mve_vldrd_immediate") + (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)] + "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" +{ + rtx ignore_result = gen_reg_rtx (V4SFmode); + emit_insn ( + gen_mve_vldrwq_gather_base_wb_fv4sf_insn (ignore_result, operands[0], + operands[1], operands[2])); + DONE; +}) + +(define_expand "mve_vldrwq_gather_base_nowb_fv4sf" [(match_operand:V4SF 0 "s_register_operand") (match_operand:V4SI 1 "s_register_operand") (match_operand:SI 2 "mve_vldrd_immediate") @@ -10531,6 +10574,22 @@ [(set_attr "length" "4")]) (define_expand "mve_vldrwq_gather_base_wb_z_fv4sf" + [(match_operand:V4SI 0 "s_register_operand") + (match_operand:V4SI 1 "s_register_operand") + (match_operand:SI 2 "mve_vldrd_immediate") + (match_operand:HI 3 "vpr_register_operand") + (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)] + "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" +{ + rtx ignore_result = gen_reg_rtx (V4SFmode); + emit_insn ( + gen_mve_vldrwq_gather_base_wb_z_fv4sf_insn (ignore_result, operands[0], + operands[1], operands[2], + operands[3])); + DONE; +}) + +(define_expand "mve_vldrwq_gather_base_nowb_z_fv4sf" [(match_operand:V4SF 0 "s_register_operand") (match_operand:V4SI 1 "s_register_operand") (match_operand:SI 2 "mve_vldrd_immediate") @@ -10566,7 +10625,7 @@ ops[0] = operands[0]; ops[1] = operands[2]; ops[2] = operands[3]; - output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops); + output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops); return ""; } [(set_attr "length" "8")]) @@ -10578,6 +10637,20 @@ (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)] "TARGET_HAVE_MVE" { + rtx ignore_result = gen_reg_rtx (V2DImode); + emit_insn ( + gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (ignore_result, operands[0], + operands[1], operands[2])); + DONE; +}) + +(define_expand "mve_vldrdq_gather_base_nowb_<supf>v2di" + [(match_operand:V2DI 0 "s_register_operand") + (match_operand:V2DI 1 "s_register_operand") + (match_operand:SI 2 "mve_vldrd_immediate") + (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)] + "TARGET_HAVE_MVE" +{ rtx ignore_wb = gen_reg_rtx (V2DImode); emit_insn ( gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (operands[0], ignore_wb, @@ -10585,6 +10658,7 @@ DONE; }) + ;; ;; [vldrdq_gather_base_wb_s vldrdq_gather_base_wb_u] ;; @@ -10617,6 +10691,22 @@ (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)] "TARGET_HAVE_MVE" { + rtx ignore_result = gen_reg_rtx (V2DImode); + emit_insn ( + gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (ignore_result, operands[0], + operands[1], operands[2], + operands[3])); + DONE; +}) + +(define_expand "mve_vldrdq_gather_base_nowb_z_<supf>v2di" + [(match_operand:V2DI 0 "s_register_operand") + (match_operand:V2DI 1 "s_register_operand") + (match_operand:SI 2 "mve_vldrd_immediate") + (match_operand:HI 3 "vpr_register_operand") + (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)] + "TARGET_HAVE_MVE" +{ rtx ignore_wb = gen_reg_rtx (V2DImode); emit_insn ( gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (operands[0], ignore_wb, @@ -10660,7 +10750,7 @@ ops[0] = operands[0]; ops[1] = operands[2]; ops[2] = operands[3]; - output_asm_insn ("vpst\;\tvldrdt.u64\t%q0, [%q1, %2]!",ops); + output_asm_insn ("vpst\;vldrdt.u64\t%q0, [%q1, %2]!",ops); return ""; } [(set_attr "length" "8")]) diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c index a5c5a61345cb0a46abc7796ceff195698cabe804..0d1ee769ec64b55c7559ce9dc14f8a6ae2e43e34 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c @@ -10,4 +10,6 @@ foo (uint64x2_t * addr) return vldrdq_gather_base_wb_s64 (addr, 8); } -/* { dg-final { scan-assembler "vldrd.64" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c index 442bca92a43c05124717bf6ea0c44672941091f0..cb2a41bdcd32b553a93d3bcc4787d506f1b54f74 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c @@ -10,4 +10,6 @@ foo (uint64x2_t * addr) return vldrdq_gather_base_wb_u64 (addr, 8); } -/* { dg-final { scan-assembler "vldrd.64" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c index 1863d0835e12328b7b7bb824f59e3d441042f56d..243fbeacc3429025202da2ff157ade38a472e123 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c @@ -8,4 +8,8 @@ int64x2_t foo (uint64x2_t * addr, mve_pred16_t p) return vldrdq_gather_base_wb_z_s64 (addr, 1016, p); } -/* { dg-final { scan-assembler "vldrdt.u64" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*$" } } */ +/* { dg-final { scan-assembler "vpst" } } */ +/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c index 7ba272a112607b0e57a3d4659e5b4033044af83c..10ba42405fe8fde9d4f8993b20e41a59c7bb2e77 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c @@ -8,4 +8,8 @@ uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p) return vldrdq_gather_base_wb_z_u64 (addr, 8, p); } -/* { dg-final { scan-assembler "vldrdt.u64" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ +/* { dg-final { scan-assembler "vpst" } } */ +/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c index 6b496873f173e30414ffcddf50513758bc8ca770..db8108e37325c4e1fafd2293d48eba0c33309073 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c @@ -10,4 +10,6 @@ foo (uint32x4_t * addr) return vldrwq_gather_base_wb_f32 (addr, 8); } -/* { dg-final { scan-assembler "vldrw.u32" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c index 9bbbd0d701546b5ec224129aef49e632addea550..3da64e218e2c0789e996be551650033567eba4e5 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c @@ -10,4 +10,6 @@ foo (uint32x4_t * addr) return vldrwq_gather_base_wb_s32 (addr, 8); } -/* { dg-final { scan-assembler "vldrw.u32" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c index 774230b290367a7d28f0c8579be26fc9c75db1cb..2597ee11608bfe21d697f2250bee7e69c0cc7aec 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c @@ -10,4 +10,6 @@ foo (uint32x4_t * addr) return vldrwq_gather_base_wb_u32 (addr, 8); } -/* { dg-final { scan-assembler "vldrw.u32" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c index 6400f014a88ccf34fef15effff65f9b1267dbd5f..f1ba63855be254d96806c163177e32856294c106 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c @@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p) return vldrwq_gather_base_wb_z_f32 (addr, 8, p); } -/* { dg-final { scan-assembler "vldrwt.u32" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vmsr\tP0, r\[0-9\]+.*" } } */ +/* { dg-final { scan-assembler "vpst" } } */ +/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c index de7006c51f17665b80b83fd5ea034477b7a7e778..56da5a46c64d2946ceade8689105048e19efdc6a 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c @@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p) return vldrwq_gather_base_wb_z_s32 (addr, 8, p); } -/* { dg-final { scan-assembler "vldrwt.u32" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ +/* { dg-final { scan-assembler "vpst" } } */ +/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c index 6c9608f07ba966876804f56403a4352a51a0e0c4..63165d97c1a7b4120be036348a09b73afddd36d1 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c @@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p) return vldrwq_gather_base_wb_z_u32 (addr, 8, p); } -/* { dg-final { scan-assembler "vldrwt.u32" } } */ +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ +/* { dg-final { scan-assembler "vpst" } } */ +/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */