Message ID | AM0PR08MB538031E167ECB404A2F2B7439B9D0@AM0PR08MB5380.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | [GCC-10,Backport] arm: Fix the wrong code-gen generated by MVE vector load/store intrinsics (PR94959). | expand |
> -----Original Message----- > From: Srinath Parvathaneni <Srinath.Parvathaneni@arm.com> > Sent: 16 June 2020 11:50 > To: gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> > Subject: [PATCH][GCC-10 Backport] arm: Fix the wrong code-gen generated > by MVE vector load/store intrinsics (PR94959). > > Hello, > > Few MVE intrinsics like vldrbq_s32, vldrhq_s32 etc., the assembler > instructions > generated by current compiler are wrong. > eg: vldrbq_s32 generates an assembly instructions `vldrb.s32 q0,[ip]`. > But as per Arm-arm second argument in above instructions must also be a > low > register (<= r7). This patch fixes this issue by creating a new predicate > "mve_memory_operand" and constraint "Ux" which allows low registers as > arguments > to the generated instructions depending on the mode of the argument. A > new constraint > "Ul" is created to handle loading to PC-relative addressing modes for vector > store/load intrinsiscs. > All the corresponding MVE intrinsic generating wrong code-gen as vldrbq_s32 > are modified in this patch. > > Please refer to M-profile Vector Extension (MVE) intrinsics [1] and Armv8-M > Architecture Reference Manual [2] for more details. > [1] https://developer.arm.com/architectures/instruction-sets/simd- > isas/helium/mve-intrinsics > [2] https://developer.arm.com/docs/ddi0553/latest > > Regression tested on arm-none-eabi and found no regressions. > > Ok for gcc-10 branch? Ok. Thanks, Kyrill > > Thanks, > Srinath. > > gcc/ChangeLog: > > 2020-06-09 Srinath Parvathaneni <srinath.parvathaneni@arm.com> > > Backported from mainline > 2020-05-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com> > Andre Vieira <andre.simoesdiasvieira@arm.com> > > PR target/94959 > * config/arm/arm-protos.h (arm_mode_base_reg_class): Function > declaration. > (mve_vector_mem_operand): Likewise. > * config/arm/arm.c (thumb2_legitimate_address_p): For MVE target > check > the load from memory to a core register is legitimate for give mode. > (mve_vector_mem_operand): Define function. > (arm_print_operand): Modify comment. > (arm_mode_base_reg_class): Define. > * config/arm/arm.h (MODE_BASE_REG_CLASS): Modify to add check > for > TARGET_HAVE_MVE and expand to arm_mode_base_reg_class on > TRUE. > * config/arm/constraints.md (Ux): Likewise. > (Ul): Likewise. > * config/arm/mve.md (mve_mov): Replace constraint Us with Ux and > also > add support for missing Vector Store Register and Vector Load > Register. > Add a new alternative to support load from memory to PC (or label) > in > vector store/load. > (mve_vstrbq_<supf><mode>): Modify constraint Us to Ux. > (mve_vldrbq_<supf><mode>): Modify constriant Us to Ux, predicate > to > mve_memory_operand and also modify the MVE instructions to emit. > (mve_vldrbq_z_<supf><mode>): Modify constraint Us to Ux. > (mve_vldrhq_fv8hf): Modify constriant Us to Ux, predicate to > mve_memory_operand and also modify the MVE instructions to emit. > (mve_vldrhq_<supf><mode>): Modify constriant Us to Ux, predicate > to > mve_memory_operand and also modify the MVE instructions to emit. > (mve_vldrhq_z_fv8hf): Likewise. > (mve_vldrhq_z_<supf><mode>): Likewise. > (mve_vldrwq_fv4sf): Likewise. > (mve_vldrwq_<supf>v4si): Likewise. > (mve_vldrwq_z_fv4sf): Likewise. > (mve_vldrwq_z_<supf>v4si): Likewise. > (mve_vld1q_f<mode>): Modify constriant Us to Ux. > (mve_vld1q_<supf><mode>): Likewise. > (mve_vstrhq_fv8hf): Modify constriant Us to Ux, predicate to > mve_memory_operand. > (mve_vstrhq_p_fv8hf): Modify constriant Us to Ux, predicate to > mve_memory_operand and also modify the MVE instructions to emit. > (mve_vstrhq_p_<supf><mode>): Likewise. > (mve_vstrhq_<supf><mode>): Modify constriant Us to Ux, predicate > to > mve_memory_operand. > (mve_vstrwq_fv4sf): Modify constriant Us to Ux. > (mve_vstrwq_p_fv4sf): Modify constriant Us to Ux and also modify > the MVE > instructions to emit. > (mve_vstrwq_p_<supf>v4si): Likewise. > (mve_vstrwq_<supf>v4si): Likewise.Modify constriant Us to Ux. > * config/arm/predicates.md (mve_memory_operand): Define. > > gcc/testsuite/ChangeLog: > > 2020-06-09 Srinath Parvathaneni <srinath.parvathaneni@arm.com> > > Backported from mainline > 2020-05-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com> > > PR target/94959 > * gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Modify. > * gcc.target/arm/mve/intrinsics/mve_vldr.c: New test. > * gcc.target/arm/mve/intrinsics/mve_vldr_z.c: Likewise. > * gcc.target/arm/mve/intrinsics/mve_vstr.c: Likewise. > * gcc.target/arm/mve/intrinsics/mve_vstr_p.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_f16.c: Modify. > * gcc.target/arm/mve/intrinsics/vld1q_f32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_s16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_s32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_s8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_u16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_u32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_u8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_z_f16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_z_f32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_z_s16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_z_s32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_z_s8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_z_u16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_z_u32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vld1q_z_u8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrbq_s8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrbq_u8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_f16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_s16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_s32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_u16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_u32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_f32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_s32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_u32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vuninitializedq_float.c: Likewise. > * gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c: Likewise. > * gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise. > * gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: Likewise. > > > ############### Attachment also inlined for ease of reply > ############### > > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index > 9571b60f84f947851639de94501b8bccd0149727..33d162c3e00590ab96de56f > 20380f4ae4f200849 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -64,6 +64,8 @@ extern bool arm_q_bit_access (void); > extern bool arm_ge_bits_access (void); > > #ifdef RTX_CODE > +enum reg_class > +arm_mode_base_reg_class (machine_mode); > extern void arm_gen_unlikely_cbranch (enum rtx_code, machine_mode > cc_mode, > rtx label_ref); > extern bool arm_vector_mode_supported_p (machine_mode); > @@ -114,6 +116,7 @@ extern bool arm_tls_referenced_p (rtx); > > extern int arm_coproc_mem_operand (rtx, bool); > extern int neon_vector_mem_operand (rtx, int, bool); > +extern int mve_vector_mem_operand (machine_mode, rtx, bool); > extern int neon_struct_mem_operand (rtx); > > extern rtx *neon_vcmla_lane_prepare_operands (rtx *); > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h > index > 0126f390abb2650e0b81cb59d55b1ce608490d4a..30e1d6dc994e18012fd2e5a > 1bbd7c69134ee100c 100644 > --- a/gcc/config/arm/arm.h > +++ b/gcc/config/arm/arm.h > @@ -1292,11 +1292,13 @@ extern const char > *fp_sysreg_names[NB_FP_SYSREGS]; > > /* For the Thumb the high registers cannot be used as base registers > when addressing quantities in QI or HI mode; if we don't know the > - mode, then we must be conservative. */ > + mode, then we must be conservative. For MVE we need to load from > + memory to low regs based on given modes i.e [Rn], Rn <= LO_REGS. */ > #define MODE_BASE_REG_CLASS(MODE) \ > - (TARGET_32BIT ? CORE_REGS \ > + (TARGET_HAVE_MVE ? arm_mode_base_reg_class (MODE) \ > + :(TARGET_32BIT ? CORE_REGS \ > : GET_MODE_SIZE (MODE) >= 4 ? BASE_REGS \ > - : LO_REGS) > + : LO_REGS)) > > /* For Thumb we cannot support SP+reg addressing, so we return LO_REGS > instead of BASE_REGS. */ > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index > b169250918c13c6eabf55146a79081514d171571..01bc1b8ae9b72700ca5ae08 > 40ee4496fd686b623 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -8443,6 +8443,10 @@ thumb2_legitimate_address_p (machine_mode > mode, rtx x, int strict_p) > bool use_ldrd; > enum rtx_code code = GET_CODE (x); > > + if (TARGET_HAVE_MVE > + && (mode == V8QImode || mode == E_V4QImode || mode == > V4HImode)) > + return mve_vector_mem_operand (mode, x, strict_p); > + > if (arm_address_register_rtx_p (x, strict_p)) > return 1; > > @@ -13257,6 +13261,80 @@ arm_coproc_mem_operand (rtx op, bool wb) > return FALSE; > } > > +/* This function returns TRUE on matching mode and op. > +1. For given modes, check for [Rn], return TRUE for Rn <= LO_REGS. > +2. For other modes, check for [Rn], return TRUE for Rn < R15 (expect R13). > */ > +int > +mve_vector_mem_operand (machine_mode mode, rtx op, bool strict) > +{ > + enum rtx_code code; > + HOST_WIDE_INT val; > + int reg_no; > + > + /* Match: (mem (reg)). */ > + if (REG_P (op)) > + { > + int reg_no = REGNO (op); > + return (((mode == E_V8QImode || mode == E_V4QImode || mode == > E_V4HImode) > + ? reg_no <= LAST_LO_REGNUM > + :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)) > + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); > + } > + code = GET_CODE (op); > + > + if (code == POST_INC || code == PRE_DEC > + || code == PRE_INC || code == POST_DEC) > + { > + reg_no = REGNO (XEXP (op, 0)); > + return (((mode == E_V8QImode || mode == E_V4QImode || mode == > E_V4HImode) > + ? reg_no <= LAST_LO_REGNUM > + :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)) > + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); > + } > + else if ((code == POST_MODIFY || code == PRE_MODIFY) > + && GET_CODE (XEXP (op, 1)) == PLUS && REG_P (XEXP (XEXP (op, 1), > 1))) > + { > + reg_no = REGNO (XEXP (op, 0)); > + val = INTVAL (XEXP ( XEXP (op, 1), 1)); > + switch (mode) > + { > + case E_V16QImode: > + if (abs_hwi (val)) > + return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM) > + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); > + case E_V8HImode: > + case E_V8HFmode: > + if (abs (val) <= 255) > + return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM) > + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); > + case E_V8QImode: > + case E_V4QImode: > + if (abs_hwi (val)) > + return (reg_no <= LAST_LO_REGNUM > + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); > + case E_V4HImode: > + case E_V4HFmode: > + if (val % 2 == 0 && abs (val) <= 254) > + return (reg_no <= LAST_LO_REGNUM > + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); > + case E_V4SImode: > + case E_V4SFmode: > + if (val % 4 == 0 && abs (val) <= 508) > + return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM) > + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); > + case E_V2DImode: > + case E_V2DFmode: > + case E_TImode: > + if (val % 4 == 0 && val >= 0 && val <= 1020) > + return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM) > + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); > + default: > + return FALSE; > + } > + } > + return FALSE; > +} > + > /* Return TRUE if OP is a memory operand which we can load or store a > vector > to/from. TYPE is one of the following values: > 0 - Vector load/stor (vldr) > @@ -13324,15 +13402,6 @@ neon_vector_mem_operand (rtx op, int type, > bool strict) > && (INTVAL (XEXP (ind, 1)) & 3) == 0) > return TRUE; > > - if (type == 1 && TARGET_HAVE_MVE > - && (GET_CODE (ind) == POST_INC || GET_CODE (ind) == PRE_DEC)) > - { > - rtx ind1 = XEXP (ind, 0); > - if (!REG_P (ind1)) > - return 0; > - return VFP_REGNO_OK_FOR_SINGLE (REGNO (ind1)); > - } > - > return FALSE; > } > > @@ -24019,7 +24088,7 @@ arm_print_operand (FILE *stream, rtx x, int > code) > } > return; > > - /* To print the memory operand with "Us" constraint. Based on the > rtx_code > + /* To print the memory operand with "Ux" constraint. Based on the > rtx_code > the memory operands output looks like following. > 1. [Rn], #+/-<imm> > 2. [Rn, #+/-<imm>]! > @@ -33389,6 +33458,18 @@ arm_gen_far_branch (rtx * operands, int > pos_label, const char * dest, > return ""; > } > > +/* If given mode matches, load from memory to LO_REGS. > + (i.e [Rn], Rn <= LO_REGS). */ > +enum reg_class > +arm_mode_base_reg_class (machine_mode mode) > +{ > + if (TARGET_HAVE_MVE > + && (mode == E_V8QImode || mode == E_V4QImode || mode == > E_V4HImode)) > + return LO_REGS; > + > + return MODE_BASE_REG_REG_CLASS (mode); > +} > + > struct gcc_target targetm = TARGET_INITIALIZER; > > #include "gt-arm.h" > diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md > index > fed6c7c84032dd8aba45142b59b980b4a6240d6d..011badc9957655a0fba6794 > 6c1db6fa6334b2bbb 100644 > --- a/gcc/config/arm/constraints.md > +++ b/gcc/config/arm/constraints.md > @@ -39,7 +39,7 @@ > ;; in all states: Pf, Pg > > ;; The following memory constraints have been used: > -;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf > +;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul > ;; in ARM state: Uq > ;; in Thumb state: Uu, Uw > ;; in all states: Q > @@ -47,6 +47,18 @@ > (define_register_constraint "Up" "TARGET_HAVE_MVE ? VPR_REG : > NO_REGS" > "MVE VPR register") > > +(define_memory_constraint "Ul" > + "@internal > + In ARM/Thumb-2 state a valid address for load instruction with XEXP (op, 0) > + being label of the literal data item to be loaded." > + (and (match_code "mem") > + (match_test "TARGET_HAVE_MVE && reload_completed > + && (GET_CODE (XEXP (op, 0)) == LABEL_REF > + || (GET_CODE (XEXP (op, 0)) == CONST > + && GET_CODE (XEXP (XEXP (op, 0), 0)) == PLUS > + && GET_CODE (XEXP (XEXP (XEXP (op, 0), 0), 0)) == > LABEL_REF > + && CONST_INT_P (XEXP (XEXP (XEXP (op, 0), 0), > 1))))"))) > + > (define_register_constraint "Uf" "TARGET_HAVE_MVE ? VFPCC_REG : > NO_REGS" > "MVE FPCCR register") > > @@ -467,6 +479,15 @@ > (and (match_code "mem") > (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, > true)"))) > > +(define_memory_constraint "Ux" > + "@internal > + In ARM/Thumb-2 state a valid address and load into CORE regs or only to > + LO_REGS based on mode of op." > + (and (match_code "mem") > + (match_test "(TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT) > + && mve_vector_mem_operand (GET_MODE (op), > + XEXP (op, 0), true)"))) > + > (define_memory_constraint "Uq" > "@internal > In ARM state an address valid in ldrsb instructions." > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > index > f43dabbfd4f15b602f0627a9b0ea423064501e51..986fbfe2abae5f1e91e65f1ff > 5c84709c43c4617 100644 > --- a/gcc/config/arm/mve.md > +++ b/gcc/config/arm/mve.md > @@ -666,8 +666,8 @@ > (define_int_iterator VSHLCQ_M [VSHLCQ_M_S VSHLCQ_M_U]) > > (define_insn "*mve_mov<mode>" > - [(set (match_operand:MVE_types 0 "nonimmediate_operand" > "=w,w,r,w,w,r,w,Us") > - (match_operand:MVE_types 1 "general_operand" > "w,r,w,Dn,Usi,r,Dm,w"))] > + [(set (match_operand:MVE_types 0 "nonimmediate_operand" > "=w,w,r,w,w,r,w,Ux,w") > + (match_operand:MVE_types 1 "general_operand" > "w,r,w,Dn,Uxi,r,Dm,w,Ul"))] > "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT" > { > if (which_alternative == 3 || which_alternative == 6) > @@ -686,6 +686,50 @@ > sprintf (templ, "vmov.i%d\t%%q0, %%x1 @ <mode>", width); > return templ; > } > + > + if (which_alternative == 4 || which_alternative == 7) > + { > + rtx ops[2]; > + int regno = (which_alternative == 7) > + ? REGNO (operands[1]) : REGNO (operands[0]); > + > + ops[0] = operands[0]; > + ops[1] = operands[1]; > + if (<MODE>mode == V2DFmode || <MODE>mode == V2DImode) > + { > + if (which_alternative == 7) > + { > + ops[1] = gen_rtx_REG (DImode, regno); > + output_asm_insn ("vstr.64\t%P1, %E0",ops); > + } > + else > + { > + ops[0] = gen_rtx_REG (DImode, regno); > + output_asm_insn ("vldr.64\t%P0, %E1",ops); > + } > + } > + else if (<MODE>mode == TImode) > + { > + if (which_alternative == 7) > + output_asm_insn ("vstr.64\t%q1, %E0",ops); > + else > + output_asm_insn ("vldr.64\t%q0, %E1",ops); > + } > + else > + { > + if (which_alternative == 7) > + { > + ops[1] = gen_rtx_REG (TImode, regno); > + output_asm_insn > ("vstr<V_sz_elem1>.<V_sz_elem>\t%q1, %E0",ops); > + } > + else > + { > + ops[0] = gen_rtx_REG (TImode, regno); > + output_asm_insn > ("vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1",ops); > + } > + } > + return ""; > + } > switch (which_alternative) > { > case 0: > @@ -694,26 +738,19 @@ > return "vmov\t%e0, %Q1, %R1 @ <mode>\;vmov\t%f0, %J1, %K1"; > case 2: > return "vmov\t%Q0, %R0, %e1 @ <mode>\;vmov\t%J0, %K0, %f1"; > - case 4: > - if (MEM_P (operands[1]) > - && (GET_CODE (XEXP (operands[1], 0)) == LABEL_REF > - || GET_CODE (XEXP (operands[1], 0)) == CONST)) > - return output_move_neon (operands); > - else > - return "vldrb.8 %q0, %E1"; > case 5: > return output_move_quad (operands); > - case 7: > - return "vstrb.8 %q1, %E0"; > + case 8: > + return output_move_neon (operands); > default: > gcc_unreachable (); > return ""; > } > } > - [(set_attr "type" > "mve_move,mve_move,mve_move,mve_move,mve_load,multiple,mve_mov > e,mve_store") > - (set_attr "length" "4,8,8,4,8,8,4,4") > - (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*,*") > - (set_attr "neg_pool_range" "*,*,*,*,996,*,*,*")]) > + [(set_attr "type" > "mve_move,mve_move,mve_move,mve_move,mve_load,multiple,mve_mov > e,mve_store,mve_load") > + (set_attr "length" "4,8,8,4,8,8,4,4,4") > + (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*,*,*") > + (set_attr "neg_pool_range" "*,*,*,*,996,*,*,*,*")]) > > (define_insn "*mve_mov<mode>" > [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w") > @@ -8047,7 +8084,7 @@ > ;; [vstrbq_s vstrbq_u] > ;; > (define_insn "mve_vstrbq_<supf><mode>" > - [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us") > + [(set (match_operand:<MVE_B_ELEM> 0 "mve_memory_operand" "=Ux") > (unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 > "s_register_operand" "w")] > VSTRBQ)) > ] > @@ -8133,7 +8170,7 @@ > ;; > (define_insn "mve_vldrbq_<supf><mode>" > [(set (match_operand:MVE_2 0 "s_register_operand" "=w") > - (unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 > "memory_operand" "Us")] > + (unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 > "mve_memory_operand" "Ux")] > VLDRBQ)) > ] > "TARGET_HAVE_MVE" > @@ -8142,7 +8179,10 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops); > + if (<V_sz_elem> == 8) > + output_asm_insn ("vldrb.<V_sz_elem>\t%q0, %E1",ops); > + else > + output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "4")]) > @@ -8216,7 +8256,7 @@ > ;; [vstrbq_p_s vstrbq_p_u] > ;; > (define_insn "mve_vstrbq_p_<supf><mode>" > - [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us") > + [(set (match_operand:<MVE_B_ELEM> 0 "mve_memory_operand" "=Ux") > (unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 > "s_register_operand" "w") > (match_operand:HI 2 "vpr_register_operand" > "Up")] > VSTRBQ)) > @@ -8227,7 +8267,7 @@ > int regno = REGNO (operands[1]); > ops[1] = gen_rtx_REG (TImode, regno); > ops[0] = operands[0]; > - output_asm_insn ("vpst\n\tvstrbt.<V_sz_elem>\t%q1, %E0",ops); > + output_asm_insn ("vpst\;vstrbt.<V_sz_elem>\t%q1, %E0",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -8262,7 +8302,7 @@ > ;; > (define_insn "mve_vldrbq_z_<supf><mode>" > [(set (match_operand:MVE_2 0 "s_register_operand" "=w") > - (unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 > "memory_operand" "Us") > + (unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 > "mve_memory_operand" "Ux") > (match_operand:HI 2 "vpr_register_operand" "Up")] > VLDRBQ)) > ] > @@ -8272,7 +8312,10 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vpst\n\tvldrbt.<supf><V_sz_elem>\t%q0, %E1",ops); > + if (<V_sz_elem> == 8) > + output_asm_insn ("vpst\;vldrbt.<V_sz_elem>\t%q0, %E1",ops); > + else > + output_asm_insn ("vpst\;vldrbt.<supf><V_sz_elem>\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -8303,7 +8346,7 @@ > ;; > (define_insn "mve_vldrhq_fv8hf" > [(set (match_operand:V8HF 0 "s_register_operand" "=w") > - (unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")] > + (unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand" > "Ux")] > VLDRHQ_F)) > ] > "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" > @@ -8312,7 +8355,7 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vldrh.f16\t%q0, %E1",ops); > + output_asm_insn ("vldrh.16\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "4")]) > @@ -8414,12 +8457,11 @@ > [(set_attr "length" "8")]) > > ;; > -;; > ;; [vldrhq_s, vldrhq_u] > ;; > (define_insn "mve_vldrhq_<supf><mode>" > [(set (match_operand:MVE_6 0 "s_register_operand" "=w") > - (unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 > "memory_operand" "Us")] > + (unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 > "mve_memory_operand" "Ux")] > VLDRHQ)) > ] > "TARGET_HAVE_MVE" > @@ -8428,7 +8470,10 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, %E1",ops); > + if (<V_sz_elem> == 16) > + output_asm_insn ("vldrh.16\t%q0, %E1",ops); > + else > + output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "4")]) > @@ -8438,7 +8483,7 @@ > ;; > (define_insn "mve_vldrhq_z_fv8hf" > [(set (match_operand:V8HF 0 "s_register_operand" "=w") > - (unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us") > + (unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand" > "Ux") > (match_operand:HI 2 "vpr_register_operand" "Up")] > VLDRHQ_F)) > ] > @@ -8448,7 +8493,7 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vpst\n\tvldrht.f16\t%q0, %E1",ops); > + output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -8458,7 +8503,7 @@ > ;; > (define_insn "mve_vldrhq_z_<supf><mode>" > [(set (match_operand:MVE_6 0 "s_register_operand" "=w") > - (unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 > "memory_operand" "Us") > + (unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 > "mve_memory_operand" "Ux") > (match_operand:HI 2 "vpr_register_operand" "Up")] > VLDRHQ)) > ] > @@ -8468,7 +8513,10 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vpst\n\tvldrht.<supf><V_sz_elem>\t%q0, %E1",ops); > + if (<V_sz_elem> == 16) > + output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops); > + else > + output_asm_insn ("vpst\;vldrht.<supf><V_sz_elem>\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -8478,7 +8526,7 @@ > ;; > (define_insn "mve_vldrwq_fv4sf" > [(set (match_operand:V4SF 0 "s_register_operand" "=w") > - (unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")] > + (unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Ux")] > VLDRWQ_F)) > ] > "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" > @@ -8487,7 +8535,7 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vldrw.f32\t%q0, %E1",ops); > + output_asm_insn ("vldrw.32\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "4")]) > @@ -8497,7 +8545,7 @@ > ;; > (define_insn "mve_vldrwq_<supf>v4si" > [(set (match_operand:V4SI 0 "s_register_operand" "=w") > - (unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")] > + (unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Ux")] > VLDRWQ)) > ] > "TARGET_HAVE_MVE" > @@ -8506,7 +8554,7 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vldrw.<supf>32\t%q0, %E1",ops); > + output_asm_insn ("vldrw.32\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "4")]) > @@ -8516,7 +8564,7 @@ > ;; > (define_insn "mve_vldrwq_z_fv4sf" > [(set (match_operand:V4SF 0 "s_register_operand" "=w") > - (unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us") > + (unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Ux") > (match_operand:HI 2 "vpr_register_operand" "Up")] > VLDRWQ_F)) > ] > @@ -8526,7 +8574,7 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vpst\n\tvldrwt.f32\t%q0, %E1",ops); > + output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -8536,7 +8584,7 @@ > ;; > (define_insn "mve_vldrwq_z_<supf>v4si" > [(set (match_operand:V4SI 0 "s_register_operand" "=w") > - (unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us") > + (unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Ux") > (match_operand:HI 2 "vpr_register_operand" "Up")] > VLDRWQ)) > ] > @@ -8546,14 +8594,14 @@ > int regno = REGNO (operands[0]); > ops[0] = gen_rtx_REG (TImode, regno); > ops[1] = operands[1]; > - output_asm_insn ("vpst\n\tvldrwt.<supf>32\t%q0, %E1",ops); > + output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops); > return ""; > } > [(set_attr "length" "8")]) > > (define_expand "mve_vld1q_f<mode>" > [(match_operand:MVE_0 0 "s_register_operand") > - (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "memory_operand")] > VLD1Q_F) > + (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 > "mve_memory_operand")] VLD1Q_F) > ] > "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT" > { > @@ -8563,7 +8611,7 @@ > > (define_expand "mve_vld1q_<supf><mode>" > [(match_operand:MVE_2 0 "s_register_operand") > - (unspec:MVE_2 [(match_operand:MVE_2 1 "memory_operand")] VLD1Q) > + (unspec:MVE_2 [(match_operand:MVE_2 1 "mve_memory_operand")] > VLD1Q) > ] > "TARGET_HAVE_MVE" > { > @@ -8991,7 +9039,7 @@ > ;; [vstrhq_f] > ;; > (define_insn "mve_vstrhq_fv8hf" > - [(set (match_operand:V8HI 0 "memory_operand" "=Us") > + [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux") > (unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")] > VSTRHQ_F)) > ] > @@ -9010,7 +9058,7 @@ > ;; [vstrhq_p_f] > ;; > (define_insn "mve_vstrhq_p_fv8hf" > - [(set (match_operand:V8HI 0 "memory_operand" "=Us") > + [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux") > (unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w") > (match_operand:HI 2 "vpr_register_operand" "Up")] > VSTRHQ_F)) > @@ -9021,7 +9069,7 @@ > int regno = REGNO (operands[1]); > ops[1] = gen_rtx_REG (TImode, regno); > ops[0] = operands[0]; > - output_asm_insn ("vpst\n\tvstrht.16\t%q1, %E0",ops); > + output_asm_insn ("vpst\;vstrht.16\t%q1, %E0",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -9030,7 +9078,7 @@ > ;; [vstrhq_p_s vstrhq_p_u] > ;; > (define_insn "mve_vstrhq_p_<supf><mode>" > - [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us") > + [(set (match_operand:<MVE_H_ELEM> 0 "mve_memory_operand" "=Ux") > (unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 > "s_register_operand" "w") > (match_operand:HI 2 "vpr_register_operand" > "Up")] > VSTRHQ)) > @@ -9041,7 +9089,7 @@ > int regno = REGNO (operands[1]); > ops[1] = gen_rtx_REG (TImode, regno); > ops[0] = operands[0]; > - output_asm_insn ("vpst\n\tvstrht.<V_sz_elem>\t%q1, %E0",ops); > + output_asm_insn ("vpst\;vstrht.<V_sz_elem>\t%q1, %E0",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -9093,7 +9141,7 @@ > ;; [vstrhq_scatter_shifted_offset_p_s vstrhq_scatter_shifted_offset_p_u] > ;; > (define_insn "mve_vstrhq_scatter_shifted_offset_p_<supf><mode>" > - [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us") > + [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Ux") > (unspec:<MVE_H_ELEM> > [(match_operand:MVE_6 1 "s_register_operand" "w") > (match_operand:MVE_6 2 "s_register_operand" "w") > @@ -9136,7 +9184,7 @@ > ;; [vstrhq_s, vstrhq_u] > ;; > (define_insn "mve_vstrhq_<supf><mode>" > - [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us") > + [(set (match_operand:<MVE_H_ELEM> 0 "mve_memory_operand" "=Ux") > (unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 > "s_register_operand" "w")] > VSTRHQ)) > ] > @@ -9155,7 +9203,7 @@ > ;; [vstrwq_f] > ;; > (define_insn "mve_vstrwq_fv4sf" > - [(set (match_operand:V4SI 0 "memory_operand" "=Us") > + [(set (match_operand:V4SI 0 "memory_operand" "=Ux") > (unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")] > VSTRWQ_F)) > ] > @@ -9174,7 +9222,7 @@ > ;; [vstrwq_p_f] > ;; > (define_insn "mve_vstrwq_p_fv4sf" > - [(set (match_operand:V4SI 0 "memory_operand" "=Us") > + [(set (match_operand:V4SI 0 "memory_operand" "=Ux") > (unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w") > (match_operand:HI 2 "vpr_register_operand" "Up")] > VSTRWQ_F)) > @@ -9185,7 +9233,7 @@ > int regno = REGNO (operands[1]); > ops[1] = gen_rtx_REG (TImode, regno); > ops[0] = operands[0]; > - output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops); > + output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -9194,7 +9242,7 @@ > ;; [vstrwq_p_s vstrwq_p_u] > ;; > (define_insn "mve_vstrwq_p_<supf>v4si" > - [(set (match_operand:V4SI 0 "memory_operand" "=Us") > + [(set (match_operand:V4SI 0 "memory_operand" "=Ux") > (unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w") > (match_operand:HI 2 "vpr_register_operand" "Up")] > VSTRWQ)) > @@ -9205,7 +9253,7 @@ > int regno = REGNO (operands[1]); > ops[1] = gen_rtx_REG (TImode, regno); > ops[0] = operands[0]; > - output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops); > + output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops); > return ""; > } > [(set_attr "length" "8")]) > @@ -9214,7 +9262,7 @@ > ;; [vstrwq_s vstrwq_u] > ;; > (define_insn "mve_vstrwq_<supf>v4si" > - [(set (match_operand:V4SI 0 "memory_operand" "=Us") > + [(set (match_operand:V4SI 0 "memory_operand" "=Ux") > (unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")] > VSTRWQ)) > ] > diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md > index > 009862e012c9ce3bbe446a89aacb750f47be66f0..c57ad73577e1eebebc8951e > d5b4fb544dd3381f8 100644 > --- a/gcc/config/arm/predicates.md > +++ b/gcc/config/arm/predicates.md > @@ -31,6 +31,12 @@ > || REGNO_REG_CLASS (REGNO (op)) != NO_REGS)); > }) > > +(define_predicate "mve_memory_operand" > + (and (match_code "mem") > + (match_test "TARGET_32BIT > + && mve_vector_mem_operand (GET_MODE (op), XEXP (op, > 0), > + false)"))) > + > ;; True for immediates in the range of 1 to 16 for MVE. > (define_predicate "mve_imm_16" > (match_test "satisfies_constraint_Rd (op)")) > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c > index > e3cf8f8207d603243eae22be9a90bbb1e8a73a58..35f83c6b298aaf2b80937131 > 59b32de17ff96bd2 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c > @@ -11,10 +11,6 @@ foo32 () > return b; > } > > -/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" } } */ > -/* { dg-final { scan-assembler "vstrb.*" } } */ > -/* { dg-final { scan-assembler "vldr.64*" } } */ > - > float16x8_t > foo16 () > { > @@ -22,6 +18,9 @@ foo16 () > return b; > } > > -/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" } } */ > -/* { dg-final { scan-assembler "vstrb.*" } } */ > -/* { dg-final { scan-assembler "vldr.64.*" } } */ > +/* { dg-final { scan-assembler-times "vmov\\tq\[0-7\], q\[0-7\]" 2 } } */ > +/* { dg-final { scan-assembler-times "vstrw.32*" 1 } } */ > +/* { dg-final { scan-assembler-times "vstrh.16*" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrw.32*" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrh.16*" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..15656ed8c3c8c3ab95bbb5d > e59dafdab864b28db > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c > @@ -0,0 +1,61 @@ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-additional-options "-O2" } */ > + > +#include "arm_mve.h" > +void > +foo (uint16_t row_x_col, int8_t *out) > +{ > + for (;;) > + { > + int32x4_t out_3; > + int8_t *rhs_0; > + int8_t *lhs_3; > + int i_row_x_col; > + for (;i_row_x_col < row_x_col; i_row_x_col++) > + { > + int32x4_t ker_0 = vldrbq_s32(rhs_0); > + int32x4_t ip_3 = vldrbq_s32(lhs_3); > + out_3 = vmulq_s32(ip_3, ker_0); > + } > + vstrbq_s32(out, out_3); > + } > +} > + > +void > +foo1 (uint16_t row_x_col, int8_t *out) > +{ > + for (;;) > + { > + int16x8_t out_3; > + int8_t *rhs_0; > + int8_t *lhs_3; > + int i_row_x_col; > + for (; i_row_x_col < row_x_col; i_row_x_col++) > + { > + int16x8_t ker_0 = vldrbq_s16(rhs_0); > + int16x8_t ip_3 = vldrbq_s16(lhs_3); > + out_3 = vmulq_s16(ip_3, ker_0); > + } > + vstrbq_s16(out, out_3); > + } > +} > + > +void > +foo2 (uint16_t row_x_col, int16_t *out) > +{ > + for (;;) > + { > + int32x4_t out_3; > + int16_t *rhs_0; > + int16_t *lhs_3; > + int i_row_x_col; > + for (; i_row_x_col < row_x_col; i_row_x_col++) > + { > + int32x4_t ker_0 = vldrhq_s32(rhs_0); > + int32x4_t ip_3 = vldrhq_s32(lhs_3); > + out_3 = vmulq_s32(ip_3, ker_0); > + } > + vstrhq_s32(out, out_3); > + } > +} > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..ae640837d14f41cc617ac56c > 57ca120be615ac31 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c > @@ -0,0 +1,73 @@ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-additional-options "-O2" } */ > + > +#include "arm_mve.h" > +void > +foo (uint16_t row_len, const int32_t *bias, int8_t *out) > +{ > + int i_out_ch; > + for (;;) > + { > + int8_t *ip_c3; > + int32_t acc_3; > + int32_t row_loop_cnt = row_len; > + int32x4_t res = {acc_3}; > + uint32x4_t scatter_offset; > + int i_row_loop; > + for (; i_row_loop < row_loop_cnt; i_row_loop++) > + { > + mve_pred16_t p; > + int16x8_t r0; > + int16x8_t c3 = vldrbq_z_s16(ip_c3, p); > + acc_3 = vmladavaq_p_s16(acc_3, r0, c3, p); > + } > + vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res); > + } > +} > + > +void > +foo1 (uint16_t row_len, const int32_t *bias, int8_t *out) > +{ > + int i_out_ch; > + for (;;) > + { > + int8_t *ip_c3; > + int32_t acc_3; > + int32_t row_loop_cnt = row_len; > + int i_row_loop; > + int32x4_t res = {acc_3}; > + uint32x4_t scatter_offset; > + for (; i_row_loop < row_loop_cnt; i_row_loop++) > + { > + mve_pred16_t p; > + int32x4_t r0; > + int32x4_t c3 = vldrbq_z_s32(ip_c3, p); > + acc_3 = vmladavaq_p_s32(acc_3, r0, c3, p); > + } > + vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res); > + } > +} > + > +void > +foo2 (uint16_t row_len, const int32_t *bias, int8_t *out) > +{ > + int i_out_ch; > + for (;;) > + { > + int16_t *ip_c3; > + int32_t acc_3; > + int32_t row_loop_cnt = row_len; > + int i_row_loop; > + int32x4_t res = {acc_3}; > + uint32x4_t scatter_offset; > + for (; i_row_loop < row_loop_cnt; i_row_loop++) > + { > + mve_pred16_t p; > + int32x4_t r0; > + int32x4_t c3 = vldrhq_z_s32(ip_c3, p); > + acc_3 = vmladavaq_p_s32(acc_3, r0, c3, p); > + } > + vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res); > + } > +} > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..dd785f28bc02beae828a648 > 6fdcf3a374829ac0d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c > @@ -0,0 +1,43 @@ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-additional-options "-O2" } */ > + > +#include "arm_mve.h" > +void > +foo (const int32_t *output_bias, int8_t *out, uint16_t num_ch) > +{ > + int32_t loop_count = num_ch; > + const int32_t *bias = output_bias; > + int i_loop_cnt; > + for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++) > + { > + int32x4_t out_0 = vldrwq_s32(bias); > + vstrbq_s32(out, out_0); > + } > +} > + > +void > +foo1 (const int16_t *output_bias, int8_t *out, uint16_t num_ch) > +{ > + int32_t loop_count = num_ch; > + const int16_t *bias = output_bias; > + int i_loop_cnt; > + for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++) > + { > + int16x8_t out_0 = vldrhq_s16(bias); > + vstrbq_s16(out, out_0); > + } > +} > + > +void > +foo2 (const int32_t *output_bias, int16_t *out, uint16_t num_ch) > +{ > + int32_t loop_count = num_ch; > + const int32_t *bias = output_bias; > + int i_loop_cnt; > + for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++) > + { > + int32x4_t out_0 = vldrwq_s32(bias); > + vstrhq_s32(out, out_0); > + } > +} > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..8b222f1be0a95031189e792 > bf9afa22411fa867a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c > @@ -0,0 +1,42 @@ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-additional-options "-O2" } */ > + > +#include "arm_mve.h" > +void > +foo1 (int8_t *x, int32_t * i1) > +{ > + mve_pred16_t p; > + int32x4_t x_0; > + int32_t * bias1 = i1; > + for (;; x++) > + { > + x_0 = vldrwq_s32(bias1); > + vstrbq_p_s32(x, x_0, p); > + } > +} > +void > +foo2 (int8_t *x, int16_t * i1) > +{ > + mve_pred16_t p; > + int16x8_t x_0; > + int16_t * bias1 = i1; > + for (;; x++) > + { > + x_0 = vldrhq_s16(bias1); > + vstrbq_p_s16(x, x_0, p); > + } > +} > + > +void > +foo3 (int16_t *x, int32_t * i1) > +{ > + mve_pred16_t p; > + int32x4_t x_0; > + int32_t * bias1 = i1; > + for (;; x++) > + { > + x_0 = vldrwq_s32(bias1); > + vstrhq_p_s32(x, x_0, p); > + } > +} > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c > index > 5e42f634412309411e4a6257cc3042a9ab280e06..699e40d0e3b503f6c02abaa > 3f4f976343081f108 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c > @@ -10,12 +10,11 @@ foo (float16_t const * base) > return vld1q_f16 (base); > } > > -/* { dg-final { scan-assembler "vldrh.f16" } } */ > - > float16x8_t > foo1 (float16_t const * base) > { > return vld1q (base); > } > > -/* { dg-final { scan-assembler "vldrh.f16" } } */ > +/* { dg-final { scan-assembler-times "vldrh.16" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c > index > 99d1a7a9c5e66b4ae99d5184756bb65b8bc5e852..865923033629c273b1a31f5 > 7c0589e0ab1e6fc24 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c > @@ -10,12 +10,11 @@ foo (float32_t const * base) > return vld1q_f32 (base); > } > > -/* { dg-final { scan-assembler "vldrw.f32" } } */ > - > float32x4_t > foo1 (float32_t const * base) > { > return vld1q (base); > } > > -/* { dg-final { scan-assembler "vldrw.f32" } } */ > +/* { dg-final { scan-assembler-times "vldrw.32" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c > index > d77f98ea8893959adfbc2688645d0d36dd826816..f4f04f534db63c5b77927d8e > 2ea967bb705012cc 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c > @@ -10,12 +10,11 @@ foo (int16_t const * base) > return vld1q_s16 (base); > } > > -/* { dg-final { scan-assembler "vldrh.s16" } } */ > - > int16x8_t > foo1 (int16_t const * base) > { > return vld1q (base); > } > > -/* { dg-final { scan-assembler "vldrh.s16" } } */ > +/* { dg-final { scan-assembler-times "vldrh.16" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c > index > 9a7f024f735f1715d6e577aaf08e217b52ad66e7..e0f661667515f3d3e94cd052 > b4bbdef9c33c06dc 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c > @@ -10,12 +10,11 @@ foo (int32_t const * base) > return vld1q_s32 (base); > } > > -/* { dg-final { scan-assembler "vldrw.s32" } } */ > - > int32x4_t > foo1 (int32_t const * base) > { > return vld1q (base); > } > > -/* { dg-final { scan-assembler "vldrw.s32" } } */ > +/* { dg-final { scan-assembler-times "vldrw.32" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c > index > 9c67bb60110081c1ed6c65f6986bbc08b0e2a691..1b7edead6b1a5489f2c668a > 69136b5fed463c703 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c > @@ -10,12 +10,11 @@ foo (int8_t const * base) > return vld1q_s8 (base); > } > > -/* { dg-final { scan-assembler "vldrb.s8" } } */ > - > int8x16_t > foo1 (int8_t const * base) > { > return vld1q (base); > } > > -/* { dg-final { scan-assembler "vldrb.s8" } } */ > +/* { dg-final { scan-assembler-times "vldrb.8" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c > index > 2bef21a5a1dcaf265c052ddb689df9b12d4419ae..50e1f5cedcbe42d7f6325535 > 9795007bfe5ffc0e 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c > @@ -10,12 +10,11 @@ foo (uint16_t const * base) > return vld1q_u16 (base); > } > > -/* { dg-final { scan-assembler "vldrh.u16" } } */ > - > uint16x8_t > foo1 (uint16_t const * base) > { > return vld1q (base); > } > > -/* { dg-final { scan-assembler "vldrh.u16" } } */ > +/* { dg-final { scan-assembler-times "vldrh.16" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c > index > 01a1dd611ed68281e32e8719e11137a9b5626398..a13fe824382f825a32f865fc > 5937712a2f278faf 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c > @@ -10,12 +10,11 @@ foo (uint32_t const * base) > return vld1q_u32 (base); > } > > -/* { dg-final { scan-assembler "vldrw.u32" } } */ > - > uint32x4_t > foo1 (uint32_t const * base) > { > return vld1q (base); > } > > -/* { dg-final { scan-assembler "vldrw.u32" } } */ > +/* { dg-final { scan-assembler-times "vldrw.32" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c > index > 997bc1b212d228668b7a6f36a615168a52ac1af0..dfd1deb93f0f485fb2491a3b > 21821c284c0da437 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c > @@ -10,12 +10,11 @@ foo (uint8_t const * base) > return vld1q_u8 (base); > } > > -/* { dg-final { scan-assembler "vldrb.u8" } } */ > - > uint8x16_t > foo1 (uint8_t const * base) > { > return vld1q (base); > } > > -/* { dg-final { scan-assembler "vldrb.u8" } } */ > +/* { dg-final { scan-assembler-times "vldrb.8" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c > index > ea5593a9dd19d682089021902e7bf283bb54041f..3c32e408e420e2d393b5abc > c96bd59e5d048ec34 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c > @@ -10,12 +10,12 @@ foo (float16_t const * base, mve_pred16_t p) > return vld1q_z_f16 (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.f16" } } */ > - > float16x8_t > foo1 (float16_t const * base, mve_pred16_t p) > { > return vld1q_z (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.f16" } } */ > +/* { dg-final { scan-assembler-times "vpst" 2 } } */ > +/* { dg-final { scan-assembler-times "vldrht.16" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c > index > 28937cd18aa9692d357cf553a71e81f78e184dc5..3fc935c889bea0fec7858e034 > 002b4a521afab65 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c > @@ -10,12 +10,12 @@ foo (float32_t const * base, mve_pred16_t p) > return vld1q_z_f32 (base, p); > } > > -/* { dg-final { scan-assembler "vldrwt.f32" } } */ > - > float32x4_t > foo1 (float32_t const * base, mve_pred16_t p) > { > return vld1q_z (base, p); > } > > -/* { dg-final { scan-assembler "vldrwt.f32" } } */ > +/* { dg-final { scan-assembler-times "vpst" 2 } } */ > +/* { dg-final { scan-assembler-times "vldrwt.32" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c > index > 81a1c439d6e034341a08ea28050f7bab35237808..49cc81092f359249c5178332 > c1ca6e18076eabdb 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c > @@ -10,12 +10,12 @@ foo (int16_t const * base, mve_pred16_t p) > return vld1q_z_s16 (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.s16" } } */ > - > int16x8_t > foo1 (int16_t const * base, mve_pred16_t p) > { > return vld1q_z (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.s16" } } */ > +/* { dg-final { scan-assembler-times "vpst" 2 } } */ > +/* { dg-final { scan-assembler-times "vldrht.16" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c > index > d03ab345f1920f03e6a609bb330b514eae81779c..ec317cd70e8f5cb2a5f83bbd > caf90b18ae148615 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c > @@ -10,12 +10,12 @@ foo (int32_t const * base, mve_pred16_t p) > return vld1q_z_s32 (base, p); > } > > -/* { dg-final { scan-assembler "vldrwt.s32" } } */ > - > int32x4_t > foo1 (int32_t const * base, mve_pred16_t p) > { > return vld1q_z (base, p); > } > > -/* { dg-final { scan-assembler "vldrwt.s32" } } */ > +/* { dg-final { scan-assembler-times "vpst" 2 } } */ > +/* { dg-final { scan-assembler-times "vldrwt.32" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c > index > e535662c7d04437843b9c6aee516ba7a0ceaa214..538c140e78e8d858fe1b42d > 73ca06ad774f3f4da 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c > @@ -10,12 +10,12 @@ foo (int8_t const * base, mve_pred16_t p) > return vld1q_z_s8 (base, p); > } > > -/* { dg-final { scan-assembler "vldrbt.s8" } } */ > - > int8x16_t > foo1 (int8_t const * base, mve_pred16_t p) > { > return vld1q_z (base, p); > } > > -/* { dg-final { scan-assembler "vldrbt.s8" } } */ > +/* { dg-final { scan-assembler-times "vpst" 2 } } */ > +/* { dg-final { scan-assembler-times "vldrbt.8" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c > index > 3f20f4ed9ca6a74fbbce1f471d0a49d89075f1ad..e5e588a187e9524a76d9d9b3 > a2c799338989d7f6 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c > @@ -10,12 +10,12 @@ foo (uint16_t const * base, mve_pred16_t p) > return vld1q_z_u16 (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.u16" } } */ > - > uint16x8_t > foo1 (uint16_t const * base, mve_pred16_t p) > { > return vld1q_z (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.u16" } } */ > +/* { dg-final { scan-assembler-times "vpst" 2 } } */ > +/* { dg-final { scan-assembler-times "vldrht.16" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c > index > 1d3b53e38e8bc3123b85f89570d8656988b9b278..999beefa7e8669dc15b70f4 > 841adfbc08d018622 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c > @@ -10,12 +10,12 @@ foo (uint32_t const * base, mve_pred16_t p) > return vld1q_z_u32 (base, p); > } > > -/* { dg-final { scan-assembler "vldrwt.u32" } } */ > - > uint32x4_t > foo1 (uint32_t const * base, mve_pred16_t p) > { > return vld1q_z (base, p); > } > > -/* { dg-final { scan-assembler "vldrwt.u32" } } */ > +/* { dg-final { scan-assembler-times "vpst" 2 } } */ > +/* { dg-final { scan-assembler-times "vldrwt.32" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c > index > 47d3f6fa4c70773f7a4c549dcf8a3b884cebab92..172053c71422f5daad1555932 > c9af84deee0c8d9 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c > @@ -10,12 +10,12 @@ foo (uint8_t const * base, mve_pred16_t p) > return vld1q_z_u8 (base, p); > } > > -/* { dg-final { scan-assembler "vldrbt.u8" } } */ > - > uint8x16_t > foo1 (uint8_t const * base, mve_pred16_t p) > { > return vld1q_z (base, p); > } > > -/* { dg-final { scan-assembler "vldrbt.u8" } } */ > +/* { dg-final { scan-assembler-times "vpst" 2 } } */ > +/* { dg-final { scan-assembler-times "vldrbt.8" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c > index > 886491f005284596099eeec4d90e67f82b5e967f..ec2f2176ccfebbef00447aed1 > 53069ce3be9491c 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c > @@ -10,4 +10,5 @@ foo (int8_t const * base) > return vldrbq_s8 (base); > } > > -/* { dg-final { scan-assembler "vldrb.s8" } } */ > +/* { dg-final { scan-assembler-times "vldrb.8" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c > index > e58120a2b64d6a9f0efaf80a5a47d9c80f5f4d34..d07b472a4ffe79d4615ae2a1e > 15606a34b9ac765 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c > @@ -10,4 +10,5 @@ foo (uint8_t const * base) > return vldrbq_u8 (base); > } > > -/* { dg-final { scan-assembler "vldrb.u8" } } */ > +/* { dg-final { scan-assembler-times "vldrb.8" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c > index > 7d66c704516b5db6cbd3f36645e6a44031201bf1..aed3c9100638a2e86b91b3b > f42040b4b728fd725 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c > @@ -10,4 +10,6 @@ foo (int8_t const * base, mve_pred16_t p) > return vldrbq_z_s8 (base, p); > } > > -/* { dg-final { scan-assembler "vldrbt.s8" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrbt.8" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c > index > 05ae2628d5645ffa3bfae83867f95151c887ef75..54c61e744543d5bba03dfdd2 > 3e1ceb1e8f398a1a 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c > @@ -10,4 +10,6 @@ foo (uint8_t const * base, mve_pred16_t p) > return vldrbq_z_u8 (base, p); > } > > -/* { dg-final { scan-assembler "vldrbt.u8" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrbt.8" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c > index > 0d1ee769ec64b55c7559ce9dc14f8a6ae2e43e34..7420d0198e7450f566644a74 > bac925170b49d688 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c > @@ -10,6 +10,7 @@ foo (uint64x2_t * addr) > return vldrdq_gather_base_wb_s64 (addr, 8); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ > +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64. > c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64. > c > index > cb2a41bdcd32b553a93d3bcc4787d506f1b54f74..ebe5b2fd70c7e9c1ebe6e2ee > db185975afea36b9 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64. > c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64. > c > @@ -10,6 +10,7 @@ foo (uint64x2_t * addr) > return vldrdq_gather_base_wb_u64 (addr, 8); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ > +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6 > 4.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6 > 4.c > index > 243fbeacc3429025202da2ff157ade38a472e123..231a24a1e5550b444c5476bf > c0d1f6802a4952c8 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6 > 4.c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6 > 4.c > @@ -8,8 +8,8 @@ int64x2_t foo (uint64x2_t * addr, mve_pred16_t p) > return vldrdq_gather_base_wb_z_s64 (addr, 1016, p); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*$" } } */ > /* { dg-final { scan-assembler "vpst" } } */ > /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ > +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6 > 4.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6 > 4.c > index > 10ba42405fe8fde9d4f8993b20e41a59c7bb2e77..b8d9b5c139150536721b1a6 > 6636ce9b5a86bf093 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6 > 4.c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6 > 4.c > @@ -8,8 +8,8 @@ uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p) > return vldrdq_gather_base_wb_z_u64 (addr, 8, p); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ > /* { dg-final { scan-assembler "vpst" } } */ > /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ > +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c > index > b79c0e9bfe49e99557ebfb14bc03f8aaf40ab925..05bef418d822a7a994002f40 > 73b65178e42346a2 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c > @@ -10,4 +10,5 @@ foo (float16_t const * base) > return vldrhq_f16 (base); > } > > -/* { dg-final { scan-assembler "vldrh.f16" } } */ > +/* { dg-final { scan-assembler-times "vldrh.16" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c > index > 4872eb555f3e7b8b7ec86b8ba377f0a0452b1aa0..7c977b6a6995f11cd03380b > a11585452bdddbe89 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c > @@ -10,4 +10,5 @@ foo (int16_t const * base) > return vldrhq_s16 (base); > } > > -/* { dg-final { scan-assembler "vldrh.s16" } } */ > +/* { dg-final { scan-assembler-times "vldrh.16" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c > index > e73e208c26a63fb7caaa68f4d5a52f1fa1c904fa..229b52163faa2566e17f13209 > 40a066313d2c853 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c > @@ -10,4 +10,5 @@ foo (int16_t const * base) > return vldrhq_s32 (base); > } > > -/* { dg-final { scan-assembler "vldrh.s32" } } */ > +/* { dg-final { scan-assembler-times "vldrh.s32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c > index > 6b285d45aaa158c01f3e043d2e71214ed824f79a..07f6d9e3944a976886b35f3c > 7a042046f3c7498a 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c > @@ -10,4 +10,5 @@ foo (uint16_t const * base) > return vldrhq_u16 (base); > } > > -/* { dg-final { scan-assembler "vldrh.u16" } } */ > +/* { dg-final { scan-assembler-times "vldrh.16" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c > index > 994cd4a20badd807c8be54aa45f788e8b9420fd9..cd24f01831f77d1da50ca624 > a7b6c800a9b616fd 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c > @@ -10,4 +10,5 @@ foo (uint16_t const * base) > return vldrhq_u32 (base); > } > > -/* { dg-final { scan-assembler "vldrh.u32" } } */ > +/* { dg-final { scan-assembler-times "vldrh.u32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c > index > 2b866a99dd4c8a213887fe120cfe6dec35f84f87..dd0fc9c7b733114f6e229f781 > 55afe53cca675b7 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c > @@ -10,4 +10,6 @@ foo (float16_t const * base, mve_pred16_t p) > return vldrhq_z_f16 (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.f16" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrht.16" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c > index > 6c92c50ba12a502713ee7d2e7cf719edf848fe9c..36d3458d95c91d13631e511a > c1294e544791336a 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c > @@ -10,4 +10,6 @@ foo (int16_t const * base, mve_pred16_t p) > return vldrhq_z_s16 (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.s16" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrht.16" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c > index > 4cd97ba5743ef1dcd8dc368ef75dc6df2391f69c..9c67b479be79c8377682a448 > 8d365cd853df7a2c 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c > @@ -10,4 +10,6 @@ foo (int16_t const * base, mve_pred16_t p) > return vldrhq_z_s32 (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.s32" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrht.s32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c > index > 80ae0e5cd17fe158b152873d144e8b1217ad8e33..26354b5971aca3c9f003559 > d5261866a019a70ef 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c > @@ -10,4 +10,6 @@ foo (uint16_t const * base, mve_pred16_t p) > return vldrhq_z_u16 (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.u16" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrht.16" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c > index > 1a8590116eb017d9744d62b1dd7d07f539b390f0..948fe5ee5b46701ce6a7e80 > d4b6a3d54690d921f 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c > @@ -10,4 +10,6 @@ foo (uint16_t const * base, mve_pred16_t p) > return vldrhq_z_u32 (base, p); > } > > -/* { dg-final { scan-assembler "vldrht.u32" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrht.u32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c > index > 2c834ae53df93b97f7d7f9600fc4eba7d6c3400d..143079aa23fe8a45c381e33e2 > 0adbd4bb91a539c 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c > @@ -10,4 +10,5 @@ foo (float32_t const * base) > return vldrwq_f32 (base); > } > > -/* { dg-final { scan-assembler "vldrw.f32" } } */ > +/* { dg-final { scan-assembler-times "vldrw.32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32. > c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32. > c > index > db8108e37325c4e1fafd2293d48eba0c33309073..8e2994f75d7d488e968dd9c > d4847900d2438475a 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32. > c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32. > c > @@ -10,6 +10,7 @@ foo (uint32x4_t * addr) > return vldrwq_gather_base_wb_f32 (addr, 8); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32. > c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32. > c > index > 3da64e218e2c0789e996be551650033567eba4e5..e5054738b75ec7378a6a289 > e9c071721f9a6a4d0 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32. > c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32. > c > @@ -10,6 +10,7 @@ foo (uint32x4_t * addr) > return vldrwq_gather_base_wb_s32 (addr, 8); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32. > c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32. > c > index > 2597ee11608bfe21d697f2250bee7e69c0cc7aec..7f39414143bdfb3bcbc059dc > dcba0472c0a63459 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32. > c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32. > c > @@ -10,6 +10,7 @@ foo (uint32x4_t * addr) > return vldrwq_gather_base_wb_u32 (addr, 8); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3 > 2.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3 > 2.c > index > 9fb47daf486fafdb897618453958e776a069d432..f3219e2e8254f542916b1fdc6 > d633e5512c08cfe 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3 > 2.c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3 > 2.c > @@ -10,8 +10,9 @@ foo (uint32x4_t * addr, mve_pred16_t p) > return vldrwq_gather_base_wb_z_f32 (addr, 8, p); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ > /* { dg-final { scan-assembler "vpst" } } */ > /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3 > 2.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3 > 2.c > index > 56da5a46c64d2946ceade8689105048e19efdc6a..4d093d243fe63e3f98cffaf15 > fcb41fa4611b41e 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3 > 2.c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3 > 2.c > @@ -10,8 +10,9 @@ foo (uint32x4_t * addr, mve_pred16_t p) > return vldrwq_gather_base_wb_z_s32 (addr, 8, p); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ > /* { dg-final { scan-assembler "vpst" } } */ > /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3 > 2.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u > 32.c > index > 63165d97c1a7b4120be036348a09b73afddd36d1..e796522a49c6c1929f2f64e > e27e36eda9a1a95d3 100644 > --- > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3 > 2.c > +++ > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u > 32.c > @@ -10,8 +10,9 @@ foo (uint32x4_t * addr, mve_pred16_t p) > return vldrwq_gather_base_wb_z_u32 (addr, 8, p); > } > > -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ > /* { dg-final { scan-assembler "vpst" } } */ > /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0- > 9\]+\\\]!" } } */ > -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c > index > f48c29f8bff5f0b57802d1673c433b70311f8fc0..860dd324d256511a5802a0970 > 19f1a9a7cd52e9b 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c > @@ -10,4 +10,5 @@ foo (int32_t const * base) > return vldrwq_s32 (base); > } > > -/* { dg-final { scan-assembler "vldrw.s32" } } */ > +/* { dg-final { scan-assembler-times "vldrw.32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c > index > 7c722200ecc5c642d6b8e3e0be69601a325b7f53..513ed49fb6eb7a88a51df58a > 521ff0669af89ad1 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c > @@ -10,4 +10,5 @@ foo (uint32_t const * base) > return vldrwq_u32 (base); > } > > -/* { dg-final { scan-assembler "vldrw.u32" } } */ > +/* { dg-final { scan-assembler-times "vldrw.32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c > index > bcdcecab46875864125c4232a75931faf0bcb54f..3e0a6a60bcf4374ec09f33600 > 1b04e6fda524913 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c > @@ -10,4 +10,6 @@ foo (float32_t const * base, mve_pred16_t p) > return vldrwq_z_f32 (base, p); > } > > -/* { dg-final { scan-assembler "vldrwt.f32" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrwt.32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c > index > fd32b30565627078561f6f04214a15a9a1643a68..82b914885b55d7b7d076500 > 726ac9e174f8c0ece 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c > @@ -10,4 +10,6 @@ foo (int32_t const * base, mve_pred16_t p) > return vldrwq_z_s32 (base, p); > } > > -/* { dg-final { scan-assembler "vldrwt.s32" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrwt.32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c > index > f49440438348582745eabd4589bef40ee07f8deb..6a66e1678815b7b4984ed01 > 1d108bc48ab44c963 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c > @@ -10,4 +10,6 @@ foo (uint32_t const * base, mve_pred16_t p) > return vldrwq_z_u32 (base, p); > } > > -/* { dg-final { scan-assembler "vldrwt.u32" } } */ > +/* { dg-final { scan-assembler-times "vpst" 1 } } */ > +/* { dg-final { scan-assembler-times "vldrwt.32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c > index > 52bad05b6219621ada414dc74ab2deebdd1c93e3..739f282c476f2611245a20d > fc0d121eba289a788 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c > @@ -1,6 +1,6 @@ > /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > /* { dg-add-options arm_v8_1m_mve_fp } */ > -/* { dg-additional-options "-O0" } */ > +/* { dg-additional-options "-O2" } */ > > #include "arm_mve.h" > > @@ -14,4 +14,6 @@ foo () > fb = vuninitializedq_f32 (); > } > > -/* { dg-final { scan-assembler-times "vstrb.8" 4 } } */ > +/* { dg-final { scan-assembler-times "vstrh.16" 1 } } */ > +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git > a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c > index > c6724a52074c6ce0361fdba66c4add831e8c13db..a9130607f26915af39e41d9f > 1181131bcbd1ef32 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c > @@ -1,6 +1,6 @@ > /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > /* { dg-add-options arm_v8_1m_mve_fp } */ > -/* { dg-additional-options "-O0" } */ > +/* { dg-additional-options "-O2" } */ > > #include "arm_mve.h" > > @@ -14,4 +14,6 @@ foo () > fb = vuninitializedq (fbb); > } > > -/* { dg-final { scan-assembler-times "vstrb.8" 6 } } */ > +/* { dg-final { scan-assembler-times "vstrh.16" 1 } } */ > +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c > index > 13a0109a9b5380cd83f48154df231081ddb8f08e..bf6692fe57322ac9ed5c949a > 9697d3ed7a565acc 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c > @@ -1,6 +1,6 @@ > /* { dg-require-effective-target arm_v8_1m_mve_ok } */ > /* { dg-add-options arm_v8_1m_mve } */ > -/* { dg-additional-options "-O0" } */ > +/* { dg-additional-options "-O2" } */ > > #include "arm_mve.h" > int8x16_t a; > @@ -25,4 +25,8 @@ foo () > ud = vuninitializedq_u64 (); > } > > -/* { dg-final { scan-assembler-times "vstrb.8" 16 } } */ > +/* { dg-final { scan-assembler-times "vstrb.8" 2 } } */ > +/* { dg-final { scan-assembler-times "vstrh.16" 2 } } */ > +/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */ > +/* { dg-final { scan-assembler-times "vstr.64" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c > index > a321398709e65ee7daadfab9c6089116baccde83..4f66a07ac29030482a2643e1 > 0907d0dae24743af 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c > @@ -1,6 +1,6 @@ > /* { dg-require-effective-target arm_v8_1m_mve_ok } */ > /* { dg-add-options arm_v8_1m_mve } */ > -/* { dg-additional-options "-O0" } */ > +/* { dg-additional-options "-O2" } */ > > #include "arm_mve.h" > > @@ -26,4 +26,8 @@ foo () > ud = vuninitializedq (udd); > } > > -/* { dg-final { scan-assembler-times "vstrb.8" 24 } } */ > +/* { dg-final { scan-assembler-times "vstrb.8" 2 } } */ > +/* { dg-final { scan-assembler-times "vstrh.16" 2 } } */ > +/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */ > +/* { dg-final { scan-assembler-times "vstr.64" 2 } } */ > +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9571b60f84f947851639de94501b8bccd0149727..33d162c3e00590ab96de56f20380f4ae4f200849 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -64,6 +64,8 @@ extern bool arm_q_bit_access (void); extern bool arm_ge_bits_access (void); #ifdef RTX_CODE +enum reg_class +arm_mode_base_reg_class (machine_mode); extern void arm_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode, rtx label_ref); extern bool arm_vector_mode_supported_p (machine_mode); @@ -114,6 +116,7 @@ extern bool arm_tls_referenced_p (rtx); extern int arm_coproc_mem_operand (rtx, bool); extern int neon_vector_mem_operand (rtx, int, bool); +extern int mve_vector_mem_operand (machine_mode, rtx, bool); extern int neon_struct_mem_operand (rtx); extern rtx *neon_vcmla_lane_prepare_operands (rtx *); diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 0126f390abb2650e0b81cb59d55b1ce608490d4a..30e1d6dc994e18012fd2e5a1bbd7c69134ee100c 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1292,11 +1292,13 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS]; /* For the Thumb the high registers cannot be used as base registers when addressing quantities in QI or HI mode; if we don't know the - mode, then we must be conservative. */ + mode, then we must be conservative. For MVE we need to load from + memory to low regs based on given modes i.e [Rn], Rn <= LO_REGS. */ #define MODE_BASE_REG_CLASS(MODE) \ - (TARGET_32BIT ? CORE_REGS \ + (TARGET_HAVE_MVE ? arm_mode_base_reg_class (MODE) \ + :(TARGET_32BIT ? CORE_REGS \ : GET_MODE_SIZE (MODE) >= 4 ? BASE_REGS \ - : LO_REGS) + : LO_REGS)) /* For Thumb we cannot support SP+reg addressing, so we return LO_REGS instead of BASE_REGS. */ diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index b169250918c13c6eabf55146a79081514d171571..01bc1b8ae9b72700ca5ae0840ee4496fd686b623 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -8443,6 +8443,10 @@ thumb2_legitimate_address_p (machine_mode mode, rtx x, int strict_p) bool use_ldrd; enum rtx_code code = GET_CODE (x); + if (TARGET_HAVE_MVE + && (mode == V8QImode || mode == E_V4QImode || mode == V4HImode)) + return mve_vector_mem_operand (mode, x, strict_p); + if (arm_address_register_rtx_p (x, strict_p)) return 1; @@ -13257,6 +13261,80 @@ arm_coproc_mem_operand (rtx op, bool wb) return FALSE; } +/* This function returns TRUE on matching mode and op. +1. For given modes, check for [Rn], return TRUE for Rn <= LO_REGS. +2. For other modes, check for [Rn], return TRUE for Rn < R15 (expect R13). */ +int +mve_vector_mem_operand (machine_mode mode, rtx op, bool strict) +{ + enum rtx_code code; + HOST_WIDE_INT val; + int reg_no; + + /* Match: (mem (reg)). */ + if (REG_P (op)) + { + int reg_no = REGNO (op); + return (((mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode) + ? reg_no <= LAST_LO_REGNUM + :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)) + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); + } + code = GET_CODE (op); + + if (code == POST_INC || code == PRE_DEC + || code == PRE_INC || code == POST_DEC) + { + reg_no = REGNO (XEXP (op, 0)); + return (((mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode) + ? reg_no <= LAST_LO_REGNUM + :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)) + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); + } + else if ((code == POST_MODIFY || code == PRE_MODIFY) + && GET_CODE (XEXP (op, 1)) == PLUS && REG_P (XEXP (XEXP (op, 1), 1))) + { + reg_no = REGNO (XEXP (op, 0)); + val = INTVAL (XEXP ( XEXP (op, 1), 1)); + switch (mode) + { + case E_V16QImode: + if (abs_hwi (val)) + return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM) + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); + case E_V8HImode: + case E_V8HFmode: + if (abs (val) <= 255) + return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM) + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); + case E_V8QImode: + case E_V4QImode: + if (abs_hwi (val)) + return (reg_no <= LAST_LO_REGNUM + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); + case E_V4HImode: + case E_V4HFmode: + if (val % 2 == 0 && abs (val) <= 254) + return (reg_no <= LAST_LO_REGNUM + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); + case E_V4SImode: + case E_V4SFmode: + if (val % 4 == 0 && abs (val) <= 508) + return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM) + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); + case E_V2DImode: + case E_V2DFmode: + case E_TImode: + if (val % 4 == 0 && val >= 0 && val <= 1020) + return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM) + || (!strict && reg_no >= FIRST_PSEUDO_REGISTER)); + default: + return FALSE; + } + } + return FALSE; +} + /* Return TRUE if OP is a memory operand which we can load or store a vector to/from. TYPE is one of the following values: 0 - Vector load/stor (vldr) @@ -13324,15 +13402,6 @@ neon_vector_mem_operand (rtx op, int type, bool strict) && (INTVAL (XEXP (ind, 1)) & 3) == 0) return TRUE; - if (type == 1 && TARGET_HAVE_MVE - && (GET_CODE (ind) == POST_INC || GET_CODE (ind) == PRE_DEC)) - { - rtx ind1 = XEXP (ind, 0); - if (!REG_P (ind1)) - return 0; - return VFP_REGNO_OK_FOR_SINGLE (REGNO (ind1)); - } - return FALSE; } @@ -24019,7 +24088,7 @@ arm_print_operand (FILE *stream, rtx x, int code) } return; - /* To print the memory operand with "Us" constraint. Based on the rtx_code + /* To print the memory operand with "Ux" constraint. Based on the rtx_code the memory operands output looks like following. 1. [Rn], #+/-<imm> 2. [Rn, #+/-<imm>]! @@ -33389,6 +33458,18 @@ arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, return ""; } +/* If given mode matches, load from memory to LO_REGS. + (i.e [Rn], Rn <= LO_REGS). */ +enum reg_class +arm_mode_base_reg_class (machine_mode mode) +{ + if (TARGET_HAVE_MVE + && (mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode)) + return LO_REGS; + + return MODE_BASE_REG_REG_CLASS (mode); +} + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-arm.h" diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index fed6c7c84032dd8aba45142b59b980b4a6240d6d..011badc9957655a0fba67946c1db6fa6334b2bbb 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -39,7 +39,7 @@ ;; in all states: Pf, Pg ;; The following memory constraints have been used: -;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf +;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul ;; in ARM state: Uq ;; in Thumb state: Uu, Uw ;; in all states: Q @@ -47,6 +47,18 @@ (define_register_constraint "Up" "TARGET_HAVE_MVE ? VPR_REG : NO_REGS" "MVE VPR register") +(define_memory_constraint "Ul" + "@internal + In ARM/Thumb-2 state a valid address for load instruction with XEXP (op, 0) + being label of the literal data item to be loaded." + (and (match_code "mem") + (match_test "TARGET_HAVE_MVE && reload_completed + && (GET_CODE (XEXP (op, 0)) == LABEL_REF + || (GET_CODE (XEXP (op, 0)) == CONST + && GET_CODE (XEXP (XEXP (op, 0), 0)) == PLUS + && GET_CODE (XEXP (XEXP (XEXP (op, 0), 0), 0)) == LABEL_REF + && CONST_INT_P (XEXP (XEXP (XEXP (op, 0), 0), 1))))"))) + (define_register_constraint "Uf" "TARGET_HAVE_MVE ? VFPCC_REG : NO_REGS" "MVE FPCCR register") @@ -467,6 +479,15 @@ (and (match_code "mem") (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)"))) +(define_memory_constraint "Ux" + "@internal + In ARM/Thumb-2 state a valid address and load into CORE regs or only to + LO_REGS based on mode of op." + (and (match_code "mem") + (match_test "(TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT) + && mve_vector_mem_operand (GET_MODE (op), + XEXP (op, 0), true)"))) + (define_memory_constraint "Uq" "@internal In ARM state an address valid in ldrsb instructions." diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index f43dabbfd4f15b602f0627a9b0ea423064501e51..986fbfe2abae5f1e91e65f1ff5c84709c43c4617 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -666,8 +666,8 @@ (define_int_iterator VSHLCQ_M [VSHLCQ_M_S VSHLCQ_M_U]) (define_insn "*mve_mov<mode>" - [(set (match_operand:MVE_types 0 "nonimmediate_operand" "=w,w,r,w,w,r,w,Us") - (match_operand:MVE_types 1 "general_operand" "w,r,w,Dn,Usi,r,Dm,w"))] + [(set (match_operand:MVE_types 0 "nonimmediate_operand" "=w,w,r,w,w,r,w,Ux,w") + (match_operand:MVE_types 1 "general_operand" "w,r,w,Dn,Uxi,r,Dm,w,Ul"))] "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT" { if (which_alternative == 3 || which_alternative == 6) @@ -686,6 +686,50 @@ sprintf (templ, "vmov.i%d\t%%q0, %%x1 @ <mode>", width); return templ; } + + if (which_alternative == 4 || which_alternative == 7) + { + rtx ops[2]; + int regno = (which_alternative == 7) + ? REGNO (operands[1]) : REGNO (operands[0]); + + ops[0] = operands[0]; + ops[1] = operands[1]; + if (<MODE>mode == V2DFmode || <MODE>mode == V2DImode) + { + if (which_alternative == 7) + { + ops[1] = gen_rtx_REG (DImode, regno); + output_asm_insn ("vstr.64\t%P1, %E0",ops); + } + else + { + ops[0] = gen_rtx_REG (DImode, regno); + output_asm_insn ("vldr.64\t%P0, %E1",ops); + } + } + else if (<MODE>mode == TImode) + { + if (which_alternative == 7) + output_asm_insn ("vstr.64\t%q1, %E0",ops); + else + output_asm_insn ("vldr.64\t%q0, %E1",ops); + } + else + { + if (which_alternative == 7) + { + ops[1] = gen_rtx_REG (TImode, regno); + output_asm_insn ("vstr<V_sz_elem1>.<V_sz_elem>\t%q1, %E0",ops); + } + else + { + ops[0] = gen_rtx_REG (TImode, regno); + output_asm_insn ("vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1",ops); + } + } + return ""; + } switch (which_alternative) { case 0: @@ -694,26 +738,19 @@ return "vmov\t%e0, %Q1, %R1 @ <mode>\;vmov\t%f0, %J1, %K1"; case 2: return "vmov\t%Q0, %R0, %e1 @ <mode>\;vmov\t%J0, %K0, %f1"; - case 4: - if (MEM_P (operands[1]) - && (GET_CODE (XEXP (operands[1], 0)) == LABEL_REF - || GET_CODE (XEXP (operands[1], 0)) == CONST)) - return output_move_neon (operands); - else - return "vldrb.8 %q0, %E1"; case 5: return output_move_quad (operands); - case 7: - return "vstrb.8 %q1, %E0"; + case 8: + return output_move_neon (operands); default: gcc_unreachable (); return ""; } } - [(set_attr "type" "mve_move,mve_move,mve_move,mve_move,mve_load,multiple,mve_move,mve_store") - (set_attr "length" "4,8,8,4,8,8,4,4") - (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*,*") - (set_attr "neg_pool_range" "*,*,*,*,996,*,*,*")]) + [(set_attr "type" "mve_move,mve_move,mve_move,mve_move,mve_load,multiple,mve_move,mve_store,mve_load") + (set_attr "length" "4,8,8,4,8,8,4,4,4") + (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*,*,*") + (set_attr "neg_pool_range" "*,*,*,*,996,*,*,*,*")]) (define_insn "*mve_mov<mode>" [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w") @@ -8047,7 +8084,7 @@ ;; [vstrbq_s vstrbq_u] ;; (define_insn "mve_vstrbq_<supf><mode>" - [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us") + [(set (match_operand:<MVE_B_ELEM> 0 "mve_memory_operand" "=Ux") (unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 "s_register_operand" "w")] VSTRBQ)) ] @@ -8133,7 +8170,7 @@ ;; (define_insn "mve_vldrbq_<supf><mode>" [(set (match_operand:MVE_2 0 "s_register_operand" "=w") - (unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")] + (unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "mve_memory_operand" "Ux")] VLDRBQ)) ] "TARGET_HAVE_MVE" @@ -8142,7 +8179,10 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops); + if (<V_sz_elem> == 8) + output_asm_insn ("vldrb.<V_sz_elem>\t%q0, %E1",ops); + else + output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops); return ""; } [(set_attr "length" "4")]) @@ -8216,7 +8256,7 @@ ;; [vstrbq_p_s vstrbq_p_u] ;; (define_insn "mve_vstrbq_p_<supf><mode>" - [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us") + [(set (match_operand:<MVE_B_ELEM> 0 "mve_memory_operand" "=Ux") (unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 "s_register_operand" "w") (match_operand:HI 2 "vpr_register_operand" "Up")] VSTRBQ)) @@ -8227,7 +8267,7 @@ int regno = REGNO (operands[1]); ops[1] = gen_rtx_REG (TImode, regno); ops[0] = operands[0]; - output_asm_insn ("vpst\n\tvstrbt.<V_sz_elem>\t%q1, %E0",ops); + output_asm_insn ("vpst\;vstrbt.<V_sz_elem>\t%q1, %E0",ops); return ""; } [(set_attr "length" "8")]) @@ -8262,7 +8302,7 @@ ;; (define_insn "mve_vldrbq_z_<supf><mode>" [(set (match_operand:MVE_2 0 "s_register_operand" "=w") - (unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us") + (unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "mve_memory_operand" "Ux") (match_operand:HI 2 "vpr_register_operand" "Up")] VLDRBQ)) ] @@ -8272,7 +8312,10 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vpst\n\tvldrbt.<supf><V_sz_elem>\t%q0, %E1",ops); + if (<V_sz_elem> == 8) + output_asm_insn ("vpst\;vldrbt.<V_sz_elem>\t%q0, %E1",ops); + else + output_asm_insn ("vpst\;vldrbt.<supf><V_sz_elem>\t%q0, %E1",ops); return ""; } [(set_attr "length" "8")]) @@ -8303,7 +8346,7 @@ ;; (define_insn "mve_vldrhq_fv8hf" [(set (match_operand:V8HF 0 "s_register_operand" "=w") - (unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")] + (unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand" "Ux")] VLDRHQ_F)) ] "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" @@ -8312,7 +8355,7 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vldrh.f16\t%q0, %E1",ops); + output_asm_insn ("vldrh.16\t%q0, %E1",ops); return ""; } [(set_attr "length" "4")]) @@ -8414,12 +8457,11 @@ [(set_attr "length" "8")]) ;; -;; ;; [vldrhq_s, vldrhq_u] ;; (define_insn "mve_vldrhq_<supf><mode>" [(set (match_operand:MVE_6 0 "s_register_operand" "=w") - (unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")] + (unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "mve_memory_operand" "Ux")] VLDRHQ)) ] "TARGET_HAVE_MVE" @@ -8428,7 +8470,10 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, %E1",ops); + if (<V_sz_elem> == 16) + output_asm_insn ("vldrh.16\t%q0, %E1",ops); + else + output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, %E1",ops); return ""; } [(set_attr "length" "4")]) @@ -8438,7 +8483,7 @@ ;; (define_insn "mve_vldrhq_z_fv8hf" [(set (match_operand:V8HF 0 "s_register_operand" "=w") - (unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us") + (unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand" "Ux") (match_operand:HI 2 "vpr_register_operand" "Up")] VLDRHQ_F)) ] @@ -8448,7 +8493,7 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vpst\n\tvldrht.f16\t%q0, %E1",ops); + output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops); return ""; } [(set_attr "length" "8")]) @@ -8458,7 +8503,7 @@ ;; (define_insn "mve_vldrhq_z_<supf><mode>" [(set (match_operand:MVE_6 0 "s_register_operand" "=w") - (unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us") + (unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "mve_memory_operand" "Ux") (match_operand:HI 2 "vpr_register_operand" "Up")] VLDRHQ)) ] @@ -8468,7 +8513,10 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vpst\n\tvldrht.<supf><V_sz_elem>\t%q0, %E1",ops); + if (<V_sz_elem> == 16) + output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops); + else + output_asm_insn ("vpst\;vldrht.<supf><V_sz_elem>\t%q0, %E1",ops); return ""; } [(set_attr "length" "8")]) @@ -8478,7 +8526,7 @@ ;; (define_insn "mve_vldrwq_fv4sf" [(set (match_operand:V4SF 0 "s_register_operand" "=w") - (unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")] + (unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Ux")] VLDRWQ_F)) ] "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" @@ -8487,7 +8535,7 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vldrw.f32\t%q0, %E1",ops); + output_asm_insn ("vldrw.32\t%q0, %E1",ops); return ""; } [(set_attr "length" "4")]) @@ -8497,7 +8545,7 @@ ;; (define_insn "mve_vldrwq_<supf>v4si" [(set (match_operand:V4SI 0 "s_register_operand" "=w") - (unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")] + (unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Ux")] VLDRWQ)) ] "TARGET_HAVE_MVE" @@ -8506,7 +8554,7 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vldrw.<supf>32\t%q0, %E1",ops); + output_asm_insn ("vldrw.32\t%q0, %E1",ops); return ""; } [(set_attr "length" "4")]) @@ -8516,7 +8564,7 @@ ;; (define_insn "mve_vldrwq_z_fv4sf" [(set (match_operand:V4SF 0 "s_register_operand" "=w") - (unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us") + (unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Ux") (match_operand:HI 2 "vpr_register_operand" "Up")] VLDRWQ_F)) ] @@ -8526,7 +8574,7 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vpst\n\tvldrwt.f32\t%q0, %E1",ops); + output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops); return ""; } [(set_attr "length" "8")]) @@ -8536,7 +8584,7 @@ ;; (define_insn "mve_vldrwq_z_<supf>v4si" [(set (match_operand:V4SI 0 "s_register_operand" "=w") - (unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us") + (unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Ux") (match_operand:HI 2 "vpr_register_operand" "Up")] VLDRWQ)) ] @@ -8546,14 +8594,14 @@ int regno = REGNO (operands[0]); ops[0] = gen_rtx_REG (TImode, regno); ops[1] = operands[1]; - output_asm_insn ("vpst\n\tvldrwt.<supf>32\t%q0, %E1",ops); + output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops); return ""; } [(set_attr "length" "8")]) (define_expand "mve_vld1q_f<mode>" [(match_operand:MVE_0 0 "s_register_operand") - (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "memory_operand")] VLD1Q_F) + (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "mve_memory_operand")] VLD1Q_F) ] "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT" { @@ -8563,7 +8611,7 @@ (define_expand "mve_vld1q_<supf><mode>" [(match_operand:MVE_2 0 "s_register_operand") - (unspec:MVE_2 [(match_operand:MVE_2 1 "memory_operand")] VLD1Q) + (unspec:MVE_2 [(match_operand:MVE_2 1 "mve_memory_operand")] VLD1Q) ] "TARGET_HAVE_MVE" { @@ -8991,7 +9039,7 @@ ;; [vstrhq_f] ;; (define_insn "mve_vstrhq_fv8hf" - [(set (match_operand:V8HI 0 "memory_operand" "=Us") + [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux") (unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")] VSTRHQ_F)) ] @@ -9010,7 +9058,7 @@ ;; [vstrhq_p_f] ;; (define_insn "mve_vstrhq_p_fv8hf" - [(set (match_operand:V8HI 0 "memory_operand" "=Us") + [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux") (unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w") (match_operand:HI 2 "vpr_register_operand" "Up")] VSTRHQ_F)) @@ -9021,7 +9069,7 @@ int regno = REGNO (operands[1]); ops[1] = gen_rtx_REG (TImode, regno); ops[0] = operands[0]; - output_asm_insn ("vpst\n\tvstrht.16\t%q1, %E0",ops); + output_asm_insn ("vpst\;vstrht.16\t%q1, %E0",ops); return ""; } [(set_attr "length" "8")]) @@ -9030,7 +9078,7 @@ ;; [vstrhq_p_s vstrhq_p_u] ;; (define_insn "mve_vstrhq_p_<supf><mode>" - [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us") + [(set (match_operand:<MVE_H_ELEM> 0 "mve_memory_operand" "=Ux") (unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 "s_register_operand" "w") (match_operand:HI 2 "vpr_register_operand" "Up")] VSTRHQ)) @@ -9041,7 +9089,7 @@ int regno = REGNO (operands[1]); ops[1] = gen_rtx_REG (TImode, regno); ops[0] = operands[0]; - output_asm_insn ("vpst\n\tvstrht.<V_sz_elem>\t%q1, %E0",ops); + output_asm_insn ("vpst\;vstrht.<V_sz_elem>\t%q1, %E0",ops); return ""; } [(set_attr "length" "8")]) @@ -9093,7 +9141,7 @@ ;; [vstrhq_scatter_shifted_offset_p_s vstrhq_scatter_shifted_offset_p_u] ;; (define_insn "mve_vstrhq_scatter_shifted_offset_p_<supf><mode>" - [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us") + [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Ux") (unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 "s_register_operand" "w") (match_operand:MVE_6 2 "s_register_operand" "w") @@ -9136,7 +9184,7 @@ ;; [vstrhq_s, vstrhq_u] ;; (define_insn "mve_vstrhq_<supf><mode>" - [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us") + [(set (match_operand:<MVE_H_ELEM> 0 "mve_memory_operand" "=Ux") (unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 "s_register_operand" "w")] VSTRHQ)) ] @@ -9155,7 +9203,7 @@ ;; [vstrwq_f] ;; (define_insn "mve_vstrwq_fv4sf" - [(set (match_operand:V4SI 0 "memory_operand" "=Us") + [(set (match_operand:V4SI 0 "memory_operand" "=Ux") (unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")] VSTRWQ_F)) ] @@ -9174,7 +9222,7 @@ ;; [vstrwq_p_f] ;; (define_insn "mve_vstrwq_p_fv4sf" - [(set (match_operand:V4SI 0 "memory_operand" "=Us") + [(set (match_operand:V4SI 0 "memory_operand" "=Ux") (unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w") (match_operand:HI 2 "vpr_register_operand" "Up")] VSTRWQ_F)) @@ -9185,7 +9233,7 @@ int regno = REGNO (operands[1]); ops[1] = gen_rtx_REG (TImode, regno); ops[0] = operands[0]; - output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops); + output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops); return ""; } [(set_attr "length" "8")]) @@ -9194,7 +9242,7 @@ ;; [vstrwq_p_s vstrwq_p_u] ;; (define_insn "mve_vstrwq_p_<supf>v4si" - [(set (match_operand:V4SI 0 "memory_operand" "=Us") + [(set (match_operand:V4SI 0 "memory_operand" "=Ux") (unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w") (match_operand:HI 2 "vpr_register_operand" "Up")] VSTRWQ)) @@ -9205,7 +9253,7 @@ int regno = REGNO (operands[1]); ops[1] = gen_rtx_REG (TImode, regno); ops[0] = operands[0]; - output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops); + output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops); return ""; } [(set_attr "length" "8")]) @@ -9214,7 +9262,7 @@ ;; [vstrwq_s vstrwq_u] ;; (define_insn "mve_vstrwq_<supf>v4si" - [(set (match_operand:V4SI 0 "memory_operand" "=Us") + [(set (match_operand:V4SI 0 "memory_operand" "=Ux") (unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")] VSTRWQ)) ] diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 009862e012c9ce3bbe446a89aacb750f47be66f0..c57ad73577e1eebebc8951ed5b4fb544dd3381f8 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -31,6 +31,12 @@ || REGNO_REG_CLASS (REGNO (op)) != NO_REGS)); }) +(define_predicate "mve_memory_operand" + (and (match_code "mem") + (match_test "TARGET_32BIT + && mve_vector_mem_operand (GET_MODE (op), XEXP (op, 0), + false)"))) + ;; True for immediates in the range of 1 to 16 for MVE. (define_predicate "mve_imm_16" (match_test "satisfies_constraint_Rd (op)")) diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c index e3cf8f8207d603243eae22be9a90bbb1e8a73a58..35f83c6b298aaf2b8093713159b32de17ff96bd2 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c @@ -11,10 +11,6 @@ foo32 () return b; } -/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" } } */ -/* { dg-final { scan-assembler "vstrb.*" } } */ -/* { dg-final { scan-assembler "vldr.64*" } } */ - float16x8_t foo16 () { @@ -22,6 +18,9 @@ foo16 () return b; } -/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" } } */ -/* { dg-final { scan-assembler "vstrb.*" } } */ -/* { dg-final { scan-assembler "vldr.64.*" } } */ +/* { dg-final { scan-assembler-times "vmov\\tq\[0-7\], q\[0-7\]" 2 } } */ +/* { dg-final { scan-assembler-times "vstrw.32*" 1 } } */ +/* { dg-final { scan-assembler-times "vstrh.16*" 1 } } */ +/* { dg-final { scan-assembler-times "vldrw.32*" 1 } } */ +/* { dg-final { scan-assembler-times "vldrh.16*" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c new file mode 100644 index 0000000000000000000000000000000000000000..15656ed8c3c8c3ab95bbb5de59dafdab864b28db --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c @@ -0,0 +1,61 @@ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O2" } */ + +#include "arm_mve.h" +void +foo (uint16_t row_x_col, int8_t *out) +{ + for (;;) + { + int32x4_t out_3; + int8_t *rhs_0; + int8_t *lhs_3; + int i_row_x_col; + for (;i_row_x_col < row_x_col; i_row_x_col++) + { + int32x4_t ker_0 = vldrbq_s32(rhs_0); + int32x4_t ip_3 = vldrbq_s32(lhs_3); + out_3 = vmulq_s32(ip_3, ker_0); + } + vstrbq_s32(out, out_3); + } +} + +void +foo1 (uint16_t row_x_col, int8_t *out) +{ + for (;;) + { + int16x8_t out_3; + int8_t *rhs_0; + int8_t *lhs_3; + int i_row_x_col; + for (; i_row_x_col < row_x_col; i_row_x_col++) + { + int16x8_t ker_0 = vldrbq_s16(rhs_0); + int16x8_t ip_3 = vldrbq_s16(lhs_3); + out_3 = vmulq_s16(ip_3, ker_0); + } + vstrbq_s16(out, out_3); + } +} + +void +foo2 (uint16_t row_x_col, int16_t *out) +{ + for (;;) + { + int32x4_t out_3; + int16_t *rhs_0; + int16_t *lhs_3; + int i_row_x_col; + for (; i_row_x_col < row_x_col; i_row_x_col++) + { + int32x4_t ker_0 = vldrhq_s32(rhs_0); + int32x4_t ip_3 = vldrhq_s32(lhs_3); + out_3 = vmulq_s32(ip_3, ker_0); + } + vstrhq_s32(out, out_3); + } +} diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c new file mode 100644 index 0000000000000000000000000000000000000000..ae640837d14f41cc617ac56c57ca120be615ac31 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c @@ -0,0 +1,73 @@ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O2" } */ + +#include "arm_mve.h" +void +foo (uint16_t row_len, const int32_t *bias, int8_t *out) +{ + int i_out_ch; + for (;;) + { + int8_t *ip_c3; + int32_t acc_3; + int32_t row_loop_cnt = row_len; + int32x4_t res = {acc_3}; + uint32x4_t scatter_offset; + int i_row_loop; + for (; i_row_loop < row_loop_cnt; i_row_loop++) + { + mve_pred16_t p; + int16x8_t r0; + int16x8_t c3 = vldrbq_z_s16(ip_c3, p); + acc_3 = vmladavaq_p_s16(acc_3, r0, c3, p); + } + vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res); + } +} + +void +foo1 (uint16_t row_len, const int32_t *bias, int8_t *out) +{ + int i_out_ch; + for (;;) + { + int8_t *ip_c3; + int32_t acc_3; + int32_t row_loop_cnt = row_len; + int i_row_loop; + int32x4_t res = {acc_3}; + uint32x4_t scatter_offset; + for (; i_row_loop < row_loop_cnt; i_row_loop++) + { + mve_pred16_t p; + int32x4_t r0; + int32x4_t c3 = vldrbq_z_s32(ip_c3, p); + acc_3 = vmladavaq_p_s32(acc_3, r0, c3, p); + } + vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res); + } +} + +void +foo2 (uint16_t row_len, const int32_t *bias, int8_t *out) +{ + int i_out_ch; + for (;;) + { + int16_t *ip_c3; + int32_t acc_3; + int32_t row_loop_cnt = row_len; + int i_row_loop; + int32x4_t res = {acc_3}; + uint32x4_t scatter_offset; + for (; i_row_loop < row_loop_cnt; i_row_loop++) + { + mve_pred16_t p; + int32x4_t r0; + int32x4_t c3 = vldrhq_z_s32(ip_c3, p); + acc_3 = vmladavaq_p_s32(acc_3, r0, c3, p); + } + vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res); + } +} diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c new file mode 100644 index 0000000000000000000000000000000000000000..dd785f28bc02beae828a6486fdcf3a374829ac0d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O2" } */ + +#include "arm_mve.h" +void +foo (const int32_t *output_bias, int8_t *out, uint16_t num_ch) +{ + int32_t loop_count = num_ch; + const int32_t *bias = output_bias; + int i_loop_cnt; + for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++) + { + int32x4_t out_0 = vldrwq_s32(bias); + vstrbq_s32(out, out_0); + } +} + +void +foo1 (const int16_t *output_bias, int8_t *out, uint16_t num_ch) +{ + int32_t loop_count = num_ch; + const int16_t *bias = output_bias; + int i_loop_cnt; + for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++) + { + int16x8_t out_0 = vldrhq_s16(bias); + vstrbq_s16(out, out_0); + } +} + +void +foo2 (const int32_t *output_bias, int16_t *out, uint16_t num_ch) +{ + int32_t loop_count = num_ch; + const int32_t *bias = output_bias; + int i_loop_cnt; + for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++) + { + int32x4_t out_0 = vldrwq_s32(bias); + vstrhq_s32(out, out_0); + } +} diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c new file mode 100644 index 0000000000000000000000000000000000000000..8b222f1be0a95031189e792bf9afa22411fa867a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O2" } */ + +#include "arm_mve.h" +void +foo1 (int8_t *x, int32_t * i1) +{ + mve_pred16_t p; + int32x4_t x_0; + int32_t * bias1 = i1; + for (;; x++) + { + x_0 = vldrwq_s32(bias1); + vstrbq_p_s32(x, x_0, p); + } +} +void +foo2 (int8_t *x, int16_t * i1) +{ + mve_pred16_t p; + int16x8_t x_0; + int16_t * bias1 = i1; + for (;; x++) + { + x_0 = vldrhq_s16(bias1); + vstrbq_p_s16(x, x_0, p); + } +} + +void +foo3 (int16_t *x, int32_t * i1) +{ + mve_pred16_t p; + int32x4_t x_0; + int32_t * bias1 = i1; + for (;; x++) + { + x_0 = vldrwq_s32(bias1); + vstrhq_p_s32(x, x_0, p); + } +} diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c index 5e42f634412309411e4a6257cc3042a9ab280e06..699e40d0e3b503f6c02abaa3f4f976343081f108 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c @@ -10,12 +10,11 @@ foo (float16_t const * base) return vld1q_f16 (base); } -/* { dg-final { scan-assembler "vldrh.f16" } } */ - float16x8_t foo1 (float16_t const * base) { return vld1q (base); } -/* { dg-final { scan-assembler "vldrh.f16" } } */ +/* { dg-final { scan-assembler-times "vldrh.16" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c index 99d1a7a9c5e66b4ae99d5184756bb65b8bc5e852..865923033629c273b1a31f57c0589e0ab1e6fc24 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c @@ -10,12 +10,11 @@ foo (float32_t const * base) return vld1q_f32 (base); } -/* { dg-final { scan-assembler "vldrw.f32" } } */ - float32x4_t foo1 (float32_t const * base) { return vld1q (base); } -/* { dg-final { scan-assembler "vldrw.f32" } } */ +/* { dg-final { scan-assembler-times "vldrw.32" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c index d77f98ea8893959adfbc2688645d0d36dd826816..f4f04f534db63c5b77927d8e2ea967bb705012cc 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c @@ -10,12 +10,11 @@ foo (int16_t const * base) return vld1q_s16 (base); } -/* { dg-final { scan-assembler "vldrh.s16" } } */ - int16x8_t foo1 (int16_t const * base) { return vld1q (base); } -/* { dg-final { scan-assembler "vldrh.s16" } } */ +/* { dg-final { scan-assembler-times "vldrh.16" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c index 9a7f024f735f1715d6e577aaf08e217b52ad66e7..e0f661667515f3d3e94cd052b4bbdef9c33c06dc 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c @@ -10,12 +10,11 @@ foo (int32_t const * base) return vld1q_s32 (base); } -/* { dg-final { scan-assembler "vldrw.s32" } } */ - int32x4_t foo1 (int32_t const * base) { return vld1q (base); } -/* { dg-final { scan-assembler "vldrw.s32" } } */ +/* { dg-final { scan-assembler-times "vldrw.32" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c index 9c67bb60110081c1ed6c65f6986bbc08b0e2a691..1b7edead6b1a5489f2c668a69136b5fed463c703 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c @@ -10,12 +10,11 @@ foo (int8_t const * base) return vld1q_s8 (base); } -/* { dg-final { scan-assembler "vldrb.s8" } } */ - int8x16_t foo1 (int8_t const * base) { return vld1q (base); } -/* { dg-final { scan-assembler "vldrb.s8" } } */ +/* { dg-final { scan-assembler-times "vldrb.8" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c index 2bef21a5a1dcaf265c052ddb689df9b12d4419ae..50e1f5cedcbe42d7f63255359795007bfe5ffc0e 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c @@ -10,12 +10,11 @@ foo (uint16_t const * base) return vld1q_u16 (base); } -/* { dg-final { scan-assembler "vldrh.u16" } } */ - uint16x8_t foo1 (uint16_t const * base) { return vld1q (base); } -/* { dg-final { scan-assembler "vldrh.u16" } } */ +/* { dg-final { scan-assembler-times "vldrh.16" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c index 01a1dd611ed68281e32e8719e11137a9b5626398..a13fe824382f825a32f865fc5937712a2f278faf 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c @@ -10,12 +10,11 @@ foo (uint32_t const * base) return vld1q_u32 (base); } -/* { dg-final { scan-assembler "vldrw.u32" } } */ - uint32x4_t foo1 (uint32_t const * base) { return vld1q (base); } -/* { dg-final { scan-assembler "vldrw.u32" } } */ +/* { dg-final { scan-assembler-times "vldrw.32" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c index 997bc1b212d228668b7a6f36a615168a52ac1af0..dfd1deb93f0f485fb2491a3b21821c284c0da437 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c @@ -10,12 +10,11 @@ foo (uint8_t const * base) return vld1q_u8 (base); } -/* { dg-final { scan-assembler "vldrb.u8" } } */ - uint8x16_t foo1 (uint8_t const * base) { return vld1q (base); } -/* { dg-final { scan-assembler "vldrb.u8" } } */ +/* { dg-final { scan-assembler-times "vldrb.8" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c index ea5593a9dd19d682089021902e7bf283bb54041f..3c32e408e420e2d393b5abcc96bd59e5d048ec34 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c @@ -10,12 +10,12 @@ foo (float16_t const * base, mve_pred16_t p) return vld1q_z_f16 (base, p); } -/* { dg-final { scan-assembler "vldrht.f16" } } */ - float16x8_t foo1 (float16_t const * base, mve_pred16_t p) { return vld1q_z (base, p); } -/* { dg-final { scan-assembler "vldrht.f16" } } */ +/* { dg-final { scan-assembler-times "vpst" 2 } } */ +/* { dg-final { scan-assembler-times "vldrht.16" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c index 28937cd18aa9692d357cf553a71e81f78e184dc5..3fc935c889bea0fec7858e034002b4a521afab65 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c @@ -10,12 +10,12 @@ foo (float32_t const * base, mve_pred16_t p) return vld1q_z_f32 (base, p); } -/* { dg-final { scan-assembler "vldrwt.f32" } } */ - float32x4_t foo1 (float32_t const * base, mve_pred16_t p) { return vld1q_z (base, p); } -/* { dg-final { scan-assembler "vldrwt.f32" } } */ +/* { dg-final { scan-assembler-times "vpst" 2 } } */ +/* { dg-final { scan-assembler-times "vldrwt.32" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c index 81a1c439d6e034341a08ea28050f7bab35237808..49cc81092f359249c5178332c1ca6e18076eabdb 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c @@ -10,12 +10,12 @@ foo (int16_t const * base, mve_pred16_t p) return vld1q_z_s16 (base, p); } -/* { dg-final { scan-assembler "vldrht.s16" } } */ - int16x8_t foo1 (int16_t const * base, mve_pred16_t p) { return vld1q_z (base, p); } -/* { dg-final { scan-assembler "vldrht.s16" } } */ +/* { dg-final { scan-assembler-times "vpst" 2 } } */ +/* { dg-final { scan-assembler-times "vldrht.16" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c index d03ab345f1920f03e6a609bb330b514eae81779c..ec317cd70e8f5cb2a5f83bbdcaf90b18ae148615 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c @@ -10,12 +10,12 @@ foo (int32_t const * base, mve_pred16_t p) return vld1q_z_s32 (base, p); } -/* { dg-final { scan-assembler "vldrwt.s32" } } */ - int32x4_t foo1 (int32_t const * base, mve_pred16_t p) { return vld1q_z (base, p); } -/* { dg-final { scan-assembler "vldrwt.s32" } } */ +/* { dg-final { scan-assembler-times "vpst" 2 } } */ +/* { dg-final { scan-assembler-times "vldrwt.32" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c index e535662c7d04437843b9c6aee516ba7a0ceaa214..538c140e78e8d858fe1b42d73ca06ad774f3f4da 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c @@ -10,12 +10,12 @@ foo (int8_t const * base, mve_pred16_t p) return vld1q_z_s8 (base, p); } -/* { dg-final { scan-assembler "vldrbt.s8" } } */ - int8x16_t foo1 (int8_t const * base, mve_pred16_t p) { return vld1q_z (base, p); } -/* { dg-final { scan-assembler "vldrbt.s8" } } */ +/* { dg-final { scan-assembler-times "vpst" 2 } } */ +/* { dg-final { scan-assembler-times "vldrbt.8" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c index 3f20f4ed9ca6a74fbbce1f471d0a49d89075f1ad..e5e588a187e9524a76d9d9b3a2c799338989d7f6 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c @@ -10,12 +10,12 @@ foo (uint16_t const * base, mve_pred16_t p) return vld1q_z_u16 (base, p); } -/* { dg-final { scan-assembler "vldrht.u16" } } */ - uint16x8_t foo1 (uint16_t const * base, mve_pred16_t p) { return vld1q_z (base, p); } -/* { dg-final { scan-assembler "vldrht.u16" } } */ +/* { dg-final { scan-assembler-times "vpst" 2 } } */ +/* { dg-final { scan-assembler-times "vldrht.16" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c index 1d3b53e38e8bc3123b85f89570d8656988b9b278..999beefa7e8669dc15b70f4841adfbc08d018622 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c @@ -10,12 +10,12 @@ foo (uint32_t const * base, mve_pred16_t p) return vld1q_z_u32 (base, p); } -/* { dg-final { scan-assembler "vldrwt.u32" } } */ - uint32x4_t foo1 (uint32_t const * base, mve_pred16_t p) { return vld1q_z (base, p); } -/* { dg-final { scan-assembler "vldrwt.u32" } } */ +/* { dg-final { scan-assembler-times "vpst" 2 } } */ +/* { dg-final { scan-assembler-times "vldrwt.32" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c index 47d3f6fa4c70773f7a4c549dcf8a3b884cebab92..172053c71422f5daad1555932c9af84deee0c8d9 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c @@ -10,12 +10,12 @@ foo (uint8_t const * base, mve_pred16_t p) return vld1q_z_u8 (base, p); } -/* { dg-final { scan-assembler "vldrbt.u8" } } */ - uint8x16_t foo1 (uint8_t const * base, mve_pred16_t p) { return vld1q_z (base, p); } -/* { dg-final { scan-assembler "vldrbt.u8" } } */ +/* { dg-final { scan-assembler-times "vpst" 2 } } */ +/* { dg-final { scan-assembler-times "vldrbt.8" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c index 886491f005284596099eeec4d90e67f82b5e967f..ec2f2176ccfebbef00447aed153069ce3be9491c 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c @@ -10,4 +10,5 @@ foo (int8_t const * base) return vldrbq_s8 (base); } -/* { dg-final { scan-assembler "vldrb.s8" } } */ +/* { dg-final { scan-assembler-times "vldrb.8" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c index e58120a2b64d6a9f0efaf80a5a47d9c80f5f4d34..d07b472a4ffe79d4615ae2a1e15606a34b9ac765 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c @@ -10,4 +10,5 @@ foo (uint8_t const * base) return vldrbq_u8 (base); } -/* { dg-final { scan-assembler "vldrb.u8" } } */ +/* { dg-final { scan-assembler-times "vldrb.8" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c index 7d66c704516b5db6cbd3f36645e6a44031201bf1..aed3c9100638a2e86b91b3bf42040b4b728fd725 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c @@ -10,4 +10,6 @@ foo (int8_t const * base, mve_pred16_t p) return vldrbq_z_s8 (base, p); } -/* { dg-final { scan-assembler "vldrbt.s8" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrbt.8" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c index 05ae2628d5645ffa3bfae83867f95151c887ef75..54c61e744543d5bba03dfdd23e1ceb1e8f398a1a 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c @@ -10,4 +10,6 @@ foo (uint8_t const * base, mve_pred16_t p) return vldrbq_z_u8 (base, p); } -/* { dg-final { scan-assembler "vldrbt.u8" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrbt.8" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c index 0d1ee769ec64b55c7559ce9dc14f8a6ae2e43e34..7420d0198e7450f566644a74bac925170b49d688 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c @@ -10,6 +10,7 @@ foo (uint64x2_t * addr) return vldrdq_gather_base_wb_s64 (addr, 8); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c index cb2a41bdcd32b553a93d3bcc4787d506f1b54f74..ebe5b2fd70c7e9c1ebe6e2eedb185975afea36b9 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c @@ -10,6 +10,7 @@ foo (uint64x2_t * addr) return vldrdq_gather_base_wb_u64 (addr, 8); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c index 243fbeacc3429025202da2ff157ade38a472e123..231a24a1e5550b444c5476bfc0d1f6802a4952c8 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c @@ -8,8 +8,8 @@ int64x2_t foo (uint64x2_t * addr, mve_pred16_t p) return vldrdq_gather_base_wb_z_s64 (addr, 1016, p); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*$" } } */ /* { dg-final { scan-assembler "vpst" } } */ /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c index 10ba42405fe8fde9d4f8993b20e41a59c7bb2e77..b8d9b5c139150536721b1a66636ce9b5a86bf093 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c @@ -8,8 +8,8 @@ uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p) return vldrdq_gather_base_wb_z_u64 (addr, 8, p); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ /* { dg-final { scan-assembler "vpst" } } */ /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c index b79c0e9bfe49e99557ebfb14bc03f8aaf40ab925..05bef418d822a7a994002f4073b65178e42346a2 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c @@ -10,4 +10,5 @@ foo (float16_t const * base) return vldrhq_f16 (base); } -/* { dg-final { scan-assembler "vldrh.f16" } } */ +/* { dg-final { scan-assembler-times "vldrh.16" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c index 4872eb555f3e7b8b7ec86b8ba377f0a0452b1aa0..7c977b6a6995f11cd03380ba11585452bdddbe89 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c @@ -10,4 +10,5 @@ foo (int16_t const * base) return vldrhq_s16 (base); } -/* { dg-final { scan-assembler "vldrh.s16" } } */ +/* { dg-final { scan-assembler-times "vldrh.16" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c index e73e208c26a63fb7caaa68f4d5a52f1fa1c904fa..229b52163faa2566e17f1320940a066313d2c853 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c @@ -10,4 +10,5 @@ foo (int16_t const * base) return vldrhq_s32 (base); } -/* { dg-final { scan-assembler "vldrh.s32" } } */ +/* { dg-final { scan-assembler-times "vldrh.s32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c index 6b285d45aaa158c01f3e043d2e71214ed824f79a..07f6d9e3944a976886b35f3c7a042046f3c7498a 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c @@ -10,4 +10,5 @@ foo (uint16_t const * base) return vldrhq_u16 (base); } -/* { dg-final { scan-assembler "vldrh.u16" } } */ +/* { dg-final { scan-assembler-times "vldrh.16" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c index 994cd4a20badd807c8be54aa45f788e8b9420fd9..cd24f01831f77d1da50ca624a7b6c800a9b616fd 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c @@ -10,4 +10,5 @@ foo (uint16_t const * base) return vldrhq_u32 (base); } -/* { dg-final { scan-assembler "vldrh.u32" } } */ +/* { dg-final { scan-assembler-times "vldrh.u32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c index 2b866a99dd4c8a213887fe120cfe6dec35f84f87..dd0fc9c7b733114f6e229f78155afe53cca675b7 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c @@ -10,4 +10,6 @@ foo (float16_t const * base, mve_pred16_t p) return vldrhq_z_f16 (base, p); } -/* { dg-final { scan-assembler "vldrht.f16" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrht.16" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c index 6c92c50ba12a502713ee7d2e7cf719edf848fe9c..36d3458d95c91d13631e511ac1294e544791336a 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c @@ -10,4 +10,6 @@ foo (int16_t const * base, mve_pred16_t p) return vldrhq_z_s16 (base, p); } -/* { dg-final { scan-assembler "vldrht.s16" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrht.16" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c index 4cd97ba5743ef1dcd8dc368ef75dc6df2391f69c..9c67b479be79c8377682a4488d365cd853df7a2c 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c @@ -10,4 +10,6 @@ foo (int16_t const * base, mve_pred16_t p) return vldrhq_z_s32 (base, p); } -/* { dg-final { scan-assembler "vldrht.s32" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrht.s32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c index 80ae0e5cd17fe158b152873d144e8b1217ad8e33..26354b5971aca3c9f003559d5261866a019a70ef 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c @@ -10,4 +10,6 @@ foo (uint16_t const * base, mve_pred16_t p) return vldrhq_z_u16 (base, p); } -/* { dg-final { scan-assembler "vldrht.u16" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrht.16" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c index 1a8590116eb017d9744d62b1dd7d07f539b390f0..948fe5ee5b46701ce6a7e80d4b6a3d54690d921f 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c @@ -10,4 +10,6 @@ foo (uint16_t const * base, mve_pred16_t p) return vldrhq_z_u32 (base, p); } -/* { dg-final { scan-assembler "vldrht.u32" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrht.u32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c index 2c834ae53df93b97f7d7f9600fc4eba7d6c3400d..143079aa23fe8a45c381e33e20adbd4bb91a539c 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c @@ -10,4 +10,5 @@ foo (float32_t const * base) return vldrwq_f32 (base); } -/* { dg-final { scan-assembler "vldrw.f32" } } */ +/* { dg-final { scan-assembler-times "vldrw.32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c index db8108e37325c4e1fafd2293d48eba0c33309073..8e2994f75d7d488e968dd9cd4847900d2438475a 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c @@ -10,6 +10,7 @@ foo (uint32x4_t * addr) return vldrwq_gather_base_wb_f32 (addr, 8); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c index 3da64e218e2c0789e996be551650033567eba4e5..e5054738b75ec7378a6a289e9c071721f9a6a4d0 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c @@ -10,6 +10,7 @@ foo (uint32x4_t * addr) return vldrwq_gather_base_wb_s32 (addr, 8); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c index 2597ee11608bfe21d697f2250bee7e69c0cc7aec..7f39414143bdfb3bcbc059dcdcba0472c0a63459 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c @@ -10,6 +10,7 @@ foo (uint32x4_t * addr) return vldrwq_gather_base_wb_u32 (addr, 8); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c index 9fb47daf486fafdb897618453958e776a069d432..f3219e2e8254f542916b1fdc6d633e5512c08cfe 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c @@ -10,8 +10,9 @@ foo (uint32x4_t * addr, mve_pred16_t p) return vldrwq_gather_base_wb_z_f32 (addr, 8, p); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ /* { dg-final { scan-assembler "vpst" } } */ /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c index 56da5a46c64d2946ceade8689105048e19efdc6a..4d093d243fe63e3f98cffaf15fcb41fa4611b41e 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c @@ -10,8 +10,9 @@ foo (uint32x4_t * addr, mve_pred16_t p) return vldrwq_gather_base_wb_z_s32 (addr, 8, p); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ /* { dg-final { scan-assembler "vpst" } } */ /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c index 63165d97c1a7b4120be036348a09b73afddd36d1..e796522a49c6c1929f2f64ee27e36eda9a1a95d3 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c @@ -10,8 +10,9 @@ foo (uint32x4_t * addr, mve_pred16_t p) return vldrwq_gather_base_wb_z_u32 (addr, 8, p); } -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */ /* { dg-final { scan-assembler "vpst" } } */ /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c index f48c29f8bff5f0b57802d1673c433b70311f8fc0..860dd324d256511a5802a097019f1a9a7cd52e9b 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c @@ -10,4 +10,5 @@ foo (int32_t const * base) return vldrwq_s32 (base); } -/* { dg-final { scan-assembler "vldrw.s32" } } */ +/* { dg-final { scan-assembler-times "vldrw.32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c index 7c722200ecc5c642d6b8e3e0be69601a325b7f53..513ed49fb6eb7a88a51df58a521ff0669af89ad1 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c @@ -10,4 +10,5 @@ foo (uint32_t const * base) return vldrwq_u32 (base); } -/* { dg-final { scan-assembler "vldrw.u32" } } */ +/* { dg-final { scan-assembler-times "vldrw.32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c index bcdcecab46875864125c4232a75931faf0bcb54f..3e0a6a60bcf4374ec09f336001b04e6fda524913 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c @@ -10,4 +10,6 @@ foo (float32_t const * base, mve_pred16_t p) return vldrwq_z_f32 (base, p); } -/* { dg-final { scan-assembler "vldrwt.f32" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrwt.32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c index fd32b30565627078561f6f04214a15a9a1643a68..82b914885b55d7b7d076500726ac9e174f8c0ece 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c @@ -10,4 +10,6 @@ foo (int32_t const * base, mve_pred16_t p) return vldrwq_z_s32 (base, p); } -/* { dg-final { scan-assembler "vldrwt.s32" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrwt.32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c index f49440438348582745eabd4589bef40ee07f8deb..6a66e1678815b7b4984ed011d108bc48ab44c963 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c @@ -10,4 +10,6 @@ foo (uint32_t const * base, mve_pred16_t p) return vldrwq_z_u32 (base, p); } -/* { dg-final { scan-assembler "vldrwt.u32" } } */ +/* { dg-final { scan-assembler-times "vpst" 1 } } */ +/* { dg-final { scan-assembler-times "vldrwt.32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c index 52bad05b6219621ada414dc74ab2deebdd1c93e3..739f282c476f2611245a20dfc0d121eba289a788 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c @@ -1,6 +1,6 @@ /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ /* { dg-add-options arm_v8_1m_mve_fp } */ -/* { dg-additional-options "-O0" } */ +/* { dg-additional-options "-O2" } */ #include "arm_mve.h" @@ -14,4 +14,6 @@ foo () fb = vuninitializedq_f32 (); } -/* { dg-final { scan-assembler-times "vstrb.8" 4 } } */ +/* { dg-final { scan-assembler-times "vstrh.16" 1 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c index c6724a52074c6ce0361fdba66c4add831e8c13db..a9130607f26915af39e41d9f1181131bcbd1ef32 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c @@ -1,6 +1,6 @@ /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ /* { dg-add-options arm_v8_1m_mve_fp } */ -/* { dg-additional-options "-O0" } */ +/* { dg-additional-options "-O2" } */ #include "arm_mve.h" @@ -14,4 +14,6 @@ foo () fb = vuninitializedq (fbb); } -/* { dg-final { scan-assembler-times "vstrb.8" 6 } } */ +/* { dg-final { scan-assembler-times "vstrh.16" 1 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c index 13a0109a9b5380cd83f48154df231081ddb8f08e..bf6692fe57322ac9ed5c949a9697d3ed7a565acc 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c @@ -1,6 +1,6 @@ /* { dg-require-effective-target arm_v8_1m_mve_ok } */ /* { dg-add-options arm_v8_1m_mve } */ -/* { dg-additional-options "-O0" } */ +/* { dg-additional-options "-O2" } */ #include "arm_mve.h" int8x16_t a; @@ -25,4 +25,8 @@ foo () ud = vuninitializedq_u64 (); } -/* { dg-final { scan-assembler-times "vstrb.8" 16 } } */ +/* { dg-final { scan-assembler-times "vstrb.8" 2 } } */ +/* { dg-final { scan-assembler-times "vstrh.16" 2 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */ +/* { dg-final { scan-assembler-times "vstr.64" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c index a321398709e65ee7daadfab9c6089116baccde83..4f66a07ac29030482a2643e10907d0dae24743af 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c @@ -1,6 +1,6 @@ /* { dg-require-effective-target arm_v8_1m_mve_ok } */ /* { dg-add-options arm_v8_1m_mve } */ -/* { dg-additional-options "-O0" } */ +/* { dg-additional-options "-O2" } */ #include "arm_mve.h" @@ -26,4 +26,8 @@ foo () ud = vuninitializedq (udd); } -/* { dg-final { scan-assembler-times "vstrb.8" 24 } } */ +/* { dg-final { scan-assembler-times "vstrb.8" 2 } } */ +/* { dg-final { scan-assembler-times "vstrh.16" 2 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */ +/* { dg-final { scan-assembler-times "vstr.64" 2 } } */ +/* { dg-final { scan-assembler-not "__ARM_undef" } } */