diff mbox series

[GCC-10,Backport] arm: Fix the wrong code-gen generated by MVE vector load/store intrinsics (PR94959).

Message ID AM0PR08MB538031E167ECB404A2F2B7439B9D0@AM0PR08MB5380.eurprd08.prod.outlook.com
State New
Headers show
Series [GCC-10,Backport] arm: Fix the wrong code-gen generated by MVE vector load/store intrinsics (PR94959). | expand

Commit Message

Srinath Parvathaneni June 16, 2020, 10:50 a.m. UTC
Hello,

Few MVE intrinsics like vldrbq_s32, vldrhq_s32 etc., the assembler instructions
generated by current compiler are wrong.
eg: vldrbq_s32 generates an assembly instructions `vldrb.s32 q0,[ip]`.
But as per Arm-arm second argument in above instructions must also be a low
register (<= r7). This patch fixes this issue by creating a new predicate
"mve_memory_operand" and constraint "Ux" which allows low registers as arguments
to the generated instructions depending on the mode of the argument. A new constraint
"Ul" is created to handle loading to PC-relative addressing modes for vector
store/load intrinsiscs.
All the corresponding MVE intrinsic generating wrong code-gen as vldrbq_s32
are modified in this patch.

Please refer to M-profile Vector Extension (MVE) intrinsics [1] and Armv8-M Architecture Reference Manual [2] for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
[2] https://developer.arm.com/docs/ddi0553/latest

Regression tested on arm-none-eabi and found no regressions.

Ok for gcc-10 branch?

Thanks,
Srinath.

gcc/ChangeLog:

2020-06-09  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	Backported from mainline
	2020-05-20  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>
		    Andre Vieira  <andre.simoesdiasvieira@arm.com>

	PR target/94959
	* config/arm/arm-protos.h (arm_mode_base_reg_class): Function
	declaration.
	(mve_vector_mem_operand): Likewise.
	* config/arm/arm.c (thumb2_legitimate_address_p): For MVE target check
	the load from memory to a core register is legitimate for give mode.
	(mve_vector_mem_operand): Define function.
	(arm_print_operand): Modify comment.
	(arm_mode_base_reg_class): Define.
	* config/arm/arm.h (MODE_BASE_REG_CLASS): Modify to add check for
	TARGET_HAVE_MVE and expand to arm_mode_base_reg_class on TRUE.
	* config/arm/constraints.md (Ux): Likewise.
	(Ul): Likewise.
	* config/arm/mve.md (mve_mov): Replace constraint Us with Ux and also
	add support for missing Vector Store Register and Vector Load Register.
	Add a new alternative to support load from memory to PC (or label) in
	vector store/load.
	(mve_vstrbq_<supf><mode>): Modify constraint Us to Ux.
	(mve_vldrbq_<supf><mode>): Modify constriant Us to Ux, predicate to
	mve_memory_operand and also modify the MVE instructions to emit.
	(mve_vldrbq_z_<supf><mode>): Modify constraint Us to Ux.
	(mve_vldrhq_fv8hf): Modify constriant Us to Ux, predicate to
	mve_memory_operand and also modify the MVE instructions to emit.
	(mve_vldrhq_<supf><mode>): Modify constriant Us to Ux, predicate to
	mve_memory_operand and also modify the MVE instructions to emit.
	(mve_vldrhq_z_fv8hf): Likewise.
	(mve_vldrhq_z_<supf><mode>): Likewise.
	(mve_vldrwq_fv4sf): Likewise.
	(mve_vldrwq_<supf>v4si): Likewise.
	(mve_vldrwq_z_fv4sf): Likewise.
	(mve_vldrwq_z_<supf>v4si): Likewise.
	(mve_vld1q_f<mode>): Modify constriant Us to Ux.
	(mve_vld1q_<supf><mode>): Likewise.
	(mve_vstrhq_fv8hf): Modify constriant Us to Ux, predicate to
	mve_memory_operand.
	(mve_vstrhq_p_fv8hf): Modify constriant Us to Ux, predicate to
	mve_memory_operand and also modify the MVE instructions to emit.
	(mve_vstrhq_p_<supf><mode>): Likewise.
	(mve_vstrhq_<supf><mode>): Modify constriant Us to Ux, predicate to
	mve_memory_operand.
	(mve_vstrwq_fv4sf): Modify constriant Us to Ux.
	(mve_vstrwq_p_fv4sf): Modify constriant Us to Ux and also modify the MVE
	instructions to emit.
	(mve_vstrwq_p_<supf>v4si): Likewise.
	(mve_vstrwq_<supf>v4si): Likewise.Modify constriant Us to Ux.
	* config/arm/predicates.md (mve_memory_operand): Define.

gcc/testsuite/ChangeLog:

2020-06-09  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

        Backported from mainline
        2020-05-20  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	PR target/94959
	* gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Modify.
	* gcc.target/arm/mve/intrinsics/mve_vldr.c: New test.
	* gcc.target/arm/mve/intrinsics/mve_vldr_z.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vstr.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vstr_p.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_f16.c: Modify.
	* gcc.target/arm/mve/intrinsics/vld1q_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_float.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############

Comments

Kyrylo Tkachov June 16, 2020, 11:47 a.m. UTC | #1
> -----Original Message-----
> From: Srinath Parvathaneni <Srinath.Parvathaneni@arm.com>
> Sent: 16 June 2020 11:50
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: [PATCH][GCC-10 Backport] arm: Fix the wrong code-gen generated
> by MVE vector load/store intrinsics (PR94959).
> 
> Hello,
> 
> Few MVE intrinsics like vldrbq_s32, vldrhq_s32 etc., the assembler
> instructions
> generated by current compiler are wrong.
> eg: vldrbq_s32 generates an assembly instructions `vldrb.s32 q0,[ip]`.
> But as per Arm-arm second argument in above instructions must also be a
> low
> register (<= r7). This patch fixes this issue by creating a new predicate
> "mve_memory_operand" and constraint "Ux" which allows low registers as
> arguments
> to the generated instructions depending on the mode of the argument. A
> new constraint
> "Ul" is created to handle loading to PC-relative addressing modes for vector
> store/load intrinsiscs.
> All the corresponding MVE intrinsic generating wrong code-gen as vldrbq_s32
> are modified in this patch.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1] and Armv8-M
> Architecture Reference Manual [2] for more details.
> [1] https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> [2] https://developer.arm.com/docs/ddi0553/latest
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for gcc-10 branch?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2020-06-09  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>
> 
> 	Backported from mainline
> 	2020-05-20  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>
> 		    Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
> 	PR target/94959
> 	* config/arm/arm-protos.h (arm_mode_base_reg_class): Function
> 	declaration.
> 	(mve_vector_mem_operand): Likewise.
> 	* config/arm/arm.c (thumb2_legitimate_address_p): For MVE target
> check
> 	the load from memory to a core register is legitimate for give mode.
> 	(mve_vector_mem_operand): Define function.
> 	(arm_print_operand): Modify comment.
> 	(arm_mode_base_reg_class): Define.
> 	* config/arm/arm.h (MODE_BASE_REG_CLASS): Modify to add check
> for
> 	TARGET_HAVE_MVE and expand to arm_mode_base_reg_class on
> TRUE.
> 	* config/arm/constraints.md (Ux): Likewise.
> 	(Ul): Likewise.
> 	* config/arm/mve.md (mve_mov): Replace constraint Us with Ux and
> also
> 	add support for missing Vector Store Register and Vector Load
> Register.
> 	Add a new alternative to support load from memory to PC (or label)
> in
> 	vector store/load.
> 	(mve_vstrbq_<supf><mode>): Modify constraint Us to Ux.
> 	(mve_vldrbq_<supf><mode>): Modify constriant Us to Ux, predicate
> to
> 	mve_memory_operand and also modify the MVE instructions to emit.
> 	(mve_vldrbq_z_<supf><mode>): Modify constraint Us to Ux.
> 	(mve_vldrhq_fv8hf): Modify constriant Us to Ux, predicate to
> 	mve_memory_operand and also modify the MVE instructions to emit.
> 	(mve_vldrhq_<supf><mode>): Modify constriant Us to Ux, predicate
> to
> 	mve_memory_operand and also modify the MVE instructions to emit.
> 	(mve_vldrhq_z_fv8hf): Likewise.
> 	(mve_vldrhq_z_<supf><mode>): Likewise.
> 	(mve_vldrwq_fv4sf): Likewise.
> 	(mve_vldrwq_<supf>v4si): Likewise.
> 	(mve_vldrwq_z_fv4sf): Likewise.
> 	(mve_vldrwq_z_<supf>v4si): Likewise.
> 	(mve_vld1q_f<mode>): Modify constriant Us to Ux.
> 	(mve_vld1q_<supf><mode>): Likewise.
> 	(mve_vstrhq_fv8hf): Modify constriant Us to Ux, predicate to
> 	mve_memory_operand.
> 	(mve_vstrhq_p_fv8hf): Modify constriant Us to Ux, predicate to
> 	mve_memory_operand and also modify the MVE instructions to emit.
> 	(mve_vstrhq_p_<supf><mode>): Likewise.
> 	(mve_vstrhq_<supf><mode>): Modify constriant Us to Ux, predicate
> to
> 	mve_memory_operand.
> 	(mve_vstrwq_fv4sf): Modify constriant Us to Ux.
> 	(mve_vstrwq_p_fv4sf): Modify constriant Us to Ux and also modify
> the MVE
> 	instructions to emit.
> 	(mve_vstrwq_p_<supf>v4si): Likewise.
> 	(mve_vstrwq_<supf>v4si): Likewise.Modify constriant Us to Ux.
> 	* config/arm/predicates.md (mve_memory_operand): Define.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-06-09  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>
> 
>         Backported from mainline
>         2020-05-20  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>
> 
> 	PR target/94959
> 	* gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Modify.
> 	* gcc.target/arm/mve/intrinsics/mve_vldr.c: New test.
> 	* gcc.target/arm/mve/intrinsics/mve_vldr_z.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/mve_vstr.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/mve_vstr_p.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_f16.c: Modify.
> 	* gcc.target/arm/mve/intrinsics/vld1q_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_z_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_z_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_z_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_z_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_z_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_z_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_z_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vld1q_z_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrbq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrbq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c:
> Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vuninitializedq_float.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: Likewise.
> 
> 
> ###############     Attachment also inlined for ease of reply
> ###############
> 
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index
> 9571b60f84f947851639de94501b8bccd0149727..33d162c3e00590ab96de56f
> 20380f4ae4f200849 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -64,6 +64,8 @@ extern bool arm_q_bit_access (void);
>  extern bool arm_ge_bits_access (void);
> 
>  #ifdef RTX_CODE
> +enum reg_class
> +arm_mode_base_reg_class (machine_mode);
>  extern void arm_gen_unlikely_cbranch (enum rtx_code, machine_mode
> cc_mode,
>  				      rtx label_ref);
>  extern bool arm_vector_mode_supported_p (machine_mode);
> @@ -114,6 +116,7 @@ extern bool arm_tls_referenced_p (rtx);
> 
>  extern int arm_coproc_mem_operand (rtx, bool);
>  extern int neon_vector_mem_operand (rtx, int, bool);
> +extern int mve_vector_mem_operand (machine_mode, rtx, bool);
>  extern int neon_struct_mem_operand (rtx);
> 
>  extern rtx *neon_vcmla_lane_prepare_operands (rtx *);
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index
> 0126f390abb2650e0b81cb59d55b1ce608490d4a..30e1d6dc994e18012fd2e5a
> 1bbd7c69134ee100c 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1292,11 +1292,13 @@ extern const char
> *fp_sysreg_names[NB_FP_SYSREGS];
> 
>  /* For the Thumb the high registers cannot be used as base registers
>     when addressing quantities in QI or HI mode; if we don't know the
> -   mode, then we must be conservative.  */
> +   mode, then we must be conservative. For MVE we need to load from
> +   memory to low regs based on given modes i.e [Rn], Rn <= LO_REGS.  */
>  #define MODE_BASE_REG_CLASS(MODE)				\
> -  (TARGET_32BIT ? CORE_REGS					\
> +   (TARGET_HAVE_MVE ? arm_mode_base_reg_class (MODE)		\
> +   :(TARGET_32BIT ? CORE_REGS					\
>     : GET_MODE_SIZE (MODE) >= 4 ? BASE_REGS			\
> -   : LO_REGS)
> +   : LO_REGS))
> 
>  /* For Thumb we cannot support SP+reg addressing, so we return LO_REGS
>     instead of BASE_REGS.  */
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index
> b169250918c13c6eabf55146a79081514d171571..01bc1b8ae9b72700ca5ae08
> 40ee4496fd686b623 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -8443,6 +8443,10 @@ thumb2_legitimate_address_p (machine_mode
> mode, rtx x, int strict_p)
>    bool use_ldrd;
>    enum rtx_code code = GET_CODE (x);
> 
> +  if (TARGET_HAVE_MVE
> +      && (mode == V8QImode || mode == E_V4QImode || mode ==
> V4HImode))
> +    return mve_vector_mem_operand (mode, x, strict_p);
> +
>    if (arm_address_register_rtx_p (x, strict_p))
>      return 1;
> 
> @@ -13257,6 +13261,80 @@ arm_coproc_mem_operand (rtx op, bool wb)
>    return FALSE;
>  }
> 
> +/* This function returns TRUE on matching mode and op.
> +1. For given modes, check for [Rn], return TRUE for Rn <= LO_REGS.
> +2. For other modes, check for [Rn], return TRUE for Rn < R15 (expect R13).
> */
> +int
> +mve_vector_mem_operand (machine_mode mode, rtx op, bool strict)
> +{
> +  enum rtx_code code;
> +  HOST_WIDE_INT val;
> +  int  reg_no;
> +
> +  /* Match: (mem (reg)).  */
> +  if (REG_P (op))
> +    {
> +      int reg_no = REGNO (op);
> +      return (((mode == E_V8QImode || mode == E_V4QImode || mode ==
> E_V4HImode)
> +	       ? reg_no <= LAST_LO_REGNUM
> +	       :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM))
> +	      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
> +    }
> +  code = GET_CODE (op);
> +
> +  if (code == POST_INC || code == PRE_DEC
> +      || code == PRE_INC || code == POST_DEC)
> +    {
> +      reg_no = REGNO (XEXP (op, 0));
> +      return (((mode == E_V8QImode || mode == E_V4QImode || mode ==
> E_V4HImode)
> +	       ? reg_no <= LAST_LO_REGNUM
> +	       :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM))
> +	      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
> +    }
> +  else if ((code == POST_MODIFY || code == PRE_MODIFY)
> +	   && GET_CODE (XEXP (op, 1)) == PLUS && REG_P (XEXP (XEXP (op, 1),
> 1)))
> +    {
> +      reg_no = REGNO (XEXP (op, 0));
> +      val = INTVAL (XEXP ( XEXP (op, 1), 1));
> +      switch (mode)
> +	{
> +	  case E_V16QImode:
> +	    if (abs_hwi (val))
> +	      return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
> +		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
> +	  case E_V8HImode:
> +	  case E_V8HFmode:
> +	    if (abs (val) <= 255)
> +	      return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
> +		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
> +	  case E_V8QImode:
> +	  case E_V4QImode:
> +	    if (abs_hwi (val))
> +	      return (reg_no <= LAST_LO_REGNUM
> +		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
> +	  case E_V4HImode:
> +	  case E_V4HFmode:
> +	    if (val % 2 == 0 && abs (val) <= 254)
> +	      return (reg_no <= LAST_LO_REGNUM
> +		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
> +	  case E_V4SImode:
> +	  case E_V4SFmode:
> +	    if (val % 4 == 0 && abs (val) <= 508)
> +	      return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
> +		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
> +	  case E_V2DImode:
> +	  case E_V2DFmode:
> +	  case E_TImode:
> +	    if (val % 4 == 0 && val >= 0 && val <= 1020)
> +	      return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
> +		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
> +	  default:
> +	    return FALSE;
> +	}
> +    }
> +  return FALSE;
> +}
> +
>  /* Return TRUE if OP is a memory operand which we can load or store a
> vector
>     to/from. TYPE is one of the following values:
>      0 - Vector load/stor (vldr)
> @@ -13324,15 +13402,6 @@ neon_vector_mem_operand (rtx op, int type,
> bool strict)
>        && (INTVAL (XEXP (ind, 1)) & 3) == 0)
>      return TRUE;
> 
> -  if (type == 1 && TARGET_HAVE_MVE
> -      && (GET_CODE (ind) == POST_INC || GET_CODE (ind) == PRE_DEC))
> -    {
> -      rtx ind1 = XEXP (ind, 0);
> -      if (!REG_P (ind1))
> -	return 0;
> -      return VFP_REGNO_OK_FOR_SINGLE (REGNO (ind1));
> -    }
> -
>    return FALSE;
>  }
> 
> @@ -24019,7 +24088,7 @@ arm_print_operand (FILE *stream, rtx x, int
> code)
>        }
>        return;
> 
> -    /* To print the memory operand with "Us" constraint.  Based on the
> rtx_code
> +    /* To print the memory operand with "Ux" constraint.  Based on the
> rtx_code
>         the memory operands output looks like following.
>         1. [Rn], #+/-<imm>
>         2. [Rn, #+/-<imm>]!
> @@ -33389,6 +33458,18 @@ arm_gen_far_branch (rtx * operands, int
> pos_label, const char * dest,
>    return "";
>  }
> 
> +/* If given mode matches, load from memory to LO_REGS.
> +   (i.e [Rn], Rn <= LO_REGS).  */
> +enum reg_class
> +arm_mode_base_reg_class (machine_mode mode)
> +{
> +  if (TARGET_HAVE_MVE
> +      && (mode == E_V8QImode || mode == E_V4QImode || mode ==
> E_V4HImode))
> +    return LO_REGS;
> +
> +  return MODE_BASE_REG_REG_CLASS (mode);
> +}
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
> 
>  #include "gt-arm.h"
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index
> fed6c7c84032dd8aba45142b59b980b4a6240d6d..011badc9957655a0fba6794
> 6c1db6fa6334b2bbb 100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -39,7 +39,7 @@
>  ;; in all states: Pf, Pg
> 
>  ;; The following memory constraints have been used:
> -;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf
> +;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
>  ;; in ARM state: Uq
>  ;; in Thumb state: Uu, Uw
>  ;; in all states: Q
> @@ -47,6 +47,18 @@
>  (define_register_constraint "Up" "TARGET_HAVE_MVE ? VPR_REG :
> NO_REGS"
>    "MVE VPR register")
> 
> +(define_memory_constraint "Ul"
> + "@internal
> +  In ARM/Thumb-2 state a valid address for load instruction with XEXP (op, 0)
> +  being label of the literal data item to be loaded."
> + (and (match_code "mem")
> +      (match_test "TARGET_HAVE_MVE && reload_completed
> +		   && (GET_CODE (XEXP (op, 0)) == LABEL_REF
> +		       || (GET_CODE (XEXP (op, 0)) == CONST
> +			   && GET_CODE (XEXP (XEXP (op, 0), 0)) == PLUS
> +			   && GET_CODE (XEXP (XEXP (XEXP (op, 0), 0), 0)) ==
> LABEL_REF
> +			   && CONST_INT_P (XEXP (XEXP (XEXP (op, 0), 0),
> 1))))")))
> +
>  (define_register_constraint "Uf" "TARGET_HAVE_MVE ? VFPCC_REG :
> NO_REGS"
>    "MVE FPCCR register")
> 
> @@ -467,6 +479,15 @@
>   (and (match_code "mem")
>        (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1,
> true)")))
> 
> +(define_memory_constraint "Ux"
> + "@internal
> +  In ARM/Thumb-2 state a valid address and load into CORE regs or only to
> +  LO_REGS based on mode of op."
> + (and (match_code "mem")
> +      (match_test "(TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT)
> +		   && mve_vector_mem_operand (GET_MODE (op),
> +					      XEXP (op, 0), true)")))
> +
>  (define_memory_constraint "Uq"
>   "@internal
>    In ARM state an address valid in ldrsb instructions."
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index
> f43dabbfd4f15b602f0627a9b0ea423064501e51..986fbfe2abae5f1e91e65f1ff
> 5c84709c43c4617 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -666,8 +666,8 @@
>  (define_int_iterator VSHLCQ_M [VSHLCQ_M_S VSHLCQ_M_U])
> 
>  (define_insn "*mve_mov<mode>"
> -  [(set (match_operand:MVE_types 0 "nonimmediate_operand"
> "=w,w,r,w,w,r,w,Us")
> -	(match_operand:MVE_types 1 "general_operand"
> "w,r,w,Dn,Usi,r,Dm,w"))]
> +  [(set (match_operand:MVE_types 0 "nonimmediate_operand"
> "=w,w,r,w,w,r,w,Ux,w")
> +	(match_operand:MVE_types 1 "general_operand"
> "w,r,w,Dn,Uxi,r,Dm,w,Ul"))]
>    "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
>  {
>    if (which_alternative == 3 || which_alternative == 6)
> @@ -686,6 +686,50 @@
>  	sprintf (templ, "vmov.i%d\t%%q0, %%x1  @ <mode>", width);
>        return templ;
>      }
> +
> +  if (which_alternative == 4 || which_alternative == 7)
> +    {
> +      rtx ops[2];
> +      int regno = (which_alternative == 7)
> +		  ? REGNO (operands[1]) : REGNO (operands[0]);
> +
> +      ops[0] = operands[0];
> +      ops[1] = operands[1];
> +      if (<MODE>mode == V2DFmode || <MODE>mode == V2DImode)
> +	{
> +	  if (which_alternative == 7)
> +	    {
> +	      ops[1] = gen_rtx_REG (DImode, regno);
> +	      output_asm_insn ("vstr.64\t%P1, %E0",ops);
> +	    }
> +	  else
> +	    {
> +	      ops[0] = gen_rtx_REG (DImode, regno);
> +	      output_asm_insn ("vldr.64\t%P0, %E1",ops);
> +	    }
> +	}
> +      else if (<MODE>mode == TImode)
> +	{
> +	  if (which_alternative == 7)
> +	    output_asm_insn ("vstr.64\t%q1, %E0",ops);
> +	  else
> +	    output_asm_insn ("vldr.64\t%q0, %E1",ops);
> +	}
> +      else
> +	{
> +	  if (which_alternative == 7)
> +	    {
> +	      ops[1] = gen_rtx_REG (TImode, regno);
> +	      output_asm_insn
> ("vstr<V_sz_elem1>.<V_sz_elem>\t%q1, %E0",ops);
> +	    }
> +	  else
> +	    {
> +	      ops[0] = gen_rtx_REG (TImode, regno);
> +	      output_asm_insn
> ("vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1",ops);
> +	    }
> +	}
> +      return "";
> +    }
>    switch (which_alternative)
>      {
>      case 0:
> @@ -694,26 +738,19 @@
>        return "vmov\t%e0, %Q1, %R1  @ <mode>\;vmov\t%f0, %J1, %K1";
>      case 2:
>        return "vmov\t%Q0, %R0, %e1  @ <mode>\;vmov\t%J0, %K0, %f1";
> -    case 4:
> -      if (MEM_P (operands[1])
> -	  && (GET_CODE (XEXP (operands[1], 0)) == LABEL_REF
> -	      || GET_CODE (XEXP (operands[1], 0)) == CONST))
> -	return output_move_neon (operands);
> -      else
> -	return "vldrb.8 %q0, %E1";
>      case 5:
>        return output_move_quad (operands);
> -    case 7:
> -      return "vstrb.8 %q1, %E0";
> +    case 8:
> +	return output_move_neon (operands);
>      default:
>        gcc_unreachable ();
>        return "";
>      }
>  }
> -  [(set_attr "type"
> "mve_move,mve_move,mve_move,mve_move,mve_load,multiple,mve_mov
> e,mve_store")
> -   (set_attr "length" "4,8,8,4,8,8,4,4")
> -   (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*,*")
> -   (set_attr "neg_pool_range" "*,*,*,*,996,*,*,*")])
> +  [(set_attr "type"
> "mve_move,mve_move,mve_move,mve_move,mve_load,multiple,mve_mov
> e,mve_store,mve_load")
> +   (set_attr "length" "4,8,8,4,8,8,4,4,4")
> +   (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*,*,*")
> +   (set_attr "neg_pool_range" "*,*,*,*,996,*,*,*,*")])
> 
>  (define_insn "*mve_mov<mode>"
>    [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w")
> @@ -8047,7 +8084,7 @@
>  ;; [vstrbq_s vstrbq_u]
>  ;;
>  (define_insn "mve_vstrbq_<supf><mode>"
> -  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
> +  [(set (match_operand:<MVE_B_ELEM> 0 "mve_memory_operand" "=Ux")
>  	(unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1
> "s_register_operand" "w")]
>  	 VSTRBQ))
>    ]
> @@ -8133,7 +8170,7 @@
>  ;;
>  (define_insn "mve_vldrbq_<supf><mode>"
>    [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1
> "memory_operand" "Us")]
> +	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1
> "mve_memory_operand" "Ux")]
>  	 VLDRBQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -8142,7 +8179,10 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops);
> +   if (<V_sz_elem> == 8)
> +     output_asm_insn ("vldrb.<V_sz_elem>\t%q0, %E1",ops);
> +   else
> +     output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "4")])
> @@ -8216,7 +8256,7 @@
>  ;; [vstrbq_p_s vstrbq_p_u]
>  ;;
>  (define_insn "mve_vstrbq_p_<supf><mode>"
> -  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
> +  [(set (match_operand:<MVE_B_ELEM> 0 "mve_memory_operand" "=Ux")
>  	(unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1
> "s_register_operand" "w")
>  			      (match_operand:HI 2 "vpr_register_operand"
> "Up")]
>  	 VSTRBQ))
> @@ -8227,7 +8267,7 @@
>     int regno = REGNO (operands[1]);
>     ops[1] = gen_rtx_REG (TImode, regno);
>     ops[0]  = operands[0];
> -   output_asm_insn ("vpst\n\tvstrbt.<V_sz_elem>\t%q1, %E0",ops);
> +   output_asm_insn ("vpst\;vstrbt.<V_sz_elem>\t%q1, %E0",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> @@ -8262,7 +8302,7 @@
>  ;;
>  (define_insn "mve_vldrbq_z_<supf><mode>"
>    [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1
> "memory_operand" "Us")
> +	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1
> "mve_memory_operand" "Ux")
>  		       (match_operand:HI 2 "vpr_register_operand" "Up")]
>  	 VLDRBQ))
>    ]
> @@ -8272,7 +8312,10 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vpst\n\tvldrbt.<supf><V_sz_elem>\t%q0, %E1",ops);
> +   if (<V_sz_elem> == 8)
> +     output_asm_insn ("vpst\;vldrbt.<V_sz_elem>\t%q0, %E1",ops);
> +   else
> +     output_asm_insn ("vpst\;vldrbt.<supf><V_sz_elem>\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> @@ -8303,7 +8346,7 @@
>  ;;
>  (define_insn "mve_vldrhq_fv8hf"
>    [(set (match_operand:V8HF 0 "s_register_operand" "=w")
> -	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")]
> +	(unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand"
> "Ux")]
>  	 VLDRHQ_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -8312,7 +8355,7 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vldrh.f16\t%q0, %E1",ops);
> +   output_asm_insn ("vldrh.16\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "4")])
> @@ -8414,12 +8457,11 @@
>    [(set_attr "length" "8")])
> 
>  ;;
> -;;
>  ;; [vldrhq_s, vldrhq_u]
>  ;;
>  (define_insn "mve_vldrhq_<supf><mode>"
>    [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
> -	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1
> "memory_operand" "Us")]
> +	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1
> "mve_memory_operand" "Ux")]
>  	 VLDRHQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -8428,7 +8470,10 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, %E1",ops);
> +   if (<V_sz_elem> == 16)
> +     output_asm_insn ("vldrh.16\t%q0, %E1",ops);
> +   else
> +     output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "4")])
> @@ -8438,7 +8483,7 @@
>  ;;
>  (define_insn "mve_vldrhq_z_fv8hf"
>    [(set (match_operand:V8HF 0 "s_register_operand" "=w")
> -	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
> +	(unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand"
> "Ux")
>  	(match_operand:HI 2 "vpr_register_operand" "Up")]
>  	 VLDRHQ_F))
>    ]
> @@ -8448,7 +8493,7 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vpst\n\tvldrht.f16\t%q0, %E1",ops);
> +   output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> @@ -8458,7 +8503,7 @@
>  ;;
>  (define_insn "mve_vldrhq_z_<supf><mode>"
>    [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
> -	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1
> "memory_operand" "Us")
> +	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1
> "mve_memory_operand" "Ux")
>  	(match_operand:HI 2 "vpr_register_operand" "Up")]
>  	 VLDRHQ))
>    ]
> @@ -8468,7 +8513,10 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vpst\n\tvldrht.<supf><V_sz_elem>\t%q0, %E1",ops);
> +   if (<V_sz_elem> == 16)
> +     output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops);
> +   else
> +     output_asm_insn ("vpst\;vldrht.<supf><V_sz_elem>\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> @@ -8478,7 +8526,7 @@
>  ;;
>  (define_insn "mve_vldrwq_fv4sf"
>    [(set (match_operand:V4SF 0 "s_register_operand" "=w")
> -	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")]
> +	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Ux")]
>  	 VLDRWQ_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -8487,7 +8535,7 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vldrw.f32\t%q0, %E1",ops);
> +   output_asm_insn ("vldrw.32\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "4")])
> @@ -8497,7 +8545,7 @@
>  ;;
>  (define_insn "mve_vldrwq_<supf>v4si"
>    [(set (match_operand:V4SI 0 "s_register_operand" "=w")
> -	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")]
> +	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Ux")]
>  	 VLDRWQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -8506,7 +8554,7 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vldrw.<supf>32\t%q0, %E1",ops);
> +   output_asm_insn ("vldrw.32\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "4")])
> @@ -8516,7 +8564,7 @@
>  ;;
>  (define_insn "mve_vldrwq_z_fv4sf"
>    [(set (match_operand:V4SF 0 "s_register_operand" "=w")
> -	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
> +	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Ux")
>  	(match_operand:HI 2 "vpr_register_operand" "Up")]
>  	 VLDRWQ_F))
>    ]
> @@ -8526,7 +8574,7 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vpst\n\tvldrwt.f32\t%q0, %E1",ops);
> +   output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> @@ -8536,7 +8584,7 @@
>  ;;
>  (define_insn "mve_vldrwq_z_<supf>v4si"
>    [(set (match_operand:V4SI 0 "s_register_operand" "=w")
> -	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
> +	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Ux")
>  	(match_operand:HI 2 "vpr_register_operand" "Up")]
>  	 VLDRWQ))
>    ]
> @@ -8546,14 +8594,14 @@
>     int regno = REGNO (operands[0]);
>     ops[0] = gen_rtx_REG (TImode, regno);
>     ops[1]  = operands[1];
> -   output_asm_insn ("vpst\n\tvldrwt.<supf>32\t%q0, %E1",ops);
> +   output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> 
>  (define_expand "mve_vld1q_f<mode>"
>    [(match_operand:MVE_0 0 "s_register_operand")
> -   (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "memory_operand")]
> VLD1Q_F)
> +   (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1
> "mve_memory_operand")] VLD1Q_F)
>    ]
>    "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
>  {
> @@ -8563,7 +8611,7 @@
> 
>  (define_expand "mve_vld1q_<supf><mode>"
>    [(match_operand:MVE_2 0 "s_register_operand")
> -   (unspec:MVE_2 [(match_operand:MVE_2 1 "memory_operand")] VLD1Q)
> +   (unspec:MVE_2 [(match_operand:MVE_2 1 "mve_memory_operand")]
> VLD1Q)
>    ]
>    "TARGET_HAVE_MVE"
>  {
> @@ -8991,7 +9039,7 @@
>  ;; [vstrhq_f]
>  ;;
>  (define_insn "mve_vstrhq_fv8hf"
> -  [(set (match_operand:V8HI 0 "memory_operand" "=Us")
> +  [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux")
>  	(unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")]
>  	 VSTRHQ_F))
>    ]
> @@ -9010,7 +9058,7 @@
>  ;; [vstrhq_p_f]
>  ;;
>  (define_insn "mve_vstrhq_p_fv8hf"
> -  [(set (match_operand:V8HI 0 "memory_operand" "=Us")
> +  [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux")
>  	(unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")
>  		      (match_operand:HI 2 "vpr_register_operand" "Up")]
>  	 VSTRHQ_F))
> @@ -9021,7 +9069,7 @@
>     int regno = REGNO (operands[1]);
>     ops[1] = gen_rtx_REG (TImode, regno);
>     ops[0]  = operands[0];
> -   output_asm_insn ("vpst\n\tvstrht.16\t%q1, %E0",ops);
> +   output_asm_insn ("vpst\;vstrht.16\t%q1, %E0",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> @@ -9030,7 +9078,7 @@
>  ;; [vstrhq_p_s vstrhq_p_u]
>  ;;
>  (define_insn "mve_vstrhq_p_<supf><mode>"
> -  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
> +  [(set (match_operand:<MVE_H_ELEM> 0 "mve_memory_operand" "=Ux")
>  	(unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1
> "s_register_operand" "w")
>  			      (match_operand:HI 2 "vpr_register_operand"
> "Up")]
>  	 VSTRHQ))
> @@ -9041,7 +9089,7 @@
>     int regno = REGNO (operands[1]);
>     ops[1] = gen_rtx_REG (TImode, regno);
>     ops[0]  = operands[0];
> -   output_asm_insn ("vpst\n\tvstrht.<V_sz_elem>\t%q1, %E0",ops);
> +   output_asm_insn ("vpst\;vstrht.<V_sz_elem>\t%q1, %E0",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> @@ -9093,7 +9141,7 @@
>  ;; [vstrhq_scatter_shifted_offset_p_s vstrhq_scatter_shifted_offset_p_u]
>  ;;
>  (define_insn "mve_vstrhq_scatter_shifted_offset_p_<supf><mode>"
> -  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
> +  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Ux")
>  	(unspec:<MVE_H_ELEM>
>  		[(match_operand:MVE_6 1 "s_register_operand" "w")
>  		 (match_operand:MVE_6 2 "s_register_operand" "w")
> @@ -9136,7 +9184,7 @@
>  ;; [vstrhq_s, vstrhq_u]
>  ;;
>  (define_insn "mve_vstrhq_<supf><mode>"
> -  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
> +  [(set (match_operand:<MVE_H_ELEM> 0 "mve_memory_operand" "=Ux")
>  	(unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1
> "s_register_operand" "w")]
>  	 VSTRHQ))
>    ]
> @@ -9155,7 +9203,7 @@
>  ;; [vstrwq_f]
>  ;;
>  (define_insn "mve_vstrwq_fv4sf"
> -  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
> +  [(set (match_operand:V4SI 0 "memory_operand" "=Ux")
>  	(unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")]
>  	 VSTRWQ_F))
>    ]
> @@ -9174,7 +9222,7 @@
>  ;; [vstrwq_p_f]
>  ;;
>  (define_insn "mve_vstrwq_p_fv4sf"
> -  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
> +  [(set (match_operand:V4SI 0 "memory_operand" "=Ux")
>  	(unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")
>  		      (match_operand:HI 2 "vpr_register_operand" "Up")]
>  	 VSTRWQ_F))
> @@ -9185,7 +9233,7 @@
>     int regno = REGNO (operands[1]);
>     ops[1] = gen_rtx_REG (TImode, regno);
>     ops[0]  = operands[0];
> -   output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops);
> +   output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> @@ -9194,7 +9242,7 @@
>  ;; [vstrwq_p_s vstrwq_p_u]
>  ;;
>  (define_insn "mve_vstrwq_p_<supf>v4si"
> -  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
> +  [(set (match_operand:V4SI 0 "memory_operand" "=Ux")
>  	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
>  		      (match_operand:HI 2 "vpr_register_operand" "Up")]
>  	 VSTRWQ))
> @@ -9205,7 +9253,7 @@
>     int regno = REGNO (operands[1]);
>     ops[1] = gen_rtx_REG (TImode, regno);
>     ops[0]  = operands[0];
> -   output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops);
> +   output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops);
>     return "";
>  }
>    [(set_attr "length" "8")])
> @@ -9214,7 +9262,7 @@
>  ;; [vstrwq_s vstrwq_u]
>  ;;
>  (define_insn "mve_vstrwq_<supf>v4si"
> -  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
> +  [(set (match_operand:V4SI 0 "memory_operand" "=Ux")
>  	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")]
>  	 VSTRWQ))
>    ]
> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
> index
> 009862e012c9ce3bbe446a89aacb750f47be66f0..c57ad73577e1eebebc8951e
> d5b4fb544dd3381f8 100644
> --- a/gcc/config/arm/predicates.md
> +++ b/gcc/config/arm/predicates.md
> @@ -31,6 +31,12 @@
>  	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
>  })
> 
> +(define_predicate "mve_memory_operand"
> +  (and (match_code "mem")
> +       (match_test "TARGET_32BIT
> +		    && mve_vector_mem_operand (GET_MODE (op), XEXP (op,
> 0),
> +					       false)")))
> +
>  ;; True for immediates in the range of 1 to 16 for MVE.
>  (define_predicate "mve_imm_16"
>    (match_test "satisfies_constraint_Rd (op)"))
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
> index
> e3cf8f8207d603243eae22be9a90bbb1e8a73a58..35f83c6b298aaf2b80937131
> 59b32de17ff96bd2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
> @@ -11,10 +11,6 @@ foo32 ()
>    return b;
>  }
> 
> -/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
> -/* { dg-final { scan-assembler "vstrb.*" }  } */
> -/* { dg-final { scan-assembler "vldr.64*" }  } */
> -
>  float16x8_t
>  foo16 ()
>  {
> @@ -22,6 +18,9 @@ foo16 ()
>    return b;
>  }
> 
> -/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
> -/* { dg-final { scan-assembler "vstrb.*" }  } */
> -/* { dg-final { scan-assembler "vldr.64.*" }  } */
> +/* { dg-final { scan-assembler-times "vmov\\tq\[0-7\], q\[0-7\]" 2 } } */
> +/* { dg-final { scan-assembler-times "vstrw.32*" 1 } } */
> +/* { dg-final { scan-assembler-times "vstrh.16*" 1 } } */
> +/* { dg-final { scan-assembler-times "vldrw.32*" 1 } } */
> +/* { dg-final { scan-assembler-times "vldrh.16*" 1 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..15656ed8c3c8c3ab95bbb5d
> e59dafdab864b28db
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c
> @@ -0,0 +1,61 @@
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O2" } */
> +
> +#include "arm_mve.h"
> +void
> +foo (uint16_t row_x_col, int8_t *out)
> +{
> +  for (;;)
> +   {
> +     int32x4_t out_3;
> +     int8_t *rhs_0;
> +     int8_t *lhs_3;
> +     int i_row_x_col;
> +     for (;i_row_x_col < row_x_col; i_row_x_col++)
> +      {
> +	int32x4_t ker_0 = vldrbq_s32(rhs_0);
> +	int32x4_t ip_3 = vldrbq_s32(lhs_3);
> +	out_3 = vmulq_s32(ip_3, ker_0);
> +      }
> +     vstrbq_s32(out, out_3);
> +   }
> +}
> +
> +void
> +foo1 (uint16_t row_x_col, int8_t *out)
> +{
> +  for (;;)
> +   {
> +     int16x8_t out_3;
> +     int8_t *rhs_0;
> +     int8_t *lhs_3;
> +     int i_row_x_col;
> +     for (; i_row_x_col < row_x_col; i_row_x_col++)
> +      {
> +	int16x8_t ker_0 = vldrbq_s16(rhs_0);
> +	int16x8_t ip_3 = vldrbq_s16(lhs_3);
> +	out_3 = vmulq_s16(ip_3, ker_0);
> +      }
> +     vstrbq_s16(out, out_3);
> +   }
> +}
> +
> +void
> +foo2 (uint16_t row_x_col, int16_t *out)
> +{
> +  for (;;)
> +   {
> +     int32x4_t out_3;
> +     int16_t *rhs_0;
> +     int16_t *lhs_3;
> +     int i_row_x_col;
> +     for (; i_row_x_col < row_x_col; i_row_x_col++)
> +      {
> +	int32x4_t ker_0 = vldrhq_s32(rhs_0);
> +	int32x4_t ip_3 = vldrhq_s32(lhs_3);
> +	out_3 = vmulq_s32(ip_3, ker_0);
> +      }
> +     vstrhq_s32(out, out_3);
> +   }
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..ae640837d14f41cc617ac56c
> 57ca120be615ac31
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c
> @@ -0,0 +1,73 @@
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O2" } */
> +
> +#include "arm_mve.h"
> +void
> +foo (uint16_t row_len, const int32_t *bias, int8_t *out)
> +{
> +  int i_out_ch;
> +  for (;;)
> +   {
> +     int8_t *ip_c3;
> +     int32_t acc_3;
> +     int32_t row_loop_cnt = row_len;
> +     int32x4_t res = {acc_3};
> +     uint32x4_t scatter_offset;
> +     int i_row_loop;
> +     for (; i_row_loop < row_loop_cnt; i_row_loop++)
> +      {
> +	mve_pred16_t p;
> +	int16x8_t r0;
> +	int16x8_t c3 = vldrbq_z_s16(ip_c3, p);
> +	acc_3 = vmladavaq_p_s16(acc_3, r0, c3, p);
> +      }
> +     vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res);
> +   }
> +}
> +
> +void
> +foo1 (uint16_t row_len, const int32_t *bias, int8_t *out)
> +{
> +  int i_out_ch;
> +  for (;;)
> +   {
> +     int8_t *ip_c3;
> +     int32_t acc_3;
> +     int32_t row_loop_cnt = row_len;
> +     int i_row_loop;
> +     int32x4_t res = {acc_3};
> +     uint32x4_t scatter_offset;
> +     for (; i_row_loop < row_loop_cnt; i_row_loop++)
> +      {
> +	mve_pred16_t p;
> +	int32x4_t r0;
> +	int32x4_t c3 = vldrbq_z_s32(ip_c3, p);
> +	acc_3 = vmladavaq_p_s32(acc_3, r0, c3, p);
> +      }
> +     vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res);
> +   }
> +}
> +
> +void
> +foo2 (uint16_t row_len, const int32_t *bias, int8_t *out)
> +{
> +  int i_out_ch;
> +  for (;;)
> +   {
> +     int16_t *ip_c3;
> +     int32_t acc_3;
> +     int32_t row_loop_cnt = row_len;
> +     int i_row_loop;
> +     int32x4_t res = {acc_3};
> +     uint32x4_t scatter_offset;
> +     for (; i_row_loop < row_loop_cnt; i_row_loop++)
> +      {
> +	mve_pred16_t p;
> +	int32x4_t r0;
> +	int32x4_t c3 = vldrhq_z_s32(ip_c3, p);
> +	acc_3 = vmladavaq_p_s32(acc_3, r0, c3, p);
> +      }
> +     vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res);
> +   }
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..dd785f28bc02beae828a648
> 6fdcf3a374829ac0d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c
> @@ -0,0 +1,43 @@
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O2" } */
> +
> +#include "arm_mve.h"
> +void
> +foo (const int32_t *output_bias, int8_t *out, uint16_t num_ch)
> +{
> +  int32_t loop_count = num_ch;
> +  const int32_t *bias = output_bias;
> +  int i_loop_cnt;
> +  for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++)
> +   {
> +     int32x4_t out_0 = vldrwq_s32(bias);
> +     vstrbq_s32(out, out_0);
> +   }
> +}
> +
> +void
> +foo1 (const int16_t *output_bias, int8_t *out, uint16_t num_ch)
> +{
> +  int32_t loop_count = num_ch;
> +  const int16_t *bias = output_bias;
> +  int i_loop_cnt;
> +  for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++)
> +   {
> +     int16x8_t out_0 = vldrhq_s16(bias);
> +     vstrbq_s16(out, out_0);
> +   }
> +}
> +
> +void
> +foo2 (const int32_t *output_bias, int16_t *out, uint16_t num_ch)
> +{
> +  int32_t loop_count = num_ch;
> +  const int32_t *bias = output_bias;
> +  int i_loop_cnt;
> +  for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++)
> +   {
> +     int32x4_t out_0 = vldrwq_s32(bias);
> +     vstrhq_s32(out, out_0);
> +   }
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..8b222f1be0a95031189e792
> bf9afa22411fa867a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c
> @@ -0,0 +1,42 @@
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O2" } */
> +
> +#include "arm_mve.h"
> +void
> +foo1 (int8_t *x, int32_t * i1)
> +{
> +  mve_pred16_t p;
> +  int32x4_t x_0;
> +  int32_t * bias1 = i1;
> +  for (;; x++)
> +  {
> +    x_0 = vldrwq_s32(bias1);
> +    vstrbq_p_s32(x, x_0, p);
> +  }
> +}
> +void
> +foo2 (int8_t *x, int16_t * i1)
> +{
> +  mve_pred16_t p;
> +  int16x8_t x_0;
> +  int16_t * bias1 = i1;
> +  for (;; x++)
> +  {
> +    x_0 = vldrhq_s16(bias1);
> +    vstrbq_p_s16(x, x_0, p);
> +  }
> +}
> +
> +void
> +foo3 (int16_t *x, int32_t * i1)
> +{
> +  mve_pred16_t p;
> +  int32x4_t x_0;
> +  int32_t * bias1 = i1;
> +  for (;; x++)
> +  {
> +    x_0 = vldrwq_s32(bias1);
> +    vstrhq_p_s32(x, x_0, p);
> +  }
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c
> index
> 5e42f634412309411e4a6257cc3042a9ab280e06..699e40d0e3b503f6c02abaa
> 3f4f976343081f108 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c
> @@ -10,12 +10,11 @@ foo (float16_t const * base)
>    return vld1q_f16 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.f16"  }  } */
> -
>  float16x8_t
>  foo1 (float16_t const * base)
>  {
>    return vld1q (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.f16"  }  } */
> +/* { dg-final { scan-assembler-times "vldrh.16" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c
> index
> 99d1a7a9c5e66b4ae99d5184756bb65b8bc5e852..865923033629c273b1a31f5
> 7c0589e0ab1e6fc24 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c
> @@ -10,12 +10,11 @@ foo (float32_t const * base)
>    return vld1q_f32 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrw.f32"  }  } */
> -
>  float32x4_t
>  foo1 (float32_t const * base)
>  {
>    return vld1q (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrw.f32"  }  } */
> +/* { dg-final { scan-assembler-times "vldrw.32" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c
> index
> d77f98ea8893959adfbc2688645d0d36dd826816..f4f04f534db63c5b77927d8e
> 2ea967bb705012cc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c
> @@ -10,12 +10,11 @@ foo (int16_t const * base)
>    return vld1q_s16 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.s16"  }  } */
> -
>  int16x8_t
>  foo1 (int16_t const * base)
>  {
>    return vld1q (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.s16"  }  } */
> +/* { dg-final { scan-assembler-times "vldrh.16" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c
> index
> 9a7f024f735f1715d6e577aaf08e217b52ad66e7..e0f661667515f3d3e94cd052
> b4bbdef9c33c06dc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c
> @@ -10,12 +10,11 @@ foo (int32_t const * base)
>    return vld1q_s32 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrw.s32"  }  } */
> -
>  int32x4_t
>  foo1 (int32_t const * base)
>  {
>    return vld1q (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrw.s32"  }  } */
> +/* { dg-final { scan-assembler-times "vldrw.32" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c
> index
> 9c67bb60110081c1ed6c65f6986bbc08b0e2a691..1b7edead6b1a5489f2c668a
> 69136b5fed463c703 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c
> @@ -10,12 +10,11 @@ foo (int8_t const * base)
>    return vld1q_s8 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.s8"  }  } */
> -
>  int8x16_t
>  foo1 (int8_t const * base)
>  {
>    return vld1q (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.s8"  }  } */
> +/* { dg-final { scan-assembler-times "vldrb.8" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c
> index
> 2bef21a5a1dcaf265c052ddb689df9b12d4419ae..50e1f5cedcbe42d7f6325535
> 9795007bfe5ffc0e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c
> @@ -10,12 +10,11 @@ foo (uint16_t const * base)
>    return vld1q_u16 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.u16"  }  } */
> -
>  uint16x8_t
>  foo1 (uint16_t const * base)
>  {
>    return vld1q (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.u16"  }  } */
> +/* { dg-final { scan-assembler-times "vldrh.16" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c
> index
> 01a1dd611ed68281e32e8719e11137a9b5626398..a13fe824382f825a32f865fc
> 5937712a2f278faf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c
> @@ -10,12 +10,11 @@ foo (uint32_t const * base)
>    return vld1q_u32 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrw.u32"  }  } */
> -
>  uint32x4_t
>  foo1 (uint32_t const * base)
>  {
>    return vld1q (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrw.u32"  }  } */
> +/* { dg-final { scan-assembler-times "vldrw.32" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c
> index
> 997bc1b212d228668b7a6f36a615168a52ac1af0..dfd1deb93f0f485fb2491a3b
> 21821c284c0da437 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c
> @@ -10,12 +10,11 @@ foo (uint8_t const * base)
>    return vld1q_u8 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.u8"  }  } */
> -
>  uint8x16_t
>  foo1 (uint8_t const * base)
>  {
>    return vld1q (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.u8"  }  } */
> +/* { dg-final { scan-assembler-times "vldrb.8" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c
> index
> ea5593a9dd19d682089021902e7bf283bb54041f..3c32e408e420e2d393b5abc
> c96bd59e5d048ec34 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c
> @@ -10,12 +10,12 @@ foo (float16_t const * base, mve_pred16_t p)
>    return vld1q_z_f16 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.f16"  }  } */
> -
>  float16x8_t
>  foo1 (float16_t const * base, mve_pred16_t p)
>  {
>    return vld1q_z (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.f16"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 2 }  } */
> +/* { dg-final { scan-assembler-times "vldrht.16" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c
> index
> 28937cd18aa9692d357cf553a71e81f78e184dc5..3fc935c889bea0fec7858e034
> 002b4a521afab65 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c
> @@ -10,12 +10,12 @@ foo (float32_t const * base, mve_pred16_t p)
>    return vld1q_z_f32 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrwt.f32"  }  } */
> -
>  float32x4_t
>  foo1 (float32_t const * base, mve_pred16_t p)
>  {
>    return vld1q_z (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrwt.f32"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 2 }  } */
> +/* { dg-final { scan-assembler-times "vldrwt.32" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c
> index
> 81a1c439d6e034341a08ea28050f7bab35237808..49cc81092f359249c5178332
> c1ca6e18076eabdb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c
> @@ -10,12 +10,12 @@ foo (int16_t const * base, mve_pred16_t p)
>    return vld1q_z_s16 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.s16"  }  } */
> -
>  int16x8_t
>  foo1 (int16_t const * base, mve_pred16_t p)
>  {
>    return vld1q_z (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.s16"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 2 }  } */
> +/* { dg-final { scan-assembler-times "vldrht.16" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c
> index
> d03ab345f1920f03e6a609bb330b514eae81779c..ec317cd70e8f5cb2a5f83bbd
> caf90b18ae148615 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c
> @@ -10,12 +10,12 @@ foo (int32_t const * base, mve_pred16_t p)
>    return vld1q_z_s32 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrwt.s32"  }  } */
> -
>  int32x4_t
>  foo1 (int32_t const * base, mve_pred16_t p)
>  {
>    return vld1q_z (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrwt.s32"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 2 }  } */
> +/* { dg-final { scan-assembler-times "vldrwt.32" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c
> index
> e535662c7d04437843b9c6aee516ba7a0ceaa214..538c140e78e8d858fe1b42d
> 73ca06ad774f3f4da 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c
> @@ -10,12 +10,12 @@ foo (int8_t const * base, mve_pred16_t p)
>    return vld1q_z_s8 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
> -
>  int8x16_t
>  foo1 (int8_t const * base, mve_pred16_t p)
>  {
>    return vld1q_z (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 2 }  } */
> +/* { dg-final { scan-assembler-times "vldrbt.8" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c
> index
> 3f20f4ed9ca6a74fbbce1f471d0a49d89075f1ad..e5e588a187e9524a76d9d9b3
> a2c799338989d7f6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c
> @@ -10,12 +10,12 @@ foo (uint16_t const * base, mve_pred16_t p)
>    return vld1q_z_u16 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.u16"  }  } */
> -
>  uint16x8_t
>  foo1 (uint16_t const * base, mve_pred16_t p)
>  {
>    return vld1q_z (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.u16"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 2 }  } */
> +/* { dg-final { scan-assembler-times "vldrht.16" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c
> index
> 1d3b53e38e8bc3123b85f89570d8656988b9b278..999beefa7e8669dc15b70f4
> 841adfbc08d018622 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c
> @@ -10,12 +10,12 @@ foo (uint32_t const * base, mve_pred16_t p)
>    return vld1q_z_u32 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
> -
>  uint32x4_t
>  foo1 (uint32_t const * base, mve_pred16_t p)
>  {
>    return vld1q_z (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 2 }  } */
> +/* { dg-final { scan-assembler-times "vldrwt.32" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c
> index
> 47d3f6fa4c70773f7a4c549dcf8a3b884cebab92..172053c71422f5daad1555932
> c9af84deee0c8d9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c
> @@ -10,12 +10,12 @@ foo (uint8_t const * base, mve_pred16_t p)
>    return vld1q_z_u8 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
> -
>  uint8x16_t
>  foo1 (uint8_t const * base, mve_pred16_t p)
>  {
>    return vld1q_z (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 2 }  } */
> +/* { dg-final { scan-assembler-times "vldrbt.8" 2 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
> index
> 886491f005284596099eeec4d90e67f82b5e967f..ec2f2176ccfebbef00447aed1
> 53069ce3be9491c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
> @@ -10,4 +10,5 @@ foo (int8_t const * base)
>    return vldrbq_s8 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.s8"  }  } */
> +/* { dg-final { scan-assembler-times "vldrb.8" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
> index
> e58120a2b64d6a9f0efaf80a5a47d9c80f5f4d34..d07b472a4ffe79d4615ae2a1e
> 15606a34b9ac765 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
> @@ -10,4 +10,5 @@ foo (uint8_t const * base)
>    return vldrbq_u8 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.u8"  }  } */
> +/* { dg-final { scan-assembler-times "vldrb.8" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
> index
> 7d66c704516b5db6cbd3f36645e6a44031201bf1..aed3c9100638a2e86b91b3b
> f42040b4b728fd725 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
> @@ -10,4 +10,6 @@ foo (int8_t const * base, mve_pred16_t p)
>    return vldrbq_z_s8 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrbt.8" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
> index
> 05ae2628d5645ffa3bfae83867f95151c887ef75..54c61e744543d5bba03dfdd2
> 3e1ceb1e8f398a1a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
> @@ -10,4 +10,6 @@ foo (uint8_t const * base, mve_pred16_t p)
>    return vldrbq_z_u8 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrbt.8" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
> index
> 0d1ee769ec64b55c7559ce9dc14f8a6ae2e43e34..7420d0198e7450f566644a74
> bac925170b49d688 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
> @@ -10,6 +10,7 @@ foo (uint64x2_t * addr)
>    return vldrdq_gather_base_wb_s64 (addr, 8);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
>  /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */
> +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.
> c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.
> c
> index
> cb2a41bdcd32b553a93d3bcc4787d506f1b54f74..ebe5b2fd70c7e9c1ebe6e2ee
> db185975afea36b9 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.
> c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.
> c
> @@ -10,6 +10,7 @@ foo (uint64x2_t * addr)
>    return vldrdq_gather_base_wb_u64 (addr, 8);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
>  /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */
> +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6
> 4.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6
> 4.c
> index
> 243fbeacc3429025202da2ff157ade38a472e123..231a24a1e5550b444c5476bf
> c0d1f6802a4952c8 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6
> 4.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6
> 4.c
> @@ -8,8 +8,8 @@ int64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
>      return vldrdq_gather_base_wb_z_s64 (addr, 1016, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*$" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */
> +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6
> 4.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6
> 4.c
> index
> 10ba42405fe8fde9d4f8993b20e41a59c7bb2e77..b8d9b5c139150536721b1a6
> 6636ce9b5a86bf093 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6
> 4.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6
> 4.c
> @@ -8,8 +8,8 @@ uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
>      return vldrdq_gather_base_wb_z_u64 (addr, 8, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-times "vldr.64" 1 } } */
> +/* { dg-final { scan-assembler-times "vstr.64" 1 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c
> index
> b79c0e9bfe49e99557ebfb14bc03f8aaf40ab925..05bef418d822a7a994002f40
> 73b65178e42346a2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c
> @@ -10,4 +10,5 @@ foo (float16_t const * base)
>    return vldrhq_f16 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.f16"  }  } */
> +/* { dg-final { scan-assembler-times "vldrh.16" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c
> index
> 4872eb555f3e7b8b7ec86b8ba377f0a0452b1aa0..7c977b6a6995f11cd03380b
> a11585452bdddbe89 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c
> @@ -10,4 +10,5 @@ foo (int16_t const * base)
>    return vldrhq_s16 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.s16"  }  } */
> +/* { dg-final { scan-assembler-times "vldrh.16" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c
> index
> e73e208c26a63fb7caaa68f4d5a52f1fa1c904fa..229b52163faa2566e17f13209
> 40a066313d2c853 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c
> @@ -10,4 +10,5 @@ foo (int16_t const * base)
>    return vldrhq_s32 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.s32"  }  } */
> +/* { dg-final { scan-assembler-times "vldrh.s32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c
> index
> 6b285d45aaa158c01f3e043d2e71214ed824f79a..07f6d9e3944a976886b35f3c
> 7a042046f3c7498a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c
> @@ -10,4 +10,5 @@ foo (uint16_t const * base)
>    return vldrhq_u16 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.u16"  }  } */
> +/* { dg-final { scan-assembler-times "vldrh.16" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c
> index
> 994cd4a20badd807c8be54aa45f788e8b9420fd9..cd24f01831f77d1da50ca624
> a7b6c800a9b616fd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c
> @@ -10,4 +10,5 @@ foo (uint16_t const * base)
>    return vldrhq_u32 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrh.u32"  }  } */
> +/* { dg-final { scan-assembler-times "vldrh.u32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c
> index
> 2b866a99dd4c8a213887fe120cfe6dec35f84f87..dd0fc9c7b733114f6e229f781
> 55afe53cca675b7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c
> @@ -10,4 +10,6 @@ foo (float16_t const * base, mve_pred16_t p)
>    return vldrhq_z_f16 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.f16"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrht.16" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c
> index
> 6c92c50ba12a502713ee7d2e7cf719edf848fe9c..36d3458d95c91d13631e511a
> c1294e544791336a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c
> @@ -10,4 +10,6 @@ foo (int16_t const * base, mve_pred16_t p)
>    return vldrhq_z_s16 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.s16"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrht.16" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c
> index
> 4cd97ba5743ef1dcd8dc368ef75dc6df2391f69c..9c67b479be79c8377682a448
> 8d365cd853df7a2c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c
> @@ -10,4 +10,6 @@ foo (int16_t const * base, mve_pred16_t p)
>    return vldrhq_z_s32 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.s32"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrht.s32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c
> index
> 80ae0e5cd17fe158b152873d144e8b1217ad8e33..26354b5971aca3c9f003559
> d5261866a019a70ef 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c
> @@ -10,4 +10,6 @@ foo (uint16_t const * base, mve_pred16_t p)
>    return vldrhq_z_u16 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.u16"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrht.16" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c
> index
> 1a8590116eb017d9744d62b1dd7d07f539b390f0..948fe5ee5b46701ce6a7e80
> d4b6a3d54690d921f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c
> @@ -10,4 +10,6 @@ foo (uint16_t const * base, mve_pred16_t p)
>    return vldrhq_z_u32 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrht.u32"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrht.u32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c
> index
> 2c834ae53df93b97f7d7f9600fc4eba7d6c3400d..143079aa23fe8a45c381e33e2
> 0adbd4bb91a539c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c
> @@ -10,4 +10,5 @@ foo (float32_t const * base)
>    return vldrwq_f32 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrw.f32"  }  } */
> +/* { dg-final { scan-assembler-times "vldrw.32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.
> c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.
> c
> index
> db8108e37325c4e1fafd2293d48eba0c33309073..8e2994f75d7d488e968dd9c
> d4847900d2438475a 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.
> c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.
> c
> @@ -10,6 +10,7 @@ foo (uint32x4_t * addr)
>    return vldrwq_gather_base_wb_f32 (addr, 8);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
>  /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.
> c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.
> c
> index
> 3da64e218e2c0789e996be551650033567eba4e5..e5054738b75ec7378a6a289
> e9c071721f9a6a4d0 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.
> c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.
> c
> @@ -10,6 +10,7 @@ foo (uint32x4_t * addr)
>    return vldrwq_gather_base_wb_s32 (addr, 8);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
>  /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.
> c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.
> c
> index
> 2597ee11608bfe21d697f2250bee7e69c0cc7aec..7f39414143bdfb3bcbc059dc
> dcba0472c0a63459 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.
> c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.
> c
> @@ -10,6 +10,7 @@ foo (uint32x4_t * addr)
>    return vldrwq_gather_base_wb_u32 (addr, 8);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
>  /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> index
> 9fb47daf486fafdb897618453958e776a069d432..f3219e2e8254f542916b1fdc6
> d633e5512c08cfe 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> @@ -10,8 +10,9 @@ foo (uint32x4_t * addr, mve_pred16_t p)
>    return vldrwq_gather_base_wb_z_f32 (addr, 8, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
>  /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> index
> 56da5a46c64d2946ceade8689105048e19efdc6a..4d093d243fe63e3f98cffaf15
> fcb41fa4611b41e 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> @@ -10,8 +10,9 @@ foo (uint32x4_t * addr, mve_pred16_t p)
>    return vldrwq_gather_base_wb_z_s32 (addr, 8, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
>  /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3
> 2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u
> 32.c
> index
> 63165d97c1a7b4120be036348a09b73afddd36d1..e796522a49c6c1929f2f64e
> e27e36eda9a1a95d3 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3
> 2.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u
> 32.c
> @@ -10,8 +10,9 @@ foo (uint32x4_t * addr, mve_pred16_t p)
>    return vldrwq_gather_base_wb_z_u32 (addr, 8, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
>  /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
> -/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c
> index
> f48c29f8bff5f0b57802d1673c433b70311f8fc0..860dd324d256511a5802a0970
> 19f1a9a7cd52e9b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c
> @@ -10,4 +10,5 @@ foo (int32_t const * base)
>    return vldrwq_s32 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrw.s32"  }  } */
> +/* { dg-final { scan-assembler-times "vldrw.32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c
> index
> 7c722200ecc5c642d6b8e3e0be69601a325b7f53..513ed49fb6eb7a88a51df58a
> 521ff0669af89ad1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c
> @@ -10,4 +10,5 @@ foo (uint32_t const * base)
>    return vldrwq_u32 (base);
>  }
> 
> -/* { dg-final { scan-assembler "vldrw.u32"  }  } */
> +/* { dg-final { scan-assembler-times "vldrw.32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c
> index
> bcdcecab46875864125c4232a75931faf0bcb54f..3e0a6a60bcf4374ec09f33600
> 1b04e6fda524913 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c
> @@ -10,4 +10,6 @@ foo (float32_t const * base, mve_pred16_t p)
>    return vldrwq_z_f32 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrwt.f32"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrwt.32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c
> index
> fd32b30565627078561f6f04214a15a9a1643a68..82b914885b55d7b7d076500
> 726ac9e174f8c0ece 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c
> @@ -10,4 +10,6 @@ foo (int32_t const * base, mve_pred16_t p)
>    return vldrwq_z_s32 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrwt.s32"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrwt.32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c
> index
> f49440438348582745eabd4589bef40ee07f8deb..6a66e1678815b7b4984ed01
> 1d108bc48ab44c963 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c
> @@ -10,4 +10,6 @@ foo (uint32_t const * base, mve_pred16_t p)
>    return vldrwq_z_u32 (base, p);
>  }
> 
> -/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
> +/* { dg-final { scan-assembler-times "vpst" 1 }  } */
> +/* { dg-final { scan-assembler-times "vldrwt.32" 1 }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c
> index
> 52bad05b6219621ada414dc74ab2deebdd1c93e3..739f282c476f2611245a20d
> fc0d121eba289a788 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c
> @@ -1,6 +1,6 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
> -/* { dg-additional-options "-O0" } */
> +/* { dg-additional-options "-O2" } */
> 
>  #include "arm_mve.h"
> 
> @@ -14,4 +14,6 @@ foo ()
>    fb = vuninitializedq_f32 ();
>  }
> 
> -/* { dg-final { scan-assembler-times "vstrb.8" 4 } } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 1 } } */
> +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c
> index
> c6724a52074c6ce0361fdba66c4add831e8c13db..a9130607f26915af39e41d9f
> 1181131bcbd1ef32 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c
> @@ -1,6 +1,6 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
> -/* { dg-additional-options "-O0" } */
> +/* { dg-additional-options "-O2" } */
> 
>  #include "arm_mve.h"
> 
> @@ -14,4 +14,6 @@ foo ()
>    fb = vuninitializedq (fbb);
>  }
> 
> -/* { dg-final { scan-assembler-times "vstrb.8" 6 } } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 1 } } */
> +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c
> index
> 13a0109a9b5380cd83f48154df231081ddb8f08e..bf6692fe57322ac9ed5c949a
> 9697d3ed7a565acc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c
> @@ -1,6 +1,6 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
> -/* { dg-additional-options "-O0" } */
> +/* { dg-additional-options "-O2" } */
> 
>  #include "arm_mve.h"
>  int8x16_t a;
> @@ -25,4 +25,8 @@ foo ()
>    ud = vuninitializedq_u64 ();
>  }
> 
> -/* { dg-final { scan-assembler-times "vstrb.8" 16 } } */
> +/* { dg-final { scan-assembler-times "vstrb.8" 2 } } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 } } */
> +/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */
> +/* { dg-final { scan-assembler-times "vstr.64" 2 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c
> index
> a321398709e65ee7daadfab9c6089116baccde83..4f66a07ac29030482a2643e1
> 0907d0dae24743af 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c
> @@ -1,6 +1,6 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
> -/* { dg-additional-options "-O0" } */
> +/* { dg-additional-options "-O2" } */
> 
>  #include "arm_mve.h"
> 
> @@ -26,4 +26,8 @@ foo ()
>    ud = vuninitializedq (udd);
>  }
> 
> -/* { dg-final { scan-assembler-times "vstrb.8" 24 } } */
> +/* { dg-final { scan-assembler-times "vstrb.8" 2 } } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 } } */
> +/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */
> +/* { dg-final { scan-assembler-times "vstr.64" 2 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff mbox series

Patch

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 9571b60f84f947851639de94501b8bccd0149727..33d162c3e00590ab96de56f20380f4ae4f200849 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -64,6 +64,8 @@  extern bool arm_q_bit_access (void);
 extern bool arm_ge_bits_access (void);
 
 #ifdef RTX_CODE
+enum reg_class
+arm_mode_base_reg_class (machine_mode);
 extern void arm_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
 				      rtx label_ref);
 extern bool arm_vector_mode_supported_p (machine_mode);
@@ -114,6 +116,7 @@  extern bool arm_tls_referenced_p (rtx);
 
 extern int arm_coproc_mem_operand (rtx, bool);
 extern int neon_vector_mem_operand (rtx, int, bool);
+extern int mve_vector_mem_operand (machine_mode, rtx, bool);
 extern int neon_struct_mem_operand (rtx);
 
 extern rtx *neon_vcmla_lane_prepare_operands (rtx *);
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 0126f390abb2650e0b81cb59d55b1ce608490d4a..30e1d6dc994e18012fd2e5a1bbd7c69134ee100c 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1292,11 +1292,13 @@  extern const char *fp_sysreg_names[NB_FP_SYSREGS];
 
 /* For the Thumb the high registers cannot be used as base registers
    when addressing quantities in QI or HI mode; if we don't know the
-   mode, then we must be conservative.  */
+   mode, then we must be conservative. For MVE we need to load from
+   memory to low regs based on given modes i.e [Rn], Rn <= LO_REGS.  */
 #define MODE_BASE_REG_CLASS(MODE)				\
-  (TARGET_32BIT ? CORE_REGS					\
+   (TARGET_HAVE_MVE ? arm_mode_base_reg_class (MODE)		\
+   :(TARGET_32BIT ? CORE_REGS					\
    : GET_MODE_SIZE (MODE) >= 4 ? BASE_REGS			\
-   : LO_REGS)
+   : LO_REGS))
 
 /* For Thumb we cannot support SP+reg addressing, so we return LO_REGS
    instead of BASE_REGS.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b169250918c13c6eabf55146a79081514d171571..01bc1b8ae9b72700ca5ae0840ee4496fd686b623 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8443,6 +8443,10 @@  thumb2_legitimate_address_p (machine_mode mode, rtx x, int strict_p)
   bool use_ldrd;
   enum rtx_code code = GET_CODE (x);
 
+  if (TARGET_HAVE_MVE
+      && (mode == V8QImode || mode == E_V4QImode || mode == V4HImode))
+    return mve_vector_mem_operand (mode, x, strict_p);
+
   if (arm_address_register_rtx_p (x, strict_p))
     return 1;
 
@@ -13257,6 +13261,80 @@  arm_coproc_mem_operand (rtx op, bool wb)
   return FALSE;
 }
 
+/* This function returns TRUE on matching mode and op.
+1. For given modes, check for [Rn], return TRUE for Rn <= LO_REGS.
+2. For other modes, check for [Rn], return TRUE for Rn < R15 (expect R13).  */
+int
+mve_vector_mem_operand (machine_mode mode, rtx op, bool strict)
+{
+  enum rtx_code code;
+  HOST_WIDE_INT val;
+  int  reg_no;
+
+  /* Match: (mem (reg)).  */
+  if (REG_P (op))
+    {
+      int reg_no = REGNO (op);
+      return (((mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode)
+	       ? reg_no <= LAST_LO_REGNUM
+	       :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM))
+	      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+    }
+  code = GET_CODE (op);
+
+  if (code == POST_INC || code == PRE_DEC
+      || code == PRE_INC || code == POST_DEC)
+    {
+      reg_no = REGNO (XEXP (op, 0));
+      return (((mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode)
+	       ? reg_no <= LAST_LO_REGNUM
+	       :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM))
+	      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+    }
+  else if ((code == POST_MODIFY || code == PRE_MODIFY)
+	   && GET_CODE (XEXP (op, 1)) == PLUS && REG_P (XEXP (XEXP (op, 1), 1)))
+    {
+      reg_no = REGNO (XEXP (op, 0));
+      val = INTVAL (XEXP ( XEXP (op, 1), 1));
+      switch (mode)
+	{
+	  case E_V16QImode:
+	    if (abs_hwi (val))
+	      return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
+		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	  case E_V8HImode:
+	  case E_V8HFmode:
+	    if (abs (val) <= 255)
+	      return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
+		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	  case E_V8QImode:
+	  case E_V4QImode:
+	    if (abs_hwi (val))
+	      return (reg_no <= LAST_LO_REGNUM
+		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	  case E_V4HImode:
+	  case E_V4HFmode:
+	    if (val % 2 == 0 && abs (val) <= 254)
+	      return (reg_no <= LAST_LO_REGNUM
+		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	  case E_V4SImode:
+	  case E_V4SFmode:
+	    if (val % 4 == 0 && abs (val) <= 508)
+	      return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
+		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	  case E_V2DImode:
+	  case E_V2DFmode:
+	  case E_TImode:
+	    if (val % 4 == 0 && val >= 0 && val <= 1020)
+	      return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
+		      || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	  default:
+	    return FALSE;
+	}
+    }
+  return FALSE;
+}
+
 /* Return TRUE if OP is a memory operand which we can load or store a vector
    to/from. TYPE is one of the following values:
     0 - Vector load/stor (vldr)
@@ -13324,15 +13402,6 @@  neon_vector_mem_operand (rtx op, int type, bool strict)
       && (INTVAL (XEXP (ind, 1)) & 3) == 0)
     return TRUE;
 
-  if (type == 1 && TARGET_HAVE_MVE
-      && (GET_CODE (ind) == POST_INC || GET_CODE (ind) == PRE_DEC))
-    {
-      rtx ind1 = XEXP (ind, 0);
-      if (!REG_P (ind1))
-	return 0;
-      return VFP_REGNO_OK_FOR_SINGLE (REGNO (ind1));
-    }
-
   return FALSE;
 }
 
@@ -24019,7 +24088,7 @@  arm_print_operand (FILE *stream, rtx x, int code)
       }
       return;
 
-    /* To print the memory operand with "Us" constraint.  Based on the rtx_code
+    /* To print the memory operand with "Ux" constraint.  Based on the rtx_code
        the memory operands output looks like following.
        1. [Rn], #+/-<imm>
        2. [Rn, #+/-<imm>]!
@@ -33389,6 +33458,18 @@  arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
   return "";
 }
 
+/* If given mode matches, load from memory to LO_REGS.
+   (i.e [Rn], Rn <= LO_REGS).  */
+enum reg_class
+arm_mode_base_reg_class (machine_mode mode)
+{
+  if (TARGET_HAVE_MVE
+      && (mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode))
+    return LO_REGS;
+
+  return MODE_BASE_REG_REG_CLASS (mode);
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-arm.h"
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index fed6c7c84032dd8aba45142b59b980b4a6240d6d..011badc9957655a0fba67946c1db6fa6334b2bbb 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -39,7 +39,7 @@ 
 ;; in all states: Pf, Pg
 
 ;; The following memory constraints have been used:
-;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf
+;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
 ;; in ARM state: Uq
 ;; in Thumb state: Uu, Uw
 ;; in all states: Q
@@ -47,6 +47,18 @@ 
 (define_register_constraint "Up" "TARGET_HAVE_MVE ? VPR_REG : NO_REGS"
   "MVE VPR register")
 
+(define_memory_constraint "Ul"
+ "@internal
+  In ARM/Thumb-2 state a valid address for load instruction with XEXP (op, 0)
+  being label of the literal data item to be loaded."
+ (and (match_code "mem")
+      (match_test "TARGET_HAVE_MVE && reload_completed
+		   && (GET_CODE (XEXP (op, 0)) == LABEL_REF
+		       || (GET_CODE (XEXP (op, 0)) == CONST
+			   && GET_CODE (XEXP (XEXP (op, 0), 0)) == PLUS
+			   && GET_CODE (XEXP (XEXP (XEXP (op, 0), 0), 0)) == LABEL_REF
+			   && CONST_INT_P (XEXP (XEXP (XEXP (op, 0), 0), 1))))")))
+
 (define_register_constraint "Uf" "TARGET_HAVE_MVE ? VFPCC_REG : NO_REGS"
   "MVE FPCCR register")
 
@@ -467,6 +479,15 @@ 
  (and (match_code "mem")
       (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)")))
 
+(define_memory_constraint "Ux"
+ "@internal
+  In ARM/Thumb-2 state a valid address and load into CORE regs or only to
+  LO_REGS based on mode of op."
+ (and (match_code "mem")
+      (match_test "(TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT)
+		   && mve_vector_mem_operand (GET_MODE (op),
+					      XEXP (op, 0), true)")))
+
 (define_memory_constraint "Uq"
  "@internal
   In ARM state an address valid in ldrsb instructions."
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index f43dabbfd4f15b602f0627a9b0ea423064501e51..986fbfe2abae5f1e91e65f1ff5c84709c43c4617 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -666,8 +666,8 @@ 
 (define_int_iterator VSHLCQ_M [VSHLCQ_M_S VSHLCQ_M_U])
 
 (define_insn "*mve_mov<mode>"
-  [(set (match_operand:MVE_types 0 "nonimmediate_operand" "=w,w,r,w,w,r,w,Us")
-	(match_operand:MVE_types 1 "general_operand" "w,r,w,Dn,Usi,r,Dm,w"))]
+  [(set (match_operand:MVE_types 0 "nonimmediate_operand" "=w,w,r,w,w,r,w,Ux,w")
+	(match_operand:MVE_types 1 "general_operand" "w,r,w,Dn,Uxi,r,Dm,w,Ul"))]
   "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
 {
   if (which_alternative == 3 || which_alternative == 6)
@@ -686,6 +686,50 @@ 
 	sprintf (templ, "vmov.i%d\t%%q0, %%x1  @ <mode>", width);
       return templ;
     }
+
+  if (which_alternative == 4 || which_alternative == 7)
+    {
+      rtx ops[2];
+      int regno = (which_alternative == 7)
+		  ? REGNO (operands[1]) : REGNO (operands[0]);
+
+      ops[0] = operands[0];
+      ops[1] = operands[1];
+      if (<MODE>mode == V2DFmode || <MODE>mode == V2DImode)
+	{
+	  if (which_alternative == 7)
+	    {
+	      ops[1] = gen_rtx_REG (DImode, regno);
+	      output_asm_insn ("vstr.64\t%P1, %E0",ops);
+	    }
+	  else
+	    {
+	      ops[0] = gen_rtx_REG (DImode, regno);
+	      output_asm_insn ("vldr.64\t%P0, %E1",ops);
+	    }
+	}
+      else if (<MODE>mode == TImode)
+	{
+	  if (which_alternative == 7)
+	    output_asm_insn ("vstr.64\t%q1, %E0",ops);
+	  else
+	    output_asm_insn ("vldr.64\t%q0, %E1",ops);
+	}
+      else
+	{
+	  if (which_alternative == 7)
+	    {
+	      ops[1] = gen_rtx_REG (TImode, regno);
+	      output_asm_insn ("vstr<V_sz_elem1>.<V_sz_elem>\t%q1, %E0",ops);
+	    }
+	  else
+	    {
+	      ops[0] = gen_rtx_REG (TImode, regno);
+	      output_asm_insn ("vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1",ops);
+	    }
+	}
+      return "";
+    }
   switch (which_alternative)
     {
     case 0:
@@ -694,26 +738,19 @@ 
       return "vmov\t%e0, %Q1, %R1  @ <mode>\;vmov\t%f0, %J1, %K1";
     case 2:
       return "vmov\t%Q0, %R0, %e1  @ <mode>\;vmov\t%J0, %K0, %f1";
-    case 4:
-      if (MEM_P (operands[1])
-	  && (GET_CODE (XEXP (operands[1], 0)) == LABEL_REF
-	      || GET_CODE (XEXP (operands[1], 0)) == CONST))
-	return output_move_neon (operands);
-      else
-	return "vldrb.8 %q0, %E1";
     case 5:
       return output_move_quad (operands);
-    case 7:
-      return "vstrb.8 %q1, %E0";
+    case 8:
+	return output_move_neon (operands);
     default:
       gcc_unreachable ();
       return "";
     }
 }
-  [(set_attr "type" "mve_move,mve_move,mve_move,mve_move,mve_load,multiple,mve_move,mve_store")
-   (set_attr "length" "4,8,8,4,8,8,4,4")
-   (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*,*")
-   (set_attr "neg_pool_range" "*,*,*,*,996,*,*,*")])
+  [(set_attr "type" "mve_move,mve_move,mve_move,mve_move,mve_load,multiple,mve_move,mve_store,mve_load")
+   (set_attr "length" "4,8,8,4,8,8,4,4,4")
+   (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*,*,*")
+   (set_attr "neg_pool_range" "*,*,*,*,996,*,*,*,*")])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w")
@@ -8047,7 +8084,7 @@ 
 ;; [vstrbq_s vstrbq_u]
 ;;
 (define_insn "mve_vstrbq_<supf><mode>"
-  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+  [(set (match_operand:<MVE_B_ELEM> 0 "mve_memory_operand" "=Ux")
 	(unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 "s_register_operand" "w")]
 	 VSTRBQ))
   ]
@@ -8133,7 +8170,7 @@ 
 ;;
 (define_insn "mve_vldrbq_<supf><mode>"
   [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")]
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "mve_memory_operand" "Ux")]
 	 VLDRBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -8142,7 +8179,10 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops);
+   if (<V_sz_elem> == 8)
+     output_asm_insn ("vldrb.<V_sz_elem>\t%q0, %E1",ops);
+   else
+     output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "4")])
@@ -8216,7 +8256,7 @@ 
 ;; [vstrbq_p_s vstrbq_p_u]
 ;;
 (define_insn "mve_vstrbq_p_<supf><mode>"
-  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+  [(set (match_operand:<MVE_B_ELEM> 0 "mve_memory_operand" "=Ux")
 	(unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 "s_register_operand" "w")
 			      (match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VSTRBQ))
@@ -8227,7 +8267,7 @@ 
    int regno = REGNO (operands[1]);
    ops[1] = gen_rtx_REG (TImode, regno);
    ops[0]  = operands[0];
-   output_asm_insn ("vpst\n\tvstrbt.<V_sz_elem>\t%q1, %E0",ops);
+   output_asm_insn ("vpst\;vstrbt.<V_sz_elem>\t%q1, %E0",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -8262,7 +8302,7 @@ 
 ;;
 (define_insn "mve_vldrbq_z_<supf><mode>"
   [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "mve_memory_operand" "Ux")
 		       (match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VLDRBQ))
   ]
@@ -8272,7 +8312,10 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vpst\n\tvldrbt.<supf><V_sz_elem>\t%q0, %E1",ops);
+   if (<V_sz_elem> == 8)
+     output_asm_insn ("vpst\;vldrbt.<V_sz_elem>\t%q0, %E1",ops);
+   else
+     output_asm_insn ("vpst\;vldrbt.<supf><V_sz_elem>\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -8303,7 +8346,7 @@ 
 ;;
 (define_insn "mve_vldrhq_fv8hf"
   [(set (match_operand:V8HF 0 "s_register_operand" "=w")
-	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")]
+	(unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand" "Ux")]
 	 VLDRHQ_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -8312,7 +8355,7 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vldrh.f16\t%q0, %E1",ops);
+   output_asm_insn ("vldrh.16\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "4")])
@@ -8414,12 +8457,11 @@ 
   [(set_attr "length" "8")])
 
 ;;
-;;
 ;; [vldrhq_s, vldrhq_u]
 ;;
 (define_insn "mve_vldrhq_<supf><mode>"
   [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
-	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")]
+	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "mve_memory_operand" "Ux")]
 	 VLDRHQ))
   ]
   "TARGET_HAVE_MVE"
@@ -8428,7 +8470,10 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, %E1",ops);
+   if (<V_sz_elem> == 16)
+     output_asm_insn ("vldrh.16\t%q0, %E1",ops);
+   else
+     output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "4")])
@@ -8438,7 +8483,7 @@ 
 ;;
 (define_insn "mve_vldrhq_z_fv8hf"
   [(set (match_operand:V8HF 0 "s_register_operand" "=w")
-	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
+	(unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand" "Ux")
 	(match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VLDRHQ_F))
   ]
@@ -8448,7 +8493,7 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vpst\n\tvldrht.f16\t%q0, %E1",ops);
+   output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -8458,7 +8503,7 @@ 
 ;;
 (define_insn "mve_vldrhq_z_<supf><mode>"
   [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
-	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")
+	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "mve_memory_operand" "Ux")
 	(match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VLDRHQ))
   ]
@@ -8468,7 +8513,10 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vpst\n\tvldrht.<supf><V_sz_elem>\t%q0, %E1",ops);
+   if (<V_sz_elem> == 16)
+     output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops);
+   else
+     output_asm_insn ("vpst\;vldrht.<supf><V_sz_elem>\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -8478,7 +8526,7 @@ 
 ;;
 (define_insn "mve_vldrwq_fv4sf"
   [(set (match_operand:V4SF 0 "s_register_operand" "=w")
-	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")]
+	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Ux")]
 	 VLDRWQ_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -8487,7 +8535,7 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vldrw.f32\t%q0, %E1",ops);
+   output_asm_insn ("vldrw.32\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "4")])
@@ -8497,7 +8545,7 @@ 
 ;;
 (define_insn "mve_vldrwq_<supf>v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=w")
-	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")]
+	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Ux")]
 	 VLDRWQ))
   ]
   "TARGET_HAVE_MVE"
@@ -8506,7 +8554,7 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vldrw.<supf>32\t%q0, %E1",ops);
+   output_asm_insn ("vldrw.32\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "4")])
@@ -8516,7 +8564,7 @@ 
 ;;
 (define_insn "mve_vldrwq_z_fv4sf"
   [(set (match_operand:V4SF 0 "s_register_operand" "=w")
-	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
+	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Ux")
 	(match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VLDRWQ_F))
   ]
@@ -8526,7 +8574,7 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vpst\n\tvldrwt.f32\t%q0, %E1",ops);
+   output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -8536,7 +8584,7 @@ 
 ;;
 (define_insn "mve_vldrwq_z_<supf>v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=w")
-	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
+	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Ux")
 	(match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VLDRWQ))
   ]
@@ -8546,14 +8594,14 @@ 
    int regno = REGNO (operands[0]);
    ops[0] = gen_rtx_REG (TImode, regno);
    ops[1]  = operands[1];
-   output_asm_insn ("vpst\n\tvldrwt.<supf>32\t%q0, %E1",ops);
+   output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops);
    return "";
 }
   [(set_attr "length" "8")])
 
 (define_expand "mve_vld1q_f<mode>"
   [(match_operand:MVE_0 0 "s_register_operand")
-   (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "memory_operand")] VLD1Q_F)
+   (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "mve_memory_operand")] VLD1Q_F)
   ]
   "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
 {
@@ -8563,7 +8611,7 @@ 
 
 (define_expand "mve_vld1q_<supf><mode>"
   [(match_operand:MVE_2 0 "s_register_operand")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "memory_operand")] VLD1Q)
+   (unspec:MVE_2 [(match_operand:MVE_2 1 "mve_memory_operand")] VLD1Q)
   ]
   "TARGET_HAVE_MVE"
 {
@@ -8991,7 +9039,7 @@ 
 ;; [vstrhq_f]
 ;;
 (define_insn "mve_vstrhq_fv8hf"
-  [(set (match_operand:V8HI 0 "memory_operand" "=Us")
+  [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux")
 	(unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")]
 	 VSTRHQ_F))
   ]
@@ -9010,7 +9058,7 @@ 
 ;; [vstrhq_p_f]
 ;;
 (define_insn "mve_vstrhq_p_fv8hf"
-  [(set (match_operand:V8HI 0 "memory_operand" "=Us")
+  [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux")
 	(unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")
 		      (match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VSTRHQ_F))
@@ -9021,7 +9069,7 @@ 
    int regno = REGNO (operands[1]);
    ops[1] = gen_rtx_REG (TImode, regno);
    ops[0]  = operands[0];
-   output_asm_insn ("vpst\n\tvstrht.16\t%q1, %E0",ops);
+   output_asm_insn ("vpst\;vstrht.16\t%q1, %E0",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -9030,7 +9078,7 @@ 
 ;; [vstrhq_p_s vstrhq_p_u]
 ;;
 (define_insn "mve_vstrhq_p_<supf><mode>"
-  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
+  [(set (match_operand:<MVE_H_ELEM> 0 "mve_memory_operand" "=Ux")
 	(unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 "s_register_operand" "w")
 			      (match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VSTRHQ))
@@ -9041,7 +9089,7 @@ 
    int regno = REGNO (operands[1]);
    ops[1] = gen_rtx_REG (TImode, regno);
    ops[0]  = operands[0];
-   output_asm_insn ("vpst\n\tvstrht.<V_sz_elem>\t%q1, %E0",ops);
+   output_asm_insn ("vpst\;vstrht.<V_sz_elem>\t%q1, %E0",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -9093,7 +9141,7 @@ 
 ;; [vstrhq_scatter_shifted_offset_p_s vstrhq_scatter_shifted_offset_p_u]
 ;;
 (define_insn "mve_vstrhq_scatter_shifted_offset_p_<supf><mode>"
-  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
+  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Ux")
 	(unspec:<MVE_H_ELEM>
 		[(match_operand:MVE_6 1 "s_register_operand" "w")
 		 (match_operand:MVE_6 2 "s_register_operand" "w")
@@ -9136,7 +9184,7 @@ 
 ;; [vstrhq_s, vstrhq_u]
 ;;
 (define_insn "mve_vstrhq_<supf><mode>"
-  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
+  [(set (match_operand:<MVE_H_ELEM> 0 "mve_memory_operand" "=Ux")
 	(unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 "s_register_operand" "w")]
 	 VSTRHQ))
   ]
@@ -9155,7 +9203,7 @@ 
 ;; [vstrwq_f]
 ;;
 (define_insn "mve_vstrwq_fv4sf"
-  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
+  [(set (match_operand:V4SI 0 "memory_operand" "=Ux")
 	(unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")]
 	 VSTRWQ_F))
   ]
@@ -9174,7 +9222,7 @@ 
 ;; [vstrwq_p_f]
 ;;
 (define_insn "mve_vstrwq_p_fv4sf"
-  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
+  [(set (match_operand:V4SI 0 "memory_operand" "=Ux")
 	(unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")
 		      (match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VSTRWQ_F))
@@ -9185,7 +9233,7 @@ 
    int regno = REGNO (operands[1]);
    ops[1] = gen_rtx_REG (TImode, regno);
    ops[0]  = operands[0];
-   output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops);
+   output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -9194,7 +9242,7 @@ 
 ;; [vstrwq_p_s vstrwq_p_u]
 ;;
 (define_insn "mve_vstrwq_p_<supf>v4si"
-  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
+  [(set (match_operand:V4SI 0 "memory_operand" "=Ux")
 	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
 		      (match_operand:HI 2 "vpr_register_operand" "Up")]
 	 VSTRWQ))
@@ -9205,7 +9253,7 @@ 
    int regno = REGNO (operands[1]);
    ops[1] = gen_rtx_REG (TImode, regno);
    ops[0]  = operands[0];
-   output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops);
+   output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -9214,7 +9262,7 @@ 
 ;; [vstrwq_s vstrwq_u]
 ;;
 (define_insn "mve_vstrwq_<supf>v4si"
-  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
+  [(set (match_operand:V4SI 0 "memory_operand" "=Ux")
 	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")]
 	 VSTRWQ))
   ]
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 009862e012c9ce3bbe446a89aacb750f47be66f0..c57ad73577e1eebebc8951ed5b4fb544dd3381f8 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,12 @@ 
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+(define_predicate "mve_memory_operand"
+  (and (match_code "mem")
+       (match_test "TARGET_32BIT
+		    && mve_vector_mem_operand (GET_MODE (op), XEXP (op, 0),
+					       false)")))
+
 ;; True for immediates in the range of 1 to 16 for MVE.
 (define_predicate "mve_imm_16"
   (match_test "satisfies_constraint_Rd (op)"))
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
index e3cf8f8207d603243eae22be9a90bbb1e8a73a58..35f83c6b298aaf2b8093713159b32de17ff96bd2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
@@ -11,10 +11,6 @@  foo32 ()
   return b;
 }
 
-/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
-/* { dg-final { scan-assembler "vstrb.*" }  } */
-/* { dg-final { scan-assembler "vldr.64*" }  } */
-
 float16x8_t
 foo16 ()
 {
@@ -22,6 +18,9 @@  foo16 ()
   return b;
 }
 
-/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
-/* { dg-final { scan-assembler "vstrb.*" }  } */
-/* { dg-final { scan-assembler "vldr.64.*" }  } */
+/* { dg-final { scan-assembler-times "vmov\\tq\[0-7\], q\[0-7\]" 2 } } */
+/* { dg-final { scan-assembler-times "vstrw.32*" 1 } } */
+/* { dg-final { scan-assembler-times "vstrh.16*" 1 } } */
+/* { dg-final { scan-assembler-times "vldrw.32*" 1 } } */
+/* { dg-final { scan-assembler-times "vldrh.16*" 1 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c
new file mode 100644
index 0000000000000000000000000000000000000000..15656ed8c3c8c3ab95bbb5de59dafdab864b28db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr.c
@@ -0,0 +1,61 @@ 
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O2" } */
+
+#include "arm_mve.h"
+void
+foo (uint16_t row_x_col, int8_t *out)
+{
+  for (;;)
+   {
+     int32x4_t out_3;
+     int8_t *rhs_0;
+     int8_t *lhs_3;
+     int i_row_x_col;
+     for (;i_row_x_col < row_x_col; i_row_x_col++)
+      {
+	int32x4_t ker_0 = vldrbq_s32(rhs_0);
+	int32x4_t ip_3 = vldrbq_s32(lhs_3);
+	out_3 = vmulq_s32(ip_3, ker_0);
+      }
+     vstrbq_s32(out, out_3);
+   }
+}
+
+void
+foo1 (uint16_t row_x_col, int8_t *out)
+{
+  for (;;)
+   {
+     int16x8_t out_3;
+     int8_t *rhs_0;
+     int8_t *lhs_3;
+     int i_row_x_col;
+     for (; i_row_x_col < row_x_col; i_row_x_col++)
+      {
+	int16x8_t ker_0 = vldrbq_s16(rhs_0);
+	int16x8_t ip_3 = vldrbq_s16(lhs_3);
+	out_3 = vmulq_s16(ip_3, ker_0);
+      }
+     vstrbq_s16(out, out_3);
+   }
+}
+
+void
+foo2 (uint16_t row_x_col, int16_t *out)
+{
+  for (;;)
+   {
+     int32x4_t out_3;
+     int16_t *rhs_0;
+     int16_t *lhs_3;
+     int i_row_x_col;
+     for (; i_row_x_col < row_x_col; i_row_x_col++)
+      {
+	int32x4_t ker_0 = vldrhq_s32(rhs_0);
+	int32x4_t ip_3 = vldrhq_s32(lhs_3);
+	out_3 = vmulq_s32(ip_3, ker_0);
+      }
+     vstrhq_s32(out, out_3);
+   }
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c
new file mode 100644
index 0000000000000000000000000000000000000000..ae640837d14f41cc617ac56c57ca120be615ac31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vldr_z.c
@@ -0,0 +1,73 @@ 
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O2" } */
+
+#include "arm_mve.h"
+void
+foo (uint16_t row_len, const int32_t *bias, int8_t *out)
+{
+  int i_out_ch;
+  for (;;)
+   {
+     int8_t *ip_c3;
+     int32_t acc_3;
+     int32_t row_loop_cnt = row_len;
+     int32x4_t res = {acc_3};
+     uint32x4_t scatter_offset;
+     int i_row_loop;
+     for (; i_row_loop < row_loop_cnt; i_row_loop++)
+      {
+	mve_pred16_t p;
+	int16x8_t r0;
+	int16x8_t c3 = vldrbq_z_s16(ip_c3, p);
+	acc_3 = vmladavaq_p_s16(acc_3, r0, c3, p);
+      }
+     vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res);
+   }
+}
+
+void
+foo1 (uint16_t row_len, const int32_t *bias, int8_t *out)
+{
+  int i_out_ch;
+  for (;;)
+   {
+     int8_t *ip_c3;
+     int32_t acc_3;
+     int32_t row_loop_cnt = row_len;
+     int i_row_loop;
+     int32x4_t res = {acc_3};
+     uint32x4_t scatter_offset;
+     for (; i_row_loop < row_loop_cnt; i_row_loop++)
+      {
+	mve_pred16_t p;
+	int32x4_t r0;
+	int32x4_t c3 = vldrbq_z_s32(ip_c3, p);
+	acc_3 = vmladavaq_p_s32(acc_3, r0, c3, p);
+      }
+     vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res);
+   }
+}
+
+void
+foo2 (uint16_t row_len, const int32_t *bias, int8_t *out)
+{
+  int i_out_ch;
+  for (;;)
+   {
+     int16_t *ip_c3;
+     int32_t acc_3;
+     int32_t row_loop_cnt = row_len;
+     int i_row_loop;
+     int32x4_t res = {acc_3};
+     uint32x4_t scatter_offset;
+     for (; i_row_loop < row_loop_cnt; i_row_loop++)
+      {
+	mve_pred16_t p;
+	int32x4_t r0;
+	int32x4_t c3 = vldrhq_z_s32(ip_c3, p);
+	acc_3 = vmladavaq_p_s32(acc_3, r0, c3, p);
+      }
+     vstrbq_scatter_offset_s32(&out[i_out_ch], scatter_offset, res);
+   }
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c
new file mode 100644
index 0000000000000000000000000000000000000000..dd785f28bc02beae828a6486fdcf3a374829ac0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr.c
@@ -0,0 +1,43 @@ 
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O2" } */
+
+#include "arm_mve.h"
+void
+foo (const int32_t *output_bias, int8_t *out, uint16_t num_ch)
+{
+  int32_t loop_count = num_ch;
+  const int32_t *bias = output_bias;
+  int i_loop_cnt;
+  for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++)
+   {
+     int32x4_t out_0 = vldrwq_s32(bias);
+     vstrbq_s32(out, out_0);
+   }
+}
+
+void
+foo1 (const int16_t *output_bias, int8_t *out, uint16_t num_ch)
+{
+  int32_t loop_count = num_ch;
+  const int16_t *bias = output_bias;
+  int i_loop_cnt;
+  for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++)
+   {
+     int16x8_t out_0 = vldrhq_s16(bias);
+     vstrbq_s16(out, out_0);
+   }
+}
+
+void
+foo2 (const int32_t *output_bias, int16_t *out, uint16_t num_ch)
+{
+  int32_t loop_count = num_ch;
+  const int32_t *bias = output_bias;
+  int i_loop_cnt;
+  for (; i_loop_cnt < loop_count; out += 4, i_loop_cnt++)
+   {
+     int32x4_t out_0 = vldrwq_s32(bias);
+     vstrhq_s32(out, out_0);
+   }
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c
new file mode 100644
index 0000000000000000000000000000000000000000..8b222f1be0a95031189e792bf9afa22411fa867a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vstr_p.c
@@ -0,0 +1,42 @@ 
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O2" } */
+
+#include "arm_mve.h"
+void
+foo1 (int8_t *x, int32_t * i1)
+{
+  mve_pred16_t p;
+  int32x4_t x_0;
+  int32_t * bias1 = i1;
+  for (;; x++)
+  {
+    x_0 = vldrwq_s32(bias1);
+    vstrbq_p_s32(x, x_0, p);
+  }
+}
+void
+foo2 (int8_t *x, int16_t * i1)
+{
+  mve_pred16_t p;
+  int16x8_t x_0;
+  int16_t * bias1 = i1;
+  for (;; x++)
+  {
+    x_0 = vldrhq_s16(bias1);
+    vstrbq_p_s16(x, x_0, p);
+  }
+}
+
+void
+foo3 (int16_t *x, int32_t * i1)
+{
+  mve_pred16_t p;
+  int32x4_t x_0;
+  int32_t * bias1 = i1;
+  for (;; x++)
+  {
+    x_0 = vldrwq_s32(bias1);
+    vstrhq_p_s32(x, x_0, p);
+  }
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c
index 5e42f634412309411e4a6257cc3042a9ab280e06..699e40d0e3b503f6c02abaa3f4f976343081f108 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c
@@ -10,12 +10,11 @@  foo (float16_t const * base)
   return vld1q_f16 (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.f16"  }  } */
-
 float16x8_t
 foo1 (float16_t const * base)
 {
   return vld1q (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.f16"  }  } */
+/* { dg-final { scan-assembler-times "vldrh.16" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c
index 99d1a7a9c5e66b4ae99d5184756bb65b8bc5e852..865923033629c273b1a31f57c0589e0ab1e6fc24 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c
@@ -10,12 +10,11 @@  foo (float32_t const * base)
   return vld1q_f32 (base);
 }
 
-/* { dg-final { scan-assembler "vldrw.f32"  }  } */
-
 float32x4_t
 foo1 (float32_t const * base)
 {
   return vld1q (base);
 }
 
-/* { dg-final { scan-assembler "vldrw.f32"  }  } */
+/* { dg-final { scan-assembler-times "vldrw.32" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c
index d77f98ea8893959adfbc2688645d0d36dd826816..f4f04f534db63c5b77927d8e2ea967bb705012cc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c
@@ -10,12 +10,11 @@  foo (int16_t const * base)
   return vld1q_s16 (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.s16"  }  } */
-
 int16x8_t
 foo1 (int16_t const * base)
 {
   return vld1q (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.s16"  }  } */
+/* { dg-final { scan-assembler-times "vldrh.16" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c
index 9a7f024f735f1715d6e577aaf08e217b52ad66e7..e0f661667515f3d3e94cd052b4bbdef9c33c06dc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c
@@ -10,12 +10,11 @@  foo (int32_t const * base)
   return vld1q_s32 (base);
 }
 
-/* { dg-final { scan-assembler "vldrw.s32"  }  } */
-
 int32x4_t
 foo1 (int32_t const * base)
 {
   return vld1q (base);
 }
 
-/* { dg-final { scan-assembler "vldrw.s32"  }  } */
+/* { dg-final { scan-assembler-times "vldrw.32" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c
index 9c67bb60110081c1ed6c65f6986bbc08b0e2a691..1b7edead6b1a5489f2c668a69136b5fed463c703 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c
@@ -10,12 +10,11 @@  foo (int8_t const * base)
   return vld1q_s8 (base);
 }
 
-/* { dg-final { scan-assembler "vldrb.s8"  }  } */
-
 int8x16_t
 foo1 (int8_t const * base)
 {
   return vld1q (base);
 }
 
-/* { dg-final { scan-assembler "vldrb.s8"  }  } */
+/* { dg-final { scan-assembler-times "vldrb.8" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c
index 2bef21a5a1dcaf265c052ddb689df9b12d4419ae..50e1f5cedcbe42d7f63255359795007bfe5ffc0e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c
@@ -10,12 +10,11 @@  foo (uint16_t const * base)
   return vld1q_u16 (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.u16"  }  } */
-
 uint16x8_t
 foo1 (uint16_t const * base)
 {
   return vld1q (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.u16"  }  } */
+/* { dg-final { scan-assembler-times "vldrh.16" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c
index 01a1dd611ed68281e32e8719e11137a9b5626398..a13fe824382f825a32f865fc5937712a2f278faf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c
@@ -10,12 +10,11 @@  foo (uint32_t const * base)
   return vld1q_u32 (base);
 }
 
-/* { dg-final { scan-assembler "vldrw.u32"  }  } */
-
 uint32x4_t
 foo1 (uint32_t const * base)
 {
   return vld1q (base);
 }
 
-/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+/* { dg-final { scan-assembler-times "vldrw.32" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c
index 997bc1b212d228668b7a6f36a615168a52ac1af0..dfd1deb93f0f485fb2491a3b21821c284c0da437 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c
@@ -10,12 +10,11 @@  foo (uint8_t const * base)
   return vld1q_u8 (base);
 }
 
-/* { dg-final { scan-assembler "vldrb.u8"  }  } */
-
 uint8x16_t
 foo1 (uint8_t const * base)
 {
   return vld1q (base);
 }
 
-/* { dg-final { scan-assembler "vldrb.u8"  }  } */
+/* { dg-final { scan-assembler-times "vldrb.8" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c
index ea5593a9dd19d682089021902e7bf283bb54041f..3c32e408e420e2d393b5abcc96bd59e5d048ec34 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c
@@ -10,12 +10,12 @@  foo (float16_t const * base, mve_pred16_t p)
   return vld1q_z_f16 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.f16"  }  } */
-
 float16x8_t
 foo1 (float16_t const * base, mve_pred16_t p)
 {
   return vld1q_z (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.f16"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 2 }  } */
+/* { dg-final { scan-assembler-times "vldrht.16" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c
index 28937cd18aa9692d357cf553a71e81f78e184dc5..3fc935c889bea0fec7858e034002b4a521afab65 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c
@@ -10,12 +10,12 @@  foo (float32_t const * base, mve_pred16_t p)
   return vld1q_z_f32 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.f32"  }  } */
-
 float32x4_t
 foo1 (float32_t const * base, mve_pred16_t p)
 {
   return vld1q_z (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.f32"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 2 }  } */
+/* { dg-final { scan-assembler-times "vldrwt.32" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c
index 81a1c439d6e034341a08ea28050f7bab35237808..49cc81092f359249c5178332c1ca6e18076eabdb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c
@@ -10,12 +10,12 @@  foo (int16_t const * base, mve_pred16_t p)
   return vld1q_z_s16 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.s16"  }  } */
-
 int16x8_t
 foo1 (int16_t const * base, mve_pred16_t p)
 {
   return vld1q_z (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.s16"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 2 }  } */
+/* { dg-final { scan-assembler-times "vldrht.16" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c
index d03ab345f1920f03e6a609bb330b514eae81779c..ec317cd70e8f5cb2a5f83bbdcaf90b18ae148615 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c
@@ -10,12 +10,12 @@  foo (int32_t const * base, mve_pred16_t p)
   return vld1q_z_s32 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.s32"  }  } */
-
 int32x4_t
 foo1 (int32_t const * base, mve_pred16_t p)
 {
   return vld1q_z (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.s32"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 2 }  } */
+/* { dg-final { scan-assembler-times "vldrwt.32" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c
index e535662c7d04437843b9c6aee516ba7a0ceaa214..538c140e78e8d858fe1b42d73ca06ad774f3f4da 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c
@@ -10,12 +10,12 @@  foo (int8_t const * base, mve_pred16_t p)
   return vld1q_z_s8 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
-
 int8x16_t
 foo1 (int8_t const * base, mve_pred16_t p)
 {
   return vld1q_z (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 2 }  } */
+/* { dg-final { scan-assembler-times "vldrbt.8" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c
index 3f20f4ed9ca6a74fbbce1f471d0a49d89075f1ad..e5e588a187e9524a76d9d9b3a2c799338989d7f6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c
@@ -10,12 +10,12 @@  foo (uint16_t const * base, mve_pred16_t p)
   return vld1q_z_u16 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.u16"  }  } */
-
 uint16x8_t
 foo1 (uint16_t const * base, mve_pred16_t p)
 {
   return vld1q_z (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.u16"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 2 }  } */
+/* { dg-final { scan-assembler-times "vldrht.16" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c
index 1d3b53e38e8bc3123b85f89570d8656988b9b278..999beefa7e8669dc15b70f4841adfbc08d018622 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c
@@ -10,12 +10,12 @@  foo (uint32_t const * base, mve_pred16_t p)
   return vld1q_z_u32 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
-
 uint32x4_t
 foo1 (uint32_t const * base, mve_pred16_t p)
 {
   return vld1q_z (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 2 }  } */
+/* { dg-final { scan-assembler-times "vldrwt.32" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c
index 47d3f6fa4c70773f7a4c549dcf8a3b884cebab92..172053c71422f5daad1555932c9af84deee0c8d9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c
@@ -10,12 +10,12 @@  foo (uint8_t const * base, mve_pred16_t p)
   return vld1q_z_u8 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
-
 uint8x16_t
 foo1 (uint8_t const * base, mve_pred16_t p)
 {
   return vld1q_z (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 2 }  } */
+/* { dg-final { scan-assembler-times "vldrbt.8" 2 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
index 886491f005284596099eeec4d90e67f82b5e967f..ec2f2176ccfebbef00447aed153069ce3be9491c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
@@ -10,4 +10,5 @@  foo (int8_t const * base)
   return vldrbq_s8 (base);
 }
 
-/* { dg-final { scan-assembler "vldrb.s8"  }  } */
+/* { dg-final { scan-assembler-times "vldrb.8" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
index e58120a2b64d6a9f0efaf80a5a47d9c80f5f4d34..d07b472a4ffe79d4615ae2a1e15606a34b9ac765 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
@@ -10,4 +10,5 @@  foo (uint8_t const * base)
   return vldrbq_u8 (base);
 }
 
-/* { dg-final { scan-assembler "vldrb.u8"  }  } */
+/* { dg-final { scan-assembler-times "vldrb.8" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
index 7d66c704516b5db6cbd3f36645e6a44031201bf1..aed3c9100638a2e86b91b3bf42040b4b728fd725 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
@@ -10,4 +10,6 @@  foo (int8_t const * base, mve_pred16_t p)
   return vldrbq_z_s8 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrbt.8" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
index 05ae2628d5645ffa3bfae83867f95151c887ef75..54c61e744543d5bba03dfdd23e1ceb1e8f398a1a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
@@ -10,4 +10,6 @@  foo (uint8_t const * base, mve_pred16_t p)
   return vldrbq_z_u8 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrbt.8" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
index 0d1ee769ec64b55c7559ce9dc14f8a6ae2e43e34..7420d0198e7450f566644a74bac925170b49d688 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
@@ -10,6 +10,7 @@  foo (uint64x2_t * addr)
   return vldrdq_gather_base_wb_s64 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
 /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-times "vldr.64" 1 } } */
+/* { dg-final { scan-assembler-times "vstr.64" 1 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
index cb2a41bdcd32b553a93d3bcc4787d506f1b54f74..ebe5b2fd70c7e9c1ebe6e2eedb185975afea36b9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
@@ -10,6 +10,7 @@  foo (uint64x2_t * addr)
   return vldrdq_gather_base_wb_u64 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
 /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-times "vldr.64" 1 } } */
+/* { dg-final { scan-assembler-times "vstr.64" 1 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
index 243fbeacc3429025202da2ff157ade38a472e123..231a24a1e5550b444c5476bfc0d1f6802a4952c8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
@@ -8,8 +8,8 @@  int64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
     return vldrdq_gather_base_wb_z_s64 (addr, 1016, p);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
-/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*$" } } */
 /* { dg-final { scan-assembler "vpst" } } */
 /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-times "vldr.64" 1 } } */
+/* { dg-final { scan-assembler-times "vstr.64" 1 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
index 10ba42405fe8fde9d4f8993b20e41a59c7bb2e77..b8d9b5c139150536721b1a66636ce9b5a86bf093 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
@@ -8,8 +8,8 @@  uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
     return vldrdq_gather_base_wb_z_u64 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
-/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
 /* { dg-final { scan-assembler "vpst" } } */
 /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-times "vldr.64" 1 } } */
+/* { dg-final { scan-assembler-times "vstr.64" 1 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c
index b79c0e9bfe49e99557ebfb14bc03f8aaf40ab925..05bef418d822a7a994002f4073b65178e42346a2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c
@@ -10,4 +10,5 @@  foo (float16_t const * base)
   return vldrhq_f16 (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.f16"  }  } */
+/* { dg-final { scan-assembler-times "vldrh.16" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c
index 4872eb555f3e7b8b7ec86b8ba377f0a0452b1aa0..7c977b6a6995f11cd03380ba11585452bdddbe89 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c
@@ -10,4 +10,5 @@  foo (int16_t const * base)
   return vldrhq_s16 (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.s16"  }  } */
+/* { dg-final { scan-assembler-times "vldrh.16" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c
index e73e208c26a63fb7caaa68f4d5a52f1fa1c904fa..229b52163faa2566e17f1320940a066313d2c853 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c
@@ -10,4 +10,5 @@  foo (int16_t const * base)
   return vldrhq_s32 (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.s32"  }  } */
+/* { dg-final { scan-assembler-times "vldrh.s32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c
index 6b285d45aaa158c01f3e043d2e71214ed824f79a..07f6d9e3944a976886b35f3c7a042046f3c7498a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c
@@ -10,4 +10,5 @@  foo (uint16_t const * base)
   return vldrhq_u16 (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.u16"  }  } */
+/* { dg-final { scan-assembler-times "vldrh.16" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c
index 994cd4a20badd807c8be54aa45f788e8b9420fd9..cd24f01831f77d1da50ca624a7b6c800a9b616fd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c
@@ -10,4 +10,5 @@  foo (uint16_t const * base)
   return vldrhq_u32 (base);
 }
 
-/* { dg-final { scan-assembler "vldrh.u32"  }  } */
+/* { dg-final { scan-assembler-times "vldrh.u32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c
index 2b866a99dd4c8a213887fe120cfe6dec35f84f87..dd0fc9c7b733114f6e229f78155afe53cca675b7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c
@@ -10,4 +10,6 @@  foo (float16_t const * base, mve_pred16_t p)
   return vldrhq_z_f16 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.f16"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrht.16" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c
index 6c92c50ba12a502713ee7d2e7cf719edf848fe9c..36d3458d95c91d13631e511ac1294e544791336a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c
@@ -10,4 +10,6 @@  foo (int16_t const * base, mve_pred16_t p)
   return vldrhq_z_s16 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.s16"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrht.16" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c
index 4cd97ba5743ef1dcd8dc368ef75dc6df2391f69c..9c67b479be79c8377682a4488d365cd853df7a2c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c
@@ -10,4 +10,6 @@  foo (int16_t const * base, mve_pred16_t p)
   return vldrhq_z_s32 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.s32"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrht.s32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c
index 80ae0e5cd17fe158b152873d144e8b1217ad8e33..26354b5971aca3c9f003559d5261866a019a70ef 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c
@@ -10,4 +10,6 @@  foo (uint16_t const * base, mve_pred16_t p)
   return vldrhq_z_u16 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.u16"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrht.16" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c
index 1a8590116eb017d9744d62b1dd7d07f539b390f0..948fe5ee5b46701ce6a7e80d4b6a3d54690d921f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c
@@ -10,4 +10,6 @@  foo (uint16_t const * base, mve_pred16_t p)
   return vldrhq_z_u32 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrht.u32"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrht.u32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c
index 2c834ae53df93b97f7d7f9600fc4eba7d6c3400d..143079aa23fe8a45c381e33e20adbd4bb91a539c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c
@@ -10,4 +10,5 @@  foo (float32_t const * base)
   return vldrwq_f32 (base);
 }
 
-/* { dg-final { scan-assembler "vldrw.f32"  }  } */
+/* { dg-final { scan-assembler-times "vldrw.32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
index db8108e37325c4e1fafd2293d48eba0c33309073..8e2994f75d7d488e968dd9cd4847900d2438475a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
@@ -10,6 +10,7 @@  foo (uint32x4_t * addr)
   return vldrwq_gather_base_wb_f32 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
 /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
index 3da64e218e2c0789e996be551650033567eba4e5..e5054738b75ec7378a6a289e9c071721f9a6a4d0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
@@ -10,6 +10,7 @@  foo (uint32x4_t * addr)
   return vldrwq_gather_base_wb_s32 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
 /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
index 2597ee11608bfe21d697f2250bee7e69c0cc7aec..7f39414143bdfb3bcbc059dcdcba0472c0a63459 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
@@ -10,6 +10,7 @@  foo (uint32x4_t * addr)
   return vldrwq_gather_base_wb_u32 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
 /* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
index 9fb47daf486fafdb897618453958e776a069d432..f3219e2e8254f542916b1fdc6d633e5512c08cfe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
@@ -10,8 +10,9 @@  foo (uint32x4_t * addr, mve_pred16_t p)
   return vldrwq_gather_base_wb_z_f32 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
 /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
 /* { dg-final { scan-assembler "vpst" } } */
 /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
index 56da5a46c64d2946ceade8689105048e19efdc6a..4d093d243fe63e3f98cffaf15fcb41fa4611b41e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
@@ -10,8 +10,9 @@  foo (uint32x4_t * addr, mve_pred16_t p)
   return vldrwq_gather_base_wb_z_s32 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
 /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
 /* { dg-final { scan-assembler "vpst" } } */
 /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
index 63165d97c1a7b4120be036348a09b73afddd36d1..e796522a49c6c1929f2f64ee27e36eda9a1a95d3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
@@ -10,8 +10,9 @@  foo (uint32x4_t * addr, mve_pred16_t p)
   return vldrwq_gather_base_wb_z_u32 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
 /* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
 /* { dg-final { scan-assembler "vpst" } } */
 /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
-/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c
index f48c29f8bff5f0b57802d1673c433b70311f8fc0..860dd324d256511a5802a097019f1a9a7cd52e9b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c
@@ -10,4 +10,5 @@  foo (int32_t const * base)
   return vldrwq_s32 (base);
 }
 
-/* { dg-final { scan-assembler "vldrw.s32"  }  } */
+/* { dg-final { scan-assembler-times "vldrw.32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c
index 7c722200ecc5c642d6b8e3e0be69601a325b7f53..513ed49fb6eb7a88a51df58a521ff0669af89ad1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c
@@ -10,4 +10,5 @@  foo (uint32_t const * base)
   return vldrwq_u32 (base);
 }
 
-/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+/* { dg-final { scan-assembler-times "vldrw.32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c
index bcdcecab46875864125c4232a75931faf0bcb54f..3e0a6a60bcf4374ec09f336001b04e6fda524913 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c
@@ -10,4 +10,6 @@  foo (float32_t const * base, mve_pred16_t p)
   return vldrwq_z_f32 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.f32"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrwt.32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c
index fd32b30565627078561f6f04214a15a9a1643a68..82b914885b55d7b7d076500726ac9e174f8c0ece 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c
@@ -10,4 +10,6 @@  foo (int32_t const * base, mve_pred16_t p)
   return vldrwq_z_s32 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.s32"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrwt.32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c
index f49440438348582745eabd4589bef40ee07f8deb..6a66e1678815b7b4984ed011d108bc48ab44c963 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c
@@ -10,4 +10,6 @@  foo (uint32_t const * base, mve_pred16_t p)
   return vldrwq_z_u32 (base, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+/* { dg-final { scan-assembler-times "vpst" 1 }  } */
+/* { dg-final { scan-assembler-times "vldrwt.32" 1 }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c
index 52bad05b6219621ada414dc74ab2deebdd1c93e3..739f282c476f2611245a20dfc0d121eba289a788 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c
@@ -1,6 +1,6 @@ 
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O0" } */
+/* { dg-additional-options "-O2" } */
 
 #include "arm_mve.h"
 
@@ -14,4 +14,6 @@  foo ()
   fb = vuninitializedq_f32 ();
 }
 
-/* { dg-final { scan-assembler-times "vstrb.8" 4 } } */
+/* { dg-final { scan-assembler-times "vstrh.16" 1 } } */
+/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c
index c6724a52074c6ce0361fdba66c4add831e8c13db..a9130607f26915af39e41d9f1181131bcbd1ef32 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c
@@ -1,6 +1,6 @@ 
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O0" } */
+/* { dg-additional-options "-O2" } */
 
 #include "arm_mve.h"
 
@@ -14,4 +14,6 @@  foo ()
   fb = vuninitializedq (fbb);
 }
 
-/* { dg-final { scan-assembler-times "vstrb.8" 6 } } */
+/* { dg-final { scan-assembler-times "vstrh.16" 1 } } */
+/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c
index 13a0109a9b5380cd83f48154df231081ddb8f08e..bf6692fe57322ac9ed5c949a9697d3ed7a565acc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c
@@ -1,6 +1,6 @@ 
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
-/* { dg-additional-options "-O0" } */
+/* { dg-additional-options "-O2" } */
 
 #include "arm_mve.h"
 int8x16_t a;
@@ -25,4 +25,8 @@  foo ()
   ud = vuninitializedq_u64 ();
 }
 
-/* { dg-final { scan-assembler-times "vstrb.8" 16 } } */
+/* { dg-final { scan-assembler-times "vstrb.8" 2 } } */
+/* { dg-final { scan-assembler-times "vstrh.16" 2 } } */
+/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */
+/* { dg-final { scan-assembler-times "vstr.64" 2 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c
index a321398709e65ee7daadfab9c6089116baccde83..4f66a07ac29030482a2643e10907d0dae24743af 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c
@@ -1,6 +1,6 @@ 
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
-/* { dg-additional-options "-O0" } */
+/* { dg-additional-options "-O2" } */
 
 #include "arm_mve.h"
 
@@ -26,4 +26,8 @@  foo ()
   ud = vuninitializedq (udd);
 }
 
-/* { dg-final { scan-assembler-times "vstrb.8" 24 } } */
+/* { dg-final { scan-assembler-times "vstrb.8" 2 } } */
+/* { dg-final { scan-assembler-times "vstrh.16" 2 } } */
+/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */
+/* { dg-final { scan-assembler-times "vstr.64" 2 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */