diff mbox series

arm: Prevent ICE when doloop dec_set is not PLUS_EXPR

Message ID dbfb9ba1-a820-435c-8184-346722843718@arm.com
State New
Headers show
Series arm: Prevent ICE when doloop dec_set is not PLUS_EXPR | expand

Commit Message

Andre Vieira (lists) July 26, 2024, 2:05 p.m. UTC
This patch refactors and fixes an issue where 
arm_mve_dlstp_check_dec_counter
was making an assumption about the form of what a candidate for a dec_insn.

It also makes sure that if it does not initially encounter a 'set' in such a
form it tries to find another set that could be the right one.

gcc/ChangeLog:

	* config/arm/arm.cc (check_dec_insn): New helper function containing
	code hoisted from...
	(arm_mve_dlstp_check_dec_counter): ... here. Use check_dec_insn to
	check the validity of the candidate dec_insn.

gcc/testsuite/ChangeLog:

	* gcc.targer/arm/mve/dlstp-loop-form.c: New test.


Regression tested mve.exp for arm-none-eabi with -mcpu=cortex-m85.

OK for trunk?

Kind regards,
Andre Vieira

Comments

Christophe Lyon July 30, 2024, 8:31 p.m. UTC | #1
Hi Andre,

On 7/26/24 16:05, Andre Vieira (lists) wrote:
> 
> This patch refactors and fixes an issue where 
> arm_mve_dlstp_check_dec_counter
> was making an assumption about the form of what a candidate for a dec_insn.

I think this lacks some verb? (eg .... what a candidate for a dec_insn 
"is" or "should look like" or .... ?)

 > -  if (!NONDEBUG_INSN_P (dec_insn))
 > +  if (!check_dec_insn(dec_insn, condcount))
missing space before '('


 > -    return NULL;
 > +    if (!check_dec_insn(dec_insn, condcount))
same


> 
> It also makes sure that if it does not initially encounter a 'set' in 
> such a
> form it tries to find another set that could be the right one.
> 
> gcc/ChangeLog:
> 
>      * config/arm/arm.cc (check_dec_insn): New helper function containing
>      code hoisted from...
>      (arm_mve_dlstp_check_dec_counter): ... here. Use check_dec_insn to
>      check the validity of the candidate dec_insn.
> 
> gcc/testsuite/ChangeLog:
> 
>      * gcc.targer/arm/mve/dlstp-loop-form.c: New test.
> 
> 
> Regression tested mve.exp for arm-none-eabi with -mcpu=cortex-m85.

I manually tried to exercise the testcase with a cross-compiler, and 
found the same issue as the Linaro CI should have reported (there was a 
temporary breakage).

You can find detailed logs from Linaro in gcc.log.1.xz under 
https://ci.linaro.org/job/tcwg_gcc_check--master-arm-precommit/8357/artifact/artifacts/artifacts.precommit/00-sumfiles/

Basically the testcase fails to compile with loads of
dlstp-loop-form.c:6:9: warning: 'pure' attribute on function returning 
'void' [-Wattributes]
then

dlstp-loop-form.c:7:37: error: unknown type name 'float16x8_t'; did you 
mean 'int16x8_t'?
dlstp-loop-form.c: In function 'n':
dlstp-loop-form.c:18:8: error: subscripted value is neither array nor 
pointer nor vector
dlstp-loop-form.c:21:13: error: passing 'e' {aka 'int'} to argument 2 of 
'vfmsq_m', which expects an MVE vector type

Why would the test pass for you?

Thanks,

Christophe

> OK for trunk?
> 
> Kind regards,
> Andre Vieira
Andre Vieira (lists) July 31, 2024, 9:29 a.m. UTC | #2
Hi Christophe,

Thanks for the comments, attached new version for testcase, see below 
new cover letter:

This patch refactors and fixes an issue where 
arm_mve_dlstp_check_dec_counter
was making an assumption about the form of what a candidate for a dec_insn.
This dec_insn is the instruction that decreases the loop counter inside a
decrementing loop and we expect it to have the following form:
(set (reg CONDCOUNT)
      (plus (reg CONDCOUNT)
            (const_int)))

Where CONDCOUNT is the loop counter, and const int is the negative constant
used to decrement it.

This patch also improves our search for a valid dec_insn.  Before this patch
we'd only look for a dec_insn inside the loop header if the loop latch was
empty.  We now also search the loop header if the loop latch is not 
empty but
the last instruction is not a valid dec_insn.  This could potentially be 
improved
to search all instructions inside the loop latch.

gcc/ChangeLog:

     * config/arm/arm.cc (check_dec_insn): New helper function containing
     code hoisted from...
     (arm_mve_dlstp_check_dec_counter): ... here. Use check_dec_insn to
     check the validity of the candidate dec_insn.

gcc/testsuite/ChangeLog:

     * gcc.targer/arm/mve/dlstp-loop-form.c: New test.


On 30/07/2024 21:31, Christophe Lyon wrote:
> I manually tried to exercise the testcase with a cross-compiler, and 
> found the same issue as the Linaro CI should have reported (there was a 
> temporary breakage).
> 
> You can find detailed logs from Linaro in gcc.log.1.xz under 
> https://ci.linaro.org/job/tcwg_gcc_check--master-arm-precommit/8357/artifact/artifacts/artifacts.precommit/00-sumfiles/
> 
> Basically the testcase fails to compile with loads of
> dlstp-loop-form.c:6:9: warning: 'pure' attribute on function returning 
> 'void' [-Wattributes]
> then
> 
> dlstp-loop-form.c:7:37: error: unknown type name 'float16x8_t'; did you 
> mean 'int16x8_t'?
> dlstp-loop-form.c: In function 'n':
> dlstp-loop-form.c:18:8: error: subscripted value is neither array nor 
> pointer nor vector
> dlstp-loop-form.c:21:13: error: passing 'e' {aka 'int'} to argument 2 of 
> 'vfmsq_m', which expects an MVE vector type
> 
> Why would the test pass for you?

Because I tested with a toolchain configured for cortex-m85, which has 
mve.fp enabled by default, which means I didn't realize the testcase 
required arm_v8_1m_mve_fp_ok instead of arm_v8_1m_mve_ok.

Addressed that now.
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 92cd168e65937ef7350477464e8b0becf85bceed..e3c3db5c816bfaedf3afb775a0436d4b7c984b51 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35214,6 +35214,30 @@ arm_mve_dlstp_check_inc_counter (loop *loop, rtx_insn* vctp_insn,
   return vctp_insn;
 }
 
+/* Helper function to 'arm_mve_dlstp_check_dec_counter' to make sure DEC_INSN
+   is of the expected form:
+   (set (reg a) (plus (reg a) (const_int)))
+   where (reg a) is the same as CONDCOUNT.  */
+
+static bool
+check_dec_insn (rtx_insn *dec_insn, rtx condcount)
+{
+  if (!NONDEBUG_INSN_P (dec_insn))
+    return false;
+  rtx dec_set = single_set (dec_insn);
+  if (!dec_set
+      || !REG_P (SET_DEST (dec_set))
+      || GET_CODE (SET_SRC (dec_set)) != PLUS
+      || !REG_P (XEXP (SET_SRC (dec_set), 0))
+      || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
+      || REGNO (SET_DEST (dec_set))
+	  != REGNO (XEXP (SET_SRC (dec_set), 0))
+      || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
+    return false;
+
+  return true;
+}
+
 /* Helper function to `arm_mve_loop_valid_for_dlstp`.  In the case of a
    counter that is decrementing, ensure that it is decrementing by the
    right amount in each iteration and that the target condition is what
@@ -35232,30 +35256,19 @@ arm_mve_dlstp_check_dec_counter (loop *loop, rtx_insn* vctp_insn,
      modified.  */
   rtx_insn *dec_insn = BB_END (loop->latch);
   /* If not in the loop latch, try to find the decrement in the loop header.  */
-  if (!NONDEBUG_INSN_P (dec_insn))
+  if (!check_dec_insn (dec_insn, condcount))
   {
     df_ref temp = df_bb_regno_only_def_find (loop->header, REGNO (condcount));
     /* If we haven't been able to find the decrement, bail out.  */
     if (!temp)
       return NULL;
     dec_insn = DF_REF_INSN (temp);
-  }
 
-  rtx dec_set = single_set (dec_insn);
-
-  /* Next, ensure that it is a PLUS of the form:
-     (set (reg a) (plus (reg a) (const_int)))
-     where (reg a) is the same as condcount.  */
-  if (!dec_set
-      || !REG_P (SET_DEST (dec_set))
-      || !REG_P (XEXP (SET_SRC (dec_set), 0))
-      || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
-      || REGNO (SET_DEST (dec_set))
-	  != REGNO (XEXP (SET_SRC (dec_set), 0))
-      || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
-    return NULL;
+    if (!check_dec_insn (dec_insn, condcount))
+      return NULL;
+  }
 
-  decrementnum = INTVAL (XEXP (SET_SRC (dec_set), 1));
+  decrementnum = INTVAL (XEXP (SET_SRC (single_set (dec_insn)), 1));
 
   /* This decrementnum is the number of lanes/elements it decrements from the
      remaining number of lanes/elements to process in the loop, for this reason
diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
new file mode 100644
index 0000000000000000000000000000000000000000..a1b26873d7908035c726e3724c91b186c697bc60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-options "-Ofast" } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+#pragma GCC arm "arm_mve_types.h"
+#pragma GCC arm "arm_mve.h" false
+typedef __attribute__((aligned(2))) float16x8_t e;
+mve_pred16_t c(long d) { return __builtin_mve_vctp16qv8bi(d); }
+int f();
+void n() {
+  int g, h, *i, j;
+  mve_pred16_t k;
+  e acc;
+  e l;
+  e m;
+  for (;;) {
+    j = g;
+    acc[g];
+    for (; h < g; h += 8) {
+      k = c(j);
+      acc = vfmsq_m(acc, l, m, k);
+      j -= 8;
+    }
+    i[g] = f(acc);
+  }
+}
+
Christophe Lyon July 31, 2024, 2:15 p.m. UTC | #3
Hi Andre,

On 7/31/24 11:29, Andre Vieira (lists) wrote:
> Hi Christophe,
> 
> Thanks for the comments, attached new version for testcase, see below 
> new cover letter:

Thanks for the improved cover letter, it is indeed clearer.

> 
> This patch refactors and fixes an issue where 
> arm_mve_dlstp_check_dec_counter
> was making an assumption about the form of what a candidate for a dec_insn.
> This dec_insn is the instruction that decreases the loop counter inside a
> decrementing loop and we expect it to have the following form:
> (set (reg CONDCOUNT)
>       (plus (reg CONDCOUNT)
>             (const_int)))
> 
> Where CONDCOUNT is the loop counter, and const int is the negative constant
> used to decrement it.
> 
> This patch also improves our search for a valid dec_insn.  Before this 
> patch
> we'd only look for a dec_insn inside the loop header if the loop latch was
> empty.  We now also search the loop header if the loop latch is not 
> empty but
> the last instruction is not a valid dec_insn.  This could potentially be 
> improved
> to search all instructions inside the loop latch.
> 
> gcc/ChangeLog:
> 
>      * config/arm/arm.cc (check_dec_insn): New helper function containing
>      code hoisted from...
>      (arm_mve_dlstp_check_dec_counter): ... here. Use check_dec_insn to
>      check the validity of the candidate dec_insn.
> 
> gcc/testsuite/ChangeLog:
> 
>      * gcc.targer/arm/mve/dlstp-loop-form.c: New test.
> 
> 
> On 30/07/2024 21:31, Christophe Lyon wrote:
>> I manually tried to exercise the testcase with a cross-compiler, and 
>> found the same issue as the Linaro CI should have reported (there was 
>> a temporary breakage).
>>
>> You can find detailed logs from Linaro in gcc.log.1.xz under 
>> https://ci.linaro.org/job/tcwg_gcc_check--master-arm-precommit/8357/artifact/artifacts/artifacts.precommit/00-sumfiles/
>>
>> Basically the testcase fails to compile with loads of
>> dlstp-loop-form.c:6:9: warning: 'pure' attribute on function returning 
>> 'void' [-Wattributes]
>> then
>>
>> dlstp-loop-form.c:7:37: error: unknown type name 'float16x8_t'; did 
>> you mean 'int16x8_t'?
>> dlstp-loop-form.c: In function 'n':
>> dlstp-loop-form.c:18:8: error: subscripted value is neither array nor 
>> pointer nor vector
>> dlstp-loop-form.c:21:13: error: passing 'e' {aka 'int'} to argument 2 
>> of 'vfmsq_m', which expects an MVE vector type
>>
>> Why would the test pass for you?
> 
> Because I tested with a toolchain configured for cortex-m85, which has 
> mve.fp enabled by default, which means I didn't realize the testcase 
> required arm_v8_1m_mve_fp_ok instead of arm_v8_1m_mve_ok.
> 
> Addressed that now.

Thanks, I thought you meant you ran the testsuite with -mcpu=cortex-m85 
in RUNTESTFLAGS.

Regarding the patch, did you consider making the new check_dec_insn 
helper return an rtx (NULL or dec_set) instead of bool?
I think it would save a call to single_set when computing decrementnum, 
but that's nitpicking.

Thanks,

Christophe
Andre Vieira (lists) July 31, 2024, 4:06 p.m. UTC | #4
This patch refactors and fixes an issue where 
arm_mve_dlstp_check_dec_counter
     was making an assumption about the form of what a candidate for a 
dec_insn
     should be, which caused an ICE.
     This dec_insn is the instruction that decreases the loop counter 
inside a
     decrementing loop and we expect it to have the following form:
     (set (reg CONDCOUNT)
          (plus (reg CONDCOUNT)
                (const_int)))

     Where CONDCOUNT is the loop counter, and const int is the negative 
constant
     used to decrement it.

     This patch also improves our search for a valid dec_insn.  Before 
this patch
     we'd only look for a dec_insn inside the loop header if the loop 
latch was
     empty.  We now also search the loop header if the loop latch is not 
empty but
     the last instruction is not a valid dec_insn.  This could 
potentially be improved
     to search all instructions inside the loop latch.

     gcc/ChangeLog:

             * config/arm/arm.cc (check_dec_insn): New helper function 
containing
             code hoisted from...
             (arm_mve_dlstp_check_dec_counter): ... here. Use 
check_dec_insn to
             check the validity of the candidate dec_insn.

     gcc/testsuite/ChangeLog:

             * gcc.targer/arm/mve/dlstp-loop-form.c: New test.

On 31/07/2024 15:15, Christophe Lyon wrote:
>> Because I tested with a toolchain configured for cortex-m85, which has 
>> mve.fp enabled by default, which means I didn't realize the testcase 
>> required arm_v8_1m_mve_fp_ok instead of arm_v8_1m_mve_ok.
>>
>> Addressed that now.
> 
> Thanks, I thought you meant you ran the testsuite with -mcpu=cortex-m85 
> in RUNTESTFLAGS.

To be fair, that's not a terrible assumption. But what I did was I 
configured the toolchain (and single multilib) for I ran them in a build 
configured for armv8.1-m.main+mve.fp+fp.dp and fpu=auto (and 
float-abi=hard).

> 
> Regarding the patch, did you consider making the new check_dec_insn 
> helper return an rtx (NULL or dec_set) instead of bool?
> I think it would save a call to single_set when computing decrementnum, 
> but that's nitpicking.

Yeah I had also contemplated that, I'm OK either way, doesn't look too 
bad with the rtx return. See attached.

> 
> Thanks,
> 
> Christophe
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 92cd168e65937ef7350477464e8b0becf85bceed..363a972170b37275372bb8bf30d510876021c8c0 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35214,6 +35214,32 @@ arm_mve_dlstp_check_inc_counter (loop *loop, rtx_insn* vctp_insn,
   return vctp_insn;
 }
 
+/* Helper function to 'arm_mve_dlstp_check_dec_counter' to make sure DEC_INSN
+   is of the expected form:
+   (set (reg a) (plus (reg a) (const_int)))
+   where (reg a) is the same as CONDCOUNT.
+   Return a rtx with the set if it is in the right format or NULL_RTX
+   otherwise.  */
+
+static rtx
+check_dec_insn (rtx_insn *dec_insn, rtx condcount)
+{
+  if (!NONDEBUG_INSN_P (dec_insn))
+    return NULL_RTX;
+  rtx dec_set = single_set (dec_insn);
+  if (!dec_set
+      || !REG_P (SET_DEST (dec_set))
+      || GET_CODE (SET_SRC (dec_set)) != PLUS
+      || !REG_P (XEXP (SET_SRC (dec_set), 0))
+      || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
+      || REGNO (SET_DEST (dec_set))
+	  != REGNO (XEXP (SET_SRC (dec_set), 0))
+      || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
+    return NULL_RTX;
+
+  return dec_set;
+}
+
 /* Helper function to `arm_mve_loop_valid_for_dlstp`.  In the case of a
    counter that is decrementing, ensure that it is decrementing by the
    right amount in each iteration and that the target condition is what
@@ -35230,30 +35256,19 @@ arm_mve_dlstp_check_dec_counter (loop *loop, rtx_insn* vctp_insn,
      loop latch.  Here we simply need to verify that this counter is the same
      reg that is also used in the vctp_insn and that it is not otherwise
      modified.  */
-  rtx_insn *dec_insn = BB_END (loop->latch);
+  rtx dec_set = check_dec_insn (BB_END (loop->latch), condcount);
   /* If not in the loop latch, try to find the decrement in the loop header.  */
-  if (!NONDEBUG_INSN_P (dec_insn))
+  if (dec_set == NULL_RTX)
   {
     df_ref temp = df_bb_regno_only_def_find (loop->header, REGNO (condcount));
     /* If we haven't been able to find the decrement, bail out.  */
     if (!temp)
       return NULL;
-    dec_insn = DF_REF_INSN (temp);
-  }
-
-  rtx dec_set = single_set (dec_insn);
+    dec_set = check_dec_insn (DF_REF_INSN (temp), condcount);
 
-  /* Next, ensure that it is a PLUS of the form:
-     (set (reg a) (plus (reg a) (const_int)))
-     where (reg a) is the same as condcount.  */
-  if (!dec_set
-      || !REG_P (SET_DEST (dec_set))
-      || !REG_P (XEXP (SET_SRC (dec_set), 0))
-      || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
-      || REGNO (SET_DEST (dec_set))
-	  != REGNO (XEXP (SET_SRC (dec_set), 0))
-      || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
-    return NULL;
+    if (dec_set == NULL_RTX)
+      return NULL;
+  }
 
   decrementnum = INTVAL (XEXP (SET_SRC (dec_set), 1));
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
new file mode 100644
index 0000000000000000000000000000000000000000..a1b26873d7908035c726e3724c91b186c697bc60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-options "-Ofast" } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+#pragma GCC arm "arm_mve_types.h"
+#pragma GCC arm "arm_mve.h" false
+typedef __attribute__((aligned(2))) float16x8_t e;
+mve_pred16_t c(long d) { return __builtin_mve_vctp16qv8bi(d); }
+int f();
+void n() {
+  int g, h, *i, j;
+  mve_pred16_t k;
+  e acc;
+  e l;
+  e m;
+  for (;;) {
+    j = g;
+    acc[g];
+    for (; h < g; h += 8) {
+      k = c(j);
+      acc = vfmsq_m(acc, l, m, k);
+      j -= 8;
+    }
+    i[g] = f(acc);
+  }
+}
+
Christophe Lyon July 31, 2024, 4:35 p.m. UTC | #5
On 7/31/24 18:06, Andre Vieira (lists) wrote:
> This patch refactors and fixes an issue where 
> arm_mve_dlstp_check_dec_counter
>      was making an assumption about the form of what a candidate for a 
> dec_insn
>      should be, which caused an ICE.
>      This dec_insn is the instruction that decreases the loop counter 
> inside a
>      decrementing loop and we expect it to have the following form:
>      (set (reg CONDCOUNT)
>           (plus (reg CONDCOUNT)
>                 (const_int)))
> 
>      Where CONDCOUNT is the loop counter, and const int is the negative 
> constant
>      used to decrement it.
> 
>      This patch also improves our search for a valid dec_insn.  Before 
> this patch
>      we'd only look for a dec_insn inside the loop header if the loop 
> latch was
>      empty.  We now also search the loop header if the loop latch is not 
> empty but
>      the last instruction is not a valid dec_insn.  This could 
> potentially be improved
>      to search all instructions inside the loop latch.
> 
>      gcc/ChangeLog:
> 
>              * config/arm/arm.cc (check_dec_insn): New helper function 
> containing
>              code hoisted from...
>              (arm_mve_dlstp_check_dec_counter): ... here. Use 
> check_dec_insn to
>              check the validity of the candidate dec_insn.
> 
>      gcc/testsuite/ChangeLog:
> 
>              * gcc.targer/arm/mve/dlstp-loop-form.c: New test.
> 
> On 31/07/2024 15:15, Christophe Lyon wrote:
>>> Because I tested with a toolchain configured for cortex-m85, which 
>>> has mve.fp enabled by default, which means I didn't realize the 
>>> testcase required arm_v8_1m_mve_fp_ok instead of arm_v8_1m_mve_ok.
>>>
>>> Addressed that now.
>>
>> Thanks, I thought you meant you ran the testsuite with 
>> -mcpu=cortex-m85 in RUNTESTFLAGS.
> 
> To be fair, that's not a terrible assumption. But what I did was I 
> configured the toolchain (and single multilib) for I ran them in a build 
> configured for armv8.1-m.main+mve.fp+fp.dp and fpu=auto (and 
> float-abi=hard).
> 
>>
>> Regarding the patch, did you consider making the new check_dec_insn 
>> helper return an rtx (NULL or dec_set) instead of bool?
>> I think it would save a call to single_set when computing 
>> decrementnum, but that's nitpicking.
> 
> Yeah I had also contemplated that, I'm OK either way, doesn't look too 
> bad with the rtx return. See attached.
> 

Thanks, LGTM.

Christophe

>>
>> Thanks,
>>
>> Christophe
diff mbox series

Patch

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 92cd168e65937ef7350477464e8b0becf85bceed..e145a69ed4c6c5b925c1dc2bc25b98f2a1af5849 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35214,6 +35214,30 @@  arm_mve_dlstp_check_inc_counter (loop *loop, rtx_insn* vctp_insn,
   return vctp_insn;
 }
 
+/* Helper function to 'arm_mve_dlstp_check_dec_counter' to make sure DEC_INSN
+   is of the expected form:
+   (set (reg a) (plus (reg a) (const_int)))
+   where (reg a) is the same as CONDCOUNT.  */
+
+static bool
+check_dec_insn (rtx_insn *dec_insn, rtx condcount)
+{
+  if (!NONDEBUG_INSN_P (dec_insn))
+    return false;
+  rtx dec_set = single_set (dec_insn);
+  if (!dec_set
+      || !REG_P (SET_DEST (dec_set))
+      || GET_CODE (SET_SRC (dec_set)) != PLUS
+      || !REG_P (XEXP (SET_SRC (dec_set), 0))
+      || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
+      || REGNO (SET_DEST (dec_set))
+	  != REGNO (XEXP (SET_SRC (dec_set), 0))
+      || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
+    return false;
+
+  return true;
+}
+
 /* Helper function to `arm_mve_loop_valid_for_dlstp`.  In the case of a
    counter that is decrementing, ensure that it is decrementing by the
    right amount in each iteration and that the target condition is what
@@ -35232,30 +35256,19 @@  arm_mve_dlstp_check_dec_counter (loop *loop, rtx_insn* vctp_insn,
      modified.  */
   rtx_insn *dec_insn = BB_END (loop->latch);
   /* If not in the loop latch, try to find the decrement in the loop header.  */
-  if (!NONDEBUG_INSN_P (dec_insn))
+  if (!check_dec_insn(dec_insn, condcount))
   {
     df_ref temp = df_bb_regno_only_def_find (loop->header, REGNO (condcount));
     /* If we haven't been able to find the decrement, bail out.  */
     if (!temp)
       return NULL;
     dec_insn = DF_REF_INSN (temp);
-  }
 
-  rtx dec_set = single_set (dec_insn);
-
-  /* Next, ensure that it is a PLUS of the form:
-     (set (reg a) (plus (reg a) (const_int)))
-     where (reg a) is the same as condcount.  */
-  if (!dec_set
-      || !REG_P (SET_DEST (dec_set))
-      || !REG_P (XEXP (SET_SRC (dec_set), 0))
-      || !CONST_INT_P (XEXP (SET_SRC (dec_set), 1))
-      || REGNO (SET_DEST (dec_set))
-	  != REGNO (XEXP (SET_SRC (dec_set), 0))
-      || REGNO (SET_DEST (dec_set)) != REGNO (condcount))
-    return NULL;
+    if (!check_dec_insn(dec_insn, condcount))
+      return NULL;
+  }
 
-  decrementnum = INTVAL (XEXP (SET_SRC (dec_set), 1));
+  decrementnum = INTVAL (XEXP (SET_SRC (single_set (dec_insn)), 1));
 
   /* This decrementnum is the number of lanes/elements it decrements from the
      remaining number of lanes/elements to process in the loop, for this reason
diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
new file mode 100644
index 0000000000000000000000000000000000000000..20c9dcd2984ba75701d853bcbba10caf12b5f2bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
@@ -0,0 +1,27 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-options "-Ofast" } */
+/* { dg-add-options arm_v8_1m_mve } */
+#pragma GCC arm "arm_mve_types.h"
+#pragma GCC arm "arm_mve.h" false
+typedef __attribute__((aligned(2))) float16x8_t e;
+mve_pred16_t c(long d) { return __builtin_mve_vctp16qv8bi(d); }
+int f();
+void n() {
+  int g, h, *i, j;
+  mve_pred16_t k;
+  e acc;
+  e l;
+  e m;
+  for (;;) {
+    j = g;
+    acc[g];
+    for (; h < g; h += 8) {
+      k = c(j);
+      acc = vfmsq_m(acc, l, m, k);
+      j -= 8;
+    }
+    i[g] = f(acc);
+  }
+}
+