diff mbox series

cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206]

Message ID CAFULd4YrNr8GovPOMAhP29hw0edQeF7eBSKFif3NRu4RYF69+A@mail.gmail.com
State New
Headers show
Series cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206] | expand

Commit Message

Uros Bizjak July 14, 2023, 6:22 a.m. UTC
cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
that it equals 8 elements of HImodeby setting REG_EQUAL note:

(insn 21 19 22 4 (set (reg:V4QI 98)
        (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
     (expr_list:REG_EQUAL (const_vector:V4QI [
                (const_int -52 [0xffffffffffffffcc]) repeated x4
            ])
        (nil)))
(insn 22 21 23 4 (set (reg:V8HI 100)
        (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
                (parallel [
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                        (const_int 2 [0x2])
                        (const_int 3 [0x3])
                        (const_int 4 [0x4])
                        (const_int 5 [0x5])
                        (const_int 6 [0x6])
                        (const_int 7 [0x7])
                    ])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2}
     (expr_list:REG_EQUAL (const_vector:V8HI [
                (const_int 204 [0xcc]) repeated x8
            ])
        (expr_list:REG_DEAD (reg:V4QI 98)
            (nil))))

We rely on the "undefined" vals to have a specific value (from the earlier
REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
need to).  That said, the issue isn't the constant folding per-se but that
we do not actually constant fold but register an equality that doesn't hold.

    PR target/110206

gcc/ChangeLog:

    * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
    * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
    * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
    * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
    when the original source contains a paradoxical subreg.

gcc/testsuite/ChangeLog:

    * gcc.dg/torture/pr110206.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline and backports?

Uros.

Comments

Richard Biener July 14, 2023, 8:31 a.m. UTC | #1
On Fri, 14 Jul 2023, Uros Bizjak wrote:

> cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
> that it equals 8 elements of HImodeby setting REG_EQUAL note:
> 
> (insn 21 19 22 4 (set (reg:V4QI 98)
>         (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
> A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
>      (expr_list:REG_EQUAL (const_vector:V4QI [
>                 (const_int -52 [0xffffffffffffffcc]) repeated x4
>             ])
>         (nil)))
> (insn 22 21 23 4 (set (reg:V8HI 100)
>         (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
>                 (parallel [
>                         (const_int 0 [0])
>                         (const_int 1 [0x1])
>                         (const_int 2 [0x2])
>                         (const_int 3 [0x3])
>                         (const_int 4 [0x4])
>                         (const_int 5 [0x5])
>                         (const_int 6 [0x6])
>                         (const_int 7 [0x7])
>                     ])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2}
>      (expr_list:REG_EQUAL (const_vector:V8HI [
>                 (const_int 204 [0xcc]) repeated x8
>             ])
>         (expr_list:REG_DEAD (reg:V4QI 98)
>             (nil))))
> 
> We rely on the "undefined" vals to have a specific value (from the earlier
> REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
> need to).  That said, the issue isn't the constant folding per-se but that
> we do not actually constant fold but register an equality that doesn't hold.
> 
>     PR target/110206
> 
> gcc/ChangeLog:
> 
>     * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
>     * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
>     * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
>     * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
>     when the original source contains a paradoxical subreg.
> 
> gcc/testsuite/ChangeLog:
> 
>     * gcc.dg/torture/pr110206.c: New test.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> OK for mainline and backports?

OK.

I think the testcase can also run on other targets if you add
dg-additional-options "-w -Wno-psabi", all generic vector ops
should be lowered if not supported.

Richard.
Uros Bizjak July 14, 2023, 8:43 a.m. UTC | #2
On Fri, Jul 14, 2023 at 10:31 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Fri, 14 Jul 2023, Uros Bizjak wrote:
>
> > cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
> > that it equals 8 elements of HImodeby setting REG_EQUAL note:
> >
> > (insn 21 19 22 4 (set (reg:V4QI 98)
> >         (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
> > A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
> >      (expr_list:REG_EQUAL (const_vector:V4QI [
> >                 (const_int -52 [0xffffffffffffffcc]) repeated x4
> >             ])
> >         (nil)))
> > (insn 22 21 23 4 (set (reg:V8HI 100)
> >         (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
> >                 (parallel [
> >                         (const_int 0 [0])
> >                         (const_int 1 [0x1])
> >                         (const_int 2 [0x2])
> >                         (const_int 3 [0x3])
> >                         (const_int 4 [0x4])
> >                         (const_int 5 [0x5])
> >                         (const_int 6 [0x6])
> >                         (const_int 7 [0x7])
> >                     ])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2}
> >      (expr_list:REG_EQUAL (const_vector:V8HI [
> >                 (const_int 204 [0xcc]) repeated x8
> >             ])
> >         (expr_list:REG_DEAD (reg:V4QI 98)
> >             (nil))))
> >
> > We rely on the "undefined" vals to have a specific value (from the earlier
> > REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
> > need to).  That said, the issue isn't the constant folding per-se but that
> > we do not actually constant fold but register an equality that doesn't hold.
> >
> >     PR target/110206
> >
> > gcc/ChangeLog:
> >
> >     * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
> >     * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
> >     * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
> >     * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
> >     when the original source contains a paradoxical subreg.
> >
> > gcc/testsuite/ChangeLog:
> >
> >     * gcc.dg/torture/pr110206.c: New test.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > OK for mainline and backports?
>
> OK.
>
> I think the testcase can also run on other targets if you add
> dg-additional-options "-w -Wno-psabi", all generic vector ops
> should be lowered if not supported.

True, but with lowered vector ops, the test would not even come close
to the problem. The problem is specific to generic vector ops, and can
be triggered only when paradoxical subregs are used to implement
(partial) vector modes. This is the case on x86, where partial vectors
are now heavily used, and even there we need the latest vector ISA
enabled to trip the condition.

The above is the reason that dg-torture is used, with the hope that
the runtime failure will trip when testsuite is run with specific
target options.

Uros.
Richard Biener July 14, 2023, 8:53 a.m. UTC | #3
On Fri, 14 Jul 2023, Uros Bizjak wrote:

> On Fri, Jul 14, 2023 at 10:31?AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Fri, 14 Jul 2023, Uros Bizjak wrote:
> >
> > > cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
> > > that it equals 8 elements of HImodeby setting REG_EQUAL note:
> > >
> > > (insn 21 19 22 4 (set (reg:V4QI 98)
> > >         (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
> > > A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
> > >      (expr_list:REG_EQUAL (const_vector:V4QI [
> > >                 (const_int -52 [0xffffffffffffffcc]) repeated x4
> > >             ])
> > >         (nil)))
> > > (insn 22 21 23 4 (set (reg:V8HI 100)
> > >         (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
> > >                 (parallel [
> > >                         (const_int 0 [0])
> > >                         (const_int 1 [0x1])
> > >                         (const_int 2 [0x2])
> > >                         (const_int 3 [0x3])
> > >                         (const_int 4 [0x4])
> > >                         (const_int 5 [0x5])
> > >                         (const_int 6 [0x6])
> > >                         (const_int 7 [0x7])
> > >                     ])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2}
> > >      (expr_list:REG_EQUAL (const_vector:V8HI [
> > >                 (const_int 204 [0xcc]) repeated x8
> > >             ])
> > >         (expr_list:REG_DEAD (reg:V4QI 98)
> > >             (nil))))
> > >
> > > We rely on the "undefined" vals to have a specific value (from the earlier
> > > REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
> > > need to).  That said, the issue isn't the constant folding per-se but that
> > > we do not actually constant fold but register an equality that doesn't hold.
> > >
> > >     PR target/110206
> > >
> > > gcc/ChangeLog:
> > >
> > >     * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
> > >     * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
> > >     * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
> > >     * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
> > >     when the original source contains a paradoxical subreg.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >     * gcc.dg/torture/pr110206.c: New test.
> > >
> > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> > >
> > > OK for mainline and backports?
> >
> > OK.
> >
> > I think the testcase can also run on other targets if you add
> > dg-additional-options "-w -Wno-psabi", all generic vector ops
> > should be lowered if not supported.
> 
> True, but with lowered vector ops, the test would not even come close
> to the problem. The problem is specific to generic vector ops, and can
> be triggered only when paradoxical subregs are used to implement
> (partial) vector modes. This is the case on x86, where partial vectors
> are now heavily used, and even there we need the latest vector ISA
> enabled to trip the condition.
> 
> The above is the reason that dg-torture is used, with the hope that
> the runtime failure will trip when testsuite is run with specific
> target options.

I see.  I'm fine with this then though moving to gcc.target/i386
with appropriate triggering options and a dg-require for runtime
support would also work.

Richard.
Uros Bizjak July 14, 2023, 9:44 a.m. UTC | #4
On Fri, Jul 14, 2023 at 10:53 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Fri, 14 Jul 2023, Uros Bizjak wrote:
>
> > On Fri, Jul 14, 2023 at 10:31?AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Fri, 14 Jul 2023, Uros Bizjak wrote:
> > >
> > > > cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
> > > > that it equals 8 elements of HImodeby setting REG_EQUAL note:
> > > >
> > > > (insn 21 19 22 4 (set (reg:V4QI 98)
> > > >         (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
> > > > A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
> > > >      (expr_list:REG_EQUAL (const_vector:V4QI [
> > > >                 (const_int -52 [0xffffffffffffffcc]) repeated x4
> > > >             ])
> > > >         (nil)))
> > > > (insn 22 21 23 4 (set (reg:V8HI 100)
> > > >         (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
> > > >                 (parallel [
> > > >                         (const_int 0 [0])
> > > >                         (const_int 1 [0x1])
> > > >                         (const_int 2 [0x2])
> > > >                         (const_int 3 [0x3])
> > > >                         (const_int 4 [0x4])
> > > >                         (const_int 5 [0x5])
> > > >                         (const_int 6 [0x6])
> > > >                         (const_int 7 [0x7])
> > > >                     ])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2}
> > > >      (expr_list:REG_EQUAL (const_vector:V8HI [
> > > >                 (const_int 204 [0xcc]) repeated x8
> > > >             ])
> > > >         (expr_list:REG_DEAD (reg:V4QI 98)
> > > >             (nil))))
> > > >
> > > > We rely on the "undefined" vals to have a specific value (from the earlier
> > > > REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
> > > > need to).  That said, the issue isn't the constant folding per-se but that
> > > > we do not actually constant fold but register an equality that doesn't hold.
> > > >
> > > >     PR target/110206
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >     * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
> > > >     * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
> > > >     * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
> > > >     * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
> > > >     when the original source contains a paradoxical subreg.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >     * gcc.dg/torture/pr110206.c: New test.
> > > >
> > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> > > >
> > > > OK for mainline and backports?
> > >
> > > OK.
> > >
> > > I think the testcase can also run on other targets if you add
> > > dg-additional-options "-w -Wno-psabi", all generic vector ops
> > > should be lowered if not supported.
> >
> > True, but with lowered vector ops, the test would not even come close
> > to the problem. The problem is specific to generic vector ops, and can
> > be triggered only when paradoxical subregs are used to implement
> > (partial) vector modes. This is the case on x86, where partial vectors
> > are now heavily used, and even there we need the latest vector ISA
> > enabled to trip the condition.
> >
> > The above is the reason that dg-torture is used, with the hope that
> > the runtime failure will trip when testsuite is run with specific
> > target options.
>
> I see.  I'm fine with this then though moving to gcc.target/i386
> with appropriate triggering options and a dg-require for runtime
> support would also work.

You are right. I'll add the attached testcase to gcc.target/i386 instead.

Uros.
diff mbox series

Patch

diff --git a/gcc/cprop.cc b/gcc/cprop.cc
index b7400c9a421..cf6facaa8c4 100644
--- a/gcc/cprop.cc
+++ b/gcc/cprop.cc
@@ -22,6 +22,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "backend.h"
 #include "rtl.h"
+#include "rtlanal.h"
 #include "cfghooks.h"
 #include "df.h"
 #include "insn-config.h"
@@ -795,7 +796,8 @@  try_replace_reg (rtx from, rtx to, rtx_insn *insn)
       /* If we've failed perform the replacement, have a single SET to
 	 a REG destination and don't yet have a note, add a REG_EQUAL note
 	 to not lose information.  */
-      if (!success && note == 0 && set != 0 && REG_P (SET_DEST (set)))
+      if (!success && note == 0 && set != 0 && REG_P (SET_DEST (set))
+	  && !contains_paradoxical_subreg_p (SET_SRC (set)))
 	note = set_unique_reg_note (insn, REG_EQUAL, copy_rtx (src));
     }
 
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index ae342f59407..0707a234726 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -25,6 +25,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "backend.h"
 #include "rtl.h"
+#include "rtlanal.h"
 #include "df.h"
 #include "rtl-ssa.h"
 
@@ -353,21 +354,6 @@  reg_single_def_p (rtx x)
   return REG_P (x) && crtl->ssa->single_dominating_def (REGNO (x));
 }
 
-/* Return true if X contains a paradoxical subreg.  */
-
-static bool
-contains_paradoxical_subreg_p (rtx x)
-{
-  subrtx_var_iterator::array_type array;
-  FOR_EACH_SUBRTX_VAR (iter, array, x, NONCONST)
-    {
-      x = *iter;
-      if (SUBREG_P (x) && paradoxical_subreg_p (x))
-	return true;
-    }
-  return false;
-}
-
 /* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of
    USE_INSN.  Return the number of substitutions on success, otherwise return
    -1 and leave USE_INSN unchanged.
diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
index 31707f3b90a..8b48fc243a1 100644
--- a/gcc/rtlanal.cc
+++ b/gcc/rtlanal.cc
@@ -6970,3 +6970,18 @@  vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel)
     }
   return false;
 }
+
+/* Return true if X contains a paradoxical subreg.  */
+
+bool
+contains_paradoxical_subreg_p (rtx x)
+{
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, x, NONCONST)
+    {
+      x = *iter;
+      if (SUBREG_P (x) && paradoxical_subreg_p (x))
+	return true;
+    }
+  return false;
+}
diff --git a/gcc/rtlanal.h b/gcc/rtlanal.h
index 9013e75c04b..4f0dea8e99f 100644
--- a/gcc/rtlanal.h
+++ b/gcc/rtlanal.h
@@ -338,4 +338,6 @@  vec_series_highpart_p (machine_mode result_mode, machine_mode op_mode,
 bool
 vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel);
 
+bool
+contains_paradoxical_subreg_p (rtx x);
 #endif
diff --git a/gcc/testsuite/gcc.dg/torture/pr110206.c b/gcc/testsuite/gcc.dg/torture/pr110206.c
new file mode 100644
index 00000000000..3a4f221ef47
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110206.c
@@ -0,0 +1,30 @@ 
+/* PR target/110206 */
+/* { dg-do run { target x86_64-*-* i?86-*-* } } */
+
+typedef unsigned char __attribute__((__vector_size__ (4))) U;
+typedef unsigned char __attribute__((__vector_size__ (8))) V;
+typedef unsigned short u16;
+
+V g;
+
+void
+__attribute__((noinline))
+foo (U u, u16 c, V *r)
+{
+  if (!c)
+    __builtin_abort ();
+  V x = __builtin_shufflevector (u, (204 >> u), 7, 0, 5, 1, 3, 5, 0, 2);
+  V y = __builtin_shufflevector (g, (V) { }, 7, 6, 6, 7, 2, 6, 3, 5);
+  V z = __builtin_shufflevector (y, 204 * x, 3, 9, 8, 1, 4, 6, 14, 5);
+  *r = z;
+}
+
+int
+main (void)
+{
+  V r;
+  foo ((U){4}, 5, &r);
+  if (r[6] != 0x30)
+    __builtin_abort();
+  return 0;
+}