diff mbox series

rtl-ssa: Fix splitting of clobber groups [PR108508]

Message ID mptlelge6uc.fsf@arm.com
State New
Headers show
Series rtl-ssa: Fix splitting of clobber groups [PR108508] | expand

Commit Message

Richard Sandiford Feb. 2, 2023, 9:48 a.m. UTC
Since rtl-ssa isn't a real/native SSA representation, it has
to honour the constraints of the underlying rtl representation.
Part of this involves maintaining an rpo list of definitions
for each rtl register, backed by a splay tree where necessary
for quick lookup/insertion.

However, clobbers of a register don't act as barriers to
other clobbers of a register.  E.g. it's possible to move one
flag-clobbering instruction across an arbitrary number of other
flag-clobbering instructions.  In order to allow passes to do
that without quadratic complexity, the splay tree groups all
consecutive clobbers into groups, with only the group being
entered into the splay tree.  These groups in turn have an
internal splay tree of clobbers where necessary.

This means that, if we insert a new definition and use into
the middle of a sea of clobbers, we need to split the clobber
group into two groups.  This was quite a difficult condition
to trigger during development, and the PR shows that the code
to handle it had (at least) two bugs.

First, the process involves searching the clobber tree for
the split point.  This search can give either the previous
clobber (which will belong to the first of the split groups)
or the next clobber (which will belong to the second of the
split groups).  The code for the former case handled the
split correctly but the code for the latter case didn't.

Second, I'd forgotten to add the second clobber group to the
main splay tree. :-(

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK for trunk
& GCC 12?  Although the testcase is "only" a regression from GCC 12,
I think the rtl-ssa patch should be backported to GCC 11 too.

Richard


gcc/
	PR rtl-optimization/108508
	* rtl-ssa/accesses.cc (function_info::split_clobber_group): When
	the splay tree search gives the first clobber in the second group,
	make sure that the root of the first clobber group is updated
	correctly.  Enter the new clobber group into the definition splay
	tree.

gcc/testsuite/
	PR rtl-optimization/108508
	* gcc.target/aarch64/pr108508.c: New test.
---
 gcc/rtl-ssa/accesses.cc                     | 14 ++++++++---
 gcc/testsuite/gcc.target/aarch64/pr108508.c | 28 +++++++++++++++++++++
 2 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr108508.c

Comments

Richard Biener Feb. 2, 2023, 12:30 p.m. UTC | #1
On Thu, Feb 2, 2023 at 10:49 AM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Since rtl-ssa isn't a real/native SSA representation, it has
> to honour the constraints of the underlying rtl representation.
> Part of this involves maintaining an rpo list of definitions
> for each rtl register, backed by a splay tree where necessary
> for quick lookup/insertion.
>
> However, clobbers of a register don't act as barriers to
> other clobbers of a register.  E.g. it's possible to move one
> flag-clobbering instruction across an arbitrary number of other
> flag-clobbering instructions.  In order to allow passes to do
> that without quadratic complexity, the splay tree groups all
> consecutive clobbers into groups, with only the group being
> entered into the splay tree.  These groups in turn have an
> internal splay tree of clobbers where necessary.
>
> This means that, if we insert a new definition and use into
> the middle of a sea of clobbers, we need to split the clobber
> group into two groups.  This was quite a difficult condition
> to trigger during development, and the PR shows that the code
> to handle it had (at least) two bugs.
>
> First, the process involves searching the clobber tree for
> the split point.  This search can give either the previous
> clobber (which will belong to the first of the split groups)
> or the next clobber (which will belong to the second of the
> split groups).  The code for the former case handled the
> split correctly but the code for the latter case didn't.
>
> Second, I'd forgotten to add the second clobber group to the
> main splay tree. :-(
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK for trunk
> & GCC 12?  Although the testcase is "only" a regression from GCC 12,
> I think the rtl-ssa patch should be backported to GCC 11 too.

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
>         PR rtl-optimization/108508
>         * rtl-ssa/accesses.cc (function_info::split_clobber_group): When
>         the splay tree search gives the first clobber in the second group,
>         make sure that the root of the first clobber group is updated
>         correctly.  Enter the new clobber group into the definition splay
>         tree.
>
> gcc/testsuite/
>         PR rtl-optimization/108508
>         * gcc.target/aarch64/pr108508.c: New test.
> ---
>  gcc/rtl-ssa/accesses.cc                     | 14 ++++++++---
>  gcc/testsuite/gcc.target/aarch64/pr108508.c | 28 +++++++++++++++++++++
>  2 files changed, 38 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr108508.c
>
> diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
> index 03b9a475d3b..f12b5f4dd77 100644
> --- a/gcc/rtl-ssa/accesses.cc
> +++ b/gcc/rtl-ssa/accesses.cc
> @@ -794,23 +794,26 @@ function_info::merge_clobber_groups (clobber_info *clobber1,
>  // GROUP spans INSN, and INSN now sets the resource that GROUP clobbers.
>  // Split GROUP around INSN and return the clobber that comes immediately
>  // before INSN.
> +//
> +// The resource that GROUP clobbers is known to have an associated
> +// splay tree.
>  clobber_info *
>  function_info::split_clobber_group (clobber_group *group, insn_info *insn)
>  {
>    // Search for either the previous or next clobber in the group.
>    // The result is less than zero if CLOBBER should come before NEIGHBOR
>    // or greater than zero if CLOBBER should come after NEIGHBOR.
> -  int comparison = lookup_clobber (group->m_clobber_tree, insn);
> +  clobber_tree &tree1 = group->m_clobber_tree;
> +  int comparison = lookup_clobber (tree1, insn);
>    gcc_checking_assert (comparison != 0);
> -  clobber_info *neighbor = group->m_clobber_tree.root ();
> +  clobber_info *neighbor = tree1.root ();
>
> -  clobber_tree tree1, tree2;
> +  clobber_tree tree2;
>    clobber_info *prev;
>    clobber_info *next;
>    if (comparison > 0)
>      {
>        // NEIGHBOR is the last clobber in what will become the first group.
> -      tree1 = neighbor;
>        tree2 = tree1.split_after_root ();
>        prev = neighbor;
>        next = as_a<clobber_info *> (prev->next_def ());
> @@ -843,6 +846,9 @@ function_info::split_clobber_group (clobber_group *group, insn_info *insn)
>    tree2->set_group (group2);
>    last_clobber->set_group (group2);
>
> +  // Insert GROUP2 into the splay tree as an immediate successor of GROUP1.
> +  def_splay_tree::insert_child (group1, 1, group2);
> +
>    return prev;
>  }
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr108508.c b/gcc/testsuite/gcc.target/aarch64/pr108508.c
> new file mode 100644
> index 00000000000..e97896b6a1b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr108508.c
> @@ -0,0 +1,28 @@
> +/* { dg-options "-O3 -fharden-conditional-branches -fno-dce -fno-guess-branch-probability" } */
> +
> +#include <arm_neon.h>
> +
> +int
> +test_vld3q_lane_f64 (void)
> +{
> +  float64x2x3_t vectors;
> +  float64_t temp[2];
> +  int i, j;
> +
> +  for (i = 0; i < 3; i++)
> +  {
> +    vst1q_f64 (temp, vectors.val[i]);
> +    for (j = 0; j < 2; j++)
> +      if (temp[j])
> +        return 1;
> +  }
> +
> +  return 0;
> +}
> +
> +void
> +foo (void)
> +{
> +  if (test_vld3q_lane_f64 () || test_vld3q_lane_f64 ())
> +    __builtin_abort ();
> +}
> --
> 2.25.1
>
diff mbox series

Patch

diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index 03b9a475d3b..f12b5f4dd77 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -794,23 +794,26 @@  function_info::merge_clobber_groups (clobber_info *clobber1,
 // GROUP spans INSN, and INSN now sets the resource that GROUP clobbers.
 // Split GROUP around INSN and return the clobber that comes immediately
 // before INSN.
+//
+// The resource that GROUP clobbers is known to have an associated
+// splay tree.
 clobber_info *
 function_info::split_clobber_group (clobber_group *group, insn_info *insn)
 {
   // Search for either the previous or next clobber in the group.
   // The result is less than zero if CLOBBER should come before NEIGHBOR
   // or greater than zero if CLOBBER should come after NEIGHBOR.
-  int comparison = lookup_clobber (group->m_clobber_tree, insn);
+  clobber_tree &tree1 = group->m_clobber_tree;
+  int comparison = lookup_clobber (tree1, insn);
   gcc_checking_assert (comparison != 0);
-  clobber_info *neighbor = group->m_clobber_tree.root ();
+  clobber_info *neighbor = tree1.root ();
 
-  clobber_tree tree1, tree2;
+  clobber_tree tree2;
   clobber_info *prev;
   clobber_info *next;
   if (comparison > 0)
     {
       // NEIGHBOR is the last clobber in what will become the first group.
-      tree1 = neighbor;
       tree2 = tree1.split_after_root ();
       prev = neighbor;
       next = as_a<clobber_info *> (prev->next_def ());
@@ -843,6 +846,9 @@  function_info::split_clobber_group (clobber_group *group, insn_info *insn)
   tree2->set_group (group2);
   last_clobber->set_group (group2);
 
+  // Insert GROUP2 into the splay tree as an immediate successor of GROUP1.
+  def_splay_tree::insert_child (group1, 1, group2);
+
   return prev;
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr108508.c b/gcc/testsuite/gcc.target/aarch64/pr108508.c
new file mode 100644
index 00000000000..e97896b6a1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr108508.c
@@ -0,0 +1,28 @@ 
+/* { dg-options "-O3 -fharden-conditional-branches -fno-dce -fno-guess-branch-probability" } */
+
+#include <arm_neon.h>
+
+int
+test_vld3q_lane_f64 (void)
+{
+  float64x2x3_t vectors;
+  float64_t temp[2];
+  int i, j;
+
+  for (i = 0; i < 3; i++)
+  {
+    vst1q_f64 (temp, vectors.val[i]);
+    for (j = 0; j < 2; j++)
+      if (temp[j])
+        return 1;
+  }
+
+  return 0;
+}
+
+void
+foo (void)
+{
+  if (test_vld3q_lane_f64 () || test_vld3q_lane_f64 ())
+    __builtin_abort ();
+}