diff mbox series

[committed,PR,rtl-optimization/116039] Fix life computation for promoted subregs

Message ID 5e90a119-181a-4444-a6bd-9603fff13dda@gmail.com
State New
Headers show
Series [committed,PR,rtl-optimization/116039] Fix life computation for promoted subregs | expand

Commit Message

Jeff Law July 25, 2024, 6:37 p.m. UTC
So this turned out to be a neat little test and while the fuzzer found 
it on RISC-V, I wouldn't be surprised if the underlying issue is also 
the root cause of the loongarch issue with ext-dce.

The key issue is that if we have something like

(set (dest) (any_extend (subreg (source))))

If the subreg object is marked with SUBREG_PROMOTED and the 
sign/unsigned state matches the any_extend opcode, then combine (and I 
guess anything using simplify-rtx) may simplify that to

(set (dest) (source))

That implies that bits outside the mode of the subreg are actually live 
and valid.  This needs to be accounted for during liveness computation.

We have to be careful here though. If we're too conservative about 
setting additional bits live, then we'll inhibit the desired 
optimization in the coremark examples.  To do a good job we need to know 
the extension opcode.

I'm extremely unhappy with how the use handling works in ext-dce.  It 
mixes different conceptual steps and has horribly complex control flow. 
It only handles a subset of the unary/binary opcodes, etc etc.  It's 
just damn mess.    It's going to need some more noodling around.

In the mean time this is a bit hacky in that it depends on non-obvious 
behavior to know it can get the extension opcode, but I don't want to 
leave the trunk in a broken state while I figure out the refactoring 
problem.



Bootstrapped and regression tested on x86 and tested on the crosses. 
Pushing to the trunk.

Jeff
commit 34fb0feca71f763b2fbe832548749666d34a4a76
Author: Jeff Law <jlaw@ventanamicro.com>
Date:   Thu Jul 25 12:32:28 2024 -0600

    [PR rtl-optimization/116039] Fix life computation for promoted subregs
    
    So this turned out to be a neat little test and while the fuzzer found it on
    RISC-V, I wouldn't be surprised if the underlying issue is also the root cause
    of the loongarch issue with ext-dce.
    
    The key issue is that if we have something like
    
    (set (dest) (any_extend (subreg (source))))
    
    If the subreg object is marked with SUBREG_PROMOTED and the sign/unsigned state
    matches the any_extend opcode, then combine (and I guess anything using
    simplify-rtx) may simplify that to
    
    (set (dest) (source))
    
    That implies that bits outside the mode of the subreg are actually live and
    valid.  This needs to be accounted for during liveness computation.
    
    We have to be careful here though. If we're too conservative about setting
    additional bits live, then we'll inhibit the desired optimization in the
    coremark examples.  To do a good job we need to know the extension opcode.
    
    I'm extremely unhappy with how the use handling works in ext-dce.  It mixes
    different conceptual steps and has horribly complex control flow.  It only
    handles a subset of the unary/binary opcodes, etc etc.  It's just damn mess.
    It's going to need some more noodling around.
    
    In the mean time this is a bit hacky in that it depends on non-obvious behavior
    to know it can get the extension opcode, but I don't want to leave the trunk in
    a broken state while I figure out the refactoring problem.
    
    Bootstrapped and regression tested on x86 and tested on the crosses.  Pushing to the trunk.
    
            PR rtl-optimization/116039
    gcc/
            * ext-dce.cc (ext_dce_process_uses): Add some comments about concerns
            with current code.  Mark additional bit groups as live when we have
            an extension of a suitably promoted subreg.
    
    gcc/testsuite
            * gcc.dg/torture/pr116039.c: New test.
diff mbox series

Patch

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index c94d1fc3414..14f163a01d6 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -667,6 +667,12 @@  ext_dce_process_uses (rtx_insn *insn, rtx obj,
 		  if (modify && !skipped_dest && (dst_mask & ~src_mask) == 0)
 		    ext_dce_try_optimize_insn (insn, x);
 
+		  /* Stripping the extension here just seems wrong on multiple
+		     levels.  It's source side handling, so it seems like it
+		     belongs in the loop below.  Stripping here also makes it
+		     harder than necessary to properly handle live bit groups
+		     for (ANY_EXTEND (SUBREG)) where the SUBREG has
+		     SUBREG_PROMOTED state.  */
 		  dst_mask &= src_mask;
 		  src = XEXP (src, 0);
 		  code = GET_CODE (src);
@@ -674,8 +680,8 @@  ext_dce_process_uses (rtx_insn *insn, rtx obj,
 
 	      /* Optimization is done at this point.  We just want to make
 		 sure everything that should get marked as live is marked
-		 from here onward.  */
-
+		 from here onward.  Shouldn't the backpropagate step happen
+		 before optimization?  */
 	      dst_mask = carry_backpropagate (dst_mask, code, src);
 
 	      /* We will handle the other operand of a binary operator
@@ -688,7 +694,11 @@  ext_dce_process_uses (rtx_insn *insn, rtx obj,
 	      /* We're inside a SET and want to process the source operands
 		 making things live.  Breaking from this loop will cause
 		 the iterator to work on sub-rtxs, so it is safe to break
-		 if we see something we don't know how to handle.  */
+		 if we see something we don't know how to handle.
+
+		 This code is just hokey as it really just handles trivial
+		 unary and binary cases.  Otherwise the loop exits and we
+		 continue iterating on sub-rtxs, but outside the set context.  */
 	      unsigned HOST_WIDE_INT save_mask = dst_mask;
 	      for (;;)
 		{
@@ -704,10 +714,26 @@  ext_dce_process_uses (rtx_insn *insn, rtx obj,
 		    y = XEXP (y, 0);
 		  else if (SUBREG_P (y) && SUBREG_BYTE (y).is_constant ())
 		    {
-		      /* For anything but (subreg (reg)), break the inner loop
-			 and process normally (conservatively).  */
-		      if (!REG_P (SUBREG_REG (y)))
+		      /* We really want to know the outer code here, ie do we
+			 have (ANY_EXTEND (SUBREG ...)) as we need to know if
+			 the extension matches the SUBREG_PROMOTED state.  In
+			 that case optimizers can turn the extension into a
+			 simple copy.  Which means that bits outside the
+			 SUBREG's mode are actually live.
+
+			 We don't want to mark those bits live unnecessarily
+			 as that inhibits extension elimination in important
+			 cases such as those in Coremark.  So we need that
+			 outer code.  */
+		      if (!REG_P (SUBREG_REG (y))
+			  || (SUBREG_PROMOTED_VAR_P (y)
+			      && ((GET_CODE (SET_SRC (x)) == SIGN_EXTEND
+				   && SUBREG_PROMOTED_SIGNED_P (y))
+				  || (GET_CODE (SET_SRC (x)) == ZERO_EXTEND
+				      && SUBREG_PROMOTED_UNSIGNED_P (y)))))
 			break;
+
+		      /* The SUBREG's mode determine the live width.  */
 		      bit = subreg_lsb (y).to_constant ();
 		      if (dst_mask)
 			{
@@ -785,6 +811,11 @@  ext_dce_process_uses (rtx_insn *insn, rtx obj,
 	  HOST_WIDE_INT size = GET_MODE_BITSIZE (GET_MODE (x)).to_constant ();
 	  HOST_WIDE_INT rn = 4 * REGNO (SUBREG_REG (x));
 
+	  /* If this is a promoted subreg, then more of it may be live than
+	     is otherwise obvious.  */
+	  if (SUBREG_PROMOTED_VAR_P (x))
+	    size = GET_MODE_BITSIZE (GET_MODE (SUBREG_REG (x))).to_constant ();
+
 	  bitmap_set_bit (livenow, rn);
 	  if (size > 8)
 	    bitmap_set_bit (livenow, rn + 1);
diff --git a/gcc/testsuite/gcc.dg/torture/pr116039.c b/gcc/testsuite/gcc.dg/torture/pr116039.c
new file mode 100644
index 00000000000..d67b9326de7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116039.c
@@ -0,0 +1,20 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-fsigned-char -fno-strict-aliasing -fwrapv" } */
+
+extern void abort (void);
+
+int c[12];
+char d[12];
+int *f = c;
+int *z = (int *)1;
+long long y;
+int main() {
+  c[9] = 0xff;
+  for (int i = 0; i < 12; i += 3)
+    d[9] = z ? f[i] : 0;
+  for (long i = 0; i < 12; ++i)
+    y ^= d[i];
+  if (y != -1)
+    abort ();
+  return 0;
+}