diff mbox

Fix PR77881: combine improvement

Message ID alpine.LSU.2.20.1610201615340.5714@wotan.suse.de
State New
Headers show

Commit Message

Michael Matz Oct. 20, 2016, 2:20 p.m. UTC
Hello,

like analyzed in the PR, combine is able to remove outer subregs that 
don't do anything interesting in the context they are used 
(simplify_comparison).  But that currently happens outside of the loop 
that retries simplifications if changes occurred.

When we do that inside the loop as well we get secondary simplifications 
that currently only happen when calling the simplifiers multiple time, 
like when we start from three rather than from two instructions.  So 
right now we're in the curious position that more complicated code is 
optimized better than simpler code and the patch fixes this.

(FWIW: this replicates parts of rather than moves the responsible code, 
because between the loop and the original place of simplification other 
things happen that might itself generate subregs).

Regstrapping on x86-64, all languages in process.  Okay if that passes?


Ciao,
Michael.
	PR missed-optimization/77881
	* combine.c (simplify_comparison): Remove useless subregs
	also inside the loop, not just after it.

testsuite/
	* gcc.target/i386/pr77881.c: New test.

Comments

Jeff Law Oct. 20, 2016, 7:08 p.m. UTC | #1
On 10/20/2016 08:20 AM, Michael Matz wrote:
> Hello,
>
> like analyzed in the PR, combine is able to remove outer subregs that
> don't do anything interesting in the context they are used
> (simplify_comparison).  But that currently happens outside of the loop
> that retries simplifications if changes occurred.
>
> When we do that inside the loop as well we get secondary simplifications
> that currently only happen when calling the simplifiers multiple time,
> like when we start from three rather than from two instructions.  So
> right now we're in the curious position that more complicated code is
> optimized better than simpler code and the patch fixes this.
>
> (FWIW: this replicates parts of rather than moves the responsible code,
> because between the loop and the original place of simplification other
> things happen that might itself generate subregs).
>
> Regstrapping on x86-64, all languages in process.  Okay if that passes?
>
>
> Ciao,
> Michael.
> 	PR missed-optimization/77881
> 	* combine.c (simplify_comparison): Remove useless subregs
> 	also inside the loop, not just after it.
>
> testsuite/
> 	* gcc.target/i386/pr77881.c: New test.
LGTM.  I was a bit curious why you duplicated rather than factoring, but 
it looks like you simplified the copy a bit by not handling the 
paradoxical subreg case.


There's an outside chance this might help a couple BZs that I've poked 
at in the past.  I'll make a point to test them again this release cycle 
to see if your change improves them (I don't have the #s handy, I think 
they're P4/P5 regressions).

jeff
Segher Boessenkool Nov. 12, 2016, 11:48 a.m. UTC | #2
Hi Michael,

On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote:
> 	PR missed-optimization/77881
> 	* combine.c (simplify_comparison): Remove useless subregs
> 	also inside the loop, not just after it.
> 
> testsuite/
> 	* gcc.target/i386/pr77881.c: New test.

This isn't checked in yet, do you still want it?

Thanks,


Segher
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 2727683..58351ff 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11925,6 +11925,28 @@  simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 	  if (subreg_lowpart_p (op0)
 	      && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width)
 	    ;
+ 	  else if (subreg_lowpart_p (op0)
+ 		   && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
+ 		   && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT
+ 		   && (code == NE || code == EQ)
+ 		   && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0)))
+ 		       <= HOST_BITS_PER_WIDE_INT)
+ 		   && !paradoxical_subreg_p (op0)
+ 		   && (nonzero_bits (SUBREG_REG (op0),
+ 				     GET_MODE (SUBREG_REG (op0)))
+ 		       & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ 	    {
+	      /* Remove outer subregs that don't do anything.  */
+ 	      tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1);
+ 
+ 	      if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0)))
+ 		   & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ 		{
+ 		  op0 = SUBREG_REG (op0), op1 = tem;
+ 		  continue;
+ 		}
+ 	      break;
+ 	    }
 	  else
 	    break;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr77881.c b/gcc/testsuite/gcc.target/i386/pr77881.c
new file mode 100644
index 0000000..80d143f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr77881.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-O2" } */
+extern void baz(void);
+int
+foo (long long int a, long long int a2, int b)
+{
+    if (a < 0 || b)
+          baz ();
+}
+/* { dg-final { scan-assembler "js\[ \t\]\.L" } } */
+/* { dg-final { scan-assembler "jne\[ \t\]\.L" } } */