diff mbox

match.pd: reassociate multiplications with constants

Message ID alpine.LNX.2.20.13.1707152000200.20069@monopod.intra.ispras.ru
State New
Headers show

Commit Message

Alexander Monakov July 15, 2017, 5:15 p.m. UTC
On Thu, 13 Jul 2017, Marc Glisse wrote:
> X*big*big where abs(big*big)>abs(INT_MIN) can be optimized to 0

I'm not sure that would be a win, eliminating X prevents the compiler from
deducing that X must be zero (if overflow invokes undefined behavior).

> the only hard case is when the product of the constants is -INT_MIN, which we
> could turn into X<<31 for instance (sadly loses range info), or (-X)*INT_MIN
> or whatever. That would make a nice follow-up, if you are interested.

Here's a patch that combines constants, but doesn't handle this case,
(I'm not sure we want to handle it, would it be useful in practice?)
and neither does substitute zero on overflow, per the above concern.

(slsr-4.c needs to be adjusted due to new simplifications)

	* match.pd ((X * CST1) * CST2): Simplify to X * (CST1 * CST2)
	if the product does not overflow.
testsuite:
	* gcc.dg/tree-ssa/assoc-2.c: Enhance.
	* gcc.dg/tree-ssa/slsr-4.c: Adjust.

Comments

Marc Glisse July 17, 2017, 8 p.m. UTC | #1
On Sat, 15 Jul 2017, Alexander Monakov wrote:

> On Thu, 13 Jul 2017, Marc Glisse wrote:
>> X*big*big where abs(big*big)>abs(INT_MIN) can be optimized to 0
>
> I'm not sure that would be a win, eliminating X prevents the compiler from
> deducing that X must be zero (if overflow invokes undefined behavior).

This issue is common to many simplifications. As far as I know, we almost 
never use an operation to deduce a range for its argument (the only case I 
can think of is dereferencing a pointer). Though this case is a bit 
special since we would determine a constant value, not just a range, so we 
could imagine walking through the uses of X that are dominated by the 
multiplication and replacing X by 0 there... but that's quite a bit of 
work for something hopefully rare. I wonder if the new VRP that's in the 
works changes anything there.

>> the only hard case is when the product of the constants is -INT_MIN, which we
>> could turn into X<<31 for instance (sadly loses range info), or (-X)*INT_MIN
>> or whatever. That would make a nice follow-up, if you are interested.
>
> Here's a patch that combines constants, but doesn't handle this case,
> (I'm not sure we want to handle it, would it be useful in practice?)

Probably not worth it indeed.

> and neither does substitute zero on overflow, per the above concern.
>
> (slsr-4.c needs to be adjusted due to new simplifications)
>
> 	* match.pd ((X * CST1) * CST2): Simplify to X * (CST1 * CST2)
> 	if the product does not overflow.
> testsuite:
> 	* gcc.dg/tree-ssa/assoc-2.c: Enhance.
> 	* gcc.dg/tree-ssa/slsr-4.c: Adjust.
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 36045f1..7f384db 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -283,6 +283,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || mul != wi::min_value (TYPE_PRECISION (type), SIGNED))
>      { build_zero_cst (type); })))))
>
> +/* Combine successive multiplications.  Similar to above, but handling
> +   overflow is different.  */
> +(simplify
> + (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2)
> + (with {
> +   bool overflow_p;
> +   wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), &overflow_p);
> +  }
> +  (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type))

I wonder if there are cases where this would cause trouble for saturating 
integers. The only case I can think of is when @2 is -1, but that's likely 
simplified to NEGATE_EXPR first.

> +   (mult @0 {wide_int_to_tree (type, mul); }))))
> +

There are a number of possible extensions (handle vectors, handle a 
conversion, etc) but I guess reviewers probably won't make those a 
requirement for this patch. Missing space before wide_int_to_tree.

> /* Optimize A / A to 1.0 if we don't care about
>    NaNs or Infinities.  */
> (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
> index a92c882..cc0e9d4 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
> @@ -5,4 +5,15 @@ int f0(int a, int b){
>   return a * 33 * b * 55;
> }
>
> -/* { dg-final { scan-tree-dump-times "mult_expr" 2 "gimple" } } */
> +int f1(int a){
> +  a *= 33;
> +  return a * 55;
> +}
> +
> +int f2(int a, int b){
> +  a *= 33;
> +  return a * b * 55;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "mult_expr" 7 "gimple" } } */
> +/* { dg-final { scan-tree-dump-times "mult_expr" 5 "optimized" } } */

Ah, so that's why you had -fdump-tree-optimized in the previous patch...

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> index 17d7b4c..1e943b7 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> @@ -23,13 +23,9 @@ f (int i)
>   foo (y);
> }
>
> -/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
> -/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
> -/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 40" 1 "slsr" } } */
> /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
> -/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
> /* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
> -/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 40" 1 "optimized" } } */
> /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
> /* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */
Alexander Monakov July 17, 2017, 8:49 p.m. UTC | #2
On Mon, 17 Jul 2017, Marc Glisse wrote:
> > +/* Combine successive multiplications.  Similar to above, but handling
> > +   overflow is different.  */
> > +(simplify
> > + (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2)
> > + (with {
> > +   bool overflow_p;
> > +   wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), &overflow_p);
> > +  }
> > +  (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type))
> 
> I wonder if there are cases where this would cause trouble for saturating
> integers. The only case I can think of is when @2 is -1, but that's likely
> simplified to NEGATE_EXPR first.

Ah, yes, I think if @2 is -1 or 0 then we should not attempt this transform for
either saturating or sanitized types, just like in the first patch. I think
wrapping the 'with' with 'if (!integer_minus_onep (@2) && !integer_zerop (@2))'
works, since as you say it should become a negate/zero anyway?

Alexander
diff mbox

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 36045f1..7f384db 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -283,6 +283,17 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
         || mul != wi::min_value (TYPE_PRECISION (type), SIGNED))
      { build_zero_cst (type); })))))
 
+/* Combine successive multiplications.  Similar to above, but handling
+   overflow is different.  */
+(simplify
+ (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2)
+ (with {
+   bool overflow_p;
+   wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), &overflow_p);
+  }
+  (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type))
+   (mult @0 {wide_int_to_tree (type, mul); }))))
+
 /* Optimize A / A to 1.0 if we don't care about
    NaNs or Infinities.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
index a92c882..cc0e9d4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
@@ -5,4 +5,15 @@  int f0(int a, int b){
   return a * 33 * b * 55;
 }
 
-/* { dg-final { scan-tree-dump-times "mult_expr" 2 "gimple" } } */
+int f1(int a){
+  a *= 33;
+  return a * 55;
+}
+
+int f2(int a, int b){
+  a *= 33;
+  return a * b * 55;
+}
+
+/* { dg-final { scan-tree-dump-times "mult_expr" 7 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "mult_expr" 5 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
index 17d7b4c..1e943b7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
@@ -23,13 +23,9 @@  f (int i)
   foo (y);
 }
 
-/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
+/* { dg-final { scan-tree-dump-times "\\* 40" 1 "slsr" } } */
 /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
 /* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\* 40" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */