diff mbox series

Improved constant folding for scalar evolution.

Message ID 00a701d82660$a1b8ac20$e52a0460$@nextmovesoftware.com
State New
Headers show
Series Improved constant folding for scalar evolution. | expand

Commit Message

Roger Sayle Feb. 20, 2022, 1:49 p.m. UTC
This patch adds a small (follow-up) optimization to chrec_apply for
linear chrecs to clean-up the final value expressions sometimes generated
by GCC's scalar evolution pass.  The transformation of A+(X-1)*A into
A*X is usually unsafe with respect to overflow (see PR92712), and so
can't be performed by match.pd (or fold-const).  However, during scalar
evolution's evaluation of recurrences it's known that X-1 can't be negative
(in fact X-1 is unsigned even when A is signed), hence this optimization
can be applied.  Interestingly, this expression does get simplified in
later passes once the range of X-1 is bounded by VRP, but that occurs
long after the decision of whether to perform final value replacement,
which is based on the complexity of this expression.

The motivating test case is the optimization of the loop (from comment
#7 of PR65855):

int square(int x) {
  int result = 0;
  for (int i = 0; i < x; ++i)
    result += x;
  return result;
}

which is currently optimized, with final value replacement to:

  final value replacement:
   with expr: (int) ((unsigned int) x_3(D) + 4294967295) * x_3(D) + x_3(D)

but with this patch it first gets simplified further:

  final value replacement:
   with expr: x_3(D) * x_3(D)


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-02-20  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-chrec.cc (chrec_apply): Attempt to fold the linear chrec
	"{a, +, a} (x-1)" as "a*x", as the number of loop iterations, x-1,
	can't be negative.

gcc/testsuite/ChangeLog
	* gcc.dg/tree-ssa/pr65855-2.c: New test case.


Roger
--

Comments

Richard Biener Feb. 21, 2022, 7:26 a.m. UTC | #1
On Sun, Feb 20, 2022 at 2:50 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch adds a small (follow-up) optimization to chrec_apply for
> linear chrecs to clean-up the final value expressions sometimes generated
> by GCC's scalar evolution pass.  The transformation of A+(X-1)*A into
> A*X is usually unsafe with respect to overflow (see PR92712), and so
> can't be performed by match.pd (or fold-const).  However, during scalar
> evolution's evaluation of recurrences it's known that X-1 can't be negative
> (in fact X-1 is unsigned even when A is signed), hence this optimization
> can be applied.  Interestingly, this expression does get simplified in
> later passes once the range of X-1 is bounded by VRP, but that occurs
> long after the decision of whether to perform final value replacement,
> which is based on the complexity of this expression.

In principle A + (X-1)*A can be always simplified to (unsigned)A * (unsigned)X,
but at least fold-consts fold_plusminus_mult has

  /* Do not resort to unsigned multiplication because
     we lose the no-overflow property of the expression.  */
  return NULL_TREE;

we might want to heuristically do that anyway if the result is not a
multiplication
by a constant (I remember doing the above because of testsuite regressions).

It might be also interesting to see whether we change back
(int)((unsigned)A * (unsigned)X)
to A * X when we can constrain ranges.

In the specific case of the testcase below we of course only know overflow
doesn't happen because it would be undefined behavior.

> The motivating test case is the optimization of the loop (from comment
> #7 of PR65855):
>
> int square(int x) {
>   int result = 0;
>   for (int i = 0; i < x; ++i)
>     result += x;
>   return result;
> }
>
> which is currently optimized, with final value replacement to:
>
>   final value replacement:
>    with expr: (int) ((unsigned int) x_3(D) + 4294967295) * x_3(D) + x_3(D)
>
> but with this patch it first gets simplified further:
>
>   final value replacement:
>    with expr: x_3(D) * x_3(D)
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?

OK once stage1 opens.

Thanks,
Richard.

>
> 2022-02-20  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * tree-chrec.cc (chrec_apply): Attempt to fold the linear chrec
>         "{a, +, a} (x-1)" as "a*x", as the number of loop iterations, x-1,
>         can't be negative.
>
> gcc/testsuite/ChangeLog
>         * gcc.dg/tree-ssa/pr65855-2.c: New test case.
>
>
> Roger
> --
>
diff mbox series

Patch

diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc
index c44cea7..7321fb9 100644
--- a/gcc/tree-chrec.cc
+++ b/gcc/tree-chrec.cc
@@ -612,16 +612,31 @@  chrec_apply (unsigned var,
     case POLYNOMIAL_CHREC:
       if (evolution_function_is_affine_p (chrec))
 	{
+	  tree chrecr = CHREC_RIGHT (chrec);
 	  if (CHREC_VARIABLE (chrec) != var)
-	    return build_polynomial_chrec
+	    res = build_polynomial_chrec
 	      (CHREC_VARIABLE (chrec),
 	       chrec_apply (var, CHREC_LEFT (chrec), x),
-	       chrec_apply (var, CHREC_RIGHT (chrec), x));
+	       chrec_apply (var, chrecr, x));
 
 	  /* "{a, +, b} (x)"  ->  "a + b*x".  */
-	  x = chrec_convert_rhs (type, x, NULL);
-	  res = chrec_fold_multiply (TREE_TYPE (x), CHREC_RIGHT (chrec), x);
-	  res = chrec_fold_plus (type, CHREC_LEFT (chrec), res);
+	  else if (operand_equal_p (CHREC_LEFT (chrec), chrecr)
+		   && TREE_CODE (x) == PLUS_EXPR
+		   && integer_all_onesp (TREE_OPERAND (x, 1)))
+	    {
+	      /* We know the number of iterations can't be negative.
+		 So {a, +, a} (x-1) -> "a*x".  */
+	      res = build_int_cst (TREE_TYPE (x), 1);
+	      res = chrec_fold_plus (TREE_TYPE (x), x, res);
+	      res = chrec_convert_rhs (type, res, NULL);
+	      res = chrec_fold_multiply (type, chrecr, res);
+	    }
+	  else
+	    {
+	      res = chrec_convert_rhs (TREE_TYPE (chrecr), x, NULL);
+	      res = chrec_fold_multiply (TREE_TYPE (chrecr), chrecr, res);
+	      res = chrec_fold_plus (type, CHREC_LEFT (chrec), res);
+	    }
 	}
       else if (TREE_CODE (x) == INTEGER_CST
 	       && tree_int_cst_sgn (x) == 1)
@@ -644,7 +659,7 @@  chrec_apply (unsigned var,
 
   if (dump_file && (dump_flags & TDF_SCEV))
     {
-      fprintf (dump_file, "  (varying_loop = %d\n", var);
+      fprintf (dump_file, "  (varying_loop = %d", var);
       fprintf (dump_file, ")\n  (chrec = ");
       print_generic_expr (dump_file, chrec);
       fprintf (dump_file, ")\n  (x = ");
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr65855-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr65855-2.c
new file mode 100644
index 0000000..d44ef51
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr65855-2.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sccp" } */
+
+int square(int x) {
+  int result = 0;
+  for (int i = 0; i < x; ++i)
+    result += x;
+  return result;
+}
+
+/* { dg-final { scan-tree-dump " with expr: x_\[0-9\]\\(D\\) \\* x_\[0-9\]\\(D\\)" "sccp" } } */