diff mbox series

match: Change (A * B) + (-C) to (B - C/A) * A, if C multiple of A [PR109393]

Message ID 20240906124406.3990973-1-konstantinos.eleftheriou@vrull.eu
State New
Headers show
Series match: Change (A * B) + (-C) to (B - C/A) * A, if C multiple of A [PR109393] | expand

Commit Message

Konstantinos Eleftheriou Sept. 6, 2024, 12:44 p.m. UTC
From: kelefth <konstantinos.eleftheriou@vrull.eu>

The following function:

int foo(int *a, int j)
{
  int k = j - 1;
  return a[j - 1] == a[k];
}

does not fold to `return 1;` using -O2 or higher. The cause of this is that
the expression `4 * j + (-4)` for the index computation is not folded to
`4 * (j - 1)`. Existing simplifications that handle similar cases are applied
when A == C, which is not the case in this instance.

A previous attempt to address this issue is
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649896.html

This patch adds the following simplification in match.pd:
(A * B) + (-C) -> (B - C/A) * A, if C a multiple of A

which also handles cases where the index is j - 2, j - 3, etc.

Bootstrapped for all languages and regression tested on x86-64 and aarch64.

	PR tree-optimization/109393

gcc/ChangeLog:

	* match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr109393.c: New test.

Tested-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
---
 gcc/match.pd                    | 15 ++++++++++++++-
 gcc/testsuite/gcc.dg/pr109393.c | 23 +++++++++++++++++++++++
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr109393.c

Comments

Richard Biener Sept. 10, 2024, 11:53 a.m. UTC | #1
On Fri, Sep 6, 2024 at 2:44 PM <konstantinos.eleftheriou@vrull.eu> wrote:
>
> From: kelefth <konstantinos.eleftheriou@vrull.eu>
>
> The following function:
>
> int foo(int *a, int j)
> {
>   int k = j - 1;
>   return a[j - 1] == a[k];
> }
>
> does not fold to `return 1;` using -O2 or higher. The cause of this is that
> the expression `4 * j + (-4)` for the index computation is not folded to
> `4 * (j - 1)`. Existing simplifications that handle similar cases are applied
> when A == C, which is not the case in this instance.
>
> A previous attempt to address this issue is
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649896.html
>
> This patch adds the following simplification in match.pd:
> (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A
>
> which also handles cases where the index is j - 2, j - 3, etc.
>
> Bootstrapped for all languages and regression tested on x86-64 and aarch64.
>
>         PR tree-optimization/109393
>
> gcc/ChangeLog:
>
>         * match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/pr109393.c: New test.
>
> Tested-by: Christoph Müllner <christoph.muellner@vrull.eu>
> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
> ---
>  gcc/match.pd                    | 15 ++++++++++++++-
>  gcc/testsuite/gcc.dg/pr109393.c | 23 +++++++++++++++++++++++
>  2 files changed, 37 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 621306213e4..9d971b663c6 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4216,7 +4216,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>                          ? wi::max_value (TYPE_PRECISION (type), SIGNED)
>                          : wi::min_value (TYPE_PRECISION (type), SIGNED))))))
>          && single_use (@3))
> -     (mult (plusminus @2 { build_one_cst (type); }) @0))))))
> +     (mult (plusminus @2 { build_one_cst (type); }) @0)))

Please move the pattern outside of the enclosing (for plusminus (..) loop

> +   /* (A * B) + (-C) -> (B - C/A) * A, if C is a multiple of A.  */
> +   (simplify
> +    (plus (mult:cs@3 integer_nonzerop@0 @2) INTEGER_CST@4)
> +      (if (TREE_CODE (type) == INTEGER_TYPE
> +         && wi::neg_p (wi::to_wide (@4)))
> +       (with {
> +         wide_int c1 = wi::to_wide (@0);
> +         wide_int c2_abs = wi::abs (wi::to_wide (@4));

I think you need to exclude the most negative number for @4

> +         /* Calculate @4 / @0 in order to factorize the expression.  */
> +         wide_int div_res = wi::div_trunc (c2_abs, c1, TYPE_SIGN (type));
> +         tree div_cst = wide_int_to_tree (type, div_res); }

Please build the tree node (and perform the division) only when
wi::multiple_of_p.

> +       (if (wi::multiple_of_p (c2_abs, c1, TYPE_SIGN (type)))
> +         (mult (minus @2 { div_cst; }) @0))))))))

(minus @2 INTEGER_CST) isn't canonical, it's going to be
folded back to a plus with the negative constant.  Can't you
work with a signed division on the original possibly "signed" value?

You have to be careful to not introduce new undefined overflow
with the B - C/A expression which can happen for A == 1 and B == 0
for C == INT_MIN.  I think fold_plusminus_mult_expr documents all
problematical cases.

Thanks,
Richard.

>
>  #if GIMPLE
>  /* Canonicalize X + (X << C) into X * (1 + (1 << C)) and
> diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> new file mode 100644
> index 00000000000..17bf9330796
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109393.c
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/109393 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int foo(int *a, int j)
> +{
> +  int k = j - 1;
> +  return a[j - 1] == a[k];
> +}
> +
> +int foo2(int *a, int j)
> +{
> +  int k = j - 5;
> +  return a[j - 5] == a[k];
> +}
> +
> +int bar(int *a, int j)
> +{
> +  int k = j - 1;
> +  return (&a[j + 1] - 2) == &a[k];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "return 1;" 3 "optimized" } } */
> \ No newline at end of file
> --
> 2.46.0
>
Konstantinos Eleftheriou Sept. 17, 2024, 7:55 a.m. UTC | #2
Thanks for the feedback.
I have sent a new version (
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663104.html).

On Tue, Sep 10, 2024 at 2:53 PM Richard Biener <richard.guenther@gmail.com>
wrote:

> On Fri, Sep 6, 2024 at 2:44 PM <konstantinos.eleftheriou@vrull.eu> wrote:
> >
> > From: kelefth <konstantinos.eleftheriou@vrull.eu>
> >
> > The following function:
> >
> > int foo(int *a, int j)
> > {
> >   int k = j - 1;
> >   return a[j - 1] == a[k];
> > }
> >
> > does not fold to `return 1;` using -O2 or higher. The cause of this is
> that
> > the expression `4 * j + (-4)` for the index computation is not folded to
> > `4 * (j - 1)`. Existing simplifications that handle similar cases are
> applied
> > when A == C, which is not the case in this instance.
> >
> > A previous attempt to address this issue is
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649896.html
> >
> > This patch adds the following simplification in match.pd:
> > (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A
> >
> > which also handles cases where the index is j - 2, j - 3, etc.
> >
> > Bootstrapped for all languages and regression tested on x86-64 and
> aarch64.
> >
> >         PR tree-optimization/109393
> >
> > gcc/ChangeLog:
> >
> >         * match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of
> A.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/pr109393.c: New test.
> >
> > Tested-by: Christoph Müllner <christoph.muellner@vrull.eu>
> > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > Signed-off-by: Konstantinos Eleftheriou <
> konstantinos.eleftheriou@vrull.eu>
> > ---
> >  gcc/match.pd                    | 15 ++++++++++++++-
> >  gcc/testsuite/gcc.dg/pr109393.c | 23 +++++++++++++++++++++++
> >  2 files changed, 37 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 621306213e4..9d971b663c6 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -4216,7 +4216,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >                          ? wi::max_value (TYPE_PRECISION (type), SIGNED)
> >                          : wi::min_value (TYPE_PRECISION (type),
> SIGNED))))))
> >          && single_use (@3))
> > -     (mult (plusminus @2 { build_one_cst (type); }) @0))))))
> > +     (mult (plusminus @2 { build_one_cst (type); }) @0)))
>
> Please move the pattern outside of the enclosing (for plusminus (..) loop
>
> > +   /* (A * B) + (-C) -> (B - C/A) * A, if C is a multiple of A.  */
> > +   (simplify
> > +    (plus (mult:cs@3 integer_nonzerop@0 @2) INTEGER_CST@4)
> > +      (if (TREE_CODE (type) == INTEGER_TYPE
> > +         && wi::neg_p (wi::to_wide (@4)))
> > +       (with {
> > +         wide_int c1 = wi::to_wide (@0);
> > +         wide_int c2_abs = wi::abs (wi::to_wide (@4));
>
> I think you need to exclude the most negative number for @4
>
> > +         /* Calculate @4 / @0 in order to factorize the expression.  */
> > +         wide_int div_res = wi::div_trunc (c2_abs, c1, TYPE_SIGN
> (type));
> > +         tree div_cst = wide_int_to_tree (type, div_res); }
>
> Please build the tree node (and perform the division) only when
> wi::multiple_of_p.
>
> > +       (if (wi::multiple_of_p (c2_abs, c1, TYPE_SIGN (type)))
> > +         (mult (minus @2 { div_cst; }) @0))))))))
>
> (minus @2 INTEGER_CST) isn't canonical, it's going to be
> folded back to a plus with the negative constant.  Can't you
> work with a signed division on the original possibly "signed" value?
>
> You have to be careful to not introduce new undefined overflow
> with the B - C/A expression which can happen for A == 1 and B == 0
> for C == INT_MIN.  I think fold_plusminus_mult_expr documents all
> problematical cases.
>
> Thanks,
> Richard.
>
> >
> >  #if GIMPLE
> >  /* Canonicalize X + (X << C) into X * (1 + (1 << C)) and
> > diff --git a/gcc/testsuite/gcc.dg/pr109393.c
> b/gcc/testsuite/gcc.dg/pr109393.c
> > new file mode 100644
> > index 00000000000..17bf9330796
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > @@ -0,0 +1,23 @@
> > +/* PR tree-optimization/109393 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +int foo(int *a, int j)
> > +{
> > +  int k = j - 1;
> > +  return a[j - 1] == a[k];
> > +}
> > +
> > +int foo2(int *a, int j)
> > +{
> > +  int k = j - 5;
> > +  return a[j - 5] == a[k];
> > +}
> > +
> > +int bar(int *a, int j)
> > +{
> > +  int k = j - 1;
> > +  return (&a[j + 1] - 2) == &a[k];
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "return 1;" 3 "optimized" } } */
> > \ No newline at end of file
> > --
> > 2.46.0
> >
>
diff mbox series

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 621306213e4..9d971b663c6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4216,7 +4216,20 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 			 ? wi::max_value (TYPE_PRECISION (type), SIGNED)
 			 : wi::min_value (TYPE_PRECISION (type), SIGNED))))))
 	 && single_use (@3))
-     (mult (plusminus @2 { build_one_cst (type); }) @0))))))
+     (mult (plusminus @2 { build_one_cst (type); }) @0)))
+   /* (A * B) + (-C) -> (B - C/A) * A, if C is a multiple of A.  */
+   (simplify
+    (plus (mult:cs@3 integer_nonzerop@0 @2) INTEGER_CST@4)
+      (if (TREE_CODE (type) == INTEGER_TYPE
+	  && wi::neg_p (wi::to_wide (@4)))
+	(with {
+	  wide_int c1 = wi::to_wide (@0);
+	  wide_int c2_abs = wi::abs (wi::to_wide (@4));
+	  /* Calculate @4 / @0 in order to factorize the expression.  */
+	  wide_int div_res = wi::div_trunc (c2_abs, c1, TYPE_SIGN (type));
+	  tree div_cst = wide_int_to_tree (type, div_res); }
+	(if (wi::multiple_of_p (c2_abs, c1, TYPE_SIGN (type)))
+	  (mult (minus @2 { div_cst; }) @0))))))))
 
 #if GIMPLE
 /* Canonicalize X + (X << C) into X * (1 + (1 << C)) and
diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
new file mode 100644
index 00000000000..17bf9330796
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr109393.c
@@ -0,0 +1,23 @@ 
+/* PR tree-optimization/109393 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(int *a, int j)
+{
+  int k = j - 1;
+  return a[j - 1] == a[k];
+}
+
+int foo2(int *a, int j)
+{
+  int k = j - 5;
+  return a[j - 5] == a[k];
+}
+
+int bar(int *a, int j)
+{
+  int k = j - 1;
+  return (&a[j + 1] - 2) == &a[k];
+}
+
+/* { dg-final { scan-tree-dump-times "return 1;" 3 "optimized" } } */
\ No newline at end of file