Message ID | 20240906124406.3990973-1-konstantinos.eleftheriou@vrull.eu |
---|---|
State | New |
Headers | show |
Series | match: Change (A * B) + (-C) to (B - C/A) * A, if C multiple of A [PR109393] | expand |
On Fri, Sep 6, 2024 at 2:44 PM <konstantinos.eleftheriou@vrull.eu> wrote: > > From: kelefth <konstantinos.eleftheriou@vrull.eu> > > The following function: > > int foo(int *a, int j) > { > int k = j - 1; > return a[j - 1] == a[k]; > } > > does not fold to `return 1;` using -O2 or higher. The cause of this is that > the expression `4 * j + (-4)` for the index computation is not folded to > `4 * (j - 1)`. Existing simplifications that handle similar cases are applied > when A == C, which is not the case in this instance. > > A previous attempt to address this issue is > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649896.html > > This patch adds the following simplification in match.pd: > (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A > > which also handles cases where the index is j - 2, j - 3, etc. > > Bootstrapped for all languages and regression tested on x86-64 and aarch64. > > PR tree-optimization/109393 > > gcc/ChangeLog: > > * match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A. > > gcc/testsuite/ChangeLog: > > * gcc.dg/pr109393.c: New test. > > Tested-by: Christoph Müllner <christoph.muellner@vrull.eu> > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu> > --- > gcc/match.pd | 15 ++++++++++++++- > gcc/testsuite/gcc.dg/pr109393.c | 23 +++++++++++++++++++++++ > 2 files changed, 37 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 621306213e4..9d971b663c6 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -4216,7 +4216,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > ? wi::max_value (TYPE_PRECISION (type), SIGNED) > : wi::min_value (TYPE_PRECISION (type), SIGNED)))))) > && single_use (@3)) > - (mult (plusminus @2 { build_one_cst (type); }) @0)))))) > + (mult (plusminus @2 { build_one_cst (type); }) @0))) Please move the pattern outside of the enclosing (for plusminus (..) loop > + /* (A * B) + (-C) -> (B - C/A) * A, if C is a multiple of A. */ > + (simplify > + (plus (mult:cs@3 integer_nonzerop@0 @2) INTEGER_CST@4) > + (if (TREE_CODE (type) == INTEGER_TYPE > + && wi::neg_p (wi::to_wide (@4))) > + (with { > + wide_int c1 = wi::to_wide (@0); > + wide_int c2_abs = wi::abs (wi::to_wide (@4)); I think you need to exclude the most negative number for @4 > + /* Calculate @4 / @0 in order to factorize the expression. */ > + wide_int div_res = wi::div_trunc (c2_abs, c1, TYPE_SIGN (type)); > + tree div_cst = wide_int_to_tree (type, div_res); } Please build the tree node (and perform the division) only when wi::multiple_of_p. > + (if (wi::multiple_of_p (c2_abs, c1, TYPE_SIGN (type))) > + (mult (minus @2 { div_cst; }) @0)))))))) (minus @2 INTEGER_CST) isn't canonical, it's going to be folded back to a plus with the negative constant. Can't you work with a signed division on the original possibly "signed" value? You have to be careful to not introduce new undefined overflow with the B - C/A expression which can happen for A == 1 and B == 0 for C == INT_MIN. I think fold_plusminus_mult_expr documents all problematical cases. Thanks, Richard. > > #if GIMPLE > /* Canonicalize X + (X << C) into X * (1 + (1 << C)) and > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > new file mode 100644 > index 00000000000..17bf9330796 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr109393.c > @@ -0,0 +1,23 @@ > +/* PR tree-optimization/109393 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +int foo(int *a, int j) > +{ > + int k = j - 1; > + return a[j - 1] == a[k]; > +} > + > +int foo2(int *a, int j) > +{ > + int k = j - 5; > + return a[j - 5] == a[k]; > +} > + > +int bar(int *a, int j) > +{ > + int k = j - 1; > + return (&a[j + 1] - 2) == &a[k]; > +} > + > +/* { dg-final { scan-tree-dump-times "return 1;" 3 "optimized" } } */ > \ No newline at end of file > -- > 2.46.0 >
Thanks for the feedback. I have sent a new version ( https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663104.html). On Tue, Sep 10, 2024 at 2:53 PM Richard Biener <richard.guenther@gmail.com> wrote: > On Fri, Sep 6, 2024 at 2:44 PM <konstantinos.eleftheriou@vrull.eu> wrote: > > > > From: kelefth <konstantinos.eleftheriou@vrull.eu> > > > > The following function: > > > > int foo(int *a, int j) > > { > > int k = j - 1; > > return a[j - 1] == a[k]; > > } > > > > does not fold to `return 1;` using -O2 or higher. The cause of this is > that > > the expression `4 * j + (-4)` for the index computation is not folded to > > `4 * (j - 1)`. Existing simplifications that handle similar cases are > applied > > when A == C, which is not the case in this instance. > > > > A previous attempt to address this issue is > > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649896.html > > > > This patch adds the following simplification in match.pd: > > (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A > > > > which also handles cases where the index is j - 2, j - 3, etc. > > > > Bootstrapped for all languages and regression tested on x86-64 and > aarch64. > > > > PR tree-optimization/109393 > > > > gcc/ChangeLog: > > > > * match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of > A. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/pr109393.c: New test. > > > > Tested-by: Christoph Müllner <christoph.muellner@vrull.eu> > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > > Signed-off-by: Konstantinos Eleftheriou < > konstantinos.eleftheriou@vrull.eu> > > --- > > gcc/match.pd | 15 ++++++++++++++- > > gcc/testsuite/gcc.dg/pr109393.c | 23 +++++++++++++++++++++++ > > 2 files changed, 37 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 621306213e4..9d971b663c6 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -4216,7 +4216,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > ? wi::max_value (TYPE_PRECISION (type), SIGNED) > > : wi::min_value (TYPE_PRECISION (type), > SIGNED)))))) > > && single_use (@3)) > > - (mult (plusminus @2 { build_one_cst (type); }) @0)))))) > > + (mult (plusminus @2 { build_one_cst (type); }) @0))) > > Please move the pattern outside of the enclosing (for plusminus (..) loop > > > + /* (A * B) + (-C) -> (B - C/A) * A, if C is a multiple of A. */ > > + (simplify > > + (plus (mult:cs@3 integer_nonzerop@0 @2) INTEGER_CST@4) > > + (if (TREE_CODE (type) == INTEGER_TYPE > > + && wi::neg_p (wi::to_wide (@4))) > > + (with { > > + wide_int c1 = wi::to_wide (@0); > > + wide_int c2_abs = wi::abs (wi::to_wide (@4)); > > I think you need to exclude the most negative number for @4 > > > + /* Calculate @4 / @0 in order to factorize the expression. */ > > + wide_int div_res = wi::div_trunc (c2_abs, c1, TYPE_SIGN > (type)); > > + tree div_cst = wide_int_to_tree (type, div_res); } > > Please build the tree node (and perform the division) only when > wi::multiple_of_p. > > > + (if (wi::multiple_of_p (c2_abs, c1, TYPE_SIGN (type))) > > + (mult (minus @2 { div_cst; }) @0)))))))) > > (minus @2 INTEGER_CST) isn't canonical, it's going to be > folded back to a plus with the negative constant. Can't you > work with a signed division on the original possibly "signed" value? > > You have to be careful to not introduce new undefined overflow > with the B - C/A expression which can happen for A == 1 and B == 0 > for C == INT_MIN. I think fold_plusminus_mult_expr documents all > problematical cases. > > Thanks, > Richard. > > > > > #if GIMPLE > > /* Canonicalize X + (X << C) into X * (1 + (1 << C)) and > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c > b/gcc/testsuite/gcc.dg/pr109393.c > > new file mode 100644 > > index 00000000000..17bf9330796 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > @@ -0,0 +1,23 @@ > > +/* PR tree-optimization/109393 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > + > > +int foo(int *a, int j) > > +{ > > + int k = j - 1; > > + return a[j - 1] == a[k]; > > +} > > + > > +int foo2(int *a, int j) > > +{ > > + int k = j - 5; > > + return a[j - 5] == a[k]; > > +} > > + > > +int bar(int *a, int j) > > +{ > > + int k = j - 1; > > + return (&a[j + 1] - 2) == &a[k]; > > +} > > + > > +/* { dg-final { scan-tree-dump-times "return 1;" 3 "optimized" } } */ > > \ No newline at end of file > > -- > > 2.46.0 > > >
diff --git a/gcc/match.pd b/gcc/match.pd index 621306213e4..9d971b663c6 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4216,7 +4216,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) ? wi::max_value (TYPE_PRECISION (type), SIGNED) : wi::min_value (TYPE_PRECISION (type), SIGNED)))))) && single_use (@3)) - (mult (plusminus @2 { build_one_cst (type); }) @0)))))) + (mult (plusminus @2 { build_one_cst (type); }) @0))) + /* (A * B) + (-C) -> (B - C/A) * A, if C is a multiple of A. */ + (simplify + (plus (mult:cs@3 integer_nonzerop@0 @2) INTEGER_CST@4) + (if (TREE_CODE (type) == INTEGER_TYPE + && wi::neg_p (wi::to_wide (@4))) + (with { + wide_int c1 = wi::to_wide (@0); + wide_int c2_abs = wi::abs (wi::to_wide (@4)); + /* Calculate @4 / @0 in order to factorize the expression. */ + wide_int div_res = wi::div_trunc (c2_abs, c1, TYPE_SIGN (type)); + tree div_cst = wide_int_to_tree (type, div_res); } + (if (wi::multiple_of_p (c2_abs, c1, TYPE_SIGN (type))) + (mult (minus @2 { div_cst; }) @0)))))))) #if GIMPLE /* Canonicalize X + (X << C) into X * (1 + (1 << C)) and diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c new file mode 100644 index 00000000000..17bf9330796 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr109393.c @@ -0,0 +1,23 @@ +/* PR tree-optimization/109393 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int foo(int *a, int j) +{ + int k = j - 1; + return a[j - 1] == a[k]; +} + +int foo2(int *a, int j) +{ + int k = j - 5; + return a[j - 5] == a[k]; +} + +int bar(int *a, int j) +{ + int k = j - 1; + return (&a[j + 1] - 2) == &a[k]; +} + +/* { dg-final { scan-tree-dump-times "return 1;" 3 "optimized" } } */ \ No newline at end of file