diff mbox series

[match.pd] PR tree-optimization/114661: Generalize MULT_EXPR recognition (take #2)

Message ID 011101dad5ee$efb573f0$cf205bd0$@nextmovesoftware.com
State New
Headers show
Series [match.pd] PR tree-optimization/114661: Generalize MULT_EXPR recognition (take #2) | expand

Commit Message

Roger Sayle July 14, 2024, 1:08 p.m. UTC
Hi Richard,
Many thanks for the review and recommendation to use nop_convert?.
This revised patch implements that suggestion, which required a little
experimentation/tweaking as ranger/EVRP records the ranges on the
useless type conversions rather than the multiplications.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-07-14  Roger Sayle  <roger@nextmovesoftware.com>
            Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
        PR tree-optimization/114661
        * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
        type conversions around multiplicaitions, such as those inserted
        by this transformation.

gcc/testsuite/ChangeLog
        PR tree-optimization/114661
        * gcc.dg/pr114661.c: New test case.


Thanks again,
Roger
--

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 10 July 2024 12:34
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [match.pd PATCH] PR tree-optimization/114661: Generalize
> MULT_EXPR recognition.
> 
> On Wed, Jul 10, 2024 at 12:28 AM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> > This patch resolves PR tree-optimization/114661, by generalizing the
> > set of expressions that we canonicalize to multiplication.  This
> > extends the
> > optimization(s) contributed (by me) back in July 2021.
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html
> >
> > The existing transformation folds (X*C1)^(X<<C2) into X*C3 when
> > allowed.  A subtlety is that for non-wrapping integer types, we
> > actually fold this into (int)((unsigned)X*C3) so that we don't
> > introduce an undefined overflow that wasn't in the original.
> > Unfortunately, this transformation confuses itself, as the type-safe
> > multiplication isn't recognized when further combining bit operations.
> > Fixed here by adding transforms to turn (int)((unsigned)X*C1)^(X<<C2)
> > into (int)((unsigned)X*C3) so that match.pd and EVRP can continue to
> > construct multiplications.
> >
> > For the example given in the PR:
> >
> > unsigned mul(unsigned char c) {
> >     if (c > 3) __builtin_unreachable();
> >     return c << 18 | c << 15 |
> >            c << 12 | c << 9 |
> >            c << 6 | c << 3 | c;
> > }
> >
> > GCC on x86_64 with -O2 previously generated:
> >
> > mul:    movzbl  %dil, %edi
> >         leal    (%rdi,%rdi,8), %edx
> >         leal    0(,%rdx,8), %eax
> >         movl    %edx, %ecx
> >         sall    $15, %edx
> >         orl     %edi, %eax
> >         sall    $9, %ecx
> >         orl     %ecx, %eax
> >         orl     %edx, %eax
> >         ret
> >
> > with this patch we now generate:
> >
> > mul:    movzbl  %dil, %eax
> >         imull   $299593, %eax, %eax
> >         ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> 
> I'm looking at the difference between the existing
> 
>  (simplify
>   (op:c (mult:s@0 @1 INTEGER_CST@2)
>         (lshift:s@3 @1 INTEGER_CST@4))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
>        && tree_int_cst_sgn (@4) > 0
>        && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
>    (with { wide_int wone = wi::one (TYPE_PRECISION (type));
>            wide_int c = wi::add (wi::to_wide (@2),
>                                  wi::lshift (wone, wi::to_wide (@4))); }
>     (mult @1 { wide_int_to_tree (type, c); }))))
> 
> and
> 
> + (simplify
> +  (op:c (convert:s@0 (mult:s@1 (convert @2) INTEGER_CST@3))
> +       (lshift:s@4 @2 INTEGER_CST@5))
> +  (if (INTEGRAL_TYPE_P (type)
> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +       && TREE_TYPE (@2) == type
> +       && TYPE_UNSIGNED (TREE_TYPE (@1))
> +       && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1))
> +       && tree_int_cst_sgn (@5) > 0
> +       && (tree_nonzero_bits (@0) & tree_nonzero_bits (@4)) == 0)
> +   (with { tree t = TREE_TYPE (@1);
> +          wide_int wone = wi::one (TYPE_PRECISION (t));
> +          wide_int c = wi::add (wi::to_wide (@3),
> +                                wi::lshift (wone, wi::to_wide (@5))); }
> +    (convert (mult:t (convert:t @2) { wide_int_to_tree (t, c); })))))
> 
> and wonder whether wrapping of the multiplication is required for correctness,
> specifically the former seems to allow signed types with -fwrapv while the latter
> won't.  It also looks the patterns could be merged doing
> 
>  (simplify
>   (op:c (nop_convert:s? (mult:s@0 (nop_convert? @1) INTEGER_CST@2)
>         (lshift:s@3 @1 INTEGER_CST@4))
> 
> and by using nop_convert instead of convert simplify the condition?
> 
> Richard.
> 
> >
> > 2024-07-09  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR tree-optimization/114661
> >         * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Additionally recognize
> >         multiplications surrounded by casts to an unsigned type and back
> >         such as those generated by these transformations.
> >
> > gcc/testsuite/ChangeLog
> >         PR tree-optimization/114661
> >         * gcc.dg/pr114661.c: New test case.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >

Comments

Richard Biener July 15, 2024, 1:48 p.m. UTC | #1
On Sun, Jul 14, 2024 at 3:08 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Richard,
> Many thanks for the review and recommendation to use nop_convert?.
> This revised patch implements that suggestion, which required a little
> experimentation/tweaking as ranger/EVRP records the ranges on the
> useless type conversions rather than the multiplications.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

OK.

Thanks,
Richard.

>
> 2024-07-14  Roger Sayle  <roger@nextmovesoftware.com>
>             Richard Biener  <rguenther@suse.de>
>
> gcc/ChangeLog
>         PR tree-optimization/114661
>         * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
>         type conversions around multiplicaitions, such as those inserted
>         by this transformation.
>
> gcc/testsuite/ChangeLog
>         PR tree-optimization/114661
>         * gcc.dg/pr114661.c: New test case.
>
>
> Thanks again,
> Roger
> --
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: 10 July 2024 12:34
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [match.pd PATCH] PR tree-optimization/114661: Generalize
> > MULT_EXPR recognition.
> >
> > On Wed, Jul 10, 2024 at 12:28 AM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > > This patch resolves PR tree-optimization/114661, by generalizing the
> > > set of expressions that we canonicalize to multiplication.  This
> > > extends the
> > > optimization(s) contributed (by me) back in July 2021.
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html
> > >
> > > The existing transformation folds (X*C1)^(X<<C2) into X*C3 when
> > > allowed.  A subtlety is that for non-wrapping integer types, we
> > > actually fold this into (int)((unsigned)X*C3) so that we don't
> > > introduce an undefined overflow that wasn't in the original.
> > > Unfortunately, this transformation confuses itself, as the type-safe
> > > multiplication isn't recognized when further combining bit operations.
> > > Fixed here by adding transforms to turn (int)((unsigned)X*C1)^(X<<C2)
> > > into (int)((unsigned)X*C3) so that match.pd and EVRP can continue to
> > > construct multiplications.
> > >
> > > For the example given in the PR:
> > >
> > > unsigned mul(unsigned char c) {
> > >     if (c > 3) __builtin_unreachable();
> > >     return c << 18 | c << 15 |
> > >            c << 12 | c << 9 |
> > >            c << 6 | c << 3 | c;
> > > }
> > >
> > > GCC on x86_64 with -O2 previously generated:
> > >
> > > mul:    movzbl  %dil, %edi
> > >         leal    (%rdi,%rdi,8), %edx
> > >         leal    0(,%rdx,8), %eax
> > >         movl    %edx, %ecx
> > >         sall    $15, %edx
> > >         orl     %edi, %eax
> > >         sall    $9, %ecx
> > >         orl     %ecx, %eax
> > >         orl     %edx, %eax
> > >         ret
> > >
> > > with this patch we now generate:
> > >
> > > mul:    movzbl  %dil, %eax
> > >         imull   $299593, %eax, %eax
> > >         ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> >
> > I'm looking at the difference between the existing
> >
> >  (simplify
> >   (op:c (mult:s@0 @1 INTEGER_CST@2)
> >         (lshift:s@3 @1 INTEGER_CST@4))
> >   (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
> >        && tree_int_cst_sgn (@4) > 0
> >        && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
> >    (with { wide_int wone = wi::one (TYPE_PRECISION (type));
> >            wide_int c = wi::add (wi::to_wide (@2),
> >                                  wi::lshift (wone, wi::to_wide (@4))); }
> >     (mult @1 { wide_int_to_tree (type, c); }))))
> >
> > and
> >
> > + (simplify
> > +  (op:c (convert:s@0 (mult:s@1 (convert @2) INTEGER_CST@3))
> > +       (lshift:s@4 @2 INTEGER_CST@5))
> > +  (if (INTEGRAL_TYPE_P (type)
> > +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> > +       && TREE_TYPE (@2) == type
> > +       && TYPE_UNSIGNED (TREE_TYPE (@1))
> > +       && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1))
> > +       && tree_int_cst_sgn (@5) > 0
> > +       && (tree_nonzero_bits (@0) & tree_nonzero_bits (@4)) == 0)
> > +   (with { tree t = TREE_TYPE (@1);
> > +          wide_int wone = wi::one (TYPE_PRECISION (t));
> > +          wide_int c = wi::add (wi::to_wide (@3),
> > +                                wi::lshift (wone, wi::to_wide (@5))); }
> > +    (convert (mult:t (convert:t @2) { wide_int_to_tree (t, c); })))))
> >
> > and wonder whether wrapping of the multiplication is required for correctness,
> > specifically the former seems to allow signed types with -fwrapv while the latter
> > won't.  It also looks the patterns could be merged doing
> >
> >  (simplify
> >   (op:c (nop_convert:s? (mult:s@0 (nop_convert? @1) INTEGER_CST@2)
> >         (lshift:s@3 @1 INTEGER_CST@4))
> >
> > and by using nop_convert instead of convert simplify the condition?
> >
> > Richard.
> >
> > >
> > > 2024-07-09  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         PR tree-optimization/114661
> > >         * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Additionally recognize
> > >         multiplications surrounded by casts to an unsigned type and back
> > >         such as those generated by these transformations.
> > >
> > > gcc/testsuite/ChangeLog
> > >         PR tree-optimization/114661
> > >         * gcc.dg/pr114661.c: New test case.
> > >
> > >
> > > Thanks in advance,
> > > Roger
> > > --
> > >
diff mbox series

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 4edfa2a..a66b00a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4156,30 +4156,39 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    Likewise, handle (X<<C3) and X as legitimate variants of X*C.  */
 (for op (bit_ior bit_xor)
  (simplify
-  (op (mult:s@0 @1 INTEGER_CST@2)
-      (mult:s@3 @1 INTEGER_CST@4))
-  (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
-       && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
-   (mult @1
-	 { wide_int_to_tree (type, wi::to_wide (@2) + wi::to_wide (@4)); })))
+  (op (nop_convert?:s@6 (mult:s@0 (nop_convert? @1) INTEGER_CST@2))
+      (nop_convert?:s@7 (mult:s@3 (nop_convert? @5) INTEGER_CST@4)))
+  (if (INTEGRAL_TYPE_P (type)
+       && operand_equal_p (@1, @5, 0)
+       && (tree_nonzero_bits (@6) & tree_nonzero_bits (@7)) == 0)
+   (with { tree t = type;
+	   if (!TYPE_OVERFLOW_WRAPS (t))
+	     t = unsigned_type_for (t);
+	   wide_int c = wi::add (wi::to_wide (@2), wi::to_wide (@4)); }
+    (convert (mult:t (convert:t @1) { wide_int_to_tree (t, c); })))))
  (simplify
-  (op:c (mult:s@0 @1 INTEGER_CST@2)
+  (op:c (nop_convert?:s@5 (mult:s@0 (nop_convert? @1) INTEGER_CST@2))
 	(lshift:s@3 @1 INTEGER_CST@4))
-  (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
+  (if (INTEGRAL_TYPE_P (type)
        && tree_int_cst_sgn (@4) > 0
-       && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
-   (with { wide_int wone = wi::one (TYPE_PRECISION (type));
+       && (tree_nonzero_bits (@5) & tree_nonzero_bits (@3)) == 0)
+   (with { tree t = type;
+	   if (!TYPE_OVERFLOW_WRAPS (t))
+	     t = unsigned_type_for (t);
+	   wide_int wone = wi::one (TYPE_PRECISION (type));
 	   wide_int c = wi::add (wi::to_wide (@2),
 				 wi::lshift (wone, wi::to_wide (@4))); }
-    (mult @1 { wide_int_to_tree (type, c); }))))
+    (convert (mult:t (convert:t @1) { wide_int_to_tree (t, c); })))))
  (simplify
-  (op:c (mult:s@0 @1 INTEGER_CST@2)
+  (op:c (nop_convert?:s@3 (mult:s@0 (nop_convert? @1) INTEGER_CST@2))
 	@1)
-  (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
-       && (tree_nonzero_bits (@0) & tree_nonzero_bits (@1)) == 0)
-   (mult @1
-	 { wide_int_to_tree (type,
-			     wi::add (wi::to_wide (@2), 1)); })))
+  (if (INTEGRAL_TYPE_P (type)
+       && (tree_nonzero_bits (@3) & tree_nonzero_bits (@1)) == 0)
+   (with { tree t = type;
+	   if (!TYPE_OVERFLOW_WRAPS (t))
+	     t = unsigned_type_for (t);
+	   wide_int c = wi::add (wi::to_wide (@2), 1); }
+    (convert (mult:t (convert:t @1) { wide_int_to_tree (t, c); })))))
  (simplify
   (op (lshift:s@0 @1 INTEGER_CST@2)
       (lshift:s@3 @1 INTEGER_CST@4))
diff --git a/gcc/testsuite/gcc.dg/pr114661.c b/gcc/testsuite/gcc.dg/pr114661.c
new file mode 100644
index 0000000..e6b5c69
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114661.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+unsigned mul(unsigned char c) {
+    if (c > 3) __builtin_unreachable();
+    return c << 18 | c << 15 |
+        c << 12 | c << 9 |
+        c << 6 | c << 3 | c;
+}
+/* { dg-final { scan-tree-dump-times " \\* 299593" 1 "evrp" } } */