Message ID | 54DD35FF.3060108@redhat.com |
---|---|
State | New |
Headers | show |
On Fri, Feb 13, 2015 at 12:23 AM, Jeff Law <law@redhat.com> wrote: > On 02/12/15 06:02, Richard Biener wrote: > >>> >>> No strong opinion on separate pattern or integrating into existing >>> pattern. >> >> >> Yeah, I'd leave it as you did it for now. Btw, -ENOPATCH. > > This version should address all your comments. A few notes. > > When in GENERIC, the inner operands can be a fairly arbitrary expression, > including things like COMPONENT_REFs for bitfields. We filter these out by > ensuring the type's precision for both operands is a power of two and at > least byte sized. > > It may not be strictly necessary, but the pattern now verifies that the > final type, type of the inner conversion and type of the @0/@1 arguments are > integral types. Just seemed like the safe thing to do. > > I added the TYPE_OVERFLOW_WRAPS support into the new pattern rather than > create a totally new pattern. > > Testcase name change to reflect it's actually from a PR. > > > Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for the > trunk? > > Jeff > > commit a0317dfd4a3f7fc907a84e6956a8c116bb941f52 > Author: Jeff Law <law@redhat.com> > Date: Wed Feb 11 16:19:05 2015 -0700 > > PR rtl-optimization/47477 > * match.pd (convert (plus/minus (convert @0) (convert @1): New > simplifier to narrow arithmetic. > > PR rtl-optimization/47477 > * gcc.dg/tree-ssa/pr47477.c: New test. > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index c9ac045..1258a0a 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,5 +1,11 @@ > 2015-02-11 Jeff Law <law@redhat.com> > > + PR rtl-optimization/47477 > + * match.pd (convert (plus/minus (convert @0) (convert @1): New > + simplifier to narrow arithmetic. > + > +2015-02-11 Jeff Law <law@redhat.com> > + > PR target/63347 > * haifa-sched.c (prune_ready_list): If we have a SCHED_GROUP_P insn > that needs to be queued, just queue it for a single cycle. > diff --git a/gcc/match.pd b/gcc/match.pd > index 81c4ee6..27e0dd2 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -1018,3 +1018,36 @@ along with GCC; see the file COPYING3. If not see > (logs (pows @0 @1)) > (mult @1 (logs @0))))) > > +/* Narrowing of arithmetic and logical operations. > + > + These are conceptually similar to the transformations performed for > + the C/C++ front-ends by shorten_binary_op and shorten_compare. Long > + term we want to move all that code out of the front-ends into here. */ > + > +/* If we have a narrowing conversion of an arithmetic operation where > + both operands are widening conversions from the same type as the outer > + narrowing conversion. Then convert the innermost operands to a suitable > + unsigned type (to avoid introducing undefined behaviour), perform the > + operation and convert the result to the desired type. */ > +(for op (plus minus) > + (simplify > + (convert (op (convert@2 @0) (convert@3 @1))) > + (if (((GENERIC > + && TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == TYPE_MAIN_VARIANT (type) > + /* @0 and @1 could be arbitrary expressions here, including > + COMPONENT_REFs with inconvenient types. For example, they > + might have a TYPE_PRECISION of 15 bits. Ensure we are working > + with TYPE_PRECISION that are powers of two and at least byte > + sized. */ > + && exact_log2 (TYPE_PRECISION (TREE_TYPE (@0))) >= 3 > + && exact_log2 (TYPE_PRECISION (TREE_TYPE (@1))) >= 3) Note that this can happen for GIMPLE as well for struct X { long long a : 33; } x; long long b; b = x.a + x.a; where you'll see 33bit temporaries from the loads of x.a (because C only promotes bitfields up to size long). We already have a related pattern that constrains it with similar ideas and it IMHO does better by not only requiring a power-of-two precision but a precision matching the machine mode (thus at RTL expansion we don't need any extra zero/sign extensions). It's /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) when profitable. For bitwise binary operations apply operand conversions to the binary operation result instead of to the operands. This allows to combine successive conversions and bitwise binary operations. We combine the above two cases by using a conditional convert. */ (for bitop (bit_and bit_ior bit_xor) (simplify (bitop (convert @0) (convert? @1)) (if (((TREE_CODE (@1) == INTEGER_CST && INTEGRAL_TYPE_P (TREE_TYPE (@0)) && int_fits_type_p (@1, TREE_TYPE (@0))) || (GIMPLE && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))) || (GENERIC && TREE_TYPE (@0) == TREE_TYPE (@1))) /* ??? This transform conflicts with fold-const.c doing Convert (T)(x & c) into (T)x & (T)c, if c is an integer constants (if x has signed type, the sign bit cannot be set in c). This folds extension into the BIT_AND_EXPR. Restrict it to GIMPLE to avoid endless recursions. */ && (bitop != BIT_AND_EXPR || GIMPLE) && (/* That's a good idea if the conversion widens the operand, thus after hoisting the conversion the operation will be narrower. */ TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type) /* It's also a good idea if the conversion is to a non-integer mode. */ || GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT /* Or if the precision of TO is not the same as the precision of its mode. */ || TYPE_PRECISION (type) != GET_MODE_PRECISION (TYPE_MODE (type)))) so I'd say we simply want to generically constrain it as TYPE_PRECISION (TREE_TYPE (@0)) == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))) thus we want to perform the arithmetic in a type whose precision matches its mode. > + || (GIMPLE && types_compatible_p (TREE_TYPE (@0), TREE_TYPE > (@1)))) hmm, here you compare type of @0 and @1 while for GENERIC you checked 'type' and type of @0. I believe we want to compare @0 and @1 in both cases. > + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (type) > + && INTEGRAL_TYPE_P (type) > + && INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (TREE_TYPE (@2)) Better the INTEGRAL_TYPE_P checks as the very first thing. Ok with these changes. Thanks, Richard. > + && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE > (@0))) > + (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) > + (convert (op @0 @1))) > + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); } > + (convert (op (convert:utype @0) (convert:utype @1))))))) > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 8d5d3f3..2455c68 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,8 @@ > +2015-02-11 Jeff Law <law@redhat.com> > + > + PR rtl-optimization/47477 > + * gcc.dg/tree-ssa/pr47477.c: New test. > + > 2015-02-11 Jerry DeLisle <jvdelisle@gcc.gnu.org> > > PR libgfortran/57822 > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr47477.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr47477.c > new file mode 100644 > index 0000000..104cb6f5 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr47477.c > @@ -0,0 +1,22 @@ > +/* PR tree-optimization/47477 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized -w" } */ > +/* { dg-require-effective-target ilp32 } */ > + > +typedef int int64_t __attribute__ ((__mode__ (__DI__))); > +typedef int * intptr_t; > + > +typedef struct toto_s *toto_t; > +toto_t add (toto_t a, toto_t b) { > + int64_t tmp = (int64_t)(intptr_t)a + ((int64_t)(intptr_t)b&~1L); > + return (toto_t)(intptr_t) tmp; > +} > + > +/* For an ILP32 target there'll be 6 casts when we start, but just 4 > + if the match.pd pattern is successfully matched. */ > +/* { dg-final { scan-tree-dump-times "= \\(int\\)" 1 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "= \\(unsigned int\\)" 2 "optimized" } > } */ > +/* { dg-final { scan-tree-dump-times "= \\(struct toto_s \\*\\)" 1 > "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > + >
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c9ac045..1258a0a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,11 @@ 2015-02-11 Jeff Law <law@redhat.com> + PR rtl-optimization/47477 + * match.pd (convert (plus/minus (convert @0) (convert @1): New + simplifier to narrow arithmetic. + +2015-02-11 Jeff Law <law@redhat.com> + PR target/63347 * haifa-sched.c (prune_ready_list): If we have a SCHED_GROUP_P insn that needs to be queued, just queue it for a single cycle. diff --git a/gcc/match.pd b/gcc/match.pd index 81c4ee6..27e0dd2 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1018,3 +1018,36 @@ along with GCC; see the file COPYING3. If not see (logs (pows @0 @1)) (mult @1 (logs @0))))) +/* Narrowing of arithmetic and logical operations. + + These are conceptually similar to the transformations performed for + the C/C++ front-ends by shorten_binary_op and shorten_compare. Long + term we want to move all that code out of the front-ends into here. */ + +/* If we have a narrowing conversion of an arithmetic operation where + both operands are widening conversions from the same type as the outer + narrowing conversion. Then convert the innermost operands to a suitable + unsigned type (to avoid introducing undefined behaviour), perform the + operation and convert the result to the desired type. */ +(for op (plus minus) + (simplify + (convert (op (convert@2 @0) (convert@3 @1))) + (if (((GENERIC + && TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == TYPE_MAIN_VARIANT (type) + /* @0 and @1 could be arbitrary expressions here, including + COMPONENT_REFs with inconvenient types. For example, they + might have a TYPE_PRECISION of 15 bits. Ensure we are working + with TYPE_PRECISION that are powers of two and at least byte + sized. */ + && exact_log2 (TYPE_PRECISION (TREE_TYPE (@0))) >= 3 + && exact_log2 (TYPE_PRECISION (TREE_TYPE (@1))) >= 3) + || (GIMPLE && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1)))) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (type) + && INTEGRAL_TYPE_P (type) + && INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@2)) + && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0))) + (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) + (convert (op @0 @1))) + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); } + (convert (op (convert:utype @0) (convert:utype @1))))))) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 8d5d3f3..2455c68 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2015-02-11 Jeff Law <law@redhat.com> + + PR rtl-optimization/47477 + * gcc.dg/tree-ssa/pr47477.c: New test. + 2015-02-11 Jerry DeLisle <jvdelisle@gcc.gnu.org> PR libgfortran/57822 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr47477.c b/gcc/testsuite/gcc.dg/tree-ssa/pr47477.c new file mode 100644 index 0000000..104cb6f5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr47477.c @@ -0,0 +1,22 @@ +/* PR tree-optimization/47477 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -w" } */ +/* { dg-require-effective-target ilp32 } */ + +typedef int int64_t __attribute__ ((__mode__ (__DI__))); +typedef int * intptr_t; + +typedef struct toto_s *toto_t; +toto_t add (toto_t a, toto_t b) { + int64_t tmp = (int64_t)(intptr_t)a + ((int64_t)(intptr_t)b&~1L); + return (toto_t)(intptr_t) tmp; +} + +/* For an ILP32 target there'll be 6 casts when we start, but just 4 + if the match.pd pattern is successfully matched. */ +/* { dg-final { scan-tree-dump-times "= \\(int\\)" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "= \\(unsigned int\\)" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "= \\(struct toto_s \\*\\)" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ + +