Message ID | 56788DA6.80904@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Dec 22, 2015 at 12:39 AM, Jeff Law <law@redhat.com> wrote: > > As outlined in the BZ, given > > (x & y) & C > > reassociation will turn that into > > (x & C) & y > > Which inhibits bit operations on various targets. > > Associating as > > (x & y) & C > > is what we want. > > My original patch did this for all binary operations. However, reviewing > the assembly code & dump files before/after it was pretty clear that doing > this for arithmetic was losing (mostly in that we were missing CSE > opportunities). The missed CSE opportunities really are a CSE missed optimization in general with respect to two reassociatable chains with some (but not all) common operands. > Restricting to logicals means there's far fewer triggering cases, but in > every one I looked at the resultant code was clearly better. > > The testcase is a bit unusual in that it's in tree-ssa. It checks the > output of reassociation. However, on x86 and m68k it also checks the > resulting assembly code, which I believe is unique in the tree-ssa > directory. > > Bootstrapped and regression tested on x86_64-linux-gnu. The test has been > verified on x86_64-linux-gnu, i686-linux-gnu and m68k-linux-gnu. > > OK for the trunk? So you are really trying to make RTL expansion see a different pattern? It seems to me that in this case using TER and get_def_for_expr / get_gimple_for_ssa_name should be used there. [or for future direction instruction selection should be performed on GIMPLE with some match-and-simplify patterns creating IFNs matching optabs if available] Richard. > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index c7626ff..f5ca85e 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,9 @@ > +2015-12-20 Jeff Law <law@redhat.com> > + > + PR tree-optimization/64910 > + * tree-ssa-reassoc.c (swap_ops_for_binary_stmt): Make sure constant > + is last for binary bit operations. > + > 2015-12-21 Pierre-Marie de Rodat <derodat@adacore.com> > > * dwarf2out.c (add_data_member_location_attribute): Do not > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index bb2ed22..e2f55f3 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,8 @@ > +2015-12-20 Jeff Law <law@redhat.com> > + > + PR tree-optimization/64910 > + * gcc.dg/tree-ssa/pr64910-2.c.c: New test. > + > 2015-12-21 Claudiu Zissulescu <claziss@synopsys.com> > > * gcc.target/arc/builtin_general.c: New test. > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > new file mode 100644 > index 0000000..2e3d679 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > @@ -0,0 +1,85 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ > + > +/* We want to make sure that we reassociate in a way that has the > + constant last. With the constant last, it's more likely to result > + in a bitfield test on targets with such capabilities. */ > + > +extern void boo (); > + > +int b2b_uc (unsigned char u, unsigned char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_us (unsigned short u, unsigned short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ui (unsigned int u, unsigned int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ul (unsigned long u, unsigned long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ull (unsigned long long u, unsigned long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_sc (signed char u, signed char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ss (signed short u, signed short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_si (signed int u, signed int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sl (signed long u, signed long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sll (signed long long u, signed long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +/* The AND of U & W should go into a temporary, when is then ANDed > + with the constant. > + > + First verify that we have the right number of ANDs between U and W. */ > +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& > \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ > + > +/* Then verify that we have the right number of ANDS between a temporary > + and the constant. */ > +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ > + > +/* Each function has one AND. It will have either a second AND or TEST. > So > + we can count the number of AND and TEST instructions. They must be 2X > + the number of test functions in this file. */ > +/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* > x86_64-*-*} } } } */ > + > +/* Similarly on the m68k. The code for the long long tests is suboptimal, > + which catch via the second pattern and xfail. */ > +/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } > } } */ > +/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* > } } } } */ > + > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c > index e54700e..1dcfce3 100644 > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -3449,6 +3449,26 @@ swap_ops_for_binary_stmt (vec<operand_entry *> ops, > oe1->op = temp.op; > oe1->rank = temp.rank; > } > + /* For X OP Y OP C, associate as (X OP Y) OP C, but only for > + logicals, which encourages bit operations. Experimentation > + has shown this generally isn't a win for arithmetic. */ > + else if (stmt) > + { > + enum tree_code opcode = gimple_assign_rhs_code (stmt); > + if ((opcode == BIT_AND_EXPR > + || opcode == BIT_IOR_EXPR > + || opcode == BIT_XOR_EXPR) > + && TREE_CODE (oe1->op) != INTEGER_CST > + && TREE_CODE (oe2->op) != INTEGER_CST > + && TREE_CODE (oe3->op) == INTEGER_CST) > + { > + operand_entry temp = *oe3; > + oe3->op = oe1->op; > + oe3->rank = oe1->rank; > + oe1->op = temp.op; > + oe1->rank= temp.rank; > + } > + } > } > > /* If definition of RHS1 or RHS2 dominates STMT, return the later of those >
On 01/08/2016 02:36 AM, Richard Biener wrote: >> My original patch did this for all binary operations. However, reviewing >> the assembly code & dump files before/after it was pretty clear that doing >> this for arithmetic was losing (mostly in that we were missing CSE >> opportunities). > > The missed CSE opportunities really are a CSE missed optimization in > general with respect to two reassociatable chains with some (but not all) > common operands. Agreed. It was a choice between expanding the scope and tracking down the missed CSEs or nailing down 64910 without regressing and putting that the missed CSEs on the TODO list. How about I build some testcases for the missed CSEs, xfailed with a BZ too, so that we at least don't lose track of 'em. > So you are really trying to make RTL expansion see a different pattern? Yes, absolutely. The code we generate now does (x OP C) OP y which makes it bloody hard for the RTL/target bits to recognize bit operations. After the patch we generate (x OP y) OP C Which the RTL & backends can easily see as bitwise operations. > > It seems to me that in this case using TER and get_def_for_expr / > get_gimple_for_ssa_name > should be used there. [or for future direction instruction selection > should be performed on > GIMPLE with some match-and-simplify patterns creating IFNs matching > optabs if available] That seems backwards to me -- we have all the infrastructure in place in reassoc.c we just make a terrible decision for the ordering of the operands. In fact, the code is fine going into reassoc, it's reassoc that mucks things up: Before reassoc we have: ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot ;; prev block 0, next block 3, flags: (NEW, REACHABLE) ;; pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE) _4 = u_2(D) & w_3(D); _7 = _4 & 32; if (_7 != 0) goto <bb 3>; else goto <bb 4>; Which looks just like a bit test. Sadly reassoc comes around and turns it into: ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot ;; prev block 0, next block 3, flags: (NEW, REACHABLE) ;; pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE) _8 = w_3(D) & 32; _7 = _8 & u_2(D); if (_7 != 0) goto <bb 3>; else goto <bb 4>; And we've lost. Plus doing it in TER is almost certainly more complex than getting it right in reassoc to begin with. Jeff
On Sat, Jan 9, 2016 at 12:20 AM, Jeff Law <law@redhat.com> wrote: > On 01/08/2016 02:36 AM, Richard Biener wrote: >>> >>> My original patch did this for all binary operations. However, reviewing >>> the assembly code & dump files before/after it was pretty clear that >>> doing >>> this for arithmetic was losing (mostly in that we were missing CSE >>> opportunities). >> >> >> The missed CSE opportunities really are a CSE missed optimization in >> general with respect to two reassociatable chains with some (but not all) >> common operands. > > Agreed. It was a choice between expanding the scope and tracking down the > missed CSEs or nailing down 64910 without regressing and putting that the > missed CSEs on the TODO list. > > How about I build some testcases for the missed CSEs, xfailed with a BZ too, > so that we at least don't lose track of 'em. > >> So you are really trying to make RTL expansion see a different pattern? > > Yes, absolutely. The code we generate now does > > (x OP C) OP y > > which makes it bloody hard for the RTL/target bits to recognize bit > operations. After the patch we generate > > (x OP y) OP C > > Which the RTL & backends can easily see as bitwise operations. > > >> >> It seems to me that in this case using TER and get_def_for_expr / >> get_gimple_for_ssa_name >> should be used there. [or for future direction instruction selection >> should be performed on >> GIMPLE with some match-and-simplify patterns creating IFNs matching >> optabs if available] > > That seems backwards to me -- we have all the infrastructure in place in > reassoc.c we just make a terrible decision for the ordering of the operands. > In fact, the code is fine going into reassoc, it's reassoc that mucks things > up: > > Before reassoc we have: > > > ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot > ;; prev block 0, next block 3, flags: (NEW, REACHABLE) > ;; pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE) > _4 = u_2(D) & w_3(D); > _7 = _4 & 32; > if (_7 != 0) > goto <bb 3>; > else > goto <bb 4>; > > > Which looks just like a bit test. Sadly reassoc comes around and turns it > into: > > ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot > ;; prev block 0, next block 3, flags: (NEW, REACHABLE) > ;; pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE) > _8 = w_3(D) & 32; > _7 = _8 & u_2(D); > if (_7 != 0) > goto <bb 3>; > else > goto <bb 4>; > > > And we've lost. Yeah, reassoc is largely about canonicalization. > Plus doing it in TER is almost certainly more complex than getting it right > in reassoc to begin with. I guess canonicalizing differently is ok but you'll still create ((a & b) & 1) & c then if you only change the above place. So I'm not sure what pattern the backend is looking for? Richard. > Jeff >
On 01/11/2016 03:32 AM, Richard Biener wrote: > > Yeah, reassoc is largely about canonicalization. > >> Plus doing it in TER is almost certainly more complex than getting it right >> in reassoc to begin with. > > I guess canonicalizing differently is ok but you'll still create > ((a & b) & 1) & c then if you only change the above place. What's best for that expression would depend on factors like whether or not the target can exploit ILP. ie (a & b) & (1 & c) exposes more parallelism while (((a & b) & c) & 1) is not good for parallelism, but does expose the bit test. reassoc currently generates ((a & 1) & b) & c which is dreadful as there's no ILP or chance of creating a bit test. My patch shuffles things around, but still doesn't expose the ILP or bit test in the 4 operand case. Based on the comments in reassoc, it didn't seem like the author thought anything beyond the 3-operand case was worth handling. So my patch just handles the 3-operand case. > > So I'm not sure what pattern the backend is looking for? It just wants the constant last in the sequence. That exposes bit clear, set, flip, test, etc idioms. Jeff
On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote: > On 01/11/2016 03:32 AM, Richard Biener wrote: > >> >> Yeah, reassoc is largely about canonicalization. >> >>> Plus doing it in TER is almost certainly more complex than getting it >>> right >>> in reassoc to begin with. >> >> >> I guess canonicalizing differently is ok but you'll still create >> ((a & b) & 1) & c then if you only change the above place. > > What's best for that expression would depend on factors like whether or not > the target can exploit ILP. ie (a & b) & (1 & c) exposes more parallelism > while (((a & b) & c) & 1) is not good for parallelism, but does expose the > bit test. > > reassoc currently generates ((a & 1) & b) & c which is dreadful as there's > no ILP or chance of creating a bit test. My patch shuffles things around, > but still doesn't expose the ILP or bit test in the 4 operand case. Based > on the comments in reassoc, it didn't seem like the author thought anything > beyond the 3-operand case was worth handling. So my patch just handles the > 3-operand case. > > > >> >> So I'm not sure what pattern the backend is looking for? > > It just wants the constant last in the sequence. That exposes bit clear, > set, flip, test, etc idioms. But those don't feed another bit operation, right? Thus we'd like to see ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions are designed to feed conditionals (aka CC consuming ops)? Richard. > > > Jeff
On 01/12/2016 08:11 AM, Richard Biener wrote: > On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote: >> On 01/11/2016 03:32 AM, Richard Biener wrote: >> >>> >>> Yeah, reassoc is largely about canonicalization. >>> >>>> Plus doing it in TER is almost certainly more complex than getting it >>>> right >>>> in reassoc to begin with. >>> >>> >>> I guess canonicalizing differently is ok but you'll still create >>> ((a & b) & 1) & c then if you only change the above place. >> >> What's best for that expression would depend on factors like whether or not >> the target can exploit ILP. ie (a & b) & (1 & c) exposes more parallelism >> while (((a & b) & c) & 1) is not good for parallelism, but does expose the >> bit test. >> >> reassoc currently generates ((a & 1) & b) & c which is dreadful as there's >> no ILP or chance of creating a bit test. My patch shuffles things around, >> but still doesn't expose the ILP or bit test in the 4 operand case. Based >> on the comments in reassoc, it didn't seem like the author thought anything >> beyond the 3-operand case was worth handling. So my patch just handles the >> 3-operand case. >> >> >> >>> >>> So I'm not sure what pattern the backend is looking for? >> >> It just wants the constant last in the sequence. That exposes bit clear, >> set, flip, test, etc idioms. > > But those don't feed another bit operation, right? Thus we'd like to see > ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions > are designed to feed conditionals (aka CC consuming ops)? At the gimple level they could feed a conditional, or be part of a series of ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL level they'll feed CC consumers and bit manipulations of pseudos or memory. For the 3-op case, we always want the constant last. For the 4-op case it's less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) & c. Jeff
On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <law@redhat.com> wrote: > On 01/12/2016 08:11 AM, Richard Biener wrote: >> >> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote: >>> >>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>> >>>> >>>> Yeah, reassoc is largely about canonicalization. >>>> >>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>> right >>>>> in reassoc to begin with. >>>> >>>> >>>> >>>> I guess canonicalizing differently is ok but you'll still create >>>> ((a & b) & 1) & c then if you only change the above place. >>> >>> >>> What's best for that expression would depend on factors like whether or >>> not >>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>> parallelism >>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>> the >>> bit test. >>> >>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>> there's >>> no ILP or chance of creating a bit test. My patch shuffles things >>> around, >>> but still doesn't expose the ILP or bit test in the 4 operand case. >>> Based >>> on the comments in reassoc, it didn't seem like the author thought >>> anything >>> beyond the 3-operand case was worth handling. So my patch just handles >>> the >>> 3-operand case. >>> >>> >>> >>>> >>>> So I'm not sure what pattern the backend is looking for? >>> >>> >>> It just wants the constant last in the sequence. That exposes bit clear, >>> set, flip, test, etc idioms. >> >> >> But those don't feed another bit operation, right? Thus we'd like to see >> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >> are designed to feed conditionals (aka CC consuming ops)? > > At the gimple level they could feed a conditional, or be part of a series of > ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL > level they'll feed CC consumers and bit manipulations of pseudos or memory. > > For the 3-op case, we always want the constant last. For the 4-op case it's > less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) > & c. Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place to special-case bitwise ops. The "real" fix to the sorting heuristic would be to sort constants at the opposite end. That might be too invasive right now but there is another "convenient" place: /* If the operand vector is now empty, all operands were consumed by the __builtin_powi optimization. */ ... else { machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); int ops_num = ops.length (); int width = get_reassociation_width (ops_num, rhs_code, mode); tree new_lhs = lhs; if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Width = %d was chosen for reassociation\n", width); at this point you can check rhs_code and move the (single) constant entry in ops (if there is any constant entry) from .last () to the beginning. That'll get the 4 operand case correct as well and properly models "constant last" for the specified operation kind. Richard. > Jeff
On 01/13/2016 05:30 AM, Richard Biener wrote: > On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <law@redhat.com> wrote: >> On 01/12/2016 08:11 AM, Richard Biener wrote: >>> >>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote: >>>> >>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>> >>>>> >>>>> Yeah, reassoc is largely about canonicalization. >>>>> >>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>> right >>>>>> in reassoc to begin with. >>>>> >>>>> >>>>> >>>>> I guess canonicalizing differently is ok but you'll still create >>>>> ((a & b) & 1) & c then if you only change the above place. >>>> >>>> >>>> What's best for that expression would depend on factors like whether or >>>> not >>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>> parallelism >>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>> the >>>> bit test. >>>> >>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>> there's >>>> no ILP or chance of creating a bit test. My patch shuffles things >>>> around, >>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>> Based >>>> on the comments in reassoc, it didn't seem like the author thought >>>> anything >>>> beyond the 3-operand case was worth handling. So my patch just handles >>>> the >>>> 3-operand case. >>>> >>>> >>>> >>>>> >>>>> So I'm not sure what pattern the backend is looking for? >>>> >>>> >>>> It just wants the constant last in the sequence. That exposes bit clear, >>>> set, flip, test, etc idioms. >>> >>> >>> But those don't feed another bit operation, right? Thus we'd like to see >>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>> are designed to feed conditionals (aka CC consuming ops)? >> >> At the gimple level they could feed a conditional, or be part of a series of >> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >> level they'll feed CC consumers and bit manipulations of pseudos or memory. >> >> For the 3-op case, we always want the constant last. For the 4-op case it's >> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) >> & c. > > Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place > to special-case bitwise ops. The "real" fix to the sorting heuristic would be > to sort constants at the opposite end. > > That might be too invasive right now but there is another "convenient" place: > > /* If the operand vector is now empty, all operands were > consumed by the __builtin_powi optimization. */ > ... > else > { > machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); > int ops_num = ops.length (); > int width = get_reassociation_width (ops_num, rhs_code, mode); > tree new_lhs = lhs; > > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > "Width = %d was chosen for > reassociation\n", width); > > at this point you can check rhs_code and move the (single) constant > entry in ops (if there is any constant entry) from .last () to the beginning. > > That'll get the 4 operand case correct as well and properly models > "constant last" for the specified operation kind. Resurrecting an old thread... Just now getting around to flushing this out of the queue. To recap, given something like (x & y) & C reassociation will turn that into (x & C) & y. It's functionally equivalent, but it will inhibit generation of bit test instructions. I originally hacked up swap_ops_for_binary_stmt. You requested that change be made in reassociate_bb so that it would apply to cases where there are more than 3 args. So that's what this patch does. OK for the trunk now? Bootstrapped and regression tested on x86_64. Also tested the new testcase on m68k. commit c10ae0339674c27c89a1fa1904217a55bf530cb3 Author: Jeff Law <law@torsion.usersys.redhat.com> Date: Sun Sep 3 10:42:30 2017 -0400 2017-09-03 Jeff Law <law@redhat.com> PR tree-optimization/64910 * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, swap the first and last operand if the last is a constant. PR tree-optimization/64910 * gcc.dg/tree-ssa/pr64910-2.c: New test. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3f632ca31c2..2c9a8c8265a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2017-09-03 Jeff Law <law@redhat.com> + + PR tree-optimization/64910 + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, + swap the first and last operand if the last is a constant. + 2017-09-01 Segher Boessenkool <segher@kernel.crashing.org> PR rtl-optimization/82024 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 4ead57edfa2..766677c899b 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2017-09-03 Jeff Law <law@redhat.com> + + PR tree-optimization/64910 + * gcc.dg/tree-ssa/pr64910-2.c: New test. + 2017-09-01 Jakub Jelinek <jakub@redhat.com> PR target/81766 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c new file mode 100644 index 00000000000..2e3d6790776 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ + +/* We want to make sure that we reassociate in a way that has the + constant last. With the constant last, it's more likely to result + in a bitfield test on targets with such capabilities. */ + +extern void boo (); + +int b2b_uc (unsigned char u, unsigned char w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_us (unsigned short u, unsigned short w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_ui (unsigned int u, unsigned int w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_ul (unsigned long u, unsigned long w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_ull (unsigned long long u, unsigned long long w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_sc (signed char u, signed char w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_ss (signed short u, signed short w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_si (signed int u, signed int w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_sl (signed long u, signed long w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_sll (signed long long u, signed long long w) +{ + if ((u & w) & 0x20) + boo (); +} + +/* The AND of U & W should go into a temporary, when is then ANDed + with the constant. + + First verify that we have the right number of ANDs between U and W. */ +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ + +/* Then verify that we have the right number of ANDS between a temporary + and the constant. */ +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ + +/* Each function has one AND. It will have either a second AND or TEST. So + we can count the number of AND and TEST instructions. They must be 2X + the number of test functions in this file. */ +/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* x86_64-*-*} } } } */ + +/* Similarly on the m68k. The code for the long long tests is suboptimal, + which catch via the second pattern and xfail. */ +/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } } } */ +/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* } } } } */ + diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 561acea4dcc..76048196b27 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -5762,6 +5762,18 @@ reassociate_bb (basic_block bb) fprintf (dump_file, "Width = %d was chosen for reassociation\n", width); + + /* For binary bit operations, if the last operand in + OPS is a constant, move it to the front. This + helps ensure that we generate (X & Y) & C rather + than (X & C) & Y. The former will often match + a canonical bit test when we get to RTL. */ + if ((rhs_code == BIT_AND_EXPR + || rhs_code == BIT_IOR_EXPR + || rhs_code == BIT_XOR_EXPR) + && TREE_CODE (ops.last ()->op) == INTEGER_CST) + std::swap (*ops[0], *ops[ops_num - 1]); + if (width > 1 && ops.length () > 3) rewrite_expr_tree_parallel (as_a <gassign *> (stmt),
On Sun, Sep 3, 2017 at 4:44 PM, Jeff Law <law@redhat.com> wrote: > On 01/13/2016 05:30 AM, Richard Biener wrote: >> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <law@redhat.com> wrote: >>> On 01/12/2016 08:11 AM, Richard Biener wrote: >>>> >>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote: >>>>> >>>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>>> >>>>>> >>>>>> Yeah, reassoc is largely about canonicalization. >>>>>> >>>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>>> right >>>>>>> in reassoc to begin with. >>>>>> >>>>>> >>>>>> >>>>>> I guess canonicalizing differently is ok but you'll still create >>>>>> ((a & b) & 1) & c then if you only change the above place. >>>>> >>>>> >>>>> What's best for that expression would depend on factors like whether or >>>>> not >>>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>>> parallelism >>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>>> the >>>>> bit test. >>>>> >>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>>> there's >>>>> no ILP or chance of creating a bit test. My patch shuffles things >>>>> around, >>>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>>> Based >>>>> on the comments in reassoc, it didn't seem like the author thought >>>>> anything >>>>> beyond the 3-operand case was worth handling. So my patch just handles >>>>> the >>>>> 3-operand case. >>>>> >>>>> >>>>> >>>>>> >>>>>> So I'm not sure what pattern the backend is looking for? >>>>> >>>>> >>>>> It just wants the constant last in the sequence. That exposes bit clear, >>>>> set, flip, test, etc idioms. >>>> >>>> >>>> But those don't feed another bit operation, right? Thus we'd like to see >>>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>>> are designed to feed conditionals (aka CC consuming ops)? >>> >>> At the gimple level they could feed a conditional, or be part of a series of >>> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >>> level they'll feed CC consumers and bit manipulations of pseudos or memory. >>> >>> For the 3-op case, we always want the constant last. For the 4-op case it's >>> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) >>> & c. >> >> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place >> to special-case bitwise ops. The "real" fix to the sorting heuristic would be >> to sort constants at the opposite end. >> >> That might be too invasive right now but there is another "convenient" place: >> >> /* If the operand vector is now empty, all operands were >> consumed by the __builtin_powi optimization. */ >> ... >> else >> { >> machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); >> int ops_num = ops.length (); >> int width = get_reassociation_width (ops_num, rhs_code, mode); >> tree new_lhs = lhs; >> >> if (dump_file && (dump_flags & TDF_DETAILS)) >> fprintf (dump_file, >> "Width = %d was chosen for >> reassociation\n", width); >> >> at this point you can check rhs_code and move the (single) constant >> entry in ops (if there is any constant entry) from .last () to the beginning. >> >> That'll get the 4 operand case correct as well and properly models >> "constant last" for the specified operation kind. > Resurrecting an old thread... Just now getting around to flushing this > out of the queue. > > To recap, given something like (x & y) & C reassociation will turn that > into (x & C) & y. It's functionally equivalent, but it will inhibit > generation of bit test instructions. > > I originally hacked up swap_ops_for_binary_stmt. You requested that > change be made in reassociate_bb so that it would apply to cases where > there are more than 3 args. > > So that's what this patch does. OK for the trunk now? Ok. Thanks, Richard. > Bootstrapped and regression tested on x86_64. Also tested the new > testcase on m68k. > > > commit c10ae0339674c27c89a1fa1904217a55bf530cb3 > Author: Jeff Law <law@torsion.usersys.redhat.com> > Date: Sun Sep 3 10:42:30 2017 -0400 > > 2017-09-03 Jeff Law <law@redhat.com> > > PR tree-optimization/64910 > * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, > swap the first and last operand if the last is a constant. > > PR tree-optimization/64910 > * gcc.dg/tree-ssa/pr64910-2.c: New test. > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 3f632ca31c2..2c9a8c8265a 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,9 @@ > +2017-09-03 Jeff Law <law@redhat.com> > + > + PR tree-optimization/64910 > + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, > + swap the first and last operand if the last is a constant. > + > 2017-09-01 Segher Boessenkool <segher@kernel.crashing.org> > > PR rtl-optimization/82024 > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 4ead57edfa2..766677c899b 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,8 @@ > +2017-09-03 Jeff Law <law@redhat.com> > + > + PR tree-optimization/64910 > + * gcc.dg/tree-ssa/pr64910-2.c: New test. > + > 2017-09-01 Jakub Jelinek <jakub@redhat.com> > > PR target/81766 > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > new file mode 100644 > index 00000000000..2e3d6790776 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > @@ -0,0 +1,85 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ > + > +/* We want to make sure that we reassociate in a way that has the > + constant last. With the constant last, it's more likely to result > + in a bitfield test on targets with such capabilities. */ > + > +extern void boo (); > + > +int b2b_uc (unsigned char u, unsigned char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_us (unsigned short u, unsigned short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ui (unsigned int u, unsigned int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ul (unsigned long u, unsigned long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ull (unsigned long long u, unsigned long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_sc (signed char u, signed char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ss (signed short u, signed short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_si (signed int u, signed int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sl (signed long u, signed long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sll (signed long long u, signed long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +/* The AND of U & W should go into a temporary, when is then ANDed > + with the constant. > + > + First verify that we have the right number of ANDs between U and W. */ > +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ > + > +/* Then verify that we have the right number of ANDS between a temporary > + and the constant. */ > +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ > + > +/* Each function has one AND. It will have either a second AND or TEST. So > + we can count the number of AND and TEST instructions. They must be 2X > + the number of test functions in this file. */ > +/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* x86_64-*-*} } } } */ > + > +/* Similarly on the m68k. The code for the long long tests is suboptimal, > + which catch via the second pattern and xfail. */ > +/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } } } */ > +/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* } } } } */ > + > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c > index 561acea4dcc..76048196b27 100644 > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -5762,6 +5762,18 @@ reassociate_bb (basic_block bb) > fprintf (dump_file, > "Width = %d was chosen for reassociation\n", width); > > + > + /* For binary bit operations, if the last operand in > + OPS is a constant, move it to the front. This > + helps ensure that we generate (X & Y) & C rather > + than (X & C) & Y. The former will often match > + a canonical bit test when we get to RTL. */ > + if ((rhs_code == BIT_AND_EXPR > + || rhs_code == BIT_IOR_EXPR > + || rhs_code == BIT_XOR_EXPR) > + && TREE_CODE (ops.last ()->op) == INTEGER_CST) > + std::swap (*ops[0], *ops[ops_num - 1]); > + > if (width > 1 > && ops.length () > 3) > rewrite_expr_tree_parallel (as_a <gassign *> (stmt), >
Hi Jeff, On 3 September 2017 at 16:44, Jeff Law <law@redhat.com> wrote: > On 01/13/2016 05:30 AM, Richard Biener wrote: >> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <law@redhat.com> wrote: >>> On 01/12/2016 08:11 AM, Richard Biener wrote: >>>> >>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote: >>>>> >>>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>>> >>>>>> >>>>>> Yeah, reassoc is largely about canonicalization. >>>>>> >>>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>>> right >>>>>>> in reassoc to begin with. >>>>>> >>>>>> >>>>>> >>>>>> I guess canonicalizing differently is ok but you'll still create >>>>>> ((a & b) & 1) & c then if you only change the above place. >>>>> >>>>> >>>>> What's best for that expression would depend on factors like whether or >>>>> not >>>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>>> parallelism >>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>>> the >>>>> bit test. >>>>> >>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>>> there's >>>>> no ILP or chance of creating a bit test. My patch shuffles things >>>>> around, >>>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>>> Based >>>>> on the comments in reassoc, it didn't seem like the author thought >>>>> anything >>>>> beyond the 3-operand case was worth handling. So my patch just handles >>>>> the >>>>> 3-operand case. >>>>> >>>>> >>>>> >>>>>> >>>>>> So I'm not sure what pattern the backend is looking for? >>>>> >>>>> >>>>> It just wants the constant last in the sequence. That exposes bit clear, >>>>> set, flip, test, etc idioms. >>>> >>>> >>>> But those don't feed another bit operation, right? Thus we'd like to see >>>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>>> are designed to feed conditionals (aka CC consuming ops)? >>> >>> At the gimple level they could feed a conditional, or be part of a series of >>> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >>> level they'll feed CC consumers and bit manipulations of pseudos or memory. >>> >>> For the 3-op case, we always want the constant last. For the 4-op case it's >>> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) >>> & c. >> >> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place >> to special-case bitwise ops. The "real" fix to the sorting heuristic would be >> to sort constants at the opposite end. >> >> That might be too invasive right now but there is another "convenient" place: >> >> /* If the operand vector is now empty, all operands were >> consumed by the __builtin_powi optimization. */ >> ... >> else >> { >> machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); >> int ops_num = ops.length (); >> int width = get_reassociation_width (ops_num, rhs_code, mode); >> tree new_lhs = lhs; >> >> if (dump_file && (dump_flags & TDF_DETAILS)) >> fprintf (dump_file, >> "Width = %d was chosen for >> reassociation\n", width); >> >> at this point you can check rhs_code and move the (single) constant >> entry in ops (if there is any constant entry) from .last () to the beginning. >> >> That'll get the 4 operand case correct as well and properly models >> "constant last" for the specified operation kind. > Resurrecting an old thread... Just now getting around to flushing this > out of the queue. > > To recap, given something like (x & y) & C reassociation will turn that > into (x & C) & y. It's functionally equivalent, but it will inhibit > generation of bit test instructions. > > I originally hacked up swap_ops_for_binary_stmt. You requested that > change be made in reassociate_bb so that it would apply to cases where > there are more than 3 args. > > So that's what this patch does. OK for the trunk now? > > Bootstrapped and regression tested on x86_64. Also tested the new > testcase on m68k. > > > commit c10ae0339674c27c89a1fa1904217a55bf530cb3 > Author: Jeff Law <law@torsion.usersys.redhat.com> > Date: Sun Sep 3 10:42:30 2017 -0400 > > 2017-09-03 Jeff Law <law@redhat.com> > > PR tree-optimization/64910 > * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, > swap the first and last operand if the last is a constant. > > PR tree-optimization/64910 > * gcc.dg/tree-ssa/pr64910-2.c: New test. > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 3f632ca31c2..2c9a8c8265a 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,9 @@ > +2017-09-03 Jeff Law <law@redhat.com> > + > + PR tree-optimization/64910 > + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, > + swap the first and last operand if the last is a constant. > + > 2017-09-01 Segher Boessenkool <segher@kernel.crashing.org> > > PR rtl-optimization/82024 > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 4ead57edfa2..766677c899b 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,8 @@ > +2017-09-03 Jeff Law <law@redhat.com> > + > + PR tree-optimization/64910 > + * gcc.dg/tree-ssa/pr64910-2.c: New test. > + > 2017-09-01 Jakub Jelinek <jakub@redhat.com> > > PR target/81766 > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > new file mode 100644 > index 00000000000..2e3d6790776 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > @@ -0,0 +1,85 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ > + > +/* We want to make sure that we reassociate in a way that has the > + constant last. With the constant last, it's more likely to result > + in a bitfield test on targets with such capabilities. */ > + > +extern void boo (); > + > +int b2b_uc (unsigned char u, unsigned char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_us (unsigned short u, unsigned short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ui (unsigned int u, unsigned int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ul (unsigned long u, unsigned long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ull (unsigned long long u, unsigned long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_sc (signed char u, signed char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ss (signed short u, signed short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_si (signed int u, signed int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sl (signed long u, signed long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sll (signed long long u, signed long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +/* The AND of U & W should go into a temporary, when is then ANDed > + with the constant. > + > + First verify that we have the right number of ANDs between U and W. */ > +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ > + > +/* Then verify that we have the right number of ANDS between a temporary > + and the constant. */ > +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ This part of the new testcase fails on aarch64 & arm: FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1 "_[0-9]+ & 32;" 10 Christophe > + > +/* Each function has one AND. It will have either a second AND or TEST. So > + we can count the number of AND and TEST instructions. They must be 2X > + the number of test functions in this file. */ > +/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* x86_64-*-*} } } } */ > + > +/* Similarly on the m68k. The code for the long long tests is suboptimal, > + which catch via the second pattern and xfail. */ > +/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } } } */ > +/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* } } } } */ > + > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c > index 561acea4dcc..76048196b27 100644 > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -5762,6 +5762,18 @@ reassociate_bb (basic_block bb) > fprintf (dump_file, > "Width = %d was chosen for reassociation\n", width); > > + > + /* For binary bit operations, if the last operand in > + OPS is a constant, move it to the front. This > + helps ensure that we generate (X & Y) & C rather > + than (X & C) & Y. The former will often match > + a canonical bit test when we get to RTL. */ > + if ((rhs_code == BIT_AND_EXPR > + || rhs_code == BIT_IOR_EXPR > + || rhs_code == BIT_XOR_EXPR) > + && TREE_CODE (ops.last ()->op) == INTEGER_CST) > + std::swap (*ops[0], *ops[ops_num - 1]); > + > if (width > 1 > && ops.length () > 3) > rewrite_expr_tree_parallel (as_a <gassign *> (stmt), >
On 09/05/2017 12:38 AM, Christophe Lyon wrote: > Hi Jeff, > > > On 3 September 2017 at 16:44, Jeff Law <law@redhat.com> wrote: >> On 01/13/2016 05:30 AM, Richard Biener wrote: >>> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <law@redhat.com> wrote: >>>> On 01/12/2016 08:11 AM, Richard Biener wrote: >>>>> >>>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote: >>>>>> >>>>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>>>> >>>>>>> >>>>>>> Yeah, reassoc is largely about canonicalization. >>>>>>> >>>>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>>>> right >>>>>>>> in reassoc to begin with. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I guess canonicalizing differently is ok but you'll still create >>>>>>> ((a & b) & 1) & c then if you only change the above place. >>>>>> >>>>>> >>>>>> What's best for that expression would depend on factors like whether or >>>>>> not >>>>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>>>> parallelism >>>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>>>> the >>>>>> bit test. >>>>>> >>>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>>>> there's >>>>>> no ILP or chance of creating a bit test. My patch shuffles things >>>>>> around, >>>>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>>>> Based >>>>>> on the comments in reassoc, it didn't seem like the author thought >>>>>> anything >>>>>> beyond the 3-operand case was worth handling. So my patch just handles >>>>>> the >>>>>> 3-operand case. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> So I'm not sure what pattern the backend is looking for? >>>>>> >>>>>> >>>>>> It just wants the constant last in the sequence. That exposes bit clear, >>>>>> set, flip, test, etc idioms. >>>>> >>>>> >>>>> But those don't feed another bit operation, right? Thus we'd like to see >>>>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>>>> are designed to feed conditionals (aka CC consuming ops)? >>>> >>>> At the gimple level they could feed a conditional, or be part of a series of >>>> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >>>> level they'll feed CC consumers and bit manipulations of pseudos or memory. >>>> >>>> For the 3-op case, we always want the constant last. For the 4-op case it's >>>> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) >>>> & c. >>> >>> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place >>> to special-case bitwise ops. The "real" fix to the sorting heuristic would be >>> to sort constants at the opposite end. >>> >>> That might be too invasive right now but there is another "convenient" place: >>> >>> /* If the operand vector is now empty, all operands were >>> consumed by the __builtin_powi optimization. */ >>> ... >>> else >>> { >>> machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); >>> int ops_num = ops.length (); >>> int width = get_reassociation_width (ops_num, rhs_code, mode); >>> tree new_lhs = lhs; >>> >>> if (dump_file && (dump_flags & TDF_DETAILS)) >>> fprintf (dump_file, >>> "Width = %d was chosen for >>> reassociation\n", width); >>> >>> at this point you can check rhs_code and move the (single) constant >>> entry in ops (if there is any constant entry) from .last () to the beginning. >>> >>> That'll get the 4 operand case correct as well and properly models >>> "constant last" for the specified operation kind. >> Resurrecting an old thread... Just now getting around to flushing this >> out of the queue. >> >> To recap, given something like (x & y) & C reassociation will turn that >> into (x & C) & y. It's functionally equivalent, but it will inhibit >> generation of bit test instructions. >> >> I originally hacked up swap_ops_for_binary_stmt. You requested that >> change be made in reassociate_bb so that it would apply to cases where >> there are more than 3 args. >> >> So that's what this patch does. OK for the trunk now? >> >> Bootstrapped and regression tested on x86_64. Also tested the new >> testcase on m68k. >> >> >> commit c10ae0339674c27c89a1fa1904217a55bf530cb3 >> Author: Jeff Law <law@torsion.usersys.redhat.com> >> Date: Sun Sep 3 10:42:30 2017 -0400 >> >> 2017-09-03 Jeff Law <law@redhat.com> >> >> PR tree-optimization/64910 >> * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, >> swap the first and last operand if the last is a constant. >> >> PR tree-optimization/64910 >> * gcc.dg/tree-ssa/pr64910-2.c: New test. >> >> diff --git a/gcc/ChangeLog b/gcc/ChangeLog >> index 3f632ca31c2..2c9a8c8265a 100644 >> --- a/gcc/ChangeLog >> +++ b/gcc/ChangeLog >> @@ -1,3 +1,9 @@ >> +2017-09-03 Jeff Law <law@redhat.com> >> + >> + PR tree-optimization/64910 >> + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, >> + swap the first and last operand if the last is a constant. >> + >> 2017-09-01 Segher Boessenkool <segher@kernel.crashing.org> >> >> PR rtl-optimization/82024 >> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog >> index 4ead57edfa2..766677c899b 100644 >> --- a/gcc/testsuite/ChangeLog >> +++ b/gcc/testsuite/ChangeLog >> @@ -1,3 +1,8 @@ >> +2017-09-03 Jeff Law <law@redhat.com> >> + >> + PR tree-optimization/64910 >> + * gcc.dg/tree-ssa/pr64910-2.c: New test. >> + >> 2017-09-01 Jakub Jelinek <jakub@redhat.com> >> >> PR target/81766 >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c >> new file mode 100644 >> index 00000000000..2e3d6790776 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c >> @@ -0,0 +1,85 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ >> + >> +/* We want to make sure that we reassociate in a way that has the >> + constant last. With the constant last, it's more likely to result >> + in a bitfield test on targets with such capabilities. */ >> + >> +extern void boo (); >> + >> +int b2b_uc (unsigned char u, unsigned char w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_us (unsigned short u, unsigned short w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_ui (unsigned int u, unsigned int w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> +int b2b_ul (unsigned long u, unsigned long w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> +int b2b_ull (unsigned long long u, unsigned long long w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_sc (signed char u, signed char w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_ss (signed short u, signed short w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_si (signed int u, signed int w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> +int b2b_sl (signed long u, signed long w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> +int b2b_sll (signed long long u, signed long long w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +/* The AND of U & W should go into a temporary, when is then ANDed >> + with the constant. >> + >> + First verify that we have the right number of ANDs between U and W. */ >> +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ >> + >> +/* Then verify that we have the right number of ANDS between a temporary >> + and the constant. */ >> +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ > > This part of the new testcase fails on aarch64 & arm: > FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1 > "_[0-9]+ & 32;" 10 Thanks. I'm investigating. Jeff
On 09/05/2017 11:26 AM, Jeff Law wrote: > On 09/05/2017 12:38 AM, Christophe Lyon wrote: >> Hi Jeff, >> >> >> On 3 September 2017 at 16:44, Jeff Law <law@redhat.com> wrote: >>> On 01/13/2016 05:30 AM, Richard Biener wrote: >>>> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <law@redhat.com> wrote: >>>>> On 01/12/2016 08:11 AM, Richard Biener wrote: >>>>>> >>>>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote: >>>>>>> >>>>>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>>>>> >>>>>>>> >>>>>>>> Yeah, reassoc is largely about canonicalization. >>>>>>>> >>>>>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>>>>> right >>>>>>>>> in reassoc to begin with. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I guess canonicalizing differently is ok but you'll still create >>>>>>>> ((a & b) & 1) & c then if you only change the above place. >>>>>>> >>>>>>> >>>>>>> What's best for that expression would depend on factors like whether or >>>>>>> not >>>>>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>>>>> parallelism >>>>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>>>>> the >>>>>>> bit test. >>>>>>> >>>>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>>>>> there's >>>>>>> no ILP or chance of creating a bit test. My patch shuffles things >>>>>>> around, >>>>>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>>>>> Based >>>>>>> on the comments in reassoc, it didn't seem like the author thought >>>>>>> anything >>>>>>> beyond the 3-operand case was worth handling. So my patch just handles >>>>>>> the >>>>>>> 3-operand case. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> So I'm not sure what pattern the backend is looking for? >>>>>>> >>>>>>> >>>>>>> It just wants the constant last in the sequence. That exposes bit clear, >>>>>>> set, flip, test, etc idioms. >>>>>> >>>>>> >>>>>> But those don't feed another bit operation, right? Thus we'd like to see >>>>>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>>>>> are designed to feed conditionals (aka CC consuming ops)? >>>>> >>>>> At the gimple level they could feed a conditional, or be part of a series of >>>>> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >>>>> level they'll feed CC consumers and bit manipulations of pseudos or memory. >>>>> >>>>> For the 3-op case, we always want the constant last. For the 4-op case it's >>>>> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) >>>>> & c. >>>> >>>> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place >>>> to special-case bitwise ops. The "real" fix to the sorting heuristic would be >>>> to sort constants at the opposite end. >>>> >>>> That might be too invasive right now but there is another "convenient" place: >>>> >>>> /* If the operand vector is now empty, all operands were >>>> consumed by the __builtin_powi optimization. */ >>>> ... >>>> else >>>> { >>>> machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); >>>> int ops_num = ops.length (); >>>> int width = get_reassociation_width (ops_num, rhs_code, mode); >>>> tree new_lhs = lhs; >>>> >>>> if (dump_file && (dump_flags & TDF_DETAILS)) >>>> fprintf (dump_file, >>>> "Width = %d was chosen for >>>> reassociation\n", width); >>>> >>>> at this point you can check rhs_code and move the (single) constant >>>> entry in ops (if there is any constant entry) from .last () to the beginning. >>>> >>>> That'll get the 4 operand case correct as well and properly models >>>> "constant last" for the specified operation kind. >>> Resurrecting an old thread... Just now getting around to flushing this >>> out of the queue. >>> >>> To recap, given something like (x & y) & C reassociation will turn that >>> into (x & C) & y. It's functionally equivalent, but it will inhibit >>> generation of bit test instructions. >>> >>> I originally hacked up swap_ops_for_binary_stmt. You requested that >>> change be made in reassociate_bb so that it would apply to cases where >>> there are more than 3 args. >>> >>> So that's what this patch does. OK for the trunk now? >>> >>> Bootstrapped and regression tested on x86_64. Also tested the new >>> testcase on m68k. >>> >>> >>> commit c10ae0339674c27c89a1fa1904217a55bf530cb3 >>> Author: Jeff Law <law@torsion.usersys.redhat.com> >>> Date: Sun Sep 3 10:42:30 2017 -0400 >>> >>> 2017-09-03 Jeff Law <law@redhat.com> >>> >>> PR tree-optimization/64910 >>> * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, >>> swap the first and last operand if the last is a constant. >>> >>> PR tree-optimization/64910 >>> * gcc.dg/tree-ssa/pr64910-2.c: New test. >>> >>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog >>> index 3f632ca31c2..2c9a8c8265a 100644 >>> --- a/gcc/ChangeLog >>> +++ b/gcc/ChangeLog >>> @@ -1,3 +1,9 @@ >>> +2017-09-03 Jeff Law <law@redhat.com> >>> + >>> + PR tree-optimization/64910 >>> + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, >>> + swap the first and last operand if the last is a constant. >>> + >>> 2017-09-01 Segher Boessenkool <segher@kernel.crashing.org> >>> >>> PR rtl-optimization/82024 >>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog >>> index 4ead57edfa2..766677c899b 100644 >>> --- a/gcc/testsuite/ChangeLog >>> +++ b/gcc/testsuite/ChangeLog >>> @@ -1,3 +1,8 @@ >>> +2017-09-03 Jeff Law <law@redhat.com> >>> + >>> + PR tree-optimization/64910 >>> + * gcc.dg/tree-ssa/pr64910-2.c: New test. >>> + >>> 2017-09-01 Jakub Jelinek <jakub@redhat.com> >>> >>> PR target/81766 >>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c >>> new file mode 100644 >>> index 00000000000..2e3d6790776 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c >>> @@ -0,0 +1,85 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ >>> + >>> +/* We want to make sure that we reassociate in a way that has the >>> + constant last. With the constant last, it's more likely to result >>> + in a bitfield test on targets with such capabilities. */ >>> + >>> +extern void boo (); >>> + >>> +int b2b_uc (unsigned char u, unsigned char w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_us (unsigned short u, unsigned short w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_ui (unsigned int u, unsigned int w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> +int b2b_ul (unsigned long u, unsigned long w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> +int b2b_ull (unsigned long long u, unsigned long long w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_sc (signed char u, signed char w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_ss (signed short u, signed short w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_si (signed int u, signed int w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> +int b2b_sl (signed long u, signed long w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> +int b2b_sll (signed long long u, signed long long w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +/* The AND of U & W should go into a temporary, when is then ANDed >>> + with the constant. >>> + >>> + First verify that we have the right number of ANDs between U and W. */ >>> +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ >>> + >>> +/* Then verify that we have the right number of ANDS between a temporary >>> + and the constant. */ >>> +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ >> >> This part of the new testcase fails on aarch64 & arm: >> FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1 >> "_[0-9]+ & 32;" 10 > Thanks. I'm investigating. So when there are precisely 2 operands the most recent change will create non-canonical gimple. Specifically we can end up with cst OP ssa_name On some targets (like aarch64, arm, likely others). Nothing actually cared -- the form got rewritten at some point and aarch64 at least generates the desired assembly code. But since the test was looking at the .reassoc dump, it failed. The fix is easy. Just restrict the operand swapping to cases where there are 3 or more ops. Bootstrapped and regression tested on aarch64. Committing. Jeff commit 077cf883c3e983369e68b27307ec4944c7097393 Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Wed Sep 6 05:20:25 2017 +0000 PR tree-optimization/64910 * tree-ssa-reassoc.c (reassociate_bb): Restrict last change to cases where we have 3 or more operands. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251751 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 53637905722..1b3bddbb806 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2017-09-05 Jeff Law <law@redhat.com> + + PR tree-optimization/64910 + * tree-ssa-reassoc.c (reassociate_bb): Restrict last change to + cases where we have 3 or more operands. + 2017-09-05 Jakub Jelinek <jakub@redhat.com> PR middle-end/81768 diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 76048196b27..2fb6aef51d7 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -5763,14 +5763,15 @@ reassociate_bb (basic_block bb) "Width = %d was chosen for reassociation\n", width); - /* For binary bit operations, if the last operand in - OPS is a constant, move it to the front. This - helps ensure that we generate (X & Y) & C rather - than (X & C) & Y. The former will often match - a canonical bit test when we get to RTL. */ - if ((rhs_code == BIT_AND_EXPR - || rhs_code == BIT_IOR_EXPR - || rhs_code == BIT_XOR_EXPR) + /* For binary bit operations, if there are at least 3 + operands and the last last operand in OPS is a constant, + move it to the front. This helps ensure that we generate + (X & Y) & C rather than (X & C) & Y. The former will + often match a canonical bit test when we get to RTL. */ + if (ops.length () != 2 + && (rhs_code == BIT_AND_EXPR + || rhs_code == BIT_IOR_EXPR + || rhs_code == BIT_XOR_EXPR) && TREE_CODE (ops.last ()->op) == INTEGER_CST) std::swap (*ops[0], *ops[ops_num - 1]);
On Tue, Sep 05, 2017 at 11:21:48PM -0600, Jeff Law wrote: > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -5763,14 +5763,15 @@ reassociate_bb (basic_block bb) > "Width = %d was chosen for reassociation\n", width); > > > - /* For binary bit operations, if the last operand in > - OPS is a constant, move it to the front. This > - helps ensure that we generate (X & Y) & C rather > - than (X & C) & Y. The former will often match > - a canonical bit test when we get to RTL. */ > - if ((rhs_code == BIT_AND_EXPR > - || rhs_code == BIT_IOR_EXPR > - || rhs_code == BIT_XOR_EXPR) > + /* For binary bit operations, if there are at least 3 > + operands and the last last operand in OPS is a constant, > + move it to the front. This helps ensure that we generate > + (X & Y) & C rather than (X & C) & Y. The former will > + often match a canonical bit test when we get to RTL. */ > + if (ops.length () != 2 So wouldn't it be clearer to write ops.length () > 2 ? if (ops.length () == 0) else if (ops.length () == 1) come earlier, so it is the same thing, but might help the reader. > + && (rhs_code == BIT_AND_EXPR > + || rhs_code == BIT_IOR_EXPR > + || rhs_code == BIT_XOR_EXPR) > && TREE_CODE (ops.last ()->op) == INTEGER_CST) > std::swap (*ops[0], *ops[ops_num - 1]); Don't you then want to put the constant as second operand rather than first, i.e. swap with *ops[1]? And doesn't swap_ops_for_binary_stmt undo it again? Jakub
On 09/06/2017 03:26 AM, Jakub Jelinek wrote: > On Tue, Sep 05, 2017 at 11:21:48PM -0600, Jeff Law wrote: >> --- a/gcc/tree-ssa-reassoc.c >> +++ b/gcc/tree-ssa-reassoc.c >> @@ -5763,14 +5763,15 @@ reassociate_bb (basic_block bb) >> "Width = %d was chosen for reassociation\n", width); >> >> >> - /* For binary bit operations, if the last operand in >> - OPS is a constant, move it to the front. This >> - helps ensure that we generate (X & Y) & C rather >> - than (X & C) & Y. The former will often match >> - a canonical bit test when we get to RTL. */ >> - if ((rhs_code == BIT_AND_EXPR >> - || rhs_code == BIT_IOR_EXPR >> - || rhs_code == BIT_XOR_EXPR) >> + /* For binary bit operations, if there are at least 3 >> + operands and the last last operand in OPS is a constant, >> + move it to the front. This helps ensure that we generate >> + (X & Y) & C rather than (X & C) & Y. The former will >> + often match a canonical bit test when we get to RTL. */ >> + if (ops.length () != 2 > > So wouldn't it be clearer to write ops.length () > 2 ? > if (ops.length () == 0) > else if (ops.length () == 1) > come earlier, so it is the same thing, but might help the reader. Agreed. It's a trivial change, but I'll spin it with some other testing that's in progress. > >> + && (rhs_code == BIT_AND_EXPR >> + || rhs_code == BIT_IOR_EXPR >> + || rhs_code == BIT_XOR_EXPR) >> && TREE_CODE (ops.last ()->op) == INTEGER_CST) >> std::swap (*ops[0], *ops[ops_num - 1]); > > Don't you then want to put the constant as second operand rather than first, > i.e. swap with *ops[1]? I don't think it matters in practice. I recall being concerned that we could end up with non-canonical gimple, but we don't AFAICT. > And doesn't swap_ops_for_binary_stmt undo it again? No. It doesn't. The rank checks get in the way IIRC. FWIW that's where I'd originally put the transformation, but Richi wanted it in the caller. Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c7626ff..f5ca85e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2015-12-20 Jeff Law <law@redhat.com> + + PR tree-optimization/64910 + * tree-ssa-reassoc.c (swap_ops_for_binary_stmt): Make sure constant + is last for binary bit operations. + 2015-12-21 Pierre-Marie de Rodat <derodat@adacore.com> * dwarf2out.c (add_data_member_location_attribute): Do not diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index bb2ed22..e2f55f3 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2015-12-20 Jeff Law <law@redhat.com> + + PR tree-optimization/64910 + * gcc.dg/tree-ssa/pr64910-2.c.c: New test. + 2015-12-21 Claudiu Zissulescu <claziss@synopsys.com> * gcc.target/arc/builtin_general.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c new file mode 100644 index 0000000..2e3d679 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ + +/* We want to make sure that we reassociate in a way that has the + constant last. With the constant last, it's more likely to result + in a bitfield test on targets with such capabilities. */ + +extern void boo (); + +int b2b_uc (unsigned char u, unsigned char w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_us (unsigned short u, unsigned short w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_ui (unsigned int u, unsigned int w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_ul (unsigned long u, unsigned long w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_ull (unsigned long long u, unsigned long long w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_sc (signed char u, signed char w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_ss (signed short u, signed short w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_si (signed int u, signed int w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_sl (signed long u, signed long w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_sll (signed long long u, signed long long w) +{ + if ((u & w) & 0x20) + boo (); +} + +/* The AND of U & W should go into a temporary, when is then ANDed + with the constant. + + First verify that we have the right number of ANDs between U and W. */ +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ + +/* Then verify that we have the right number of ANDS between a temporary + and the constant. */ +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ + +/* Each function has one AND. It will have either a second AND or TEST. So + we can count the number of AND and TEST instructions. They must be 2X + the number of test functions in this file. */ +/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* x86_64-*-*} } } } */ + +/* Similarly on the m68k. The code for the long long tests is suboptimal, + which catch via the second pattern and xfail. */ +/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } } } */ +/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* } } } } */ + diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index e54700e..1dcfce3 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -3449,6 +3449,26 @@ swap_ops_for_binary_stmt (vec<operand_entry *> ops, oe1->op = temp.op; oe1->rank = temp.rank; } + /* For X OP Y OP C, associate as (X OP Y) OP C, but only for + logicals, which encourages bit operations. Experimentation + has shown this generally isn't a win for arithmetic. */ + else if (stmt) + { + enum tree_code opcode = gimple_assign_rhs_code (stmt); + if ((opcode == BIT_AND_EXPR + || opcode == BIT_IOR_EXPR + || opcode == BIT_XOR_EXPR) + && TREE_CODE (oe1->op) != INTEGER_CST + && TREE_CODE (oe2->op) != INTEGER_CST + && TREE_CODE (oe3->op) == INTEGER_CST) + { + operand_entry temp = *oe3; + oe3->op = oe1->op; + oe3->rank = oe1->rank; + oe1->op = temp.op; + oe1->rank= temp.rank; + } + } } /* If definition of RHS1 or RHS2 dominates STMT, return the later of those