Message ID | 053901d88209$015b0b10$04112130$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [rs6000] PR target/105991: Recognize PLUS and XOR forms of rldimi. | expand |
Hi! On Fri, Jun 17, 2022 at 07:13:37AM +0200, Roger Sayle wrote: > This patch addresses PR target/105991 where a change to prefer representing > shifts and adds at the tree-level as multiplications, causes problems for > the rldimi patterns in the powerpc backend. Because it now is converted to different RTL at expand time. Which the generic expand code does some premature optimisation on, which makes us end up with the addition instead of data manipulation insns. Oh well. > The issue is that rs6000.md > models this pattern using IOR, and some variants that have the equivalent > PLUS or XOR in the RTL fail to match some *rotl<mode>4_insert patterns. > This is fixed in this patch by adding a define_insn_and_split to locally > canonicalize the PLUS and XOR forms to the backend's preferred IOR form. Okay. > An alternative fix might be for the RTL optimizers to define a canonical > form for these plus_xor_ior equivalent expressions, but the logical > choice might be plus (which may appear in an addressing mode), and such > a change may require a number of tweaks to update various backends > (i.e. a more intrusive change than the one proposed here). This does not make sense in an address at all, thankfully :-) The only sane canonicalisation for this is something like VEC_DUPLICATE but for submodes of integer modes, instead of the component mode of a vector mode. I don't feel this is worth trying to handle in general though. > Many thanks for Marek Polacek for bootstrapping and regression testing > this change without problems. You have an account on the cfarm, it is quick and easy to test there :-) I recommend gcc135, a 32 core p9, with oodles of disk space :-) > +; Canonicalize the PLUS and XOR forms to IOR for rotl<mode>3_insert_3 > +(define_code_iterator plus_xor [plus xor]) > + > +(define_insn_and_split "*rotl<mode>3_insert_3_<code>" > + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > + (plus_xor:GPR > + (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0") > + (match_operand:GPR 4 "const_int_operand" "n")) > + (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") > + (match_operand:SI 2 "const_int_operand" "n"))))] > + "INTVAL (operands[2]) == exact_log2 (UINTVAL (operands[4]) + 1)" exact_log2 returns -1 if its argument is not a power of two. Please test it is > 0 explicitly here: I don't think this splitter will work correctly otherwise. There shouldn't really be a shift by 0 ever of course, but it isn't invalid RTL. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr105991.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +unsigned long long > +foo (unsigned long long value) > +{ > + value &= 0xffffffff; > + value |= value << 32; > + return value; > +} > +/* { dg-final { scan-assembler "rldimi" } } */ Write /* { dg-final { scan-assembler {\mrldimi\M} } } */ please. Okay for trunk with those changes. Thanks! Segher
on 2022/6/21 06:10, Segher Boessenkool wrote: > Hi! > > On Fri, Jun 17, 2022 at 07:13:37AM +0200, Roger Sayle wrote: >> This patch addresses PR target/105991 where a change to prefer representing >> shifts and adds at the tree-level as multiplications, causes problems for >> the rldimi patterns in the powerpc backend. > > Because it now is converted to different RTL at expand time. Which the > generic expand code does some premature optimisation on, which makes us > end up with the addition instead of data manipulation insns. Oh well. > >> The issue is that rs6000.md >> models this pattern using IOR, and some variants that have the equivalent >> PLUS or XOR in the RTL fail to match some *rotl<mode>4_insert patterns. >> This is fixed in this patch by adding a define_insn_and_split to locally >> canonicalize the PLUS and XOR forms to the backend's preferred IOR form. > > Okay. > >> An alternative fix might be for the RTL optimizers to define a canonical >> form for these plus_xor_ior equivalent expressions, but the logical >> choice might be plus (which may appear in an addressing mode), and such >> a change may require a number of tweaks to update various backends >> (i.e. a more intrusive change than the one proposed here). > > This does not make sense in an address at all, thankfully :-) > > The only sane canonicalisation for this is something like VEC_DUPLICATE > but for submodes of integer modes, instead of the component mode of a > vector mode. I don't feel this is worth trying to handle in general > though. > >> Many thanks for Marek Polacek for bootstrapping and regression testing >> this change without problems. > > You have an account on the cfarm, it is quick and easy to test there :-) > I recommend gcc135, a 32 core p9, with oodles of disk space :-) > >> +; Canonicalize the PLUS and XOR forms to IOR for rotl<mode>3_insert_3 >> +(define_code_iterator plus_xor [plus xor]) >> + >> +(define_insn_and_split "*rotl<mode>3_insert_3_<code>" >> + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") >> + (plus_xor:GPR >> + (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0") >> + (match_operand:GPR 4 "const_int_operand" "n")) >> + (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") >> + (match_operand:SI 2 "const_int_operand" "n"))))] >> + "INTVAL (operands[2]) == exact_log2 (UINTVAL (operands[4]) + 1)" > > exact_log2 returns -1 if its argument is not a power of two. Please > test it is > 0 explicitly here: I don't think this splitter will work > correctly otherwise. There shouldn't really be a shift by 0 ever of > course, but it isn't invalid RTL. > >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr105991.c >> @@ -0,0 +1,11 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> +unsigned long long >> +foo (unsigned long long value) >> +{ >> + value &= 0xffffffff; >> + value |= value << 32; >> + return value; >> +} >> +/* { dg-final { scan-assembler "rldimi" } } */ > > Write > /* { dg-final { scan-assembler {\mrldimi\M} } } */ > please. > This case also needs effective-target keyword lp64, that is /* { dg-require-effective-target lp64 } */ since with -m32, it gets: mr 3,4 with -m32 -mpowerpc64, it gets: rldicl 3,4,0,32 BR, Kewen
On Tue, Jun 21, 2022 at 10:03:18AM +0800, Kewen.Lin wrote: > This case also needs effective-target keyword lp64, > that is /* { dg-require-effective-target lp64 } */ Good point. Yes. It would be nice to have just has_arch_ppc64 really. > since with -m32, it gets: > mr 3,4 > > with -m32 -mpowerpc64, it gets: > rldicl 3,4,0,32 Yes, and that is not lp64 -- both longs and pointers are 32 bits when you have -m32. You get different code because parameter passing is different. The usual way to sidestep is to have the data in memory instead: unsigned long long x; void goo (void) { unsigned long long value = x; value &= 0xffffffff; value |= value << 32; x = value; } but then the compiler tries to be smart and do code like addis 10,2,.LANCHOR0+4@toc@ha lwz 10,.LANCHOR0+4@toc@l(10) sldi 9,10,32 add 9,9,10 addis 10,2,.LANCHOR0@toc@ha std 9,.LANCHOR0@toc@l(10) blr for -m64, and lis 9,x@ha la 10,x@l(9) lwz 10,4(10) stw 10,x@l(9) blr for just -m32, but lis 10,x@ha la 9,x@l(10) la 10,x@l(10) ld 9,0(9) rldicl 8,9,0,32 sldi 9,9,32 add 9,9,8 std 9,0(10) blr for -m32 -mpowerpc64 (note it has not managed to do the splitter here; it gets Failed to match this instruction: (set (reg:DI 128) (plus:DI (ashift:DI (reg/v:DI 117 [ value ]) (const_int 32 [0x20])) (zero_extend:DI (subreg:SI (reg/v:DI 117 [ value ]) 4)))) and then Failed to match this instruction: (set (reg:DI 128) (plus:DI (and:DI (reg/v:DI 117 [ value ]) (const_int 4294967295 [0xffffffff])) (ashift:DI (reg/v:DI 117 [ value ]) (const_int 32 [0x20])))) but that is not enough). So let's just do lp64, at least for now :-) Segher
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index c55ee7e..695ec33 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4188,6 +4188,23 @@ } [(set_attr "type" "insert")]) +; Canonicalize the PLUS and XOR forms to IOR for rotl<mode>3_insert_3 +(define_code_iterator plus_xor [plus xor]) + +(define_insn_and_split "*rotl<mode>3_insert_3_<code>" + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") + (plus_xor:GPR + (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0") + (match_operand:GPR 4 "const_int_operand" "n")) + (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") + (match_operand:SI 2 "const_int_operand" "n"))))] + "INTVAL (operands[2]) == exact_log2 (UINTVAL (operands[4]) + 1)" + "#" + "&& 1" + [(set (match_dup 0) + (ior:GPR (and:GPR (match_dup 3) (match_dup 4)) + (ashift:GPR (match_dup 1) (match_dup 2))))]) + (define_code_iterator plus_ior_xor [plus ior xor]) (define_split diff --git a/gcc/testsuite/gcc.target/powerpc/pr105991.c b/gcc/testsuite/gcc.target/powerpc/pr105991.c new file mode 100644 index 0000000..e853e53 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr105991.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +unsigned long long +foo (unsigned long long value) +{ + value &= 0xffffffff; + value |= value << 32; + return value; +} +/* { dg-final { scan-assembler "rldimi" } } */ +