Message ID | 4c85b45c-fbaa-5509-2344-91113478e2d1@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000: Use rldimi for vec init instead of shift + ior | expand |
Hi, I'd like to gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564758.html BR, Kewen on 2021/2/3 下午2:37, Kewen.Lin via Gcc-patches wrote: > Hi, > > This patch merges the previously approved one[1] and its relied patch > made by Segher here[2], it's to make unsigned int vector init go with > rldimi to merge two integers instead of shift and ior. > > Segher's patch in [2] is required to make the test case pass, > otherwise the costing for new pseudo-to-pseudo copies and the folding > with nonzero_bits in combine will make the rl*imi pattern become > compact and split into ior and shift unexpectedly. > > The commit log of Segher's patch describes it in more details: > > "An rl*imi is usually written as an IOR of an ASHIFT or similar, and an > AND of a register with a constant mask. In some cases combine knows > that that AND doesn't do anything (because all zero bits in that mask > correspond to bits known to be already zero), and then no pattern > matches. This patch adds a define_split for such cases. It uses > nonzero_bits in the condition of the splitter, but does not need it > afterwards for the instruction to be recognised. This is necessary > because later passes can see fewer nonzero_bits. > > Because it is a splitter, combine will only use it when starting with > three insns (or more), even though the result is just one. This isn't > a huge problem in practice, but some possible combinations still won't > happen." > > Bootstrapped/regtested on powerpc64le-linux-gnu P9 and > powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9. > > Is it ok for trunk? > > BR, > Kewen > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html > [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html > > > gcc/ChangeLog: > > 2020-02-03 Segher Boessenkool <segher@kernel.crashing.org> > Kewen Lin <linkw@gcc.gnu.org> > > * config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to... > (rotl<mode>3_insert_3): ...this. > (plus_ior_xor): New code_iterator. > (define_split for GPR rl*imi): New splitter. > * config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3 > for integer merging. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/vec-init-10.c: New test. > > ----- >
On Wed, 2021-02-03 at 14:37 +0800, Kewen.Lin via Gcc-patches wrote: > Hi, > Hi, > This patch merges the previously approved one[1] and its relied patch I don't see the review for [1] in the archives. > made by Segher here[2], it's to make unsigned int vector init go with > rldimi to merge two integers instead of shift and ior. > > Segher's patch in [2] is required to make the test case pass, > otherwise the costing for new pseudo-to-pseudo copies and the folding > with nonzero_bits in combine will make the rl*imi pattern become > compact and split into ior and shift unexpectedly. > > The commit log of Segher's patch describes it in more details: > > "An rl*imi is usually written as an IOR of an ASHIFT or similar, and an > AND of a register with a constant mask. In some cases combine knows > that that AND doesn't do anything (because all zero bits in that mask > correspond to bits known to be already zero), and then no pattern > matches. This patch adds a define_split for such cases. It uses > nonzero_bits in the condition of the splitter, but does not need it > afterwards for the instruction to be recognised. This is necessary > because later passes can see fewer nonzero_bits. > > Because it is a splitter, combine will only use it when starting with > three insns (or more), even though the result is just one. This isn't > a huge problem in practice, but some possible combinations still won't > happen." > > Bootstrapped/regtested on powerpc64le-linux-gnu P9 and > powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9. > > Is it ok for trunk? > > BR, > Kewen > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html > [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html > > > gcc/ChangeLog: > > 2020-02-03 Segher Boessenkool <segher@kernel.crashing.org> > Kewen Lin <linkw@gcc.gnu.org> > > * config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to... > (rotl<mode>3_insert_3): ...this. ok > (plus_ior_xor): New code_iterator. > (define_split for GPR rl*imi): New splitter. As described above, these two changes appear to identical to what was posted by Segher in [1]. ( > * config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3 > for integer merging. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/vec-init-10.c: New test. +/* { dg-final { scan-assembler-not "or" } } */ As is, it looks OK to me. Per other reviews i've gotten, you may get a request to wrap the "or" with \m \M . Some existing cases with scan-assembler-not have a leading whitespace/tab qualifier too. i.e. /* { dg-final { scan-assembler-not "\[ \t\]or " } } */ Thanks, -Will > > ----- >
On Thu, Feb 18, 2021 at 11:58:59AM -0600, will schmidt wrote: > On Wed, 2021-02-03 at 14:37 +0800, Kewen.Lin via Gcc-patches wrote: > > This patch merges the previously approved one[1] and its relied patch > > I don't see the review for [1] in the archives. I said: Can you make a different testcase perhaps? This patch looks fine otherwise. > > made by Segher here[2], it's to make unsigned int vector init go with > > rldimi to merge two integers instead of shift and ior. > > > > Segher's patch in [2] is required to make the test case pass, > > otherwise the costing for new pseudo-to-pseudo copies and the folding > > with nonzero_bits in combine will make the rl*imi pattern become > > compact and split into ior and shift unexpectedly. > > > > The commit log of Segher's patch describes it in more details: > > > > "An rl*imi is usually written as an IOR of an ASHIFT or similar, and an > > AND of a register with a constant mask. In some cases combine knows > > that that AND doesn't do anything (because all zero bits in that mask > > correspond to bits known to be already zero), and then no pattern > > matches. This patch adds a define_split for such cases. It uses > > nonzero_bits in the condition of the splitter, but does not need it > > afterwards for the instruction to be recognised. This is necessary > > because later passes can see fewer nonzero_bits. > > > > Because it is a splitter, combine will only use it when starting with > > three insns (or more), even though the result is just one. This isn't > > a huge problem in practice, but some possible combinations still won't > > happen." > > > > Bootstrapped/regtested on powerpc64le-linux-gnu P9 and > > powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9. > > > > Is it ok for trunk? > > > > BR, > > Kewen > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html > > [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html > > > > > > gcc/ChangeLog: > > > > 2020-02-03 Segher Boessenkool <segher@kernel.crashing.org> > > Kewen Lin <linkw@gcc.gnu.org> > > > > * config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to... > > (rotl<mode>3_insert_3): ...this. > > ok > > > (plus_ior_xor): New code_iterator. > > (define_split for GPR rl*imi): New splitter. > > As described above, these two changes appear to identical to what was > posted by Segher in [1]. But with testcase :-) > +/* { dg-final { scan-assembler-not "or" } } */ > > As is, it looks OK to me. Per other reviews i've gotten, you may > get a request to wrap the "or" with \m \M . Right, because it is very easy to end up with "or" as a substring of something else. > Some existing cases with > scan-assembler-not have a leading whitespace/tab qualifier too. > i.e. > /* { dg-final { scan-assembler-not "\[ \t\]or " } } */ Yeah, but the \m in {\mor\M} will do the same already. Thanks, Segher
Hi! On Wed, Feb 03, 2021 at 02:37:05PM +0800, Kewen.Lin wrote: > This patch merges the previously approved one[1] and its relied patch > made by Segher here[2], it's to make unsigned int vector init go with > rldimi to merge two integers instead of shift and ior. > gcc/ChangeLog: > > 2020-02-03 Segher Boessenkool <segher@kernel.crashing.org> > Kewen Lin <linkw@gcc.gnu.org> > > * config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to... > (rotl<mode>3_insert_3): ...this. > (plus_ior_xor): New code_iterator. > (define_split for GPR rl*imi): New splitter. > * config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3 > for integer merging. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/vec-init-10.c: New test. Is there a PR you should mention here? > +/* { dg-final { scan-assembler-not "sldi" } } */ > +/* { dg-final { scan-assembler-not "or" } } */ > +/* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */ /* { dg-final { scan-assembler-not {\msldi\M} } } */ /* { dg-final { scan-assembler-not {\mor\M} } } */ /* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */ Okay for trunk with that tweak. Thanks! Segher
Hi Segher & Will, Thanks for your review comments! on 2021/2/19 上午2:33, Segher Boessenkool wrote: > Hi! > > On Wed, Feb 03, 2021 at 02:37:05PM +0800, Kewen.Lin wrote: >> This patch merges the previously approved one[1] and its relied patch >> made by Segher here[2], it's to make unsigned int vector init go with >> rldimi to merge two integers instead of shift and ior. > >> gcc/ChangeLog: >> >> 2020-02-03 Segher Boessenkool <segher@kernel.crashing.org> >> Kewen Lin <linkw@gcc.gnu.org> >> >> * config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to... >> (rotl<mode>3_insert_3): ...this. >> (plus_ior_xor): New code_iterator. >> (define_split for GPR rl*imi): New splitter. >> * config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3 >> for integer merging. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/vec-init-10.c: New test. > > Is there a PR you should mention here? I thought this is trivial so didn't file one upstream PR. I did a searching in upstream bugzilla, PR93453 looks similar but different. I confirmed that the current patch can't make it pass (just two insns instead of three insns). Do you happen to have one related in mind? If no, I will commit it without PR. > >> +/* { dg-final { scan-assembler-not "sldi" } } */ >> +/* { dg-final { scan-assembler-not "or" } } */ >> +/* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */ > > /* { dg-final { scan-assembler-not {\msldi\M} } } */ > /* { dg-final { scan-assembler-not {\mor\M} } } */ > /* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */ > > Okay for trunk with that tweak. Thanks! > Will fix it, thanks! BR, Kewen
Hi! On Fri, Feb 19, 2021 at 11:06:16AM +0800, Kewen.Lin wrote: > on 2021/2/19 上午2:33, Segher Boessenkool wrote: > > Is there a PR you should mention here? > > I thought this is trivial so didn't file one upstream PR. I did a > searching in upstream bugzilla, PR93453 looks similar but different. > I confirmed that the current patch can't make it pass (just two insns > instead of three insns). > > Do you happen to have one related in mind? If no, I will commit it > without PR. That is fine. I just asked if perhaps you forgot to mention a PR; I do that all the time myself! Filing a new PR if a patch is ready and approved is just bureaucracy, so let's not (it is useful if it needs backporting though, to have a place to track that). Thanks, Segher
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index bb9fb42f82a..dca311ebc80 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4067,7 +4067,7 @@ [(set_attr "type" "insert")]) ; There are also some forms without one of the ANDs. -(define_insn "*rotl<mode>3_insert_3" +(define_insn "rotl<mode>3_insert_3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0") (match_operand:GPR 4 "const_int_operand" "n")) @@ -4082,6 +4082,24 @@ } [(set_attr "type" "insert")]) +(define_code_iterator plus_ior_xor [plus ior xor]) + +(define_split + [(set (match_operand:GPR 0 "gpc_reg_operand") + (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:SI 2 "const_int_operand")) + (match_operand:GPR 3 "gpc_reg_operand")))] + "nonzero_bits (operands[3], <MODE>mode) + < HOST_WIDE_INT_1U << INTVAL (operands[2])" + [(set (match_dup 0) + (ior:GPR (and:GPR (match_dup 3) + (match_dup 4)) + (ashift:GPR (match_dup 1) + (match_dup 2))))] +{ + operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1); +}) + (define_insn "*rotl<mode>3_insert_4" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0") diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0c1bda522a9..07c2f7ffa6e 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3008,28 +3008,22 @@ (use (match_operand:SI 4 "gpc_reg_operand"))] "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT" { - rtx a = gen_reg_rtx (DImode); - rtx b = gen_reg_rtx (DImode); - rtx c = gen_reg_rtx (DImode); - rtx d = gen_reg_rtx (DImode); - emit_insn (gen_zero_extendsidi2 (a, operands[1])); - emit_insn (gen_zero_extendsidi2 (b, operands[2])); - emit_insn (gen_zero_extendsidi2 (c, operands[3])); - emit_insn (gen_zero_extendsidi2 (d, operands[4])); + rtx a = gen_lowpart_SUBREG (DImode, operands[1]); + rtx b = gen_lowpart_SUBREG (DImode, operands[2]); + rtx c = gen_lowpart_SUBREG (DImode, operands[3]); + rtx d = gen_lowpart_SUBREG (DImode, operands[4]); if (!BYTES_BIG_ENDIAN) { std::swap (a, b); std::swap (c, d); } - rtx aa = gen_reg_rtx (DImode); rtx ab = gen_reg_rtx (DImode); - rtx cc = gen_reg_rtx (DImode); rtx cd = gen_reg_rtx (DImode); - emit_insn (gen_ashldi3 (aa, a, GEN_INT (32))); - emit_insn (gen_ashldi3 (cc, c, GEN_INT (32))); - emit_insn (gen_iordi3 (ab, aa, b)); - emit_insn (gen_iordi3 (cd, cc, d)); + emit_insn (gen_rotldi3_insert_3 (ab, a, GEN_INT (32), b, + GEN_INT (0xffffffff))); + emit_insn (gen_rotldi3_insert_3 (cd, c, GEN_INT (32), d, + GEN_INT (0xffffffff))); rtx abcd = gen_reg_rtx (V2DImode); emit_insn (gen_vsx_concat_v2di (abcd, ab, cd)); diff --git a/gcc/testsuite/gcc.target/powerpc/vec-init-10.c b/gcc/testsuite/gcc.target/powerpc/vec-init-10.c new file mode 100644 index 00000000000..680538e67f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-init-10.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ + +/* Check that we can optimize sldi + or to rldimi for vector int init. */ + +vector unsigned int +testu (unsigned int i1, unsigned int i2, unsigned int i3, unsigned int i4) +{ + vector unsigned int v = {i1, i2, i3, i4}; + return v; +} + +vector signed int +tests (signed int i1, signed int i2, signed int i3, signed int i4) +{ + vector signed int v = {i1, i2, i3, i4}; + return v; +} + +/* { dg-final { scan-assembler-not "sldi" } } */ +/* { dg-final { scan-assembler-not "or" } } */ +/* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */