Message ID | abe9b8be-8a30-9421-2ea4-b18f454ec711@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000: Use rldimi for vec init instead of shift + ior | expand |
Hi, I'd like to gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html BR, Kewen on 2020/12/22 下午4:08, Kewen.Lin via Gcc-patches wrote: > Hi, > > This patch is to make unsigned int vector init go with > rldimi to merge two integers instead of shift and ior. > > I tried to use nonzero_bits in md file to make it more > general, but the testing shows it isn't doable. The > reason is that some passes would replace some pseudos > with other pseudos and do the recog again, but at that > time the nonzero_bits could get rough information and > lead the recog fails unexpectedly. > > btw, the test case would reply on the combine patch[1]. > > Bootstrapped/regtested on powerpc64le-linux-gnu P9. > > BR, > Kewen > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561413.html > > gcc/ChangeLog: > > * config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to... > (rotl<mode>3_insert_3): ...this. > * config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3 > for integer merging. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/vec-init-10.c: New test. > > ----- >
Hi! On Tue, Dec 22, 2020 at 04:08:26PM +0800, Kewen.Lin wrote: > This patch is to make unsigned int vector init go with > rldimi to merge two integers instead of shift and ior. > > I tried to use nonzero_bits in md file to make it more > general, but the testing shows it isn't doable. The > reason is that some passes would replace some pseudos > with other pseudos and do the recog again, but at that > time the nonzero_bits could get rough information and > lead the recog fails unexpectedly. Aha. So nonzero_bits is unusable for most purposes as well. Great :-/ > btw, the test case would reply on the combine patch[1]. Can you make a different testcase perhaps? This patch looks fine otherwise. (Ideally the compiler would just form those insert insns itself, of course. Why doesn't that work? This is important for quite a lot of code, we really can advantageously use the rl*imi insns often!) Segher
Hi Segher, on 2021/1/15 上午8:50, Segher Boessenkool wrote: > Hi! > > On Tue, Dec 22, 2020 at 04:08:26PM +0800, Kewen.Lin wrote: >> This patch is to make unsigned int vector init go with >> rldimi to merge two integers instead of shift and ior. >> >> I tried to use nonzero_bits in md file to make it more >> general, but the testing shows it isn't doable. The >> reason is that some passes would replace some pseudos >> with other pseudos and do the recog again, but at that >> time the nonzero_bits could get rough information and >> lead the recog fails unexpectedly. > > Aha. So nonzero_bits is unusable for most purposes as well. Great :-/ > >> btw, the test case would reply on the combine patch[1]. > > Can you make a different testcase perhaps? This patch looks fine > otherwise. > Thanks! But I'm sorry that there is a typo, it should be "rely on" rather than "reply on", without the patch[1] "combine: zeroing cost for new copies", this test case doesn't get the desirable result. I'll explain it more in that patch's thread. If your uncommitted patch there with define_split and nonzero_bits works, this patch can be discarded totally. [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561413.html > (Ideally the compiler would just form those insert insns itself, of > course. Why doesn't that work? This is important for quite a lot of > code, we really can advantageously use the rl*imi insns often!) > The reason why it doesn't work is that the *rotl<mode>3_insert_3 pattern requires one explicit AND (with some masks) there but in this case the AND is optimized away due to that nonzero_bits indicates the AND is redundant. BR, Kewen
on 2021/1/15 下午2:40, Kewen.Lin via Gcc-patches wrote: > Hi Segher, > > on 2021/1/15 上午8:50, Segher Boessenkool wrote: >> Hi! >> >> On Tue, Dec 22, 2020 at 04:08:26PM +0800, Kewen.Lin wrote: >>> This patch is to make unsigned int vector init go with >>> rldimi to merge two integers instead of shift and ior. >>> >>> I tried to use nonzero_bits in md file to make it more >>> general, but the testing shows it isn't doable. The >>> reason is that some passes would replace some pseudos >>> with other pseudos and do the recog again, but at that >>> time the nonzero_bits could get rough information and >>> lead the recog fails unexpectedly. >> >> Aha. So nonzero_bits is unusable for most purposes as well. Great :-/ >> >>> btw, the test case would reply on the combine patch[1]. >> >> Can you make a different testcase perhaps? This patch looks fine >> otherwise. >> > > Thanks! But I'm sorry that there is a typo, it should be "rely on" > rather than "reply on", without the patch[1] "combine: zeroing cost > for new copies", this test case doesn't get the desirable result. > I'll explain it more in that patch's thread. If your uncommitted > patch there with define_split and nonzero_bits works, this patch > can be discarded totally. After the testing and the commit log shows, I realized that that patch can only work for three instruction combination, so this patch is still needed. BR, Kewen
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 0e799198a50..3529b79d35d 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4067,7 +4067,7 @@ [(set_attr "type" "insert")]) ; There are also some forms without one of the ANDs. -(define_insn "*rotl<mode>3_insert_3" +(define_insn "rotl<mode>3_insert_3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0") (match_operand:GPR 4 "const_int_operand" "n")) diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 947631d83ee..37105a5aabf 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3008,28 +3008,22 @@ (use (match_operand:SI 4 "gpc_reg_operand"))] "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT" { - rtx a = gen_reg_rtx (DImode); - rtx b = gen_reg_rtx (DImode); - rtx c = gen_reg_rtx (DImode); - rtx d = gen_reg_rtx (DImode); - emit_insn (gen_zero_extendsidi2 (a, operands[1])); - emit_insn (gen_zero_extendsidi2 (b, operands[2])); - emit_insn (gen_zero_extendsidi2 (c, operands[3])); - emit_insn (gen_zero_extendsidi2 (d, operands[4])); + rtx a = gen_lowpart_SUBREG (DImode, operands[1]); + rtx b = gen_lowpart_SUBREG (DImode, operands[2]); + rtx c = gen_lowpart_SUBREG (DImode, operands[3]); + rtx d = gen_lowpart_SUBREG (DImode, operands[4]); if (!BYTES_BIG_ENDIAN) { std::swap (a, b); std::swap (c, d); } - rtx aa = gen_reg_rtx (DImode); rtx ab = gen_reg_rtx (DImode); - rtx cc = gen_reg_rtx (DImode); rtx cd = gen_reg_rtx (DImode); - emit_insn (gen_ashldi3 (aa, a, GEN_INT (32))); - emit_insn (gen_ashldi3 (cc, c, GEN_INT (32))); - emit_insn (gen_iordi3 (ab, aa, b)); - emit_insn (gen_iordi3 (cd, cc, d)); + emit_insn (gen_rotldi3_insert_3 (ab, a, GEN_INT (32), b, + GEN_INT (0xffffffff))); + emit_insn (gen_rotldi3_insert_3 (cd, c, GEN_INT (32), d, + GEN_INT (0xffffffff))); rtx abcd = gen_reg_rtx (V2DImode); emit_insn (gen_vsx_concat_v2di (abcd, ab, cd)); diff --git a/gcc/testsuite/gcc.target/powerpc/vec-init-10.c b/gcc/testsuite/gcc.target/powerpc/vec-init-10.c new file mode 100644 index 00000000000..680538e67f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-init-10.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ + +/* Check that we can optimize sldi + or to rldimi for vector int init. */ + +vector unsigned int +testu (unsigned int i1, unsigned int i2, unsigned int i3, unsigned int i4) +{ + vector unsigned int v = {i1, i2, i3, i4}; + return v; +} + +vector signed int +tests (signed int i1, signed int i2, signed int i3, signed int i4) +{ + vector signed int v = {i1, i2, i3, i4}; + return v; +} + +/* { dg-final { scan-assembler-not "sldi" } } */ +/* { dg-final { scan-assembler-not "or" } } */ +/* { dg-final { scan-assembler-times {\mrldimi\M} 4 } } */