Message ID | 20180201193116.GA15164@ibm-tiger.the-meissners.org |
---|---|
State | New |
Headers | show |
Series | PowerPC PR target/84154, fix floating point to small integer conversion regression | expand |
On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote: > This patch fixes the optimization regression that occurred on GCC 7 where > conversions from the various floating point types to small integers would at > times generate a store and a load. [ snip big explanation; thanks for that! ] Could you merge the signed and unsigned patterns, using any_fix? Or is there a reason that cannot work (other than that <su> unsigned_fix seems buggy, it should say "u")? Okay for trunk even without that (but please try). Also okay for 7 after looking for fallout. Thanks! Segher
On Mon, Feb 05, 2018 at 05:57:25AM -0600, Segher Boessenkool wrote: > On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote: > > This patch fixes the optimization regression that occurred on GCC 7 where > > conversions from the various floating point types to small integers would at > > times generate a store and a load. > > [ snip big explanation; thanks for that! ] > > Could you merge the signed and unsigned patterns, using any_fix? Or is > there a reason that cannot work (other than that <su> unsigned_fix seems > buggy, it should say "u")? Well that's the way the instructions are. For traditional FPR instructions we have FCTIWZ vs. FCTIWUZ, while on the VSX side we have XSCVDPSXWS vs. XSCVDPUXWS. If you mean the name of the insn, I can change it if desired, but originally it was based on the FPR insn name. > Okay for trunk even without that (but please try). Also okay for 7 after > looking for fallout. In the past, I have found that combining code iterators with two mode iterators has a bug where it would use the wrong code iterator, so I just avoided doing that. I'll see if it is still a bug.
On Mon, Feb 05, 2018 at 07:54:58AM -0500, Michael Meissner wrote: > On Mon, Feb 05, 2018 at 05:57:25AM -0600, Segher Boessenkool wrote: > > On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote: > > > This patch fixes the optimization regression that occurred on GCC 7 where > > > conversions from the various floating point types to small integers would at > > > times generate a store and a load. > > > > [ snip big explanation; thanks for that! ] > > > > Could you merge the signed and unsigned patterns, using any_fix? Or is > > there a reason that cannot work (other than that <su> unsigned_fix seems > > buggy, it should say "u")? > > Well that's the way the instructions are. For traditional FPR instructions we > have FCTIWZ vs. FCTIWUZ, while on the VSX side we have XSCVDPSXWS > vs. XSCVDPUXWS. If you mean the name of the insn, I can change it if desired, > but originally it was based on the FPR insn name. We have (define_code_attr su [(sign_extend "s") (zero_extend "u") (fix "s") (unsigned_fix "s") (float "s") (unsigned_float "u")]) and "s" for unsigned_fix seems like it should be "u". Very surprising otherwise (if this is needed in some case, it should just write "s" there instead of "<su>"). > > Okay for trunk even without that (but please try). Also okay for 7 after > > looking for fallout. > > In the past, I have found that combining code iterators with two mode iterators > has a bug where it would use the wrong code iterator, so I just avoided doing > that. I'll see if it is still a bug. Hrm. If you have multiple iterators you often need to use ":" syntax, and you might want that anyway because the precedence rules are non-obvious; but you are hitting something else? Please open a PR if so :-) Segher
On Mon, Feb 05, 2018 at 08:01:32AM -0600, Segher Boessenkool wrote: > On Mon, Feb 05, 2018 at 07:54:58AM -0500, Michael Meissner wrote: > > On Mon, Feb 05, 2018 at 05:57:25AM -0600, Segher Boessenkool wrote: > > > On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote: > > > > This patch fixes the optimization regression that occurred on GCC 7 where > > > > conversions from the various floating point types to small integers would at > > > > times generate a store and a load. > > > > > > [ snip big explanation; thanks for that! ] > > > > > > Could you merge the signed and unsigned patterns, using any_fix? Or is > > > there a reason that cannot work (other than that <su> unsigned_fix seems > > > buggy, it should say "u")? > > > > Well that's the way the instructions are. For traditional FPR instructions we > > have FCTIWZ vs. FCTIWUZ, while on the VSX side we have XSCVDPSXWS > > vs. XSCVDPUXWS. If you mean the name of the insn, I can change it if desired, > > but originally it was based on the FPR insn name. > > We have > > (define_code_attr su [(sign_extend "s") > (zero_extend "u") > (fix "s") > (unsigned_fix "s") > (float "s") > (unsigned_float "u")]) > > and "s" for unsigned_fix seems like it should be "u". Very surprising > otherwise (if this is needed in some case, it should just write "s" there > instead of "<su>"). > > > > Okay for trunk even without that (but please try). Also okay for 7 after > > > looking for fallout. > > > > In the past, I have found that combining code iterators with two mode iterators > > has a bug where it would use the wrong code iterator, so I just avoided doing > > that. I'll see if it is still a bug. > > Hrm. If you have multiple iterators you often need to use ":" syntax, > and you might want that anyway because the precedence rules are non-obvious; > but you are hitting something else? Please open a PR if so :-) I already do use the : syntax. I found as long as I avoid putting the <su> or <u> in the output template (i.e. use an output statement instead of a template) it works. It only seems to fail in the IEEE128 case, and not in the SFDF case. I will submit a bug report on it after this gets checked in, as it will be simpler to provide a patch that people can test. This patch cleans up all of the {,unsigned_}fix patterns that convert to QImode or HImode. It adds the memory combiner to include SImode to the mix. While I was at it, I combined the IEEE insns to use any_fix. I have checked this patch on both little endian and big endian power8 systems, and I have have hand checked the code for power7 and power9 systems. There were no regressions in the test suite, the new tests were verified to run, and I did bootstrap builds on both systems. The big endian power8 system also 32-bit tests as well as 64-bit tests. Can I apply this patch to the trunk and after a waiting period, apply the patch to the GCC 7 branch? [gcc] 2018-02-05 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/84154 * config/rs6000/rs6000.md (fix_trunc<SFDF:mode><QHI:mode>2): Convert from define_expand to be define_insn_and_split. Rework float/double/_Float128 conversions to QI/HI/SImode to work with both ISA 2.07 (power8) or ISA 3.0 (power9). Fix regression where conversions to QI/HImode types did a store and then a load to truncate the value. For conversions to VSX registers, don't split the insn, instead emit the code directly. Use the code iterator any_fix to combine signed and unsigned conversions. (fix<uns>_trunc<SFDF:mode>si2_p8): Likewise. (fixuns_trunc<SFDF:mode><QHI:mode>2): Likewise. (fix_trunc<IEEE128:mode><QHI:mode>2): Likewise. (fix<uns>_trunc<SFDF:mode><QHI:mode>2): Likewise. (fix_<mode>di2_hw): Likewise. (fixuns_<mode>di2_hw): Likewise. (fix_<mode>si2_hw): Likewise. (fixuns_<mode>si2_hw): Likewise. (fix<uns>_<IEEE128:mode><SDI:mode>2_hw): Likewise. (fix<uns>_trunc<IEEE128:mode><QHI:mode>2): Likewise. (fctiw<u>z_<mode>_smallint): Rename fctiw<u>z_<mode>_smallint to fix<uns>_trunc<SFDF:mode>si2_p8. (fix_trunc<SFDF:mode><QHI:mode>2_internal): Delete, no longer used. (fixuns_trunc<SFDF:mode><QHI:mode>2_internal): Likewise. (fix<uns>_<mode>_mem): Likewise. (fctiw<u>z_<mode>_mem): Likewise. (fix<uns>_<mode>_mem): Likewise. (fix<uns>_trunc<SFDF:mode><QHSI:mode>2_mem): On ISA 3.0, prevent the register allocator from doing a direct move to the GPRs to do a store, and instead use the ISA 3.0 store byte/half-word from vector register instruction. For IEEE 128-bit floating point, also optimize stores of 32-bit ints. (fix<uns>_trunc<IEEE128:mode><QHSI:mode>2_mem): Likewise. [gcc/testsuite] 2018-02-05 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/84154 * gcc.target/powerpc/pr84154-1.c: New tests. * gcc.target/powerpc/pr84154-2.c: Likewise. * gcc.target/powerpc/pr84154-3.c: Likewise.
Hi! On Mon, Feb 05, 2018 at 05:57:51PM -0500, Michael Meissner wrote: > > We have > > > > (define_code_attr su [(sign_extend "s") > > (zero_extend "u") > > (fix "s") > > (unsigned_fix "s") > > (float "s") > > (unsigned_float "u")]) > > > > and "s" for unsigned_fix seems like it should be "u". Very surprising > > otherwise (if this is needed in some case, it should just write "s" there > > instead of "<su>"). What about this? > I found as long as I avoid putting the <su> or <u> in the output template > (i.e. use an output statement instead of a template) it works. It only > seems to fail in the IEEE128 case, and not in the SFDF case. I will submit a > bug report on it after this gets checked in, as it will be simpler to provide > a patch that people can test. Thanks. > +(define_insn_and_split "fix<uns>_trunc<SFDF:mode><QHI:mode>2" > + [(set (match_operand:<QHI:MODE> 0 "gpc_reg_operand" "=wJ,wJwK,r") > + (any_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa"))) > + (clobber (match_scratch:SI 2 "=X,X,wi"))] > + "TARGET_DIRECT_MOVE" > + "@ > + fctiw<u>z %0,%1 > + xscvdp<su>xws %x0,%x1 This one cannot work with the <su> definition above, for example. > +(define_insn "fix<uns>_<IEEE128:mode><SDI:mode>2_hw" > + [(set (match_operand:SDI 0 "altivec_register_operand" "=v") > + (any_fix:SDI (match_operand:IEEE128 1 "altivec_register_operand" "v")))] > + "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)" > +{ > + return (<CODE> == UNSIGNED_FIX) ? "xscvqpu<wd>z %0,%1" : "xscvqps<wd>z %0,%1"; > +} And it may well be why you need this. Rest looks good :-) Segher
On Tue, Feb 06, 2018 at 11:15:21AM -0600, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 05, 2018 at 05:57:51PM -0500, Michael Meissner wrote: > > > We have > > > > > > (define_code_attr su [(sign_extend "s") > > > (zero_extend "u") > > > (fix "s") > > > (unsigned_fix "s") > > > (float "s") > > > (unsigned_float "u")]) > > > > > > and "s" for unsigned_fix seems like it should be "u". Very surprising > > > otherwise (if this is needed in some case, it should just write "s" there > > > instead of "<su>"). > > What about this? You were right. I kept missing this. Sorry. Here is the patch for just this fix. As we've discussed, I'll apply this patch directly as being obvious, while waiting for the other builds. [gcc] 2018-02-06 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/84154 * config/rs6000/rs6000.md (su code attribute): Use "u" for unsigned_fix, not "s". Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 257269) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -550,7 +550,7 @@ (define_code_attr u [(sign_extend "") (define_code_attr su [(sign_extend "s") (zero_extend "u") (fix "s") - (unsigned_fix "s") + (unsigned_fix "u") (float "s") (unsigned_float "u")])
On Tue, Feb 06, 2018 at 11:15:21AM -0600, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 05, 2018 at 05:57:51PM -0500, Michael Meissner wrote: > > > We have > > > > > > (define_code_attr su [(sign_extend "s") > > > (zero_extend "u") > > > (fix "s") > > > (unsigned_fix "s") > > > (float "s") > > > (unsigned_float "u")]) > > > > > > and "s" for unsigned_fix seems like it should be "u". Very surprising > > > otherwise (if this is needed in some case, it should just write "s" there > > > instead of "<su>"). > > What about this? As we discussed before, this was a bug, and I've already committed the fix separately. > > I found as long as I avoid putting the <su> or <u> in the output template > > (i.e. use an output statement instead of a template) it works. It only > > seems to fail in the IEEE128 case, and not in the SFDF case. I will submit a > > bug report on it after this gets checked in, as it will be simpler to provide > > a patch that people can test. > > Thanks. > > > +(define_insn_and_split "fix<uns>_trunc<SFDF:mode><QHI:mode>2" > > + [(set (match_operand:<QHI:MODE> 0 "gpc_reg_operand" "=wJ,wJwK,r") > > + (any_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa"))) > > + (clobber (match_scratch:SI 2 "=X,X,wi"))] > > + "TARGET_DIRECT_MOVE" > > + "@ > > + fctiw<u>z %0,%1 > > + xscvdp<su>xws %x0,%x1 > > This one cannot work with the <su> definition above, for example. > > > +(define_insn "fix<uns>_<IEEE128:mode><SDI:mode>2_hw" > > + [(set (match_operand:SDI 0 "altivec_register_operand" "=v") > > + (any_fix:SDI (match_operand:IEEE128 1 "altivec_register_operand" "v")))] > > + "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)" > > +{ > > + return (<CODE> == UNSIGNED_FIX) ? "xscvqpu<wd>z %0,%1" : "xscvqps<wd>z %0,%1"; > > +} > > And it may well be why you need this. > > Rest looks good :-) Here is the patch reworked. It bootstraps on both little/big endian power8, and all of the tests run. Can I install this into trunk now, and into GCC 7 after a soak period (along with the previous patch)? [gcc] 2018-02-06 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/84154 * config/rs6000/rs6000.md (fix_trunc<SFDF:mode><QHI:mode>2): Convert from define_expand to be define_insn_and_split. Rework float/double/_Float128 conversions to QI/HI/SImode to work with both ISA 2.07 (power8) or ISA 3.0 (power9). Fix regression where conversions to QI/HImode types did a store and then a load to truncate the value. For conversions to VSX registers, don't split the insn, instead emit the code directly. Use the code iterator any_fix to combine signed and unsigned conversions. (fix<uns>_trunc<SFDF:mode>si2_p8): Likewise. (fixuns_trunc<SFDF:mode><QHI:mode>2): Likewise. (fix_trunc<IEEE128:mode><QHI:mode>2): Likewise. (fix<uns>_trunc<SFDF:mode><QHI:mode>2): Likewise. (fix_<mode>di2_hw): Likewise. (fixuns_<mode>di2_hw): Likewise. (fix_<mode>si2_hw): Likewise. (fixuns_<mode>si2_hw): Likewise. (fix<uns>_<IEEE128:mode><SDI:mode>2_hw): Likewise. (fix<uns>_trunc<IEEE128:mode><QHI:mode>2): Likewise. (fctiw<u>z_<mode>_smallint): Rename fctiw<u>z_<mode>_smallint to fix<uns>_trunc<SFDF:mode>si2_p8. (fix_trunc<SFDF:mode><QHI:mode>2_internal): Delete, no longer used. (fixuns_trunc<SFDF:mode><QHI:mode>2_internal): Likewise. (fix<uns>_<mode>_mem): Likewise. (fctiw<u>z_<mode>_mem): Likewise. (fix<uns>_<mode>_mem): Likewise. (fix<uns>_trunc<SFDF:mode><QHSI:mode>2_mem): On ISA 3.0, prevent the register allocator from doing a direct move to the GPRs to do a store, and instead use the ISA 3.0 store byte/half-word from vector register instruction. For IEEE 128-bit floating point, also optimize stores of 32-bit ints. (fix<uns>_trunc<IEEE128:mode><QHSI:mode>2_mem): Likewise. [gcc/testsuite] 2018-02-06 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/84154 * gcc.target/powerpc/pr84154-1.c: New tests. * gcc.target/powerpc/pr84154-2.c: Likewise. * gcc.target/powerpc/pr84154-3.c: Likewise.
Hi Mike, On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote: > Here is the patch reworked. It bootstraps on both little/big endian power8, > and all of the tests run. Can I install this into trunk now, and into GCC 7 > after a soak period (along with the previous patch)? > +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR "If we have"? > + [(set (match_operand:QHSI 0 "memory_operand" "=Z") > + (any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa"))) > + (clobber (match_scratch:SI 2 "=wa"))] > + "((<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR) > + || (<QHSI:MODE>mode != SImode && TARGET_P9_VECTOR))" This is the same as "(<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR" which is a bit easier to understand I think. Okay (for trunk as well as 7) with those trivialities improved. Thanks! Segher
On Wed, 7 Feb 2018, Segher Boessenkool wrote: > Hi Mike, > > On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote: > > Here is the patch reworked. It bootstraps on both little/big endian power8, > > and all of the tests run. Can I install this into trunk now, and into GCC 7 > > after a soak period (along with the previous patch)? > > > +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR > > "If we have"? > > > + [(set (match_operand:QHSI 0 "memory_operand" "=Z") > > + (any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa"))) > > + (clobber (match_scratch:SI 2 "=wa"))] > > + "((<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR) > > + || (<QHSI:MODE>mode != SImode && TARGET_P9_VECTOR))" > > This is the same as > > "(<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR" Umm, sorry for chiming in here with zero rs6000 knowledge and I might be missing something trivial but...what wouldn't that misfire for "<QHSI:MODE>mode == SImode && ! TARGET_P8_VECTOR && TARGET_P9_VECTOR" ? (Is that invalid or not applicable or don't care or something?) brgds, H-P
On Thu, Feb 08, 2018 at 06:10:31PM -0500, Hans-Peter Nilsson wrote: > On Wed, 7 Feb 2018, Segher Boessenkool wrote: > > Hi Mike, > > > > On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote: > > > Here is the patch reworked. It bootstraps on both little/big endian power8, > > > and all of the tests run. Can I install this into trunk now, and into GCC 7 > > > after a soak period (along with the previous patch)? > > > > > +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR > > > > "If we have"? > > > > > + [(set (match_operand:QHSI 0 "memory_operand" "=Z") > > > + (any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa"))) > > > + (clobber (match_scratch:SI 2 "=wa"))] > > > + "((<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR) > > > + || (<QHSI:MODE>mode != SImode && TARGET_P9_VECTOR))" > > > > This is the same as > > > > "(<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR" > > Umm, sorry for chiming in here with zero rs6000 knowledge and I > might be missing something trivial but...what wouldn't that misfire for > "<QHSI:MODE>mode == SImode && ! TARGET_P8_VECTOR && TARGET_P9_VECTOR" ? > > (Is that invalid or not applicable or don't care or something?) TARGET_P9_VECTOR requires TARGET_P8_VECTOR. Basically when we are converting SF/DFmode to SImode, we want to allow it on ISA 2.07 (-mcpu=power8). If we are converting to SF/DFmode to HI/QImode, we require ISA 3.0 (-mcpu=power9). The reason is that we don't have the instructions to do 32-bit integer store and 32-bit integer sign/zero extended load instructions to all of the vector and floating point registers until ISA 2.07. Because of that, we don't allow SImode in the vector and floating point registers until ISA 2.07. In processors before power8, we had to do a store 64-bit integer on the stack and then load up the 32-bit value into the GPR registers. However, ISA 2.07 does not have instructions to store or load 8/16-bit values that can be conveniently used. ISA 3.0 added 8/16-bit store, 8/16-bit zero extended load, and 8/16-bit sign extend instructions. So in ISA 3.0, we allow QI/HImode to go in vector registers. In ISA 2.06 (-mcpu=power7) we had to use UNSPECs to hide the convert floating point scalar to 32-bit signed/unsigned instruction instructions because we didn't allow the base type into the register.
Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 257269) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -5700,43 +5700,48 @@ (define_insn "*fix_trunc<mode>di2_fctidz xscvdpsxds %x0,%x1" [(set_attr "type" "fp")]) -(define_expand "fix_trunc<SFDF:mode><QHI:mode>2" - [(parallel [(set (match_operand:<QHI:MODE> 0 "nonimmediate_operand") - (fix:QHI (match_operand:SFDF 1 "gpc_reg_operand"))) - (clobber (match_scratch:DI 2))])] - "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT" +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR +;; registers. If we have ISA 2.07, we don't allow QI/HImode values in the +;; vector registers, so we need to do direct moves to the GPRs, but SImode +;; values can go in VSX registers. Keeping the direct move part through +;; register allocation prevents the register allocator from doing a direct of +;; the SImode value to a GPR, and then a store/load. +(define_insn_and_split "fix_trunc<SFDF:mode><QHI:mode>2" + [(set (match_operand:<QHI:MODE> 0 "gpc_reg_operand" "=wJ,wJwK,r") + (fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa"))) + (clobber (match_scratch:SI 2 "=X,X,wi"))] + "TARGET_DIRECT_MOVE" + "@ + fctiwz %0,%1 + xscvdpsxws %x0,%x1 + #" + "&& reload_completed && int_reg_operand (operands[0], <QHI:MODE>mode)" + [(set (match_dup 2) + (fix:SI (match_dup 1))) + (set (match_dup 3) + (match_dup 2))] { - if (MEM_P (operands[0])) - operands[0] = rs6000_address_for_fpconvert (operands[0]); -}) + operands[3] = gen_rtx_REG (SImode, REGNO (operands[0])); +} + [(set_attr "length" "4,4,8") + (set_attr "type" "fp")]) -(define_insn_and_split "*fix_trunc<SFDF:mode><QHI:mode>2_internal" - [(set (match_operand:<QHI:MODE> 0 "reg_or_indexed_operand" "=wIwJ,rZ") - (fix:QHI - (match_operand:SFDF 1 "gpc_reg_operand" "<SFDF:Fv>,<SFDF:Fv>"))) - (clobber (match_scratch:DI 2 "=X,wi"))] - "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT" +;; Keep the convert and store together through register allocation to prevent +;; the register allocator from getting clever and doing a direct move to a GPR +;; and then store for reg+offset stores. +(define_insn_and_split "*fix_trunc<SFDF:mode><QHI:mode>2_mem" + [(set (match_operand:QHI 0 "memory_operand" "=Z") + (fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wa"))) + (clobber (match_scratch:SI 2 "=wa"))] + "TARGET_P9_VECTOR" "#" "&& reload_completed" - [(const_int 0)] + [(set (match_dup 2) + (fix:SI (match_dup 1))) + (set (match_dup 0) + (match_dup 3))] { - rtx dest = operands[0]; - rtx src = operands[1]; - - if (vsx_register_operand (dest, <QHI:MODE>mode)) - { - rtx di_dest = gen_rtx_REG (DImode, REGNO (dest)); - emit_insn (gen_fix_trunc<SFDF:mode>di2 (di_dest, src)); - } - else - { - rtx tmp = operands[2]; - rtx tmp2 = gen_rtx_REG (<QHI:MODE>mode, REGNO (tmp)); - - emit_insn (gen_fix_trunc<SFDF:mode>di2 (tmp, src)); - emit_move_insn (dest, tmp2); - } - DONE; + operands[3] = gen_rtx_REG (<QHI:MODE>mode, REGNO (operands[2])); }) (define_expand "fixuns_trunc<mode>si2" @@ -5803,48 +5808,51 @@ (define_insn "fixuns_trunc<mode>di2" xscvdpuxds %x0,%x1" [(set_attr "type" "fp")]) -(define_expand "fixuns_trunc<SFDF:mode><QHI:mode>2" - [(parallel [(set (match_operand:<QHI:MODE> 0 "nonimmediate_operand") - (unsigned_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand"))) - (clobber (match_scratch:DI 2))])] - "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT" +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR +;; registers. If we have ISA 2.07, we don't allow QI/HImode values in the +;; vector registers, so we need to do direct moves to the GPRs, but SImode +;; values can go in VSX registers. Keeping the direct move part through +;; register allocation prevents the register allocator from doing a direct of +;; the SImode value to a GPR, and then a store/load. +(define_insn_and_split "fixuns_trunc<SFDF:mode><QHI:mode>2" + [(set (match_operand:<QHI:MODE> 0 "gpc_reg_operand" "=wJ,wJwK,r") + (unsigned_fix:QHI + (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa"))) + (clobber (match_scratch:SI 2 "=X,X,wi"))] + "TARGET_DIRECT_MOVE" + "@ + fctiwuz %0,%1 + xscvdpuxws %x0,%x1 + #" + "&& reload_completed && int_reg_operand (operands[0], <QHI:MODE>mode)" + [(set (match_dup 2) + (unsigned_fix:SI (match_dup 1))) + (set (match_dup 3) + (match_dup 2))] { - if (MEM_P (operands[0])) - operands[0] = rs6000_address_for_fpconvert (operands[0]); -}) + operands[3] = gen_rtx_REG (SImode, REGNO (operands[0])); +} + [(set_attr "length" "4,4,8") + (set_attr "type" "fp")]) -(define_insn_and_split "*fixuns_trunc<SFDF:mode><QHI:mode>2_internal" - [(set (match_operand:<QHI:MODE> 0 "reg_or_indexed_operand" "=wIwJ,rZ") - (unsigned_fix:QHI - (match_operand:SFDF 1 "gpc_reg_operand" "<SFDF:Fv>,<SFDF:Fv>"))) - (clobber (match_scratch:DI 2 "=X,wi"))] - "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT" +;; Keep the convert and store together through register allocation to prevent +;; the register allocator from getting clever and doing a direct move to a GPR +;; and then store for reg+offset stores. +(define_insn_and_split "*fixuns_trunc<SFDF:mode><QHI:mode>2_mem" + [(set (match_operand:QHI 0 "memory_operand" "=Z") + (unsigned_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wa"))) + (clobber (match_scratch:SI 2 "=wa"))] + "TARGET_P9_VECTOR" "#" "&& reload_completed" - [(const_int 0)] + [(set (match_dup 2) + (unsigned_fix:SI (match_dup 1))) + (set (match_dup 0) + (match_dup 3))] { - rtx dest = operands[0]; - rtx src = operands[1]; - - if (vsx_register_operand (dest, <QHI:MODE>mode)) - { - rtx di_dest = gen_rtx_REG (DImode, REGNO (dest)); - emit_insn (gen_fixuns_trunc<SFDF:mode>di2 (di_dest, src)); - } - else - { - rtx tmp = operands[2]; - rtx tmp2 = gen_rtx_REG (<QHI:MODE>mode, REGNO (tmp)); - - emit_insn (gen_fixuns_trunc<SFDF:mode>di2 (tmp, src)); - emit_move_insn (dest, tmp2); - } - DONE; + operands[3] = gen_rtx_REG (<QHI:MODE>mode, REGNO (operands[2])); }) -;; If -mvsx-small-integer, we can represent the FIX operation directly. On -;; older machines, we have to use an UNSPEC to produce a SImode and move it -;; to another location, since SImode is not allowed in vector registers. (define_insn "*fctiw<u>z_<mode>_smallint" [(set (match_operand:SI 0 "vsx_register_operand" "=d,wi") (any_fix:SI (match_operand:SFDF 1 "gpc_reg_operand" "<Ff>,<Fv>")))] @@ -14386,6 +14394,15 @@ (define_insn "fix_<mode>si2_hw" [(set_attr "type" "vecfloat") (set_attr "size" "128")]) +(define_insn "fix_trunc<IEEE128:mode><QHI:mode>2" + [(set (match_operand:<QHI:MODE> 0 "altivec_register_operand" "=v") + (fix:QHI + (match_operand:IEEE128 1 "altivec_register_operand" "v")))] + "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)" + "xscvqpswz %0,%1" + [(set_attr "type" "vecfloat") + (set_attr "size" "128")]) + (define_insn "fixuns_<mode>si2_hw" [(set (match_operand:SI 0 "altivec_register_operand" "=v") (unsigned_fix:SI (match_operand:IEEE128 1 "altivec_register_operand" "v")))] @@ -14394,17 +14411,40 @@ (define_insn "fixuns_<mode>si2_hw" [(set_attr "type" "vecfloat") (set_attr "size" "128")]) -;; Combiner pattern to prevent moving the result of converting an IEEE 128-bit -;; floating point value to 32-bit integer to GPR in order to save it. -(define_insn_and_split "*fix<uns>_<mode>_mem" - [(set (match_operand:SI 0 "memory_operand" "=Z") - (any_fix:SI (match_operand:IEEE128 1 "altivec_register_operand" "v"))) - (clobber (match_scratch:SI 2 "=v"))] +(define_insn "fixuns_trunc<IEEE128:mode><QHI:mode>2" + [(set (match_operand:<QHI:MODE> 0 "altivec_register_operand" "=v") + (unsigned_fix:QHI + (match_operand:IEEE128 1 "altivec_register_operand" "v")))] + "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)" + "xscvqpuwz %0,%1" + [(set_attr "type" "vecfloat") + (set_attr "size" "128")]) + +;; Combiner patterns to prevent moving the result of converting an IEEE 128-bit +;; floating point value to 8/16/32-bit integer to GPR in order to save it. +(define_insn_and_split "*fix_trunc<IEEE128:mode><QHSI:mode>2_mem" + [(set (match_operand:QHSI 0 "memory_operand" "=Z") + (fix:QHSI + (match_operand:IEEE128 1 "altivec_register_operand" "v"))) + (clobber (match_scratch:QHSI 2 "=v"))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)" "#" "&& reload_completed" [(set (match_dup 2) - (any_fix:SI (match_dup 1))) + (fix:QHSI (match_dup 1))) + (set (match_dup 0) + (match_dup 2))]) + +(define_insn_and_split "*fixuns_trunc<IEEE128:mode><QHSI:mode>2_mem" + [(set (match_operand:QHSI 0 "memory_operand" "=Z") + (unsigned_fix:QHSI + (match_operand:IEEE128 1 "altivec_register_operand" "v"))) + (clobber (match_scratch:QHSI 2 "=v"))] + "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)" + "#" + "&& reload_completed" + [(set (match_dup 2) + (unsigned_fix:QHSI (match_dup 1))) (set (match_dup 0) (match_dup 2))]) Index: gcc/testsuite/gcc.target/powerpc/pr84154-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr84154-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr84154-1.c (revision 0) @@ -0,0 +1,55 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mpower8-vector -O2" } */ + +/* PR target/84154. Make sure conversion to char/short does not generate a + store and a load on ISA 2.07 and newer systems. */ + +unsigned char +double_to_uc (double x) +{ + return x; +} + +signed char +double_to_sc (double x) +{ + return x; +} + +unsigned short +double_to_us (double x) +{ + return x; +} + +short +double_to_ss (double x) +{ + return x; +} + +unsigned int +double_to_ui (double x) +{ + return x; +} + +int +double_to_si (double x) +{ + return x; +} + +/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mextsh\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mfctiwuz\M|\mxscvdpuxws\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mfctiwz\M|\mxscvdpsxws\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 6 } } */ +/* { dg-final { scan-assembler-times {\mrlwinm\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mlbz\M} } } */ +/* { dg-final { scan-assembler-not {\mlhz\M} } } */ +/* { dg-final { scan-assembler-not {\mlha\M} } } */ +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */ +/* { dg-final { scan-assembler-not {\mstw\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/pr84154-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr84154-2.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr84154-2.c (revision 0) @@ -0,0 +1,58 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O2" } */ + +/* PR target/84154. Make sure on ISA 2.07 (power8) that we store the result of + a conversion to char/short using an offsettable address does not generate + direct moves for storing 32-bit integers, but does do a direct move for + 8/16-bit integers. */ + +void +double_to_uc (double x, unsigned char *p) +{ + p[3] = x; +} + +void +double_to_sc (double x, signed char *p) +{ + p[3] = x; +} + +void +double_to_us (double x, unsigned short *p) +{ + p[3] = x; +} + +void +double_to_ss (double x, short *p) +{ + p[3] = x; +} + +void +double_to_ui (double x, unsigned int *p) +{ + p[3] = x; +} + +void +double_to_si (double x, int *p) +{ + p[3] = x; +} + +/* { dg-final { scan-assembler-times {\mfctiwuz\M|\mxscvdpuxws\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mfctiwz\M|\mxscvdpsxws\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mstfiwx\M|\mstxsiwx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\msth\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mlbz\M} } } */ +/* { dg-final { scan-assembler-not {\mlhz\M} } } */ +/* { dg-final { scan-assembler-not {\mlha\M} } } */ +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */ +/* { dg-final { scan-assembler-not {\mstw\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/pr84154-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr84154-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr84154-3.c (revision 0) @@ -0,0 +1,60 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +/* PR target/84154. Make sure on ISA 3.0 we store the result of a conversion + to char/short using an offsettable address does not generate direct moves + for storing 8/16/32-bit integers. */ + +void +double_to_uc (double x, unsigned char *p) +{ + p[3] = x; +} + +void +double_to_sc (double x, signed char *p) +{ + p[3] = x; +} + +void +double_to_us (double x, unsigned short *p) +{ + p[3] = x; +} + +void +double_to_ss (double x, short *p) +{ + p[3] = x; +} + +void +double_to_ui (double x, unsigned int *p) +{ + p[3] = x; +} + +void +double_to_si (double x, int *p) +{ + p[3] = x; +} + +/* { dg-final { scan-assembler-times {\maddi\M} 6 } } */ +/* { dg-final { scan-assembler-times {\mfctiwuz\M|\mxscvdpuxws\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mfctiwz\M|\mxscvdpsxws\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mstfiwx\M|\mstxsiwx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxsibx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxsihx\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mlbz\M} } } */ +/* { dg-final { scan-assembler-not {\mlhz\M} } } */ +/* { dg-final { scan-assembler-not {\mlha\M} } } */ +/* { dg-final { scan-assembler-not {\mmfvsrwz\M} } } */ +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */ +/* { dg-final { scan-assembler-not {\mstw\M} } } */ +/* { dg-final { scan-assembler-not {\mstb\M} } } */ +/* { dg-final { scan-assembler-not {\msth\M} } } */