Message ID | 20161118203838.GA16377@ibm-tiger.the-meissners.org |
---|---|
State | New |
Headers | show |
On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote: > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$" > constraints instead of "?*". This allows the register allocator to more often > allocate DImode to a floating point/vector register when it is desirable to do > so. It also changes some plain "?" to "^" or "$" or even "*" (which cannot work for multi-character constraints, it skips one character, not one constraint). Wrong version of the patch? > I built bootstrap compilers and did make check with no regressions on: > 1) Little endian power8, --with-cpu=power8 > 2) Big endian power8, --with-cpu=power8 (no 32-bit support) > 3) Big endian power7, --with-cpu=power7 (both 32/64-bit support) Could you also test with reload please? Just LE is enough I guess. We'd like to keep reload working for GCC 7 at least, and these cost prefixes tend to break mov patterns :-/ Segher
On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote: > On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote: > > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$" > > constraints instead of "?*". This allows the register allocator to more often > > allocate DImode to a floating point/vector register when it is desirable to do > > so. > > It also changes some plain "?" to "^" or "$" or even "*" (which cannot > work for multi-character constraints, it skips one character, not one > constraint). Wrong version of the patch? Note, if '*' does not work with multi-character prefixes, that is a bug. All of rs6000.md assumes that ?*wa means that the register allocator will not consider VSX vector registers for when calculating register preferences. > > I built bootstrap compilers and did make check with no regressions on: > > 1) Little endian power8, --with-cpu=power8 > > 2) Big endian power8, --with-cpu=power8 (no 32-bit support) > > 3) Big endian power7, --with-cpu=power7 (both 32/64-bit support) > > Could you also test with reload please? Just LE is enough I guess. > We'd like to keep reload working for GCC 7 at least, and these cost > prefixes tend to break mov patterns :-/ Argh, I guess you are right, but then if reload doesn't work, I will likely submit the patch where there are three different movdi's (one for 32-bit without the change, one for 64-bit with reload, and one for 64-bit with lra). I would prefer not to do that.
On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote: > On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote: > > On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote: > > > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$" > > > constraints instead of "?*". This allows the register allocator to more often > > > allocate DImode to a floating point/vector register when it is desirable to do > > > so. > > > > It also changes some plain "?" to "^" or "$" or even "*" (which cannot > > work for multi-character constraints, it skips one character, not one > > constraint). Wrong version of the patch? > > Note, if '*' does not work with multi-character prefixes, that is a bug. All > of rs6000.md assumes that ?*wa means that the register allocator will not > consider VSX vector registers for when calculating register preferences. The documentation is out of date. From ira-costs.c: /* Scan all the constraint letters. See if the operand matches any of the constraints. Collect the valid register classes and see if this operand accepts memory. */ while ((c = *p)) { switch (c) { case '*': /* Ignore the next letter for this pass. */ c = *++p; break; and then 83 lines later: p += CONSTRAINT_LEN (c, p); if (c == ',') break; } so it does in fact work. Neither the patch description nor the changelog says you are doing these changes though. > > > I built bootstrap compilers and did make check with no regressions on: > > > 1) Little endian power8, --with-cpu=power8 > > > 2) Big endian power8, --with-cpu=power8 (no 32-bit support) > > > 3) Big endian power7, --with-cpu=power7 (both 32/64-bit support) > > > > Could you also test with reload please? Just LE is enough I guess. > > We'd like to keep reload working for GCC 7 at least, and these cost > > prefixes tend to break mov patterns :-/ > > Argh, I guess you are right, but then if reload doesn't work, I will likely > submit the patch where there are three different movdi's (one for 32-bit > without the change, one for 64-bit with reload, and one for 64-bit with lra). > I would prefer not to do that. Let's hope it just works :-) Segher
On Fri, Nov 18, 2016 at 05:07:21PM -0600, Segher Boessenkool wrote: > On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote: > > On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote: > > > Could you also test with reload please? Just LE is enough I guess. > > > We'd like to keep reload working for GCC 7 at least, and these cost > > > prefixes tend to break mov patterns :-/ > > > > Argh, I guess you are right, but then if reload doesn't work, I will likely > > submit the patch where there are three different movdi's (one for 32-bit > > without the change, one for 64-bit with reload, and one for 64-bit with lra). > > I would prefer not to do that. > > Let's hope it just works :-) I did test it over the weekend. 29 of the 30 spec 2006 benchmarks currently build with reload (gamess fails). The same 29 build and run with the new patch. Like the patch under LRA, there are no regressions in performance, and one FP benchmark is faster. Under LRA, sphinx3 is 2.5% faster (compared to LRA without the patch). Under reload, sphinx3 is roughly the same performance, but calculix is 3.8% faster. Can I apply the patch?
On Mon, Nov 21, 2016 at 01:27:59PM -0500, Michael Meissner wrote: > On Fri, Nov 18, 2016 at 05:07:21PM -0600, Segher Boessenkool wrote: > > On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote: > > > On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote: > > > > Could you also test with reload please? Just LE is enough I guess. > > > > We'd like to keep reload working for GCC 7 at least, and these cost > > > > prefixes tend to break mov patterns :-/ > > > > > > Argh, I guess you are right, but then if reload doesn't work, I will likely > > > submit the patch where there are three different movdi's (one for 32-bit > > > without the change, one for 64-bit with reload, and one for 64-bit with lra). > > > I would prefer not to do that. > > > > Let's hope it just works :-) > > I did test it over the weekend. > > 29 of the 30 spec 2006 benchmarks currently build with reload (gamess fails). > The same 29 build and run with the new patch. Like the patch under LRA, there > are no regressions in performance, and one FP benchmark is faster. > > Under LRA, sphinx3 is 2.5% faster (compared to LRA without the patch). > > Under reload, sphinx3 is roughly the same performance, but calculix is 3.8% > faster. Great, thanks for testing. > Can I apply the patch? Okay, if you change the changelog to say what the patch actually does ;-) And please watch for fallout. Segher
On Mon, Nov 21, 2016 at 12:51:38PM -0600, Segher Boessenkool wrote: > > Okay, if you change the changelog to say what the patch actually does ;-) > And please watch for fallout. This is the ChangeLog entry I checked in. 2016-11-21 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/rs6000.md (movdi_internal32): Change constraints so that DImode can be allocated to FP/vector registers in more cases, and we can avoid direct move operations. If the register needs reloading, prefer GPRs over FP/vector registers. In the case of FPR vs. Altivec registers, prefer FPR registers unless we have the ISA 3.0 reg+offset scalar instructions. (movdi_internal64): Likewise.
Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 242556) +++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy) @@ -8118,10 +8118,10 @@ (define_insn "p8_mfvsrd_4_disf" (define_insn "*movdi_internal32" [(set (match_operand:DI 0 "rs6000_nonimmediate_operand" - "=Y, r, r, ?m, ?*d, ?*d, - r, ?wY, ?Z, ?*wb, ?*wv, ?wi, - ?wo, ?wo, ?wv, ?wi, ?wi, ?wv, - ?wv") + "=Y, r, r, ^m, ^d, ^d, + r, ^wY, $Z, ^wb, $wv, ^wi, + *wo, *wo, *wv, *wi, *wi, *wv, + *wv") (match_operand:DI 1 "input_operand" "r, Y, r, d, m, d, @@ -8195,9 +8195,9 @@ (define_split (define_insn "*movdi_internal64" [(set (match_operand:DI 0 "nonimmediate_operand" "=Y, r, r, r, r, r, - ?m, ?*d, ?*d, ?wY, ?Z, ?*wb, - ?*wv, ?wi, ?wo, ?wo, ?wv, ?wi, - ?wi, ?wv, ?wv, r, *h, *h, + ^m, ^d, ^d, ^Y, $Z, $wb, + $wv, ^wi, *wo, *wo, *wv, *wi, + *wi, *wv, *wv, r, *h, *h, ?*r, ?*wg, ?*r, ?*wj") (match_operand:DI 1 "input_operand" Index: gcc/testsuite/gcc.target/powerpc/ppc-round2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ppc-round2.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 242556) +++ gcc/testsuite/gcc.target/powerpc/ppc-round2.c (.../gcc/testsuite/gcc.target/powerpc) (working copy) @@ -5,8 +5,8 @@ /* { dg-options "-O2 -mcpu=power8" } */ /* { dg-final { scan-assembler-times "fcfid " 2 } } */ /* { dg-final { scan-assembler-times "fcfids " 2 } } */ -/* { dg-final { scan-assembler-times "fctiwuz " 2 } } */ -/* { dg-final { scan-assembler-times "fctiwz " 2 } } */ +/* { dg-final { scan-assembler-times "fctiwuz \|xscvdpuxws " 2 } } */ +/* { dg-final { scan-assembler-times "fctiwz \|xscvdpsxws " 2 } } */ /* { dg-final { scan-assembler-times "mfvsrd " 4 } } */ /* { dg-final { scan-assembler-times "mtvsrwa " 2 } } */ /* { dg-final { scan-assembler-times "mtvsrwz " 2 } } */