Message ID | 524DDB71.6040703@redhat.com |
---|---|
State | New |
Headers | show |
On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: > The following patch permits today trunk to use LRA for ppc by default. > To switch it off -mno-lra can be used. > > The patch was bootstrapped on ppc64. GCC testsuite does not have > regressions too (in comparison with reload). The change in rs6000.md is > for fix LRA failure on a recently added ppc test. Vlad, I have not forgotten this patch. We are trying to figure out the right timeframe to make this change. The patch does affect performance -- both positively and negatively; most are in the noise but not all. And there still are some SPEC benchmarks that fail to build with the patch, at least in Mike's tests. And Mike is implementing some patches to utilize reload to improve use of VSX registers, which would need to be mirrored in LRA for the equivalent functionality. Thanks, David
On 13-10-18 11:26 AM, David Edelsohn wrote: > On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: >> The following patch permits today trunk to use LRA for ppc by default. >> To switch it off -mno-lra can be used. >> >> The patch was bootstrapped on ppc64. GCC testsuite does not have >> regressions too (in comparison with reload). The change in rs6000.md is >> for fix LRA failure on a recently added ppc test. > Vlad, > > I have not forgotten this patch. We are trying to figure out the right > timeframe to make this change. The patch does affect performance -- > both positively and negatively; most are in the noise but not all. And > there still are some SPEC benchmarks that fail to build with the > patch, at least in Mike's tests. And Mike is implementing some patches > to utilize reload to improve use of VSX registers, which would need to > be mirrored in LRA for the equivalent functionality. Thanks for informing me, David. I am ready to work on any LRA ppc issues when it will be in the trunk. It would be easier for me to work on LRA ppc if the patch is committed to the trunk and of course LRA is used as non-default local RA. I don't know what Mike is doing on reload to use VSX registers. I guess it is usage of VSX regs as spilled locations for GENERAL regs instead of memory. If it is so, it is 2 day work to add this functionality in LRA (as it already has analogous functionality for Intel processors and that gave a nice SPECFP2000 improvement for them) and probably more work on resolving issues especially as I have no power8.
On Sun, Oct 20, 2013 at 10:48:08PM -0400, Vladimir Makarov wrote: > On 13-10-18 11:26 AM, David Edelsohn wrote: > >On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: > >>The following patch permits today trunk to use LRA for ppc by default. > >>To switch it off -mno-lra can be used. > >> > >>The patch was bootstrapped on ppc64. GCC testsuite does not have > >>regressions too (in comparison with reload). The change in rs6000.md is > >>for fix LRA failure on a recently added ppc test. > >Vlad, > > > >I have not forgotten this patch. We are trying to figure out the right > >timeframe to make this change. The patch does affect performance -- > >both positively and negatively; most are in the noise but not all. And > >there still are some SPEC benchmarks that fail to build with the > >patch, at least in Mike's tests. And Mike is implementing some patches > >to utilize reload to improve use of VSX registers, which would need to > >be mirrored in LRA for the equivalent functionality. > Thanks for informing me, David. > > I am ready to work on any LRA ppc issues when it will be in the > trunk. It would be easier for me to work on LRA ppc if the patch is > committed to the trunk and of course LRA is used as non-default > local RA. > > I don't know what Mike is doing on reload to use VSX registers. I > guess it is usage of VSX regs as spilled locations for GENERAL regs > instead of memory. If it is so, it is 2 day work to add this > functionality in LRA (as it already has analogous functionality for > Intel processors and that gave a nice SPECFP2000 improvement for > them) and probably more work on resolving issues especially as I > have no power8. I would say lets add -mlra, but make the default OFF for the time being. We can always switch the default later. Vladimir, I thought I included you in the list when I gave status. The big thing is several of the Spec 2006 benchmarks don't work in 32-bit mode, and I get a lot of Fortran errors, again in 32-bit. I also saw some decimal floating point problems. What I'm doing is adding secondary reload support so that up until reload time, we can represent VSX addresses as reg+offset, and in secondary reload, create the addition instructions to put the offset in a base register. I haven't made any changes to the machine independent portions of the compiler. As long as IRA uses the secondary reload interface, it should be ok. However, right now, I need to focus most of my attention on getting the secondary reload support to work. One thing that I've asked for before, but to remind you, is I really, really wish secondary reload could allocate two scratch registers if it is given an insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since that gives 2 registers, but future changes will want TFmode to go into a single vector register, and I will need to create another type, like V4DI that does take 2 registers. The case that this is needed for is moving an item from GPRs to VSX registers that takes 2 GPR registers, such as moving 128-bit items in 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the move into, and then I will do the combine operation.
On Mon, Oct 21, 2013 at 5:51 PM, Michael Meissner wrote: > What I'm doing is adding secondary reload support so that up until reload time, > we can represent VSX addresses as reg+offset, and in secondary reload, create > the addition instructions to put the offset in a base register. I haven't made > any changes to the machine independent portions of the compiler. As long as > IRA uses the secondary reload interface, it should be ok. However, right now, > I need to focus most of my attention on getting the secondary reload support to > work. > > One thing that I've asked for before, but to remind you, is I really, really > wish secondary reload could allocate two scratch registers if it is given an > insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since > that gives 2 registers, but future changes will want TFmode to go into a single > vector register, and I will need to create another type, like V4DI that does > take 2 registers. The case that this is needed for is moving an item from GPRs > to VSX registers that takes 2 GPR registers, such as moving 128-bit items in > 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the > move into, and then I will do the combine operation. Eh, perhaps I'm missing something, but... Isn't one of the great advantages of LRA over reload, that LRA allows you to create new pseudos so that you shouldn't ever need secondary reloads?? Ciao! Steven
On Mon, Oct 21, 2013 at 08:18:22PM +0200, Steven Bosscher wrote: > On Mon, Oct 21, 2013 at 5:51 PM, Michael Meissner wrote: > > What I'm doing is adding secondary reload support so that up until reload time, > > we can represent VSX addresses as reg+offset, and in secondary reload, create > > the addition instructions to put the offset in a base register. I haven't made > > any changes to the machine independent portions of the compiler. As long as > > IRA uses the secondary reload interface, it should be ok. However, right now, > > I need to focus most of my attention on getting the secondary reload support to > > work. > > > > One thing that I've asked for before, but to remind you, is I really, really > > wish secondary reload could allocate two scratch registers if it is given an > > insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since > > that gives 2 registers, but future changes will want TFmode to go into a single > > vector register, and I will need to create another type, like V4DI that does > > take 2 registers. The case that this is needed for is moving an item from GPRs > > to VSX registers that takes 2 GPR registers, such as moving 128-bit items in > > 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the > > move into, and then I will do the combine operation. > > > Eh, perhaps I'm missing something, but... > > Isn't one of the great advantages of LRA over reload, that LRA allows > you to create new pseudos so that you shouldn't ever need secondary > reloads?? You still need secondary reload. For example, on the powerpc, you have 5 addressing modes for GPRs and FPRS: 1) base register 2) base register + index register 3) base register + offset 4) auto-update for base register + index register 5) auto-update for base register + offset register Now for VSX registers you only have: 1) base register 2) base register + index register So, in the work I'm doing right now, I want to allow reg + offset addressing, but if the register being loaded is an Altivec register (high part of the VSX registers), I need to create a secondary reload to load the offset into and then convert the address to indirect or indexed addressing. You don't want to always disallow offset based addressing, but you want to create the secondary reload when you need to. Similarly, vector types can do indexed addressing (reg+reg) but if you are loading or storing the value into GPR registers, you can't do reg+reg addressing on multi-word items. Finally, one of the features of ISA 2.07 is the notion of load fusion, where the hardware will fuse together a load immediate with an adjacent load (for GPRs, you need the load immediate shifted to be the register that is being loaded, for VSX registers, you just need the instructions adjacent). In this case, before reload we will want to pretend that the machine has addressing to include the fusion forms, and in secondary reload, you will generate the combined insn that will become the fusion instruction.
On 13-10-21 11:51 AM, Michael Meissner wrote: > On Sun, Oct 20, 2013 at 10:48:08PM -0400, Vladimir Makarov wrote: >> On 13-10-18 11:26 AM, David Edelsohn wrote: >>> On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: >>>> The following patch permits today trunk to use LRA for ppc by default. >>>> To switch it off -mno-lra can be used. >>>> >>>> The patch was bootstrapped on ppc64. GCC testsuite does not have >>>> regressions too (in comparison with reload). The change in rs6000.md is >>>> for fix LRA failure on a recently added ppc test. >>> Vlad, >>> >>> I have not forgotten this patch. We are trying to figure out the right >>> timeframe to make this change. The patch does affect performance -- >>> both positively and negatively; most are in the noise but not all. And >>> there still are some SPEC benchmarks that fail to build with the >>> patch, at least in Mike's tests. And Mike is implementing some patches >>> to utilize reload to improve use of VSX registers, which would need to >>> be mirrored in LRA for the equivalent functionality. >> Thanks for informing me, David. >> >> I am ready to work on any LRA ppc issues when it will be in the >> trunk. It would be easier for me to work on LRA ppc if the patch is >> committed to the trunk and of course LRA is used as non-default >> local RA. >> >> I don't know what Mike is doing on reload to use VSX registers. I >> guess it is usage of VSX regs as spilled locations for GENERAL regs >> instead of memory. If it is so, it is 2 day work to add this >> functionality in LRA (as it already has analogous functionality for >> Intel processors and that gave a nice SPECFP2000 improvement for >> them) and probably more work on resolving issues especially as I >> have no power8. > I would say lets add -mlra, but make the default OFF for the time being. We > can always switch the default later. Sure, if you know some LRA problems it should not be on default. Moreover, if we still have the problems when releasing gcc4.9, I think we should exclude any possibility for a user to use LRA for ppc. I don't want to have GGC-4.9 users blaming LRA. But adding LRA to PPC on the trunk (switched OFF by default) earlier could help me a lot to work on the issues. > Vladimir, I thought I included you in the list when I gave status. The big > thing is several of the Spec 2006 benchmarks don't work in 32-bit mode, and I > get a lot of Fortran errors, again in 32-bit. I also saw some decimal floating > point problems. No, I did not see the message (or may be missed). I need to check. > What I'm doing is adding secondary reload support so that up until reload time, > we can represent VSX addresses as reg+offset, and in secondary reload, create > the addition instructions to put the offset in a base register. I haven't made > any changes to the machine independent portions of the compiler. As long as > IRA uses the secondary reload interface, it should be ok. However, right now, > I need to focus most of my attention on getting the secondary reload support to > work. I completely understand. You are quite busy this time as me rushing some stuff into gcc-4.9. > One thing that I've asked for before, but to remind you, is I really, really > wish secondary reload could allocate two scratch registers if it is given an > insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since > that gives 2 registers, but future changes will want TFmode to go into a single > vector register, and I will need to create another type, like V4DI that does > take 2 registers. The case that this is needed for is moving an item from GPRs > to VSX registers that takes 2 GPR registers, such as moving 128-bit items in > 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the > move into, and then I will do the combine operation. > Ok. I guess LRA can be adapted to some new secondary_reload hook returning two scratch registers.
On 13-10-21 2:55 PM, Michael Meissner wrote: > On Mon, Oct 21, 2013 at 08:18:22PM +0200, Steven Bosscher wrote: >> On Mon, Oct 21, 2013 at 5:51 PM, Michael Meissner wrote: >>> What I'm doing is adding secondary reload support so that up until reload time, >>> we can represent VSX addresses as reg+offset, and in secondary reload, create >>> the addition instructions to put the offset in a base register. I haven't made >>> any changes to the machine independent portions of the compiler. As long as >>> IRA uses the secondary reload interface, it should be ok. However, right now, >>> I need to focus most of my attention on getting the secondary reload support to >>> work. >>> >>> One thing that I've asked for before, but to remind you, is I really, really >>> wish secondary reload could allocate two scratch registers if it is given an >>> insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since >>> that gives 2 registers, but future changes will want TFmode to go into a single >>> vector register, and I will need to create another type, like V4DI that does >>> take 2 registers. The case that this is needed for is moving an item from GPRs >>> to VSX registers that takes 2 GPR registers, such as moving 128-bit items in >>> 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the >>> move into, and then I will do the combine operation. >> >> Eh, perhaps I'm missing something, but... >> >> Isn't one of the great advantages of LRA over reload, that LRA allows >> you to create new pseudos so that you shouldn't ever need secondary >> reloads?? > You still need secondary reload. As I understand, Mike is telling about secondary_reload hook. LRA can generate chain of reloads as long as it is needed. It is achieved by subsequent processing generated reload insns on one or more lra-constraints subpasses. Porting LRA frequently consists of removing secondary reload hook as in many cases it is smart enough to find necessary reloads just from insns constraints. But there are still really complicated situations when LRA can not do this and it still needs directions from secondary_reload hook. I am sure PPC has real needs to use this hook even for LRA. > For example, on the powerpc, you have 5 addressing modes for GPRs and FPRS: > > 1) base register > 2) base register + index register > 3) base register + offset > 4) auto-update for base register + index register > 5) auto-update for base register + offset register > > Now for VSX registers you only have: > > 1) base register > 2) base register + index register > > So, in the work I'm doing right now, I want to allow reg + offset addressing, > but if the register being loaded is an Altivec register (high part of the VSX > registers), I need to create a secondary reload to load the offset into and > then convert the address to indirect or indexed addressing. You don't want to > always disallow offset based addressing, but you want to create the secondary > reload when you need to. > > Similarly, vector types can do indexed addressing (reg+reg) but if you are > loading or storing the value into GPR registers, you can't do reg+reg > addressing on multi-word items. > > Finally, one of the features of ISA 2.07 is the notion of load fusion, where > the hardware will fuse together a load immediate with an adjacent load (for > GPRs, you need the load immediate shifted to be the register that is being > loaded, for VSX registers, you just need the instructions adjacent). In this > case, before reload we will want to pretend that the machine has addressing to > include the fusion forms, and in secondary reload, you will generate the > combined insn that will become the fusion instruction. >
On Mon, Oct 21, 2013 at 10:42 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: >> I would say lets add -mlra, but make the default OFF for the time being. >> We >> can always switch the default later. > > Sure, if you know some LRA problems it should not be on default. Moreover, > if we still have the problems when releasing gcc4.9, I think we should > exclude any possibility for a user to use LRA for ppc. I don't want to have > GGC-4.9 users blaming LRA. > > But adding LRA to PPC on the trunk (switched OFF by default) earlier could > help me a lot to work on the issues. My main concern was disrupting Mike. If Mike is comfortable with adding LRA disabled by default, it is okay with me. The patch mostly adds lra_in_progress, which will not have any effect while LRA remains disabled. My one question about the patch is: - [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r") + [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r") which may cause register preferencing problems for bswap when LRA is not used. The rest of the patch is okay. Thanks, David
On Tue, Oct 22, 2013 at 10:21:32AM -0400, David Edelsohn wrote: > On Mon, Oct 21, 2013 at 10:42 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: > > >> I would say lets add -mlra, but make the default OFF for the time being. > >> We > >> can always switch the default later. > > > > Sure, if you know some LRA problems it should not be on default. Moreover, > > if we still have the problems when releasing gcc4.9, I think we should > > exclude any possibility for a user to use LRA for ppc. I don't want to have > > GGC-4.9 users blaming LRA. > > > > But adding LRA to PPC on the trunk (switched OFF by default) earlier could > > help me a lot to work on the issues. > > My main concern was disrupting Mike. If Mike is comfortable with > adding LRA disabled by default, it is okay with me. > > The patch mostly adds lra_in_progress, which will not have any effect > while LRA remains disabled. > > My one question about the patch is: > > - [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r") > + [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r") > > which may cause register preferencing problems for bswap when LRA is not used. > > The rest of the patch is okay. > > Thanks, David Yeah, I can see a whole round of tuning issues, and everywhere reload_in_progress is used, add lra_in_progress. Because of the Advance Toolchain, RHEL, and SLES, we will need to still deal with the original register allocator. Vlad, this is part of a message I had sent David, and I thought you were on the CC list about LRA. I haven't looked in detail what the changes are at this point. I did do some builds and comparisons. It looks like there are definately problems with 32-bit fortran and decimal floating point (and likely long double using IBM's double double format). If somebody has some cycles, it may be useful digging into why we get these failures. Note, I have some sort of configuration problem in running dealII, so it isn't run right now: Spec 2006, 64-bit, 3 runs, picking the middle, power7 options: Benchmark Type Percent 400.perlbench int 96.74% 401.bzip2 int 100.09% 403.gcc int 99.94% 429.mcf int 99.21% 445.gobmk int 99.33% 456.hmmer int 98.34% 458.sjeng int 99.68% 462.libquantum int 101.48% 464.h264ref int 101.40% 471.omnetpp int 100.28% 473.astar int 100.09% 483.xalancbmk int 98.28% 410.bwaves fp 98.11% 416.gamess fp 101.31% 433.milc fp 99.43% 434.zeusmp fp 103.53% 435.gromacs fp 109.63% 436.cactusADM fp 99.53% 437.leslie3d fp 101.23% 444.namd fp 103.42% 447.dealII fp ------ 450.soplex fp 99.14% 453.povray fp 99.66% 454.calculix fp 97.17% 459.GemsFDTD fp 100.88% 465.tonto fp 101.18% 470.lbm fp 99.83% 481.wrf fp 93.38% 482.sphinx3 fp 100.82% Spec INT int 99.57% Spec FP except 447.dealII fp 100.43% Perlbench, calculix, and wrf are slower. Zeusmp, gromacs, and Namd are faster. Unfortunately, the profiling tools on my system seem to abort when I run 32-bit benchmarks, so I haven't gotten the numbers recently (nor had time to get the tools team to look at it). In terms of building 32-bit, 3 benchmarks don't build with LRA: gamess, dealII (note in 64-bit dealII builds, it just doesn't run correctly), and wrf. Lets see. In gamess, I see: /home/meissner/fsf-install-ppc64/gcc-4_9-lra/bin/gfortran -c -o ormas1.fppized.o -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -mlra -m32 ormas1.fppized.f ormas1.fppized.f: In function 'maktabs': ormas1.fppized.f:2281:0: internal compiler error: in check_rtl, at lra.c:2036 END ^ 0x105a08ef check_rtl /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2036 0x105a2bcb lra(_IO_FILE*) /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2432 0x10552933 do_reload /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4686 0x10552933 rest_of_handle_reload /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4815 0x10552933 execute /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4844 Please submit a full bug report, In dealII we see: /home/meissner/fsf-install-ppc64/gcc-4_9-lra/bin/g++ -c -o sparse_matrix_ez.float.o -DSPEC_CPU -DNDEBUG -Iinclude -DBOOST_DISABLE_THREADS -Ddeal_II_dimension=3 -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=po wer7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -mlra -m32 -DSPEC_CPU_LINUX -include cstddef sparse_matrix_ez.float.cc quadrature_lib.cc: In constructor 'QGauss<dim>::QGauss(unsigned int) [with int dim = 1]': quadrature_lib.cc:95:1: internal compiler error: in check_rtl, at lra.c:2036 } ^ 0x1073272f check_rtl /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2036 0x10734a0b lra(_IO_FILE*) /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2432 0x106e4773 do_reload /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4686 0x106e4773 rest_of_handle_reload /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4815 0x106e4773 execute /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4844 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. specmake: *** [quadrature_lib.o] Error 1 specmake: *** Waiting for unfinished jobs.... polynomial.cc: In member function 'Polynomials::Polynomial<number> Polynomials::Polynomial<number>::derivative() const [with number = long double]': polynomial.cc:282:3: internal compiler error: in check_rtl, at lra.c:2036 } ^ 0x1073272f check_rtl /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2036 0x10734a0b lra(_IO_FILE*) /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2432 0x106e4773 do_reload /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4686 0x106e4773 rest_of_handle_reload /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4815 0x106e4773 execute /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4844 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. specmake: *** [polynomial.o] Error 1 In wrf, we get: /home/meissner/fsf-install-ppc64/gcc-4_9-lra/bin/gfortran -c -o module_radiation_driver.fppized.o -I. -I./netcdf/include -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll -loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -mlra -m32 module_radiation_driver.fppized.f90 module_diffusion_em.fppized.f90: In function 'cal_deform_and_div': module_diffusion_em.fppized.f90:829:0: internal compiler error: in check_rtl, at lra.c:2036 END SUBROUTINE cal_deform_and_div ^ 0x105a08ef check_rtl /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2036 0x105a2bcb lra(_IO_FILE*) /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2432 0x10552933 do_reload /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4686 0x10552933 rest_of_handle_reload /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4815 0x10552933 execute /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4844 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. specmake: *** [module_diffusion_em.fppized.o] Error 1 specmake: *** Waiting for unfinished jobs.... Error with make 'specmake -j40 build': check file '/home/meissner/spec-build/spec-2006-base-dev49-power7-vsx-svn203459-lra-shared-at7.0-32bit/benchspec/CPU2006/481.wrf/build/build_base_dev49-power7-vsx-32bit.0000/make.err' Command returned exit code 2 Error with make! *** Error building 481.wrf I checked the LRA changes into a branch, and it is based off of subversion id 203569. svn+ssh://gcc.gnu.org/svn/gcc/branches/ibm/gcc-4_9-lra Lets see, in terms of make check regressions: Unexpected tests for gcc -m64: Test | gcc-4 #1 | trunk #2 =============================================== | ======== | ======== gcc.target/powerpc/p8vector-ldst.c | fail | --- gcc.target/powerpc/pr57744.c | fail | --- Unexpected tests for gcc -m32: Test | gcc-4 #1 | trunk #2 =============================================== | ======== | ======== c-c++-common/dfp/cast.c | fail | --- c-c++-common/dfp/convert-bfp-10.c | fail | --- c-c++-common/dfp/convert-bfp-11.c | fail | --- c-c++-common/dfp/convert-bfp-2.c | fail | --- c-c++-common/dfp/convert-bfp-3.c | fail | --- c-c++-common/dfp/convert-bfp-4.c | fail | --- c-c++-common/dfp/convert-bfp-5.c | fail | --- c-c++-common/dfp/convert-bfp-6.c | fail | --- c-c++-common/dfp/convert-bfp-7.c | fail | --- c-c++-common/dfp/convert-bfp.c | fail | --- c-c++-common/dfp/inf-1.c | fail | --- gcc.target/powerpc/bswap64-4.c | fail | --- gcc.target/powerpc/p8vector-ldst.c | fail | --- gcc.target/powerpc/pr53199.c | fail | --- Unexpected tests for g++ -m32: Test | gcc-4 #1 | trunk #2 ================================= | ======== | ======== c-c++-common/dfp/cast.c | fail | --- c-c++-common/dfp/convert-bfp-10.c | fail | --- c-c++-common/dfp/convert-bfp-11.c | fail | --- c-c++-common/dfp/convert-bfp-2.c | fail | --- c-c++-common/dfp/convert-bfp-3.c | fail | --- c-c++-common/dfp/convert-bfp-4.c | fail | --- c-c++-common/dfp/convert-bfp-5.c | fail | --- c-c++-common/dfp/convert-bfp-6.c | fail | --- c-c++-common/dfp/convert-bfp-7.c | fail | --- c-c++-common/dfp/convert-bfp.c | fail | --- c-c++-common/dfp/inf-1.c | fail | --- Unexpected tests for gfortran -m32: Test | gcc-4 #1 | trunk #2 =========================================================== | ======== | ======== gfortran.dg/PR19872.f | fail | --- gfortran.dg/advance_1.f90 | fail | --- gfortran.dg/advance_4.f90 | fail | --- gfortran.dg/advance_5.f90 | fail | --- gfortran.dg/advance_6.f90 | fail | --- gfortran.dg/append_1.f90 | fail | --- gfortran.dg/associated_2.f90 | fail | --- gfortran.dg/assumed_rank_1.f90 | fail | --- gfortran.dg/assumed_rank_2.f90 | fail | --- gfortran.dg/assumed_rank_7.f90 | fail | --- gfortran.dg/assumed_type_2.f90 | fail | --- gfortran.dg/backspace_10.f90 | fail | --- gfortran.dg/backspace_2.f | fail | --- gfortran.dg/backspace_8.f | fail | --- gfortran.dg/backspace_9.f | fail | --- gfortran.dg/bound_2.f90 | fail | --- gfortran.dg/bound_7.f90 | fail | --- gfortran.dg/char_cshift_1.f90 | fail | --- gfortran.dg/char_cshift_2.f90 | fail | --- gfortran.dg/char_cshift_3.f90 | fail | --- gfortran.dg/char_eoshift_1.f90 | fail | --- gfortran.dg/char_eoshift_2.f90 | fail | --- gfortran.dg/char_eoshift_3.f90 | fail | --- gfortran.dg/char_eoshift_4.f90 | fail | --- gfortran.dg/char_eoshift_5.f90 | fail | --- gfortran.dg/char_length_8.f90 | fail | --- gfortran.dg/chmod_1.f90 | fail | --- gfortran.dg/chmod_2.f90 | fail | --- gfortran.dg/chmod_3.f90 | fail | --- gfortran.dg/comma.f | fail | --- gfortran.dg/convert_2.f90 | fail | --- gfortran.dg/convert_implied_open.f90 | fail | --- gfortran.dg/cr_lf.f90 | fail | --- gfortran.dg/cshift_bounds_1.f90 | fail | --- gfortran.dg/cshift_bounds_2.f90 | fail | --- gfortran.dg/cshift_bounds_3.f90 | fail | --- gfortran.dg/cshift_bounds_4.f90 | fail | --- gfortran.dg/cshift_nan_1.f90 | fail | --- gfortran.dg/defined_assignment_9.f90 | fail | --- gfortran.dg/dev_null.F90 | fail | --- gfortran.dg/direct_io_1.f90 | fail | --- gfortran.dg/direct_io_11.f90 | fail | --- gfortran.dg/direct_io_12.f90 | fail | --- gfortran.dg/direct_io_2.f90 | fail | --- gfortran.dg/direct_io_3.f90 | fail | --- gfortran.dg/direct_io_5.f90 | fail | --- gfortran.dg/direct_io_8.f90 | fail | --- gfortran.dg/endfile.f90 | fail | --- gfortran.dg/endfile_2.f90 | fail | --- gfortran.dg/eof_4.f90 | fail | --- gfortran.dg/eoshift.f90 | fail | --- gfortran.dg/eoshift_bounds_1.f90 | fail | --- gfortran.dg/error_format.f90 | fail | --- gfortran.dg/f2003_inquire_1.f03 | fail | --- gfortran.dg/f2003_io_1.f03 | fail | --- gfortran.dg/f2003_io_5.f03 | fail | --- gfortran.dg/f2003_io_7.f03 | fail | --- gfortran.dg/fmt_cache_1.f | fail | --- gfortran.dg/fmt_error_4.f90 | fail | --- gfortran.dg/fmt_error_5.f90 | fail | --- gfortran.dg/fmt_t_5.f90 | fail | --- gfortran.dg/fmt_t_7.f | fail | --- gfortran.dg/ftell_3.f90 | fail | --- gfortran.dg/hollerith4.f90 | fail | --- gfortran.dg/inquire_10.f90 | fail | --- gfortran.dg/inquire_13.f90 | fail | --- gfortran.dg/inquire_15.f90 | fail | --- gfortran.dg/inquire_9.f90 | fail | --- gfortran.dg/inquire_size.f90 | fail | --- gfortran.dg/iomsg_1.f90 | fail | --- gfortran.dg/iostat_2.f90 | fail | --- gfortran.dg/list_read_10.f90 | fail | --- gfortran.dg/list_read_11.f90 | fail | --- gfortran.dg/list_read_6.f90 | fail | --- gfortran.dg/list_read_7.f90 | fail | --- gfortran.dg/list_read_9.f90 | fail | --- gfortran.dg/matmul_1.f90 | fail | --- gfortran.dg/matmul_5.f90 | fail | --- gfortran.dg/maxloc_bounds_1.f90 | fail | --- gfortran.dg/maxloc_bounds_2.f90 | fail | --- gfortran.dg/maxloc_bounds_3.f90 | fail | --- gfortran.dg/maxloc_bounds_6.f90 | fail | --- gfortran.dg/maxloc_bounds_8.f90 | fail | --- gfortran.dg/namelist_44.f90 | fail | --- gfortran.dg/namelist_45.f90 | fail | --- gfortran.dg/namelist_46.f90 | fail | --- gfortran.dg/namelist_66.f90 | fail | --- gfortran.dg/namelist_72.f | fail | --- gfortran.dg/namelist_82.f90 | fail | --- gfortran.dg/negative_automatic_size.f90 | fail | --- gfortran.dg/negative_unit.f | fail | --- gfortran.dg/negative_unit_int8.f | fail | --- gfortran.dg/newunit_1.f90 | fail | --- gfortran.dg/newunit_3.f90 | fail | --- gfortran.dg/open_access_append_1.f90 | fail | --- gfortran.dg/open_errors.f90 | fail | --- gfortran.dg/open_negative_unit_1.f90 | fail | --- gfortran.dg/open_new.f90 | fail | --- gfortran.dg/open_readonly_1.f90 | fail | --- gfortran.dg/open_status_1.f90 | fail | --- gfortran.dg/open_status_2.f90 | fail | --- gfortran.dg/open_status_3.f90 | fail | --- gfortran.dg/optional_dim_2.f90 | fail | --- gfortran.dg/optional_dim_3.f90 | fail | --- gfortran.dg/overwrite_1.f | fail | --- gfortran.dg/pointer_assign_8.f90 | fail | --- gfortran.dg/pr16597.f90 | fail | --- gfortran.dg/pr16935.f90 | fail | --- gfortran.dg/pr20954.f | fail | --- gfortran.dg/pr39865.f90 | fail | --- gfortran.dg/pr46804.f90 | fail | --- gfortran.dg/pr47878.f90 | fail | --- gfortran.dg/read_comma.f | fail | --- gfortran.dg/read_eof_4.f90 | fail | --- gfortran.dg/read_eof_8.f90 | fail | --- gfortran.dg/read_eof_all.f90 | fail | --- gfortran.dg/read_list_eof_1.f90 | fail | --- gfortran.dg/read_many_1.f | fail | --- gfortran.dg/read_no_eor.f90 | fail | --- gfortran.dg/readwrite_unf_direct_eor_1.f90 | fail | --- gfortran.dg/realloc_on_assign_11.f90 | fail | --- gfortran.dg/realloc_on_assign_7.f03 | fail | --- gfortran.dg/record_marker_1.f90 | fail | --- gfortran.dg/record_marker_3.f90 | fail | --- gfortran.dg/runtime_warning_1.f90 | fail | --- gfortran.dg/selected_char_kind_1.f90 | fail | --- gfortran.dg/selected_char_kind_4.f90 | fail | --- gfortran.dg/shift-alloc.f90 | fail | --- gfortran.dg/shift-kind_2.f90 | fail | --- gfortran.dg/stat_1.f90 | fail | --- gfortran.dg/stat_2.f90 | fail | --- gfortran.dg/streamio_1.f90 | fail | --- gfortran.dg/streamio_10.f90 | fail | --- gfortran.dg/streamio_12.f90 | fail | --- gfortran.dg/streamio_14.f90 | fail | --- gfortran.dg/streamio_15.f90 | fail | --- gfortran.dg/streamio_16.f90 | fail | --- gfortran.dg/streamio_2.f90 | fail | --- gfortran.dg/streamio_3.f90 | fail | --- gfortran.dg/streamio_4.f90 | fail | --- gfortran.dg/streamio_5.f90 | fail | --- gfortran.dg/streamio_6.f90 | fail | --- gfortran.dg/streamio_7.f90 | fail | --- gfortran.dg/streamio_8.f90 | fail | --- gfortran.dg/streamio_9.f90 | fail | --- gfortran.dg/tl_editing.f90 | fail | --- gfortran.dg/unf_io_convert_1.f90 | fail | --- gfortran.dg/unf_io_convert_2.f90 | fail | --- gfortran.dg/unf_io_convert_3.f90 | fail | --- gfortran.dg/unf_io_convert_4.f90 | fail | --- gfortran.dg/unf_read_corrupted_1.f90 | fail | --- gfortran.dg/unf_short_record_1.f90 | fail | --- gfortran.dg/unformatted_subrecord_1.f90 | fail | --- gfortran.dg/unpack_bounds_1.f90 | fail | --- gfortran.dg/unpack_bounds_2.f90 | fail | --- gfortran.dg/unpack_bounds_3.f90 | fail | --- gfortran.dg/widechar_intrinsics_10.f90 | fail | --- gfortran.dg/widechar_intrinsics_5.f90 | fail | --- gfortran.dg/write_back.f | fail | --- gfortran.dg/write_check.f90 | fail | --- gfortran.dg/write_check3.f90 | fail | --- gfortran.dg/write_direct_eor.f90 | fail | --- gfortran.dg/write_rewind_1.f | fail | --- gfortran.dg/write_rewind_2.f | fail | --- gfortran.dg/write_to_null.F90 | fail | --- gfortran.dg/x_slash_2.f | fail | --- gfortran.dg/zero_sized_1.f90 | fail | --- gfortran.fortran-torture/execute/backspace.f90 | fail | --- gfortran.fortran-torture/execute/direct_io.f90 | fail | --- gfortran.fortran-torture/execute/inquire_1.f90 | fail | --- gfortran.fortran-torture/execute/inquire_2.f90 | fail | --- gfortran.fortran-torture/execute/inquire_3.f90 | fail | --- gfortran.fortran-torture/execute/inquire_4.f90 | fail | --- gfortran.fortran-torture/execute/inquire_5.f90 | fail | --- gfortran.fortran-torture/execute/intrinsic_associated.f90 | fail | --- gfortran.fortran-torture/execute/intrinsic_associated_2.f90 | fail | --- gfortran.fortran-torture/execute/intrinsic_cshift.f90 | fail | --- gfortran.fortran-torture/execute/intrinsic_eoshift.f90 | fail | --- gfortran.fortran-torture/execute/intrinsic_size.f90 | fail | --- gfortran.fortran-torture/execute/list_read_1.f90 | fail | --- gfortran.fortran-torture/execute/open_replace.f90 | fail | --- gfortran.fortran-torture/execute/seq_io.f90 | fail | --- gfortran.fortran-torture/execute/slash_edit.f90 | fail | --- gfortran.fortran-torture/execute/unopened_unit_1.f90 | fail | ---
On 13-10-22 10:21 AM, David Edelsohn wrote: > On Mon, Oct 21, 2013 at 10:42 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: > >>> I would say lets add -mlra, but make the default OFF for the time being. >>> We >>> can always switch the default later. >> Sure, if you know some LRA problems it should not be on default. Moreover, >> if we still have the problems when releasing gcc4.9, I think we should >> exclude any possibility for a user to use LRA for ppc. I don't want to have >> GGC-4.9 users blaming LRA. >> >> But adding LRA to PPC on the trunk (switched OFF by default) earlier could >> help me a lot to work on the issues. > My main concern was disrupting Mike. If Mike is comfortable with > adding LRA disabled by default, it is okay with me. > > The patch mostly adds lra_in_progress, which will not have any effect > while LRA remains disabled. > > My one question about the patch is: > > - [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r") > + [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r") > > which may cause register preferencing problems for bswap when LRA is not used. > > The rest of the patch is okay. > > Thanks, David. I'll commit the patch this week without this change (and making LRA active only when -mlra is given). The change was for fixing a testsuite failure for a bad code generation. It can be fixed in other way not affecting reload by adding a modified copy of insn definition active only when LRA is used and making the original definition active only when reload is used. But I'll do it later.
2013-10-03 Vladimir Makarov <vmakarov@redhat.com> * config/rs6000/rs6000-protos.h (rs6000_secondary_memory_needed_mode): New prototype. * config/rs6000/rs6000.c: Include ira.h. (TARGET_LRA_P): Redefine. (rs6000_legitimate_offset_address_p): Call legitimate_constant_pool_address_p in strict mode for LRA. (rs6000_legitimate_address_p): Ditto. (legitimate_lo_sum_address_p): Add code for LRA. Use lra_in_progress. (rs6000_emit_move): Add LRA version of code to generate load/store of SDmode values. (rs6000_secondary_memory_needed_mode): New. (rs6000_alloc_sdmode_stack_slot): Do nothing for LRA. (rs6000_secondary_reload_class): Return NO_REGS for LRA for constants, memory, and FP registers. (rs6000_lra_p): New. * config/rs6000/rs6000.h (SECONDARY_MEMORY_NEEDED_MODE): New macro. * config/rs6000/rs6000.md (*bswapdi2_64bit): Remove ?? from 3rd alternative. * config/rs6000/rs6000.opt (mlra): New option. Index: config/rs6000/rs6000-protos.h =================================================================== --- config/rs6000/rs6000-protos.h (revision 203164) +++ config/rs6000/rs6000-protos.h (working copy) @@ -124,6 +124,8 @@ extern rtx create_TOC_reference (rtx, rt extern void rs6000_split_multireg_move (rtx, rtx); extern void rs6000_emit_move (rtx, rtx, enum machine_mode); extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode); +extern enum machine_mode rs6000_secondary_memory_needed_mode (enum + machine_mode); extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode, int, int, int, int *); extern bool rs6000_legitimate_offset_address_p (enum machine_mode, rtx, Index: config/rs6000/rs6000.c =================================================================== --- config/rs6000/rs6000.c (revision 203164) +++ config/rs6000/rs6000.c (working copy) @@ -56,6 +56,7 @@ #include "intl.h" #include "params.h" #include "tm-constrs.h" +#include "ira.h" #include "opts.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -1493,6 +1494,9 @@ static const struct attribute_spec rs600 #undef TARGET_MODE_DEPENDENT_ADDRESS_P #define TARGET_MODE_DEPENDENT_ADDRESS_P rs6000_mode_dependent_address_p +#undef TARGET_LRA_P +#define TARGET_LRA_P rs6000_lra_p + #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE rs6000_can_eliminate @@ -6030,7 +6034,7 @@ rs6000_legitimate_offset_address_p (enum return false; if (!reg_offset_addressing_ok_p (mode)) return virtual_stack_registers_memory_p (x); - if (legitimate_constant_pool_address_p (x, mode, strict)) + if (legitimate_constant_pool_address_p (x, mode, strict || lra_in_progress)) return true; if (GET_CODE (XEXP (x, 1)) != CONST_INT) return false; @@ -6170,19 +6174,31 @@ legitimate_lo_sum_address_p (enum machin if (TARGET_ELF || TARGET_MACHO) { + bool large_toc_ok; + if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic) return false; - if (TARGET_TOC) + /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls + push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS + recognizes some LO_SUM addresses as valid although this + function says opposite. In most cases, LRA through different + transformations can generate correct code for address reloads. + It can not manage only some LO_SUM cases. So we need to add + code analogous to one in rs6000_legitimize_reload_address for + LOW_SUM here saying that some addresses are still valid. */ + large_toc_ok = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL + && small_toc_ref (x, VOIDmode)); + if (TARGET_TOC && ! large_toc_ok) return false; if (GET_MODE_NUNITS (mode) != 1) return false; - if (GET_MODE_SIZE (mode) > UNITS_PER_WORD + if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD && !(/* ??? Assume floating point reg based on mode? */ TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT && (mode == DFmode || mode == DDmode))) return false; - return CONSTANT_P (x); + return CONSTANT_P (x) || large_toc_ok; } return false; @@ -7180,7 +7196,8 @@ rs6000_legitimate_address_p (enum machin if (reg_offset_p && legitimate_small_data_p (mode, x)) return 1; if (reg_offset_p - && legitimate_constant_pool_address_p (x, mode, reg_ok_strict)) + && legitimate_constant_pool_address_p (x, mode, + reg_ok_strict || lra_in_progress)) return 1; /* For TImode, if we have load/store quad, only allow register indirect addresses. This will allow the values to go in either GPRs or VSX @@ -7479,6 +7496,7 @@ rs6000_conditional_register_usage (void) fixed_regs[i] = call_used_regs[i] = call_really_used_regs[i] = 1; } } + /* Try to output insns to set TARGET equal to the constant C if it can be done in less than N insns. Do all computations in MODE. @@ -7783,6 +7801,68 @@ rs6000_emit_move (rtx dest, rtx source, cfun->machine->sdmode_stack_slot = eliminate_regs (cfun->machine->sdmode_stack_slot, VOIDmode, NULL_RTX); + + if (lra_in_progress + && mode == SDmode + && REG_P (operands[0]) && REGNO (operands[0]) >= FIRST_PSEUDO_REGISTER + && reg_preferred_class (REGNO (operands[0])) == NO_REGS + && (REG_P (operands[1]) + || (GET_CODE (operands[1]) == SUBREG + && REG_P (SUBREG_REG (operands[1]))))) + { + int regno = REGNO (GET_CODE (operands[1]) == SUBREG + ? SUBREG_REG (operands[1]) : operands[1]); + enum reg_class cl; + + if (regno >= FIRST_PSEUDO_REGISTER) + { + cl = reg_preferred_class (regno); + gcc_assert (cl != NO_REGS); + regno = ira_class_hard_regs[cl][0]; + } + if (FP_REGNO_P (regno)) + { + if (GET_MODE (operands[0]) != DDmode) + operands[0] = gen_rtx_SUBREG (DDmode, operands[0], 0); + emit_insn (gen_movsd_store (operands[0], operands[1])); + } + else if (INT_REGNO_P (regno)) + emit_insn (gen_movsd_hardfloat (operands[0], operands[1])); + else + gcc_unreachable(); + return; + } + if (lra_in_progress + && mode == SDmode + && (REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + && REG_P (SUBREG_REG (operands[0])))) + && REG_P (operands[1]) && REGNO (operands[1]) >= FIRST_PSEUDO_REGISTER + && reg_preferred_class (REGNO (operands[1])) == NO_REGS) + { + int regno = REGNO (GET_CODE (operands[0]) == SUBREG + ? SUBREG_REG (operands[0]) : operands[0]); + enum reg_class cl; + + if (regno >= FIRST_PSEUDO_REGISTER) + { + cl = reg_preferred_class (regno); + gcc_assert (cl != NO_REGS); + regno = ira_class_hard_regs[cl][0]; + } + if (FP_REGNO_P (regno)) + { + if (GET_MODE (operands[1]) != DDmode) + operands[1] = gen_rtx_SUBREG (DDmode, operands[1], 0); + emit_insn (gen_movsd_load (operands[0], operands[1])); + } + else if (INT_REGNO_P (regno)) + emit_insn (gen_movsd_hardfloat (operands[0], operands[1])); + else + gcc_unreachable(); + return; + } + if (reload_in_progress && mode == SDmode && cfun->machine->sdmode_stack_slot != NULL_RTX @@ -14630,6 +14710,17 @@ rs6000_secondary_memory_needed_rtx (enum return ret; } +/* Return the mode to be used for memory when a secondary memory + location is needed. For SDmode values we need to use DDmode, in + all other cases we can use the same mode. */ +enum machine_mode +rs6000_secondary_memory_needed_mode (enum machine_mode mode) +{ + if (mode == SDmode) + return DDmode; + return mode; +} + static tree rs6000_check_sdmode (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) { @@ -15523,6 +15614,10 @@ rs6000_alloc_sdmode_stack_slot (void) gimple_stmt_iterator gsi; gcc_assert (cfun->machine->sdmode_stack_slot == NULL_RTX); + /* We use a different approach for dealing with the secondary + memory in LRA. */ + if (ira_use_lra_p) + return; if (TARGET_NO_SDMODE_STACK) return; @@ -15744,7 +15839,7 @@ rs6000_secondary_reload_class (enum reg_ /* Constants, memory, and FP registers can go into FP registers. */ if ((regno == -1 || FP_REGNO_P (regno)) && (rclass == FLOAT_REGS || rclass == NON_SPECIAL_REGS)) - return (mode != SDmode) ? NO_REGS : GENERAL_REGS; + return (mode != SDmode || lra_in_progress) ? NO_REGS : GENERAL_REGS; /* Memory, and FP/altivec registers can go into fp/altivec registers under VSX. However, for scalar variables, use the traditional floating point @@ -28936,6 +29031,13 @@ rs6000_libcall_value (enum machine_mode } +/* Return true if we use LRA instead of reload pass. */ +static bool +rs6000_lra_p (void) +{ + return rs6000_lra_flag; +} + /* Given FROM and TO register numbers, say whether this elimination is allowed. Frame pointer elimination is automatically handled. Index: config/rs6000/rs6000.h =================================================================== --- config/rs6000/rs6000.h (revision 203164) +++ config/rs6000/rs6000.h (working copy) @@ -1491,6 +1491,13 @@ extern enum reg_class rs6000_constraints #define SECONDARY_MEMORY_NEEDED_RTX(MODE) \ rs6000_secondary_memory_needed_rtx (MODE) +/* Specify the mode to be used for memory when a secondary memory + location is needed. For cpus that cannot load/store SDmode values + from the 64-bit FP registers without using a full 64-bit + load/store, we need a wider mode. */ +#define SECONDARY_MEMORY_NEEDED_MODE(MODE) \ + rs6000_secondary_memory_needed_mode (MODE) + /* Return the maximum number of consecutive registers needed to represent mode MODE in a register of class CLASS. Index: config/rs6000/rs6000.md =================================================================== --- config/rs6000/rs6000.md (revision 203164) +++ config/rs6000/rs6000.md (working copy) @@ -2391,7 +2391,7 @@ ;; Non-power7/cell, fall back to use lwbrx/stwbrx (define_insn "*bswapdi2_64bit" - [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r") + [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r") (bswap:DI (match_operand:DI 1 "reg_or_mem_operand" "Z,r,r"))) (clobber (match_scratch:DI 2 "=&b,&b,&r")) (clobber (match_scratch:DI 3 "=&r,&r,&r")) Index: config/rs6000/rs6000.opt =================================================================== --- config/rs6000/rs6000.opt (revision 203164) +++ config/rs6000/rs6000.opt (working copy) @@ -446,6 +446,10 @@ mlong-double- Target RejectNegative Joined UInteger Var(rs6000_long_double_type_size) Save -mlong-double-<n> Specify size of long double (64 or 128 bits) +mlra +Target Report Var(rs6000_lra_flag) Init(1) Save +Use LRA instead of reload + msched-costly-dep= Target RejectNegative Joined Var(rs6000_sched_costly_dep_str) Determine which dependences between insns are considered costly