Message ID | 1572025151-22783-2-git-send-email-craig.blackmore@embecosm.com |
---|---|
State | New |
Headers | show |
Series | RISC-V: Allow more load/stores to be compressed | expand |
On 10/25/19 11:39 AM, Craig Blackmore wrote: > This patch aims to allow more load/store instructions to be compressed by > replacing a load/store of 'base register + large offset' with a new load/store > of 'new base + small offset'. If the new base gets stored in a compressed > register, then the new load/store can be compressed. Since there is an overhead > in creating the new base, this change is only attempted when 'base register' is > referenced in at least 4 load/stores in a basic block. > > The optimization is implemented in a new RISC-V specific pass called > shorten_memrefs which is enabled for RVC targets. It has been developed for the > 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future. > > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac, > rv64imafdc via QEMU. No regressions. > > gcc/ChangeLog: > > * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv. > * config/riscv/riscv-passes.def: New file. > * config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare. > * config/riscv/riscv-shorten-memrefs.c: New file. > * config/riscv/riscv.c (tree-pass.h): New include. > (riscv_compressed_reg_p): New Function > (riscv_compressed_lw_offset_p): Likewise. > (riscv_compressed_lw_address_p): Likewise. > (riscv_shorten_lw_offset): Likewise. > (riscv_legitimize_address): Attempt to convert base + large_offset > to compressible new_base + small_offset. > (riscv_address_cost): Make anticipated compressed load/stores > cheaper for code size than uncompressed load/stores. > (riscv_register_priority): Move compressed register check to > riscv_compressed_reg_p. > * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define. > * config/riscv/riscv.opt (mshorten-memefs): New option. > * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule. > (PASSES_EXTRA): Add riscv-passes.def. > * doc/invoke.texi: Document -mshorten-memrefs. This has traditionally been done via the the legitimize_address hook. Is there some reason that hook is insufficient for this case? The hook, IIRC, is called out explow.c. Jeff
On Sat, 2019-10-26 at 12:21 -0600, Jeff Law wrote: > On 10/25/19 11:39 AM, Craig Blackmore wrote: > > This patch aims to allow more load/store instructions to be > > compressed by > > replacing a load/store of 'base register + large offset' with a new > > load/store > > of 'new base + small offset'. If the new base gets stored in a > > compressed > > register, then the new load/store can be compressed. Since there is > > an overhead > > in creating the new base, this change is only attempted when 'base > > register' is > > referenced in at least 4 load/stores in a basic block. > > > > The optimization is implemented in a new RISC-V specific pass > > called > > shorten_memrefs which is enabled for RVC targets. It has been > > developed for the > > 32-bit lw/sw instructions but could also be extended to 64-bit > > ld/sd in future. > > > > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, > > rv64imac, > > rv64imafdc via QEMU. No regressions. > > > > gcc/ChangeLog: > > > > * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for > > riscv. > > * config/riscv/riscv-passes.def: New file. > > * config/riscv/riscv-protos.h (make_pass_shorten_memrefs): > > Declare. > > * config/riscv/riscv-shorten-memrefs.c: New file. > > * config/riscv/riscv.c (tree-pass.h): New include. > > (riscv_compressed_reg_p): New Function > > (riscv_compressed_lw_offset_p): Likewise. > > (riscv_compressed_lw_address_p): Likewise. > > (riscv_shorten_lw_offset): Likewise. > > (riscv_legitimize_address): Attempt to convert base + > > large_offset > > to compressible new_base + small_offset. > > (riscv_address_cost): Make anticipated compressed load/stores > > cheaper for code size than uncompressed load/stores. > > (riscv_register_priority): Move compressed register check to > > riscv_compressed_reg_p. > > * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): > > Define. > > * config/riscv/riscv.opt (mshorten-memefs): New option. > > * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule. > > (PASSES_EXTRA): Add riscv-passes.def. > > * doc/invoke.texi: Document -mshorten-memrefs. > > This has traditionally been done via the the legitimize_address hook. > Is there some reason that hook is insufficient for this case? > > The hook, IIRC, is called out explow.c. > This sounds like some of my addressing mode selection (AMS) attempts on SH. Haven't looked at the patch (sorry), but I'm sure the problem is pretty much the same. On SH legitimize_address is used to do ... "something" ... to the address in order to make the displacement fit. The issue is, legitimize_address doesn't get any context so it can't even try to find a local optimal base address or something like that. Cheers, Oleg
I don't know enough to say whether the legitimize_address hook is sufficient for the intended purpose, but I am sure that RISC-V's concerns are not unique: other GCC targets have to cope with offset-size constraints that are coupled to register-allocation constraints. On Sat, Oct 26, 2019 at 11:21 AM Jeff Law <law@redhat.com> wrote: > > On 10/25/19 11:39 AM, Craig Blackmore wrote: > > This patch aims to allow more load/store instructions to be compressed by > > replacing a load/store of 'base register + large offset' with a new load/store > > of 'new base + small offset'. If the new base gets stored in a compressed > > register, then the new load/store can be compressed. Since there is an overhead > > in creating the new base, this change is only attempted when 'base register' is > > referenced in at least 4 load/stores in a basic block. > > > > The optimization is implemented in a new RISC-V specific pass called > > shorten_memrefs which is enabled for RVC targets. It has been developed for the > > 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future. > > > > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac, > > rv64imafdc via QEMU. No regressions. > > > > gcc/ChangeLog: > > > > * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv. > > * config/riscv/riscv-passes.def: New file. > > * config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare. > > * config/riscv/riscv-shorten-memrefs.c: New file. > > * config/riscv/riscv.c (tree-pass.h): New include. > > (riscv_compressed_reg_p): New Function > > (riscv_compressed_lw_offset_p): Likewise. > > (riscv_compressed_lw_address_p): Likewise. > > (riscv_shorten_lw_offset): Likewise. > > (riscv_legitimize_address): Attempt to convert base + large_offset > > to compressible new_base + small_offset. > > (riscv_address_cost): Make anticipated compressed load/stores > > cheaper for code size than uncompressed load/stores. > > (riscv_register_priority): Move compressed register check to > > riscv_compressed_reg_p. > > * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define. > > * config/riscv/riscv.opt (mshorten-memefs): New option. > > * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule. > > (PASSES_EXTRA): Add riscv-passes.def. > > * doc/invoke.texi: Document -mshorten-memrefs. > This has traditionally been done via the the legitimize_address hook. > Is there some reason that hook is insufficient for this case? > > The hook, IIRC, is called out explow.c. > > Jeff >
On 10/26/19 1:10 PM, Oleg Endo wrote: > On Sat, 2019-10-26 at 12:21 -0600, Jeff Law wrote: >> On 10/25/19 11:39 AM, Craig Blackmore wrote: >>> This patch aims to allow more load/store instructions to be >>> compressed by >>> replacing a load/store of 'base register + large offset' with a new >>> load/store >>> of 'new base + small offset'. If the new base gets stored in a >>> compressed >>> register, then the new load/store can be compressed. Since there is >>> an overhead >>> in creating the new base, this change is only attempted when 'base >>> register' is >>> referenced in at least 4 load/stores in a basic block. >>> >>> The optimization is implemented in a new RISC-V specific pass >>> called >>> shorten_memrefs which is enabled for RVC targets. It has been >>> developed for the >>> 32-bit lw/sw instructions but could also be extended to 64-bit >>> ld/sd in future. >>> >>> Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, >>> rv64imac, >>> rv64imafdc via QEMU. No regressions. >>> >>> gcc/ChangeLog: >>> >>> * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for >>> riscv. >>> * config/riscv/riscv-passes.def: New file. >>> * config/riscv/riscv-protos.h (make_pass_shorten_memrefs): >>> Declare. >>> * config/riscv/riscv-shorten-memrefs.c: New file. >>> * config/riscv/riscv.c (tree-pass.h): New include. >>> (riscv_compressed_reg_p): New Function >>> (riscv_compressed_lw_offset_p): Likewise. >>> (riscv_compressed_lw_address_p): Likewise. >>> (riscv_shorten_lw_offset): Likewise. >>> (riscv_legitimize_address): Attempt to convert base + >>> large_offset >>> to compressible new_base + small_offset. >>> (riscv_address_cost): Make anticipated compressed load/stores >>> cheaper for code size than uncompressed load/stores. >>> (riscv_register_priority): Move compressed register check to >>> riscv_compressed_reg_p. >>> * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): >>> Define. >>> * config/riscv/riscv.opt (mshorten-memefs): New option. >>> * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule. >>> (PASSES_EXTRA): Add riscv-passes.def. >>> * doc/invoke.texi: Document -mshorten-memrefs. >> >> This has traditionally been done via the the legitimize_address hook. >> Is there some reason that hook is insufficient for this case? >> >> The hook, IIRC, is called out explow.c. >> > > This sounds like some of my addressing mode selection (AMS) attempts on > SH. Haven't looked at the patch (sorry), but I'm sure the problem is > pretty much the same. > > On SH legitimize_address is used to do ... "something" ... to the > address in order to make the displacement fit. The issue is, > legitimize_address doesn't get any context so it can't even try to find > a local optimal base address or something like that. True it's not given any context. I think most ports try to form the address in such a way that the base+offset' is likely to be a common subexpression. THere was a similar hook for reload which I didn't mention as I wasn't sure it was used i the LRA world. SH is likely than most more complex. Joern tried really hard to get good code on the SH and as a result the implementation is sometimes tough to follow. Jeff
On 10/26/19 1:33 PM, Andrew Waterman wrote: > I don't know enough to say whether the legitimize_address hook is > sufficient for the intended purpose, but I am sure that RISC-V's > concerns are not unique: other GCC targets have to cope with > offset-size constraints that are coupled to register-allocation > constraints. Yup. I think every risc port in the 90s faces this problem. I always wished for a generic mechanism for ports to handle this problem. Regardless, it's probably worth investigating. jeff
On Sat, 2019-10-26 at 14:04 -0600, Jeff Law wrote: > On 10/26/19 1:33 PM, Andrew Waterman wrote: > > I don't know enough to say whether the legitimize_address hook is > > sufficient for the intended purpose, but I am sure that RISC-V's > > concerns are not unique: other GCC targets have to cope with > > offset-size constraints that are coupled to register-allocation > > constraints. > > Yup. I think every risc port in the 90s faces this problem. I > always > wished for a generic mechanism for ports to handle this problem. > > Regardless, it's probably worth investigating. > What we tried to do with the address mode selection (AMS) optimization some time ago was the following: - Extract memory accesses from the insns stream and put them in "access sequences". Also analyze the address expression and try to find effective base addresses by tracing back address calculations. - For each memory access, get a set of address mode alternatives and the corresponding costs from the backend. The full context of each access is provided, so the backend can detect things like "in this access sequence, FP loads dominate" and use this information to tune the alternative costs. - Given the alternatives and costs for each memory access, the pass would then try to minimize the costs of the whole memory access sequence, taking costs of address modification isnns into account. I think this is quite generic, but of course also complex. The optimization problem itself is hard. There was some research done by others using CPLEX or PBQP solvers. To keep things simple we used a backtracking algorithm and handled only a limited set of scenarios. For example, I think we could not get loop constructs work nicely to improve post-inc address mode utilization. The SH ISA has very similar properties to ARM thumb and RVC, and perhaps others. Advantages would not be limited to RISC only, even CISC ISAs like M68K, RX, ... can benefit from it, as the "proper optimization" can reduce the instruction sizes by shortening the addresses in the instruction stream. If anyone is interested, here is the AMS optimization pass class: https://github.com/erikvarga/gcc/blob/master/gcc/ams.h https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc It's using a different style to callback into the backend code. Not GCC's "hooks" but a delegate pattern. SH backend's delegate implementation is here https://github.com/erikvarga/gcc/blob/master/gcc/config/sh/sh.c#L11897 We were getting some improvements in the generated code, but it started putting pressure on register allocation related issues in the SH backend (the R0 problem), so we could not do more proper testing. Maybe somebody can get some ideas out of it. Cheers, Oleg
On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore <craig.blackmore@embecosm.com> wrote: > This patch aims to allow more load/store instructions to be compressed by > replacing a load/store of 'base register + large offset' with a new load/store > of 'new base + small offset'. If the new base gets stored in a compressed > register, then the new load/store can be compressed. Since there is an overhead > in creating the new base, this change is only attempted when 'base register' is > referenced in at least 4 load/stores in a basic block. > > The optimization is implemented in a new RISC-V specific pass called > shorten_memrefs which is enabled for RVC targets. It has been developed for the > 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future. The fact that this needs 4 load/stores in a block with the same base address means it won't trigger very often. But it does seem to work. I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a. There might be other codes that benefit more. I'm concerned about the number of RISC-V specific optimization passes people are writing. I've seen at least 3 so far. This one is at least small and simple enough to understand that it doesn't look like it will cause maintenance problems. The config.gcc change conflicts with the save-restore optimization pass that Andrew Burgess added, but that is a trivial fix. The code only works for 32-bit load/stores. rv64 has compressed 64-bit load/stores, and the F and D extensions have float and double compressed loads and stores. The patch would be more useful if it handled all of these. The patch doesn't check the other operand, it only looks at the memory operand. This results in some cases where the code rewrites instructions that can never be compressed. For instance, given void store1z (int *array) { array[200] = 0; array[201] = 0; array[202] = 0; array[203] = 0; } the patch increases code size by 4 bytes because it rewrites the stores, but we don't have compressed store 0, and x0 is not a compressed reg, so there is no point in rewriting the stores. I can also create examples that show a size increase with loads but it is a little trickier. extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int a3, int a4, int *array) { int a = 0; a += array[200]; a += array[201]; a += array[202]; a += array[203]; return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4 directly through from args to the call, leaving only a5-a7 for the load base address and dest, and only one of those regs is a compressed reg, but we need two, so these loads can't be compressed. The code still gets rewritten, resulting in a size increase for the extra add. Not sure if you can do anything about that though, since you are doing this before register allocation. It isn't clear that the change riscv_address_cost is for. That should be commented. I'd suggest parens in the riscv_compressed_lw_offset_p return statement, that makes it easier to align the code correctly (in emacs at least). The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the "ISA constants" section of riscv.h, and maybe renamed similar to the other constants. There is a REG_P check in get_si_mem_reg. That should probably handle SUBREGs too. A testcase to verify the patch would be nice. I have one I wrote for testing that shows some tests that work, some tests that don't work, and some that work but maybe shouldn't. I vaguely remember Micheal Meissner talking about doing something similar for PPC in a GNU Cauldron maybe 2 years ago. But I don't know if he tried, or how far he got if he did try. It might be useful to ask him about that work. Otherwise this looks OK to me. Jim
FYI the testcase I'm using to test the patch. Some of the functions get smaller, some of them get bigger, and some don't change in size but should when compiled for an rv64 target. Jim
A tests case for this patch has been written and pushed to WD github repository at the following link: https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset the test case itself has been written based on a real product scenario. We got a saving of about 10% in code size of the test itself. Best Regards, Nidal Faour Staff Engineer, R&D Engineering – Firmware & Toolchain, CTO Group Western Digital® Migdal Tefen 24959, P.O Box 3 Email: nidal.faour@wdc.com Office: +972-4-9078756 Mobile: +972-50-8867756 >-----Original Message----- >From: Jim Wilson <jimw@sifive.com> >Sent: Thursday, 31 October 2019 1:57 >To: Craig Blackmore <craig.blackmore@embecosm.com> >Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Ofer Shinaar ><Ofer.Shinaar@wdc.com>; Nidal Faour <Nidal.Faour@wdc.com>; Kito Cheng ><kito.cheng@gmail.com>; Jeff Law <law@redhat.com> >Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass > >CAUTION: This email originated from outside of Western Digital. Do not click >on links or open attachments unless you recognize the sender and know that >the content is safe. > > >On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore ><craig.blackmore@embecosm.com> wrote: >> This patch aims to allow more load/store instructions to be compressed >> by replacing a load/store of 'base register + large offset' with a new >> load/store of 'new base + small offset'. If the new base gets stored >> in a compressed register, then the new load/store can be compressed. >> Since there is an overhead in creating the new base, this change is >> only attempted when 'base register' is referenced in at least 4 load/stores in >a basic block. >> >> The optimization is implemented in a new RISC-V specific pass called >> shorten_memrefs which is enabled for RVC targets. It has been >> developed for the 32-bit lw/sw instructions but could also be extended to >64-bit ld/sd in future. > >The fact that this needs 4 load/stores in a block with the same base address >means it won't trigger very often. But it does seem to work. >I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a. There >might be other codes that benefit more. > >I'm concerned about the number of RISC-V specific optimization passes >people are writing. I've seen at least 3 so far. This one is at least small and >simple enough to understand that it doesn't look like it will cause maintenance >problems. > >The config.gcc change conflicts with the save-restore optimization pass that >Andrew Burgess added, but that is a trivial fix. > >The code only works for 32-bit load/stores. rv64 has compressed 64-bit >load/stores, and the F and D extensions have float and double compressed >loads and stores. The patch would be more useful if it handled all of these. > >The patch doesn't check the other operand, it only looks at the memory >operand. This results in some cases where the code rewrites instructions that >can never be compressed. For instance, given void store1z (int *array) { > array[200] = 0; > array[201] = 0; > array[202] = 0; > array[203] = 0; >} >the patch increases code size by 4 bytes because it rewrites the stores, but we >don't have compressed store 0, and x0 is not a compressed reg, so there is no >point in rewriting the stores. I can also create examples that show a size >increase with loads but it is a little trickier. >extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int >a3, int a4, int *array) { > int a = 0; > a += array[200]; > a += array[201]; > a += array[202]; > a += array[203]; > return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4 >directly through from args to the call, leaving only a5-a7 for the load base >address and dest, and only one of those regs is a compressed reg, but we >need two, so these loads can't be compressed. The code still gets rewritten, >resulting in a size increase for the extra add. Not sure if you can do anything >about that though, since you are doing this before register allocation. > >It isn't clear that the change riscv_address_cost is for. That should be >commented. > >I'd suggest parens in the riscv_compressed_lw_offset_p return statement, >that makes it easier to align the code correctly (in emacs at least). > >The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the >"ISA constants" section of riscv.h, and maybe renamed similar to the other >constants. > >There is a REG_P check in get_si_mem_reg. That should probably handle >SUBREGs too. > >A testcase to verify the patch would be nice. I have one I wrote for testing >that shows some tests that work, some tests that don't work, and some that >work but maybe shouldn't. > >I vaguely remember Micheal Meissner talking about doing something similar >for PPC in a GNU Cauldron maybe 2 years ago. But I don't know if he tried, or >how far he got if he did try. It might be useful to ask him about that work. > >Otherwise this looks OK to me. > >Jim
Nevertheless, as Jim observed, it's a great burden on the RISC-V backend maintainers to support all these passes. Are you saying WD is willing to put its money where its mouth is and will provide active support for these passes? On Thu, Oct 31, 2019 at 2:42 AM Nidal Faour <Nidal.Faour@wdc.com> wrote: > > A tests case for this patch has been written and pushed to WD github repository at the following link: > https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset > > the test case itself has been written based on a real product scenario. > We got a saving of about 10% in code size of the test itself. > > Best Regards, > > Nidal Faour > Staff Engineer, R&D Engineering – Firmware & Toolchain, CTO Group > > Western Digital® > Migdal Tefen 24959, P.O Box 3 > Email: nidal.faour@wdc.com > Office: +972-4-9078756 > Mobile: +972-50-8867756 > > >-----Original Message----- > >From: Jim Wilson <jimw@sifive.com> > >Sent: Thursday, 31 October 2019 1:57 > >To: Craig Blackmore <craig.blackmore@embecosm.com> > >Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Ofer Shinaar > ><Ofer.Shinaar@wdc.com>; Nidal Faour <Nidal.Faour@wdc.com>; Kito Cheng > ><kito.cheng@gmail.com>; Jeff Law <law@redhat.com> > >Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass > > > >CAUTION: This email originated from outside of Western Digital. Do not click > >on links or open attachments unless you recognize the sender and know that > >the content is safe. > > > > > >On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore > ><craig.blackmore@embecosm.com> wrote: > >> This patch aims to allow more load/store instructions to be compressed > >> by replacing a load/store of 'base register + large offset' with a new > >> load/store of 'new base + small offset'. If the new base gets stored > >> in a compressed register, then the new load/store can be compressed. > >> Since there is an overhead in creating the new base, this change is > >> only attempted when 'base register' is referenced in at least 4 load/stores in > >a basic block. > >> > >> The optimization is implemented in a new RISC-V specific pass called > >> shorten_memrefs which is enabled for RVC targets. It has been > >> developed for the 32-bit lw/sw instructions but could also be extended to > >64-bit ld/sd in future. > > > >The fact that this needs 4 load/stores in a block with the same base address > >means it won't trigger very often. But it does seem to work. > >I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a. There > >might be other codes that benefit more. > > > >I'm concerned about the number of RISC-V specific optimization passes > >people are writing. I've seen at least 3 so far. This one is at least small and > >simple enough to understand that it doesn't look like it will cause maintenance > >problems. > > > >The config.gcc change conflicts with the save-restore optimization pass that > >Andrew Burgess added, but that is a trivial fix. > > > >The code only works for 32-bit load/stores. rv64 has compressed 64-bit > >load/stores, and the F and D extensions have float and double compressed > >loads and stores. The patch would be more useful if it handled all of these. > > > >The patch doesn't check the other operand, it only looks at the memory > >operand. This results in some cases where the code rewrites instructions that > >can never be compressed. For instance, given void store1z (int *array) { > > array[200] = 0; > > array[201] = 0; > > array[202] = 0; > > array[203] = 0; > >} > >the patch increases code size by 4 bytes because it rewrites the stores, but we > >don't have compressed store 0, and x0 is not a compressed reg, so there is no > >point in rewriting the stores. I can also create examples that show a size > >increase with loads but it is a little trickier. > >extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int > >a3, int a4, int *array) { > > int a = 0; > > a += array[200]; > > a += array[201]; > > a += array[202]; > > a += array[203]; > > return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4 > >directly through from args to the call, leaving only a5-a7 for the load base > >address and dest, and only one of those regs is a compressed reg, but we > >need two, so these loads can't be compressed. The code still gets rewritten, > >resulting in a size increase for the extra add. Not sure if you can do anything > >about that though, since you are doing this before register allocation. > > > >It isn't clear that the change riscv_address_cost is for. That should be > >commented. > > > >I'd suggest parens in the riscv_compressed_lw_offset_p return statement, > >that makes it easier to align the code correctly (in emacs at least). > > > >The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the > >"ISA constants" section of riscv.h, and maybe renamed similar to the other > >constants. > > > >There is a REG_P check in get_si_mem_reg. That should probably handle > >SUBREGs too. > > > >A testcase to verify the patch would be nice. I have one I wrote for testing > >that shows some tests that work, some tests that don't work, and some that > >work but maybe shouldn't. > > > >I vaguely remember Micheal Meissner talking about doing something similar > >for PPC in a GNU Cauldron maybe 2 years ago. But I don't know if he tried, or > >how far he got if he did try. It might be useful to ask him about that work. > > > >Otherwise this looks OK to me. > > > >Jim
The test case present a real embedded case, yes it is not functional benchmark, it is a test case, but on our days, most of the MCU's for IoT devices acts as controllers and less as computing/algorithm units. This days, functional algorithms are done on HW and managed by SW with snippet as the giving test-case. This means that SW that accessing the memory on the SoC, for managing some HW feature, is commonly used. Therefor the benefit in this pass will serve everyone. We (WD) will obviously support it as we are part of this joint effort to drive better density. As a next phase, we can improve this, taking Jim comments among others. Also, as Jim said this, this is a simple case and should not cause maintenance problems. On 31/10/2019 12:18, Andrew Waterman wrote: > Nevertheless, as Jim observed, it's a great burden on the RISC-V > backend maintainers to support all these passes. Are you saying WD is > willing to put its money where its mouth is and will provide active > support for these passes? > > On Thu, Oct 31, 2019 at 2:42 AM Nidal Faour <Nidal.Faour@wdc.com> wrote: >> A tests case for this patch has been written and pushed to WD github repository at the following link: >> https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset >> >> the test case itself has been written based on a real product scenario. >> We got a saving of about 10% in code size of the test itself. >> >> Best Regards, >> >> Nidal Faour >> Staff Engineer, R&D Engineering – Firmware & Toolchain, CTO Group >> >> Western Digital® >> Migdal Tefen 24959, P.O Box 3 >> Email: nidal.faour@wdc.com >> Office: +972-4-9078756 >> Mobile: +972-50-8867756 >> >>> -----Original Message----- >>> From: Jim Wilson <jimw@sifive.com> >>> Sent: Thursday, 31 October 2019 1:57 >>> To: Craig Blackmore <craig.blackmore@embecosm.com> >>> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Ofer Shinaar >>> <Ofer.Shinaar@wdc.com>; Nidal Faour <Nidal.Faour@wdc.com>; Kito Cheng >>> <kito.cheng@gmail.com>; Jeff Law <law@redhat.com> >>> Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass >>> >>> CAUTION: This email originated from outside of Western Digital. Do not click >>> on links or open attachments unless you recognize the sender and know that >>> the content is safe. >>> >>> >>> On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore >>> <craig.blackmore@embecosm.com> wrote: >>>> This patch aims to allow more load/store instructions to be compressed >>>> by replacing a load/store of 'base register + large offset' with a new >>>> load/store of 'new base + small offset'. If the new base gets stored >>>> in a compressed register, then the new load/store can be compressed. >>>> Since there is an overhead in creating the new base, this change is >>>> only attempted when 'base register' is referenced in at least 4 load/stores in >>> a basic block. >>>> The optimization is implemented in a new RISC-V specific pass called >>>> shorten_memrefs which is enabled for RVC targets. It has been >>>> developed for the 32-bit lw/sw instructions but could also be extended to >>> 64-bit ld/sd in future. >>> >>> The fact that this needs 4 load/stores in a block with the same base address >>> means it won't trigger very often. But it does seem to work. >>> I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a. There >>> might be other codes that benefit more. >>> >>> I'm concerned about the number of RISC-V specific optimization passes >>> people are writing. I've seen at least 3 so far. This one is at least small and >>> simple enough to understand that it doesn't look like it will cause maintenance >>> problems. >>> >>> The config.gcc change conflicts with the save-restore optimization pass that >>> Andrew Burgess added, but that is a trivial fix. >>> >>> The code only works for 32-bit load/stores. rv64 has compressed 64-bit >>> load/stores, and the F and D extensions have float and double compressed >>> loads and stores. The patch would be more useful if it handled all of these. >>> >>> The patch doesn't check the other operand, it only looks at the memory >>> operand. This results in some cases where the code rewrites instructions that >>> can never be compressed. For instance, given void store1z (int *array) { >>> array[200] = 0; >>> array[201] = 0; >>> array[202] = 0; >>> array[203] = 0; >>> } >>> the patch increases code size by 4 bytes because it rewrites the stores, but we >>> don't have compressed store 0, and x0 is not a compressed reg, so there is no >>> point in rewriting the stores. I can also create examples that show a size >>> increase with loads but it is a little trickier. >>> extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int >>> a3, int a4, int *array) { >>> int a = 0; >>> a += array[200]; >>> a += array[201]; >>> a += array[202]; >>> a += array[203]; >>> return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4 >>> directly through from args to the call, leaving only a5-a7 for the load base >>> address and dest, and only one of those regs is a compressed reg, but we >>> need two, so these loads can't be compressed. The code still gets rewritten, >>> resulting in a size increase for the extra add. Not sure if you can do anything >>> about that though, since you are doing this before register allocation. >>> >>> It isn't clear that the change riscv_address_cost is for. That should be >>> commented. >>> >>> I'd suggest parens in the riscv_compressed_lw_offset_p return statement, >>> that makes it easier to align the code correctly (in emacs at least). >>> >>> The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the >>> "ISA constants" section of riscv.h, and maybe renamed similar to the other >>> constants. >>> >>> There is a REG_P check in get_si_mem_reg. That should probably handle >>> SUBREGs too. >>> >>> A testcase to verify the patch would be nice. I have one I wrote for testing >>> that shows some tests that work, some tests that don't work, and some that >>> work but maybe shouldn't. >>> >>> I vaguely remember Micheal Meissner talking about doing something similar >>> for PPC in a GNU Cauldron maybe 2 years ago. But I don't know if he tried, or >>> how far he got if he did try. It might be useful to ask him about that work. >>> >>> Otherwise this looks OK to me. >>> >>> Jim
Hi Jim, Thank you for your review. I have posted an updated patch below which I think addresses your comments. On 30/10/2019 23:57, Jim Wilson wrote: > On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore > <craig.blackmore@embecosm.com> wrote: >> This patch aims to allow more load/store instructions to be compressed by >> replacing a load/store of 'base register + large offset' with a new load/store >> of 'new base + small offset'. If the new base gets stored in a compressed >> register, then the new load/store can be compressed. Since there is an overhead >> in creating the new base, this change is only attempted when 'base register' is >> referenced in at least 4 load/stores in a basic block. >> >> The optimization is implemented in a new RISC-V specific pass called >> shorten_memrefs which is enabled for RVC targets. It has been developed for the >> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future. > > The fact that this needs 4 load/stores in a block with the same base > address means it won't trigger very often. But it does seem to work. > I see about a 0.05% code size reduction for a rv32gc newlib nano > libc.a. There might be other codes that benefit more. > > I'm concerned about the number of RISC-V specific optimization passes > people are writing. I've seen at least 3 so far. This one is at > least small and simple enough to understand that it doesn't look like > it will cause maintenance problems. > > The config.gcc change conflicts with the save-restore optimization > pass that Andrew Burgess added, but that is a trivial fix. > I have rebased onto latest trunk and fixed this. > The code only works for 32-bit load/stores. rv64 has compressed > 64-bit load/stores, and the F and D extensions have float and double > compressed loads and stores. The patch would be more useful if it > handled all of these. > We plan to add support for these in the future. > The patch doesn't check the other operand, it only looks at the memory > operand. This results in some cases where the code rewrites > instructions that can never be compressed. For instance, given > void > store1z (int *array) > { > array[200] = 0; > array[201] = 0; > array[202] = 0; > array[203] = 0; > } > the patch increases code size by 4 bytes because it rewrites the > stores, but we don't have compressed store 0, and x0 is not a > compressed reg, so there is no point in rewriting the stores. I have now excluded store zero from being counted and rewritten. > I can > also create examples that show a size increase with loads but it is a > little trickier. > extern int sub1 (int, int, int, int, int, int, int); > int > load1a (int a0, int a1, int a2, int a3, int a4, int *array) > { > int a = 0; > a += array[200]; > a += array[201]; > a += array[202]; > a += array[203]; > return sub1 (a0, a1, a2, a3, a4, 0, a); > } > The register allocator will pass a0-a4 directly through from args to > the call, leaving only a5-a7 for the load base address and dest, and > only one of those regs is a compressed reg, but we need two, so these > loads can't be compressed. The code still gets rewritten, resulting > in a size increase for the extra add. Not sure if you can do anything > about that though, since you are doing this before register > allocation. I haven't got a way of avoiding this at present, so I have added this case as an xfail. > > It isn't clear that the change riscv_address_cost is for. That should > be commented. > This is now commented. > I'd suggest parens in the riscv_compressed_lw_offset_p return > statement, that makes it easier to align the code correctly (in emacs > at least). > Parens added. > The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the "ISA > constants" section of riscv.h, and maybe renamed similar to the other > constants. > Done. > There is a REG_P check in get_si_mem_reg. That should probably handle > SUBREGs too. > I couldn't produce a case where the address is 'subreg + offset', so I have not added handling for SUBREGs for now. > A testcase to verify the patch would be nice. I have one I wrote for > testing that shows some tests that work, some tests that don't work, > and some that work but maybe shouldn't. > Thank you for the test file. I have added some testcases to the patch based on this. > I vaguely remember Micheal Meissner talking about doing something > similar for PPC in a GNU Cauldron maybe 2 years ago. But I don't know > if he tried, or how far he got if he did try. It might be useful to > ask him about that work. > > Otherwise this looks OK to me. > > Jim Thanks, Craig --- gcc/ChangeLog: * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv. * config/riscv/riscv-passes.def: New file. * config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare. * config/riscv/riscv-shorten-memrefs.c: New file. * config/riscv/riscv.c (tree-pass.h): New include. (riscv_compressed_reg_p): New Function (riscv_compressed_lw_offset_p): Likewise. (riscv_compressed_lw_address_p): Likewise. (riscv_shorten_lw_offset): Likewise. (riscv_legitimize_address): Attempt to convert base + large_offset to compressible new_base + small_offset. (riscv_address_cost): Make anticipated compressed load/stores cheaper for code size than uncompressed load/stores. (riscv_register_priority): Move compressed register check to riscv_compressed_reg_p. (riscv_new_address_profitable_p): New function. (TARGET_NEW_ADDRESS_PROFITABLE_P): Define. * config/riscv/riscv.h (C_S_BITS): Define. (CSW_MAX_OFFSET): Define. * config/riscv/riscv.opt (mshorten-memefs): New option. * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule. (PASSES_EXTRA): Add riscv-passes.def. * doc/invoke.texi: Document -mshorten-memrefs. gcc/testsuite/ChangeLog: * gcc.target/riscv/shorten-memrefs-1.c: New test. * gcc.target/riscv/shorten-memrefs-2.c: New test. * gcc.target/riscv/shorten-memrefs-3.c: New test. * gcc.target/riscv/shorten-memrefs-4.c: New test. * gcc.target/riscv/shorten-memrefs-5.c: New test. * gcc.target/riscv/shorten-memrefs-6.c: New test. * gcc.target/riscv/shorten-memrefs-7.c: New test. --- gcc/config.gcc | 2 +- gcc/config/riscv/riscv-passes.def | 20 ++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-shorten-memrefs.c | 200 ++++++++++++++++++ gcc/config/riscv/riscv.c | 88 +++++++- gcc/config/riscv/riscv.h | 5 + gcc/config/riscv/riscv.opt | 6 + gcc/config/riscv/t-riscv | 5 + gcc/doc/invoke.texi | 10 + .../gcc.target/riscv/shorten-memrefs-1.c | 26 +++ .../gcc.target/riscv/shorten-memrefs-2.c | 51 +++++ .../gcc.target/riscv/shorten-memrefs-3.c | 39 ++++ .../gcc.target/riscv/shorten-memrefs-4.c | 26 +++ .../gcc.target/riscv/shorten-memrefs-5.c | 53 +++++ .../gcc.target/riscv/shorten-memrefs-6.c | 39 ++++ .../gcc.target/riscv/shorten-memrefs-7.c | 46 ++++ 16 files changed, 612 insertions(+), 6 deletions(-) create mode 100644 gcc/config/riscv/riscv-passes.def create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c diff --git a/gcc/config.gcc b/gcc/config.gcc index 5aa0130135f..29840752544 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -523,7 +523,7 @@ pru-*-*) ;; riscv*) cpu_type=riscv - extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o" + extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o" d_target_objs="riscv-d.o" ;; rs6000*-*-*) diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def new file mode 100644 index 00000000000..8a4ea0918db --- /dev/null +++ b/gcc/config/riscv/riscv-passes.def @@ -0,0 +1,20 @@ +/* Declaration of target-specific passes for RISC-V. + Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs); diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 5092294803c..78008c28b75 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -89,4 +89,6 @@ extern void riscv_init_builtins (void); /* Routines implemented in riscv-common.c. */ extern std::string riscv_arch_str (); +rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt); + #endif /* ! GCC_RISCV_PROTOS_H */ diff --git a/gcc/config/riscv/riscv-shorten-memrefs.c b/gcc/config/riscv/riscv-shorten-memrefs.c new file mode 100644 index 00000000000..3686005fe2e --- /dev/null +++ b/gcc/config/riscv/riscv-shorten-memrefs.c @@ -0,0 +1,200 @@ +/* Shorten memrefs pass for RISC-V. + Copyright (C) 2018-2019 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "rtl.h" +#include "backend.h" +#include "regs.h" +#include "target.h" +#include "memmodel.h" +#include "emit-rtl.h" +#include "df.h" +#include "predict.h" +#include "tree-pass.h" + +/* Try to make more use of compressed load and store instructions by replacing + a load/store at address BASE + LARGE_OFFSET with a new load/store at address + NEW BASE + SMALL OFFSET. If NEW BASE is stored in a compressed register, the + load/store can be compressed. Since creating NEW BASE incurs an overhead, + the change is only attempted when BASE is referenced by at least four + load/stores in the same basic block. */ + +namespace { + +const pass_data pass_data_shorten_memrefs = +{ + RTL_PASS, /* type */ + "shorten_memrefs", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_NONE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_shorten_memrefs : public rtl_opt_pass +{ +public: + pass_shorten_memrefs (gcc::context *ctxt) + : rtl_opt_pass (pass_data_shorten_memrefs, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return TARGET_RVC && riscv_mshorten_memrefs && optimize > 0; + } + virtual unsigned int execute (function *); + +private: + typedef int_hash <HOST_WIDE_INT, 0> regno_hash; + typedef hash_map <regno_hash, int> regno_map; + + regno_map * analyze (basic_block bb); + void transform (regno_map *m, basic_block bb); + bool get_si_mem_base_reg (rtx mem, rtx *addr); +}; // class pass_shorten_memrefs + +bool +pass_shorten_memrefs::get_si_mem_base_reg (rtx mem, rtx *addr) +{ + if (!MEM_P (mem) || GET_MODE (mem) != SImode) + return false; + *addr = XEXP (mem, 0); + return GET_CODE (*addr) == PLUS && REG_P (XEXP (*addr, 0)); +} + +/* Count how many times each regno is referenced as base address for a memory + access. */ + +pass_shorten_memrefs::regno_map * +pass_shorten_memrefs::analyze (basic_block bb) +{ + regno_map *m = hash_map<regno_hash, int>::create_ggc (10); + rtx_insn *insn; + + regstat_init_n_sets_and_refs (); + + FOR_BB_INSNS (bb, insn) + { + if (!NONJUMP_INSN_P (insn)) + continue; + rtx pat = PATTERN (insn); + if (GET_CODE (pat) != SET) + continue; + /* Analyze stores first then loads. */ + for (int i = 0; i < 2; i++) + { + rtx mem = XEXP (pat, i); + rtx addr; + if (get_si_mem_base_reg (mem, &addr)) + { + HOST_WIDE_INT regno = REGNO (XEXP (addr, 0)); + /* Do not count store zero as these cannot be compressed. */ + if (i == 0) + { + if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1)))) + continue; + } + if (REG_N_REFS (regno) < 4) + continue; + m->get_or_insert (regno)++; + } + } + } + regstat_free_n_sets_and_refs (); + + return m; +} + +/* Convert BASE + LARGE_OFFSET to NEW_BASE + SMALL_OFFSET for each load/store + with a base reg referenced at least 4 times. */ + +void +pass_shorten_memrefs::transform (regno_map *m, basic_block bb) +{ + rtx_insn *insn; + FOR_BB_INSNS (bb, insn) + { + if (!NONJUMP_INSN_P (insn)) + continue; + rtx pat = PATTERN (insn); + if (GET_CODE (pat) != SET) + continue; + start_sequence (); + /* Transform stores first then loads. */ + for (int i = 0; i < 2; i++) + { + rtx mem = XEXP (pat, i); + rtx addr; + if (get_si_mem_base_reg (mem, &addr)) + { + HOST_WIDE_INT regno = REGNO (XEXP (addr, 0)); + /* Do not transform store zero as these cannot be compressed. */ + if (i == 0) + { + if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1)))) + continue; + } + if (m->get_or_insert (regno) > 3) + { + addr + = targetm.legitimize_address (addr, addr, GET_MODE (mem)); + XEXP (pat, i) = replace_equiv_address (mem, addr); + df_insn_rescan (insn); + } + } + } + rtx_insn *seq = get_insns (); + end_sequence (); + emit_insn_before (seq, insn); + } +} + +unsigned int +pass_shorten_memrefs::execute (function *fn) +{ + basic_block bb; + + FOR_ALL_BB_FN (bb, fn) + { + regno_map *m; + if (optimize_bb_for_speed_p (bb)) + continue; + m = analyze (bb); + transform (m, bb); + } + + return 0; +} + +} // anon namespace + +rtl_opt_pass * +make_pass_shorten_memrefs (gcc::context *ctxt) +{ + return new pass_shorten_memrefs (ctxt); +} diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 9aa4d266e6b..1d6d2f89f7d 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "builtins.h" #include "predict.h" +#include "tree-pass.h" /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF. */ #define UNSPEC_ADDRESS_P(X) \ @@ -848,6 +849,52 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p) return riscv_classify_address (&addr, x, mode, strict_p); } +/* Return true if hard reg REGNO can be used in compressed instructions. */ + +static bool +riscv_compressed_reg_p (int regno) +{ + /* x8-x15/f8-f15 are compressible registers. */ + return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15) + || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15))); +} + +/* Return true if x is an unsigned 5-bit immediate scaled by 4. */ + +static bool +riscv_compressed_lw_offset_p (rtx x) +{ + return (CONST_INT_P (x) + && (INTVAL (x) & 3) == 0 + && IN_RANGE (INTVAL (x), 0, CSW_MAX_OFFSET)); +} + +/* Return true if load/store from/to address x can be compressed. */ + +static bool +riscv_compressed_lw_address_p (rtx x) +{ + struct riscv_address_info addr; + bool result = riscv_classify_address (&addr, x, GET_MODE (x), + reload_completed); + + /* Before reload, assuming all load/stores of valid addresses get compressed + gives better code size than checking if the address is reg + small_offset + early on. */ + if (result && !reload_completed) + return true; + + /* Return false if address is not compressed_reg + small_offset. */ + if (!result + || addr.type != ADDRESS_REG + || (!riscv_compressed_reg_p (REGNO (addr.reg)) + && addr.reg != stack_pointer_rtx) + || !riscv_compressed_lw_offset_p (addr.offset)) + return false; + + return result; +} + /* Return the number of instructions needed to load or store a value of mode MODE at address X. Return 0 if X isn't valid for MODE. Assume that multiword moves may need to be split into word moves @@ -1305,6 +1352,24 @@ riscv_force_address (rtx x, machine_mode mode) return x; } +/* Modify base + offset so that offset fits within a compressed load/store insn + and the excess is added to base. */ + +static rtx +riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset) +{ + rtx addr, high; + /* Leave OFFSET as an unsigned 5-bit offset scaled by 4 and put the excess + into HIGH. */ + high = GEN_INT (offset & ~CSW_MAX_OFFSET); + offset &= CSW_MAX_OFFSET; + if (!SMALL_OPERAND (INTVAL (high))) + high = force_reg (Pmode, high); + base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base)); + addr = plus_constant (Pmode, base, offset); + return addr; +} + /* This function is used to implement LEGITIMIZE_ADDRESS. If X can be legitimized in a way that the generic machinery might not expect, return a new address, otherwise return NULL. MODE is the mode of @@ -1323,7 +1388,7 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED, if (riscv_split_symbol (NULL, x, mode, &addr, FALSE)) return riscv_force_address (addr, mode); - /* Handle BASE + OFFSET using riscv_add_offset. */ + /* Handle BASE + OFFSET. */ if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) != 0) { @@ -1332,7 +1397,14 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED, if (!riscv_valid_base_register_p (base, mode, false)) base = copy_to_mode_reg (Pmode, base); - addr = riscv_add_offset (NULL, base, offset); + if (optimize_function_for_size_p (cfun) + && (strcmp (current_pass->name, "shorten_memrefs") == 0) + && mode == SImode) + /* Convert BASE + LARGE_OFFSET into NEW_BASE + SMALL_OFFSET to allow + possible compressed load/store. */ + addr = riscv_shorten_lw_offset (base, offset); + else + addr = riscv_add_offset (NULL, base, offset); return riscv_force_address (addr, mode); } @@ -1822,6 +1894,11 @@ riscv_address_cost (rtx addr, machine_mode mode, addr_space_t as ATTRIBUTE_UNUSED, bool speed ATTRIBUTE_UNUSED) { + /* When optimizing for size, make uncompressible 32-bit addresses more + * expensive so that compressible 32-bit addresses are preferred. */ + if (!speed && riscv_mshorten_memrefs && mode == SImode + && !riscv_compressed_lw_address_p (addr)) + return riscv_address_insns (addr, mode, false) + 1; return riscv_address_insns (addr, mode, false); } @@ -4647,6 +4724,7 @@ riscv_option_override (void) error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32" " [%<-mriscv-attribute%>]"); #endif + } /* Implement TARGET_CONDITIONAL_REGISTER_USAGE. */ @@ -4686,9 +4764,9 @@ riscv_conditional_register_usage (void) static int riscv_register_priority (int regno) { - /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection. */ - if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15) - || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15))) + /* Favor compressed registers to improve the odds of RVC instruction + selection. */ + if (riscv_compressed_reg_p (regno)) return 1; return 0; diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index b1e3403f8a7..a0d5354eae9 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -913,6 +913,7 @@ extern unsigned riscv_stack_boundary; #define SHIFT_RS1 15 #define SHIFT_IMM 20 #define IMM_BITS 12 +#define C_S_BITS 5 #define C_SxSP_BITS 6 #define IMM_REACH (1LL << IMM_BITS) @@ -922,6 +923,10 @@ extern unsigned riscv_stack_boundary; #define SWSP_REACH (4LL << C_SxSP_BITS) #define SDSP_REACH (8LL << C_SxSP_BITS) +/* This is the maximum value that can be represented in a compressed load/store + offset (an unsigned 5-bit value scaled by 4). */ +#define CSW_MAX_OFFSET ((4LL << C_S_BITS) - 1) & ~3 + /* Called from RISCV_REORG, this is defined in riscv-sr.c. */ extern void riscv_remove_unneeded_save_restore_calls (void); diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index 7f0c35e9e9c..138bddd369f 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -87,6 +87,12 @@ msave-restore Target Report Mask(SAVE_RESTORE) Use smaller but slower prologue and epilogue code. +mshorten-memrefs +Target Bool Var(riscv_mshorten_memrefs) Init(1) +Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more +memory accesses to be generated as compressed instructions. Currently targets +32-bit integer load/stores. + mcmodel= Target Report RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL) Specify the code model. diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv index 5ecb3c160a6..4820fb35d31 100644 --- a/gcc/config/riscv/t-riscv +++ b/gcc/config/riscv/t-riscv @@ -19,3 +19,8 @@ riscv-d.o: $(srcdir)/config/riscv/riscv-d.c $(COMPILE) $< $(POSTCOMPILE) +riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.c + $(COMPILE) $< + $(POSTCOMPILE) + +PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index af3c7f2b910..329083fb9b9 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1079,6 +1079,7 @@ See RS/6000 and PowerPC Options. -mpreferred-stack-boundary=@var{num} @gol -msmall-data-limit=@var{N-bytes} @gol -msave-restore -mno-save-restore @gol +-mshorten-memrefs -mno-shorten-memrefs @gol -mstrict-align -mno-strict-align @gol -mcmodel=medlow -mcmodel=medany @gol -mexplicit-relocs -mno-explicit-relocs @gol @@ -24334,6 +24335,15 @@ Do or don't use smaller but slower prologue and epilogue code that uses library function calls. The default is to use fast inline prologues and epilogues. +@item -mshorten-memrefs +@itemx -mno-shorten-memrefs +@opindex mshorten-memrefs +Do or do not attempt to make more use of compressed load/store instructions by +replacing a load/store of 'base register + large offset' with a new load/store +of 'new base + small offset'. If the new base gets stored in a compressed +register, then the new load/store can be compressed. Currently targets 32-bit +integer load/stores only. + @item -mstrict-align @itemx -mno-strict-align @opindex mstrict-align diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c new file mode 100644 index 00000000000..958942a6f7f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c @@ -0,0 +1,26 @@ +/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */ + +/* These stores cannot be compressed because x0 is not a compressed reg. + Therefore the shorten_memrefs pass should not attempt to rewrite them into a + compressible format. */ + +void +store1z (int *array) +{ + array[200] = 0; + array[201] = 0; + array[202] = 0; + array[203] = 0; +} + +void +store2z (long long *array) +{ + array[200] = 0; + array[201] = 0; + array[202] = 0; + array[203] = 0; +} + +/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */ +/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c new file mode 100644 index 00000000000..2c2f41548c6 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c @@ -0,0 +1,51 @@ +/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */ + +/* shorten_memrefs should rewrite these load/stores into a compressible + format. */ + +void +store1a (int *array, int a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +void +store2a (long long *array, long long a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +int +load1r (int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +long long +load2r (long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +/* { dg-final { scan-assembler "store1a:\n\taddi" } } */ +/* The sd insns in store2a are not rewritten because shorten_memrefs currently + only optimizes lw and sw. +/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-* } } } */ +/* { dg-final { scan-assembler "load1r:\n\taddi" } } */ +/* { dg-final { scan-assembler "load2r:\n\taddi" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c new file mode 100644 index 00000000000..2001fe871ee --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c @@ -0,0 +1,39 @@ +/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */ + +/* These loads cannot be compressed because only one compressed reg is + available (since args are passed in a0-a4, that leaves a5-a7 available, of + which only a5 is a compressed reg). Therefore the shorten_memrefs pass should + not attempt to rewrite these loads into a compressible format. It may not + be possible to avoid this because shorten_memrefs happens before reg alloc. +*/ + +extern int sub1 (int, int, int, int, int, int, int); + +int +load1a (int a0, int a1, int a2, int a3, int a4, int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return sub1 (a0, a1, a2, a3, a4, 0, a); +} + +extern long long sub2 (long long, long long, long long, long long, long long, + long long, long long); + +long long +load2a (long long a0, long long a1, long long a2, long long a3, long long a4, + long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return sub2 (a0, a1, a2, a3, a4, 0, a); +} + +/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */ +/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" { xfail riscv*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c new file mode 100644 index 00000000000..cd4784913e4 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c @@ -0,0 +1,26 @@ +/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */ + +/* These stores cannot be compressed because x0 is not a compressed reg. + Therefore the shorten_memrefs pass should not attempt to rewrite them into a + compressible format. */ + +void +store1z (int *array) +{ + array[200] = 0; + array[201] = 0; + array[202] = 0; + array[203] = 0; +} + +void +store2z (long long *array) +{ + array[200] = 0; + array[201] = 0; + array[202] = 0; + array[203] = 0; +} + +/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */ +/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c new file mode 100644 index 00000000000..80b3897e4da --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c @@ -0,0 +1,53 @@ +/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */ + +/* shorten_memrefs should rewrite these load/stores into a compressible + format. */ + +void +store1a (int *array, int a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +void +store2a (long long *array, long long a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +int +load1r (int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +long long +load2r (long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +/* { dg-final { scan-assembler "store1a:\n\taddi" } } */ +/* The sd insns in store2a are not rewritten because shorten_memrefs currently + only optimizes lw and sw. +/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-* } } } */ +/* { dg-final { scan-assembler "load1r:\n\taddi" } } */ +/* The ld insns in load2r are not rewritten because shorten_memrefs currently + only optimizes lw and sw. +/* { dg-final { scan-assembler "load2r:\n\taddi" { xfail riscv*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c new file mode 100644 index 00000000000..3403c7044df --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c @@ -0,0 +1,39 @@ +/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */ + +/* These loads cannot be compressed because only one compressed reg is + available (since args are passed in a0-a4, that leaves a5-a7 available, of + which only a5 is a compressed reg). Therefore the shorten_memrefs pass should + not attempt to rewrite these loads into a compressible format. It may not + be possible to avoid this because shorten_memrefs happens before reg alloc. +*/ + +extern int sub1 (int, int, int, int, int, int, int); + +int +load1a (int a0, int a1, int a2, int a3, int a4, int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return sub1 (a0, a1, a2, a3, a4, 0, a); +} + +extern long long sub2 (long long, long long, long long, long long, long long, + long long, long long); + +long long +load2a (long long a0, long long a1, long long a2, long long a3, long long a4, + long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return sub2 (a0, a1, a2, a3, a4, 0, a); +} + +/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */ +/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c new file mode 100644 index 00000000000..a5833fd356d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c @@ -0,0 +1,46 @@ +/* { dg-options "-Os -march=rv32imc -mabi=ilp32 -mno-shorten-memrefs" } */ + +/* Check that these load/stores do not get rewritten into a compressible format + when shorten_memrefs is disabled. */ + +void +store1a (int *array, int a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +void +store2a (long long *array, long long a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +int +load1r (int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +long long +load2r (long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +/* { dg-final { scan-assembler-not "addi" } } */
On 10/12/2019 18:28, Craig Blackmore wrote: > > Hi Jim, > > Thank you for your review. I have posted an updated patch below which I think > addresses your comments. > Ping https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00712.html https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00713.html Craig
On Wed, Feb 19, 2020 at 3:40 AM Craig Blackmore <craig.blackmore@embecosm.com> wrote: > On 10/12/2019 18:28, Craig Blackmore wrote: > Thank you for your review. I have posted an updated patch below which I think > addresses your comments. > > Ping > > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00712.html This looks OK. There are some minor issues. (riscv_new_address_profitable_p): New function. (TARGET_NEW_ADDRESS_PROFITABLE_P): Define. These are actually in part2, and part2 already has changelog entries for them, so these can just be dropped. + /* When optimizing for size, make uncompressible 32-bit addresses more + * expensive so that compressible 32-bit addresses are preferred. */ + if (!speed && riscv_mshorten_memrefs && mode == SImode + && !riscv_compressed_lw_address_p (addr)) + return riscv_address_insns (addr, mode, false) + 1; I think that there should be a TARGET_RVC check here, just like in the gate function for the new pass. But I also suspect that this probably doesn't matter much. > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00713.html This also looks OK to me, but adding a new target hook is something that should really wait for stage1 which should only be a few weeks away now. That will also give us more time to test the patch before it appears in a gcc release. So this is OK once stage1 opens. Jim
On 08/04/2020 17:04, Jim Wilson wrote: > On Wed, Feb 19, 2020 at 3:40 AM Craig Blackmore > <craig.blackmore@embecosm.com> wrote: >> On 10/12/2019 18:28, Craig Blackmore wrote: >> Thank you for your review. I have posted an updated patch below which I think >> addresses your comments. >> >> Ping >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00712.html > This looks OK. There are some minor issues. > > (riscv_new_address_profitable_p): New function. > (TARGET_NEW_ADDRESS_PROFITABLE_P): Define. > > These are actually in part2, and part2 already has changelog entries > for them, so these can just be dropped. > > + /* When optimizing for size, make uncompressible 32-bit addresses more > + * expensive so that compressible 32-bit addresses are preferred. */ > + if (!speed && riscv_mshorten_memrefs && mode == SImode > + && !riscv_compressed_lw_address_p (addr)) > + return riscv_address_insns (addr, mode, false) + 1; > > I think that there should be a TARGET_RVC check here, just like in the gate > function for the new pass. But I also suspect that this probably > doesn't matter much. Hi Jim, Thanks for the review. I have updated the following patch with those changes. Craig --- gcc/ChangeLog: * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv. * config/riscv/riscv-passes.def: New file. * config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare. * config/riscv/riscv-shorten-memrefs.c: New file. * config/riscv/riscv.c (tree-pass.h): New include. (riscv_compressed_reg_p): New Function (riscv_compressed_lw_offset_p): Likewise. (riscv_compressed_lw_address_p): Likewise. (riscv_shorten_lw_offset): Likewise. (riscv_legitimize_address): Attempt to convert base + large_offset to compressible new_base + small_offset. (riscv_address_cost): Make anticipated compressed load/stores cheaper for code size than uncompressed load/stores. (riscv_register_priority): Move compressed register check to riscv_compressed_reg_p. * config/riscv/riscv.h (C_S_BITS): Define. (CSW_MAX_OFFSET): Define. * config/riscv/riscv.opt (mshorten-memefs): New option. * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule. (PASSES_EXTRA): Add riscv-passes.def. * doc/invoke.texi: Document -mshorten-memrefs. gcc/testsuite/ChangeLog: * gcc.target/riscv/shorten-memrefs-1.c: New test. * gcc.target/riscv/shorten-memrefs-2.c: New test. * gcc.target/riscv/shorten-memrefs-3.c: New test. * gcc.target/riscv/shorten-memrefs-4.c: New test. * gcc.target/riscv/shorten-memrefs-5.c: New test. * gcc.target/riscv/shorten-memrefs-6.c: New test. * gcc.target/riscv/shorten-memrefs-7.c: New test. --- gcc/config.gcc | 2 +- gcc/config/riscv/riscv-passes.def | 20 ++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-shorten-memrefs.c | 200 ++++++++++++++++++ gcc/config/riscv/riscv.c | 88 +++++++- gcc/config/riscv/riscv.h | 5 + gcc/config/riscv/riscv.opt | 6 + gcc/config/riscv/t-riscv | 5 + gcc/doc/invoke.texi | 10 + .../gcc.target/riscv/shorten-memrefs-1.c | 26 +++ .../gcc.target/riscv/shorten-memrefs-2.c | 51 +++++ .../gcc.target/riscv/shorten-memrefs-3.c | 39 ++++ .../gcc.target/riscv/shorten-memrefs-4.c | 26 +++ .../gcc.target/riscv/shorten-memrefs-5.c | 53 +++++ .../gcc.target/riscv/shorten-memrefs-6.c | 39 ++++ .../gcc.target/riscv/shorten-memrefs-7.c | 46 ++++ 16 files changed, 612 insertions(+), 6 deletions(-) create mode 100644 gcc/config/riscv/riscv-passes.def create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c diff --git a/gcc/config.gcc b/gcc/config.gcc index cf1a87e2efd..3c2a0389b98 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -525,7 +525,7 @@ pru-*-*) ;; riscv*) cpu_type=riscv - extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o" + extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o" d_target_objs="riscv-d.o" ;; rs6000*-*-*) diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def new file mode 100644 index 00000000000..8a4ea0918db --- /dev/null +++ b/gcc/config/riscv/riscv-passes.def @@ -0,0 +1,20 @@ +/* Declaration of target-specific passes for RISC-V. + Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs); diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 8cf9137b5e7..72280ec1c76 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -91,4 +91,6 @@ extern std::string riscv_arch_str (); extern bool riscv_hard_regno_rename_ok (unsigned, unsigned); +rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt); + #endif /* ! GCC_RISCV_PROTOS_H */ diff --git a/gcc/config/riscv/riscv-shorten-memrefs.c b/gcc/config/riscv/riscv-shorten-memrefs.c new file mode 100644 index 00000000000..3686005fe2e --- /dev/null +++ b/gcc/config/riscv/riscv-shorten-memrefs.c @@ -0,0 +1,200 @@ +/* Shorten memrefs pass for RISC-V. + Copyright (C) 2018-2019 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "rtl.h" +#include "backend.h" +#include "regs.h" +#include "target.h" +#include "memmodel.h" +#include "emit-rtl.h" +#include "df.h" +#include "predict.h" +#include "tree-pass.h" + +/* Try to make more use of compressed load and store instructions by replacing + a load/store at address BASE + LARGE_OFFSET with a new load/store at address + NEW BASE + SMALL OFFSET. If NEW BASE is stored in a compressed register, the + load/store can be compressed. Since creating NEW BASE incurs an overhead, + the change is only attempted when BASE is referenced by at least four + load/stores in the same basic block. */ + +namespace { + +const pass_data pass_data_shorten_memrefs = +{ + RTL_PASS, /* type */ + "shorten_memrefs", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_NONE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_shorten_memrefs : public rtl_opt_pass +{ +public: + pass_shorten_memrefs (gcc::context *ctxt) + : rtl_opt_pass (pass_data_shorten_memrefs, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return TARGET_RVC && riscv_mshorten_memrefs && optimize > 0; + } + virtual unsigned int execute (function *); + +private: + typedef int_hash <HOST_WIDE_INT, 0> regno_hash; + typedef hash_map <regno_hash, int> regno_map; + + regno_map * analyze (basic_block bb); + void transform (regno_map *m, basic_block bb); + bool get_si_mem_base_reg (rtx mem, rtx *addr); +}; // class pass_shorten_memrefs + +bool +pass_shorten_memrefs::get_si_mem_base_reg (rtx mem, rtx *addr) +{ + if (!MEM_P (mem) || GET_MODE (mem) != SImode) + return false; + *addr = XEXP (mem, 0); + return GET_CODE (*addr) == PLUS && REG_P (XEXP (*addr, 0)); +} + +/* Count how many times each regno is referenced as base address for a memory + access. */ + +pass_shorten_memrefs::regno_map * +pass_shorten_memrefs::analyze (basic_block bb) +{ + regno_map *m = hash_map<regno_hash, int>::create_ggc (10); + rtx_insn *insn; + + regstat_init_n_sets_and_refs (); + + FOR_BB_INSNS (bb, insn) + { + if (!NONJUMP_INSN_P (insn)) + continue; + rtx pat = PATTERN (insn); + if (GET_CODE (pat) != SET) + continue; + /* Analyze stores first then loads. */ + for (int i = 0; i < 2; i++) + { + rtx mem = XEXP (pat, i); + rtx addr; + if (get_si_mem_base_reg (mem, &addr)) + { + HOST_WIDE_INT regno = REGNO (XEXP (addr, 0)); + /* Do not count store zero as these cannot be compressed. */ + if (i == 0) + { + if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1)))) + continue; + } + if (REG_N_REFS (regno) < 4) + continue; + m->get_or_insert (regno)++; + } + } + } + regstat_free_n_sets_and_refs (); + + return m; +} + +/* Convert BASE + LARGE_OFFSET to NEW_BASE + SMALL_OFFSET for each load/store + with a base reg referenced at least 4 times. */ + +void +pass_shorten_memrefs::transform (regno_map *m, basic_block bb) +{ + rtx_insn *insn; + FOR_BB_INSNS (bb, insn) + { + if (!NONJUMP_INSN_P (insn)) + continue; + rtx pat = PATTERN (insn); + if (GET_CODE (pat) != SET) + continue; + start_sequence (); + /* Transform stores first then loads. */ + for (int i = 0; i < 2; i++) + { + rtx mem = XEXP (pat, i); + rtx addr; + if (get_si_mem_base_reg (mem, &addr)) + { + HOST_WIDE_INT regno = REGNO (XEXP (addr, 0)); + /* Do not transform store zero as these cannot be compressed. */ + if (i == 0) + { + if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1)))) + continue; + } + if (m->get_or_insert (regno) > 3) + { + addr + = targetm.legitimize_address (addr, addr, GET_MODE (mem)); + XEXP (pat, i) = replace_equiv_address (mem, addr); + df_insn_rescan (insn); + } + } + } + rtx_insn *seq = get_insns (); + end_sequence (); + emit_insn_before (seq, insn); + } +} + +unsigned int +pass_shorten_memrefs::execute (function *fn) +{ + basic_block bb; + + FOR_ALL_BB_FN (bb, fn) + { + regno_map *m; + if (optimize_bb_for_speed_p (bb)) + continue; + m = analyze (bb); + transform (m, bb); + } + + return 0; +} + +} // anon namespace + +rtl_opt_pass * +make_pass_shorten_memrefs (gcc::context *ctxt) +{ + return new pass_shorten_memrefs (ctxt); +} diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 94b5ac01762..0c0879c8aee 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "builtins.h" #include "predict.h" +#include "tree-pass.h" /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF. */ #define UNSPEC_ADDRESS_P(X) \ @@ -848,6 +849,52 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p) return riscv_classify_address (&addr, x, mode, strict_p); } +/* Return true if hard reg REGNO can be used in compressed instructions. */ + +static bool +riscv_compressed_reg_p (int regno) +{ + /* x8-x15/f8-f15 are compressible registers. */ + return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15) + || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15))); +} + +/* Return true if x is an unsigned 5-bit immediate scaled by 4. */ + +static bool +riscv_compressed_lw_offset_p (rtx x) +{ + return (CONST_INT_P (x) + && (INTVAL (x) & 3) == 0 + && IN_RANGE (INTVAL (x), 0, CSW_MAX_OFFSET)); +} + +/* Return true if load/store from/to address x can be compressed. */ + +static bool +riscv_compressed_lw_address_p (rtx x) +{ + struct riscv_address_info addr; + bool result = riscv_classify_address (&addr, x, GET_MODE (x), + reload_completed); + + /* Before reload, assuming all load/stores of valid addresses get compressed + gives better code size than checking if the address is reg + small_offset + early on. */ + if (result && !reload_completed) + return true; + + /* Return false if address is not compressed_reg + small_offset. */ + if (!result + || addr.type != ADDRESS_REG + || (!riscv_compressed_reg_p (REGNO (addr.reg)) + && addr.reg != stack_pointer_rtx) + || !riscv_compressed_lw_offset_p (addr.offset)) + return false; + + return result; +} + /* Return the number of instructions needed to load or store a value of mode MODE at address X. Return 0 if X isn't valid for MODE. Assume that multiword moves may need to be split into word moves @@ -1308,6 +1355,24 @@ riscv_force_address (rtx x, machine_mode mode) return x; } +/* Modify base + offset so that offset fits within a compressed load/store insn + and the excess is added to base. */ + +static rtx +riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset) +{ + rtx addr, high; + /* Leave OFFSET as an unsigned 5-bit offset scaled by 4 and put the excess + into HIGH. */ + high = GEN_INT (offset & ~CSW_MAX_OFFSET); + offset &= CSW_MAX_OFFSET; + if (!SMALL_OPERAND (INTVAL (high))) + high = force_reg (Pmode, high); + base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base)); + addr = plus_constant (Pmode, base, offset); + return addr; +} + /* This function is used to implement LEGITIMIZE_ADDRESS. If X can be legitimized in a way that the generic machinery might not expect, return a new address, otherwise return NULL. MODE is the mode of @@ -1326,7 +1391,7 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED, if (riscv_split_symbol (NULL, x, mode, &addr, FALSE)) return riscv_force_address (addr, mode); - /* Handle BASE + OFFSET using riscv_add_offset. */ + /* Handle BASE + OFFSET. */ if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) != 0) { @@ -1335,7 +1400,14 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED, if (!riscv_valid_base_register_p (base, mode, false)) base = copy_to_mode_reg (Pmode, base); - addr = riscv_add_offset (NULL, base, offset); + if (optimize_function_for_size_p (cfun) + && (strcmp (current_pass->name, "shorten_memrefs") == 0) + && mode == SImode) + /* Convert BASE + LARGE_OFFSET into NEW_BASE + SMALL_OFFSET to allow + possible compressed load/store. */ + addr = riscv_shorten_lw_offset (base, offset); + else + addr = riscv_add_offset (NULL, base, offset); return riscv_force_address (addr, mode); } @@ -1833,6 +1905,11 @@ riscv_address_cost (rtx addr, machine_mode mode, addr_space_t as ATTRIBUTE_UNUSED, bool speed ATTRIBUTE_UNUSED) { + /* When optimizing for size, make uncompressible 32-bit addresses more + * expensive so that compressible 32-bit addresses are preferred. */ + if (TARGET_RVC && !speed && riscv_mshorten_memrefs && mode == SImode + && !riscv_compressed_lw_address_p (addr)) + return riscv_address_insns (addr, mode, false) + 1; return riscv_address_insns (addr, mode, false); } @@ -4666,6 +4743,7 @@ riscv_option_override (void) error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32" " [%<-mriscv-attribute%>]"); #endif + } /* Implement TARGET_CONDITIONAL_REGISTER_USAGE. */ @@ -4705,9 +4783,9 @@ riscv_conditional_register_usage (void) static int riscv_register_priority (int regno) { - /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection. */ - if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15) - || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15))) + /* Favor compressed registers to improve the odds of RVC instruction + selection. */ + if (riscv_compressed_reg_p (regno)) return 1; return 0; diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 567c23380fe..e6209ede9d6 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -920,6 +920,7 @@ extern unsigned riscv_stack_boundary; #define SHIFT_RS1 15 #define SHIFT_IMM 20 #define IMM_BITS 12 +#define C_S_BITS 5 #define C_SxSP_BITS 6 #define IMM_REACH (1LL << IMM_BITS) @@ -929,6 +930,10 @@ extern unsigned riscv_stack_boundary; #define SWSP_REACH (4LL << C_SxSP_BITS) #define SDSP_REACH (8LL << C_SxSP_BITS) +/* This is the maximum value that can be represented in a compressed load/store + offset (an unsigned 5-bit value scaled by 4). */ +#define CSW_MAX_OFFSET ((4LL << C_S_BITS) - 1) & ~3 + /* Called from RISCV_REORG, this is defined in riscv-sr.c. */ extern void riscv_remove_unneeded_save_restore_calls (void); diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index 29de246759e..e4bfcb86f51 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -87,6 +87,12 @@ msave-restore Target Report Mask(SAVE_RESTORE) Use smaller but slower prologue and epilogue code. +mshorten-memrefs +Target Bool Var(riscv_mshorten_memrefs) Init(1) +Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more +memory accesses to be generated as compressed instructions. Currently targets +32-bit integer load/stores. + mcmodel= Target Report RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL) Specify the code model. diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv index 5ecb3c160a6..4820fb35d31 100644 --- a/gcc/config/riscv/t-riscv +++ b/gcc/config/riscv/t-riscv @@ -19,3 +19,8 @@ riscv-d.o: $(srcdir)/config/riscv/riscv-d.c $(COMPILE) $< $(POSTCOMPILE) +riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.c + $(COMPILE) $< + $(POSTCOMPILE) + +PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index a37a2ee9c19..ad4c6d94f82 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1129,6 +1129,7 @@ See RS/6000 and PowerPC Options. -mpreferred-stack-boundary=@var{num} @gol -msmall-data-limit=@var{N-bytes} @gol -msave-restore -mno-save-restore @gol +-mshorten-memrefs -mno-shorten-memrefs @gol -mstrict-align -mno-strict-align @gol -mcmodel=medlow -mcmodel=medany @gol -mexplicit-relocs -mno-explicit-relocs @gol @@ -25321,6 +25322,15 @@ Do or don't use smaller but slower prologue and epilogue code that uses library function calls. The default is to use fast inline prologues and epilogues. +@item -mshorten-memrefs +@itemx -mno-shorten-memrefs +@opindex mshorten-memrefs +Do or do not attempt to make more use of compressed load/store instructions by +replacing a load/store of 'base register + large offset' with a new load/store +of 'new base + small offset'. If the new base gets stored in a compressed +register, then the new load/store can be compressed. Currently targets 32-bit +integer load/stores only. + @item -mstrict-align @itemx -mno-strict-align @opindex mstrict-align diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c new file mode 100644 index 00000000000..958942a6f7f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c @@ -0,0 +1,26 @@ +/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */ + +/* These stores cannot be compressed because x0 is not a compressed reg. + Therefore the shorten_memrefs pass should not attempt to rewrite them into a + compressible format. */ + +void +store1z (int *array) +{ + array[200] = 0; + array[201] = 0; + array[202] = 0; + array[203] = 0; +} + +void +store2z (long long *array) +{ + array[200] = 0; + array[201] = 0; + array[202] = 0; + array[203] = 0; +} + +/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */ +/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c new file mode 100644 index 00000000000..2c2f41548c6 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c @@ -0,0 +1,51 @@ +/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */ + +/* shorten_memrefs should rewrite these load/stores into a compressible + format. */ + +void +store1a (int *array, int a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +void +store2a (long long *array, long long a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +int +load1r (int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +long long +load2r (long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +/* { dg-final { scan-assembler "store1a:\n\taddi" } } */ +/* The sd insns in store2a are not rewritten because shorten_memrefs currently + only optimizes lw and sw. +/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-* } } } */ +/* { dg-final { scan-assembler "load1r:\n\taddi" } } */ +/* { dg-final { scan-assembler "load2r:\n\taddi" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c new file mode 100644 index 00000000000..2001fe871ee --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c @@ -0,0 +1,39 @@ +/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */ + +/* These loads cannot be compressed because only one compressed reg is + available (since args are passed in a0-a4, that leaves a5-a7 available, of + which only a5 is a compressed reg). Therefore the shorten_memrefs pass should + not attempt to rewrite these loads into a compressible format. It may not + be possible to avoid this because shorten_memrefs happens before reg alloc. +*/ + +extern int sub1 (int, int, int, int, int, int, int); + +int +load1a (int a0, int a1, int a2, int a3, int a4, int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return sub1 (a0, a1, a2, a3, a4, 0, a); +} + +extern long long sub2 (long long, long long, long long, long long, long long, + long long, long long); + +long long +load2a (long long a0, long long a1, long long a2, long long a3, long long a4, + long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return sub2 (a0, a1, a2, a3, a4, 0, a); +} + +/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */ +/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" { xfail riscv*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c new file mode 100644 index 00000000000..cd4784913e4 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c @@ -0,0 +1,26 @@ +/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */ + +/* These stores cannot be compressed because x0 is not a compressed reg. + Therefore the shorten_memrefs pass should not attempt to rewrite them into a + compressible format. */ + +void +store1z (int *array) +{ + array[200] = 0; + array[201] = 0; + array[202] = 0; + array[203] = 0; +} + +void +store2z (long long *array) +{ + array[200] = 0; + array[201] = 0; + array[202] = 0; + array[203] = 0; +} + +/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */ +/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c new file mode 100644 index 00000000000..80b3897e4da --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c @@ -0,0 +1,53 @@ +/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */ + +/* shorten_memrefs should rewrite these load/stores into a compressible + format. */ + +void +store1a (int *array, int a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +void +store2a (long long *array, long long a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +int +load1r (int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +long long +load2r (long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +/* { dg-final { scan-assembler "store1a:\n\taddi" } } */ +/* The sd insns in store2a are not rewritten because shorten_memrefs currently + only optimizes lw and sw. +/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-* } } } */ +/* { dg-final { scan-assembler "load1r:\n\taddi" } } */ +/* The ld insns in load2r are not rewritten because shorten_memrefs currently + only optimizes lw and sw. +/* { dg-final { scan-assembler "load2r:\n\taddi" { xfail riscv*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c new file mode 100644 index 00000000000..3403c7044df --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c @@ -0,0 +1,39 @@ +/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */ + +/* These loads cannot be compressed because only one compressed reg is + available (since args are passed in a0-a4, that leaves a5-a7 available, of + which only a5 is a compressed reg). Therefore the shorten_memrefs pass should + not attempt to rewrite these loads into a compressible format. It may not + be possible to avoid this because shorten_memrefs happens before reg alloc. +*/ + +extern int sub1 (int, int, int, int, int, int, int); + +int +load1a (int a0, int a1, int a2, int a3, int a4, int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return sub1 (a0, a1, a2, a3, a4, 0, a); +} + +extern long long sub2 (long long, long long, long long, long long, long long, + long long, long long); + +long long +load2a (long long a0, long long a1, long long a2, long long a3, long long a4, + long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return sub2 (a0, a1, a2, a3, a4, 0, a); +} + +/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */ +/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c new file mode 100644 index 00000000000..a5833fd356d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c @@ -0,0 +1,46 @@ +/* { dg-options "-Os -march=rv32imc -mabi=ilp32 -mno-shorten-memrefs" } */ + +/* Check that these load/stores do not get rewritten into a compressible format + when shorten_memrefs is disabled. */ + +void +store1a (int *array, int a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +void +store2a (long long *array, long long a) +{ + array[200] = a; + array[201] = a; + array[202] = a; + array[203] = a; +} + +int +load1r (int *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +long long +load2r (long long *array) +{ + int a = 0; + a += array[200]; + a += array[201]; + a += array[202]; + a += array[203]; + return a; +} + +/* { dg-final { scan-assembler-not "addi" } } */
On Mon, Apr 27, 2020 at 10:08 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
> Thanks for the review. I have updated the following patch with those changes.
This looks good, and the tree is open for development work again, so I
committed both parts 1 and 2 and pushed it.
One weird thing is that while the patch program accepted the patch
fine, git am would not, and kept giving me an error that didn't make
any sense, pointing at a line that didn't have any visible problem.
So I had to commit it by hand and then use git commit --amend to fix
the authorship before pushing it. The end result looks OK to me
though.
Jim
On 12/05/2020 23:33, Jim Wilson wrote: > On Mon, Apr 27, 2020 at 10:08 AM Craig Blackmore > <craig.blackmore@embecosm.com> wrote: >> Thanks for the review. I have updated the following patch with those changes. > This looks good, and the tree is open for development work again, so I > committed both parts 1 and 2 and pushed it. > > One weird thing is that while the patch program accepted the patch > fine, git am would not, and kept giving me an error that didn't make > any sense, pointing at a line that didn't have any visible problem. > So I had to commit it by hand and then use git commit --amend to fix > the authorship before pushing it. The end result looks OK to me > though. > > Jim Hi Jim, Many thanks for your help in reviewing this patch. I am not sure what happened with git am, thanks for working around it, the end result looks good to me. Craig
diff --git a/gcc/config.gcc b/gcc/config.gcc index bdc2253f8ef..e617215314b 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -523,7 +523,7 @@ pru-*-*) ;; riscv*) cpu_type=riscv - extra_objs="riscv-builtins.o riscv-c.o" + extra_objs="riscv-builtins.o riscv-c.o riscv-shorten-memrefs.o" d_target_objs="riscv-d.o" ;; rs6000*-*-*) diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def new file mode 100644 index 00000000000..8a4ea0918db --- /dev/null +++ b/gcc/config/riscv/riscv-passes.def @@ -0,0 +1,20 @@ +/* Declaration of target-specific passes for RISC-V. + Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs); diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 5092294803c..78008c28b75 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -89,4 +89,6 @@ extern void riscv_init_builtins (void); /* Routines implemented in riscv-common.c. */ extern std::string riscv_arch_str (); +rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt); + #endif /* ! GCC_RISCV_PROTOS_H */ diff --git a/gcc/config/riscv/riscv-shorten-memrefs.c b/gcc/config/riscv/riscv-shorten-memrefs.c new file mode 100644 index 00000000000..aed7ddb792e --- /dev/null +++ b/gcc/config/riscv/riscv-shorten-memrefs.c @@ -0,0 +1,188 @@ +/* Shorten memrefs pass for RISC-V. + Copyright (C) 2018-2019 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "rtl.h" +#include "backend.h" +#include "regs.h" +#include "target.h" +#include "memmodel.h" +#include "emit-rtl.h" +#include "df.h" +#include "predict.h" +#include "tree-pass.h" + +/* Try to make more use of compressed load and store instructions by replacing + a load/store at address BASE + LARGE_OFFSET with a new load/store at address + NEW BASE + SMALL OFFSET. If NEW BASE is stored in a compressed register, the + load/store can be compressed. Since creating NEW BASE incurs an overhead, + the change is only attempted when BASE is referenced by at least four + load/stores in the same basic block. */ + +namespace { + +const pass_data pass_data_shorten_memrefs = +{ + RTL_PASS, /* type */ + "shorten_memrefs", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_NONE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_shorten_memrefs : public rtl_opt_pass +{ +public: + pass_shorten_memrefs (gcc::context *ctxt) + : rtl_opt_pass (pass_data_shorten_memrefs, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return TARGET_RVC && riscv_mshorten_memrefs && optimize > 0; + } + virtual unsigned int execute (function *); + +private: + typedef int_hash <HOST_WIDE_INT, 0> regno_hash; + typedef hash_map <regno_hash, int> regno_map; + + regno_map * analyze (basic_block bb); + void transform (regno_map *m, basic_block bb); + bool get_si_mem_base_reg (rtx mem, rtx *addr); +}; // class pass_shorten_memrefs + +bool +pass_shorten_memrefs::get_si_mem_base_reg (rtx mem, rtx *addr) +{ + if (!MEM_P (mem) || GET_MODE (mem) != SImode) + return false; + *addr = XEXP (mem, 0); + return GET_CODE (*addr) == PLUS && REG_P (XEXP (*addr, 0)); +} + +/* Count how many times each regno is referenced as base address for a memory + access. */ + +pass_shorten_memrefs::regno_map * +pass_shorten_memrefs::analyze (basic_block bb) +{ + regno_map *m = hash_map<regno_hash, int>::create_ggc (10); + rtx_insn *insn; + + regstat_init_n_sets_and_refs (); + + FOR_BB_INSNS (bb, insn) + { + if (!NONJUMP_INSN_P (insn)) + continue; + rtx pat = PATTERN (insn); + if (GET_CODE (pat) != SET) + continue; + /* Analyze stores first then loads. */ + for (int i = 0; i < 2; i++) + { + rtx mem = XEXP (pat, i); + rtx addr; + if (get_si_mem_base_reg (mem, &addr)) + { + HOST_WIDE_INT regno = REGNO (XEXP (addr, 0)); + if (REG_N_REFS (regno) < 4) + continue; + m->get_or_insert (regno)++; + } + } + } + regstat_free_n_sets_and_refs (); + + return m; +} + +/* Convert BASE + LARGE_OFFSET to NEW_BASE + SMALL_OFFSET for each load/store + with a base reg referenced at least 4 times. */ + +void +pass_shorten_memrefs::transform (regno_map *m, basic_block bb) +{ + rtx_insn *insn; + FOR_BB_INSNS (bb, insn) + { + if (!NONJUMP_INSN_P (insn)) + continue; + rtx pat = PATTERN (insn); + if (GET_CODE (pat) != SET) + continue; + start_sequence (); + /* Transform stores first then loads. */ + for (int i = 0; i < 2; i++) + { + rtx mem = XEXP (pat, i); + rtx addr; + if (get_si_mem_base_reg (mem, &addr)) + { + HOST_WIDE_INT regno = REGNO (XEXP (addr, 0)); + if (m->get_or_insert (regno) > 3) + { + addr + = targetm.legitimize_address (addr, addr, GET_MODE (mem)); + XEXP (pat, i) = replace_equiv_address (mem, addr); + df_insn_rescan (insn); + } + } + } + rtx_insn *seq = get_insns (); + end_sequence (); + emit_insn_before (seq, insn); + } +} + +unsigned int +pass_shorten_memrefs::execute (function *fn) +{ + basic_block bb; + + FOR_ALL_BB_FN (bb, fn) + { + regno_map *m; + if (optimize_bb_for_speed_p (bb)) + continue; + m = analyze (bb); + transform (m, bb); + } + + return 0; +} + +} // anon namespace + +rtl_opt_pass * +make_pass_shorten_memrefs (gcc::context *ctxt) +{ + return new pass_shorten_memrefs (ctxt); +} diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 77a3ad94aa8..532eaa92632 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "builtins.h" #include "predict.h" +#include "tree-pass.h" /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF. */ #define UNSPEC_ADDRESS_P(X) \ @@ -848,6 +849,52 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p) return riscv_classify_address (&addr, x, mode, strict_p); } +/* Return true if hard reg REGNO can be used in compressed instructions. */ + +static bool +riscv_compressed_reg_p (int regno) +{ + /* x8-x15/f8-f15 are compressible registers. */ + return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15) + || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15))); +} + +/* Return true if x is an unsigned 5-bit immediate scaled by 4. */ + +static bool +riscv_compressed_lw_offset_p (rtx x) +{ + return CONST_INT_P (x) + && (INTVAL (x) & 3) == 0 + && IN_RANGE (INTVAL (x), 0, RISCV_MAX_COMPRESSED_LW_OFFSET); +} + +/* Return true if load/store from/to address x can be compressed. */ + +static bool +riscv_compressed_lw_address_p (rtx x) +{ + struct riscv_address_info addr; + bool result = riscv_classify_address (&addr, x, GET_MODE (x), + reload_completed); + + /* Before reload, assuming all load/stores of valid addresses get compressed + gives better code size than checking if the address is reg + small_offset + early on. */ + if (result && !reload_completed) + return true; + + /* Return false if address is not compressed_reg + small_offset. */ + if (!result + || addr.type != ADDRESS_REG + || (!riscv_compressed_reg_p (REGNO (addr.reg)) + && addr.reg != stack_pointer_rtx) + || !riscv_compressed_lw_offset_p (addr.offset)) + return false; + + return result; +} + /* Return the number of instructions needed to load or store a value of mode MODE at address X. Return 0 if X isn't valid for MODE. Assume that multiword moves may need to be split into word moves @@ -1305,6 +1352,24 @@ riscv_force_address (rtx x, machine_mode mode) return x; } +/* Modify base + offset so that offset fits within a compressed load/store insn + and the excess is added to base. */ + +static rtx +riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset) +{ + rtx addr, high; + /* Leave OFFSET as an unsigned 5-bit offset scaled by 4 and put the excess + into HIGH. */ + high = GEN_INT (offset & ~RISCV_MAX_COMPRESSED_LW_OFFSET); + offset &= RISCV_MAX_COMPRESSED_LW_OFFSET; + if (!SMALL_OPERAND (INTVAL (high))) + high = force_reg (Pmode, high); + base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base)); + addr = plus_constant (Pmode, base, offset); + return addr; +} + /* This function is used to implement LEGITIMIZE_ADDRESS. If X can be legitimized in a way that the generic machinery might not expect, return a new address, otherwise return NULL. MODE is the mode of @@ -1323,7 +1388,7 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED, if (riscv_split_symbol (NULL, x, mode, &addr, FALSE)) return riscv_force_address (addr, mode); - /* Handle BASE + OFFSET using riscv_add_offset. */ + /* Handle BASE + OFFSET. */ if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) != 0) { @@ -1332,7 +1397,14 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED, if (!riscv_valid_base_register_p (base, mode, false)) base = copy_to_mode_reg (Pmode, base); - addr = riscv_add_offset (NULL, base, offset); + if (optimize_function_for_size_p (cfun) + && (strcmp (current_pass->name, "shorten_memrefs") == 0) + && mode == SImode) + /* Convert BASE + LARGE_OFFSET into NEW_BASE + SMALL_OFFSET to allow + possible compressed load/store. */ + addr = riscv_shorten_lw_offset (base, offset); + else + addr = riscv_add_offset (NULL, base, offset); return riscv_force_address (addr, mode); } @@ -1822,6 +1894,9 @@ riscv_address_cost (rtx addr, machine_mode mode, addr_space_t as ATTRIBUTE_UNUSED, bool speed ATTRIBUTE_UNUSED) { + if (!speed && riscv_mshorten_memrefs && mode == SImode + && !riscv_compressed_lw_address_p (addr)) + return riscv_address_insns (addr, mode, false) + 1; return riscv_address_insns (addr, mode, false); } @@ -4647,6 +4722,7 @@ riscv_option_override (void) error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32" " [%<-mriscv-attribute%>]"); #endif + } /* Implement TARGET_CONDITIONAL_REGISTER_USAGE. */ @@ -4686,9 +4762,9 @@ riscv_conditional_register_usage (void) static int riscv_register_priority (int regno) { - /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection. */ - if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15) - || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15))) + /* Favor compressed registers to improve the odds of RVC instruction + selection. */ + if (riscv_compressed_reg_p (regno)) return 1; return 0; diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 246494663f6..36b84c7e8ef 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -876,6 +876,11 @@ while (0) #define SET_RATIO(speed) (CLEAR_RATIO (speed) - ((speed) ? 0 : 2)) +/* This is the maximum value that can be represented in a compressed load/store + offset (an unsigned 5-bit value scaled by 4). */ + +#define RISCV_MAX_COMPRESSED_LW_OFFSET 124 + #ifndef USED_FOR_TARGET extern const enum reg_class riscv_regno_to_class[]; extern bool riscv_slow_unaligned_access_p; diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index 7f0c35e9e9c..138bddd369f 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -87,6 +87,12 @@ msave-restore Target Report Mask(SAVE_RESTORE) Use smaller but slower prologue and epilogue code. +mshorten-memrefs +Target Bool Var(riscv_mshorten_memrefs) Init(1) +Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more +memory accesses to be generated as compressed instructions. Currently targets +32-bit integer load/stores. + mcmodel= Target Report RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL) Specify the code model. diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv index ece3a75d512..1e31a49c9c5 100644 --- a/gcc/config/riscv/t-riscv +++ b/gcc/config/riscv/t-riscv @@ -14,3 +14,8 @@ riscv-d.o: $(srcdir)/config/riscv/riscv-d.c $(COMPILE) $< $(POSTCOMPILE) +riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.c + $(COMPILE) $< + $(POSTCOMPILE) + +PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 1407d019d14..6de3e032447 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1075,6 +1075,7 @@ See RS/6000 and PowerPC Options. -mpreferred-stack-boundary=@var{num} @gol -msmall-data-limit=@var{N-bytes} @gol -msave-restore -mno-save-restore @gol +-mshorten-memrefs -mno-shorten-memrefs @gol -mstrict-align -mno-strict-align @gol -mcmodel=medlow -mcmodel=medany @gol -mexplicit-relocs -mno-explicit-relocs @gol @@ -24210,6 +24211,15 @@ Do or don't use smaller but slower prologue and epilogue code that uses library function calls. The default is to use fast inline prologues and epilogues. +@item -mshorten-memrefs +@itemx -mno-shorten-memrefs +@opindex mshorten-memrefs +Do or do not attempt to make more use of compressed load/store instructions by +replacing a load/store of 'base register + large offset' with a new load/store +of 'new base + small offset'. If the new base gets stored in a compressed +register, then the new load/store can be compressed. Currently targets 32-bit +integer load/stores only. + @item -mstrict-align @itemx -mno-strict-align @opindex mstrict-align