[v2,1/2] RISC-V: Add shorten_memrefs pass

Message ID	1572025151-22783-2-git-send-email-craig.blackmore@embecosm.com
State	New
Headers	show Return-Path: <gcc-patches-return-511789-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; q=dns; s= default; b=HNPWE4LbloTa12EU3OYHPoq/cWjv15/4DsuqUzeaKYjxFWVo6etEz FqCBgbqk+2lgdm14YD9ds00PBTlIRSRiNtEe6+sQphpnHm5i3wOpW053zIyGYRqx nnAPH5FORxo/CaLb0Q2quYnFYQUBkB/g0hBcYGOgyXdWjTqzlKBGSE= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Craig Blackmore <craig.blackmore@embecosm.com> To: gcc-patches@gcc.gnu.org Cc: jimw@sifive.com, Ofer.Shinaar@wdc.com, Nidal.Faour@wdc.com, kito.cheng@gmail.com, law@redhat.com, Craig Blackmore <craig.blackmore@embecosm.com> Subject: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass Date: Fri, 25 Oct 2019 18:39:10 +0100 Message-Id: <1572025151-22783-2-git-send-email-craig.blackmore@embecosm.com> In-Reply-To: <1572025151-22783-1-git-send-email-craig.blackmore@embecosm.com> References: <CA+yXCZDC2g8SNOeD-dWOLwDeWRd51514_dk++g7yVkS-dbG0-A@mail.gmail.com> <1572025151-22783-1-git-send-email-craig.blackmore@embecosm.com>
Series	RISC-V: Allow more load/stores to be compressed \| expand [v2,0/2] RISC-V: Allow more load/stores to be compressed [v2,1/2] RISC-V: Add shorten_memrefs pass [v2,2/2] sched-deps.c: Avoid replacing address if it increases address cost

Craig Blackmore Oct. 25, 2019, 5:39 p.m. UTC

This patch aims to allow more load/store instructions to be compressed by
replacing a load/store of 'base register + large offset' with a new load/store
of 'new base + small offset'. If the new base gets stored in a compressed
register, then the new load/store can be compressed. Since there is an overhead
in creating the new base, this change is only attempted when 'base register' is
referenced in at least 4 load/stores in a basic block.

The optimization is implemented in a new RISC-V specific pass called
shorten_memrefs which is enabled for RVC targets. It has been developed for the
32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.

Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
rv64imafdc via QEMU. No regressions.

gcc/ChangeLog:

	* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv.
	* config/riscv/riscv-passes.def: New file.
	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
	* config/riscv/riscv-shorten-memrefs.c: New file.
	* config/riscv/riscv.c (tree-pass.h): New include.
	(riscv_compressed_reg_p): New Function
	(riscv_compressed_lw_offset_p): Likewise.
	(riscv_compressed_lw_address_p): Likewise.
	(riscv_shorten_lw_offset): Likewise.
	(riscv_legitimize_address): Attempt to convert base + large_offset
        to compressible new_base + small_offset.
	(riscv_address_cost): Make anticipated compressed load/stores
        cheaper for code size than uncompressed load/stores.
	(riscv_register_priority): Move compressed register check to
        riscv_compressed_reg_p.
	* config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define.
	* config/riscv/riscv.opt (mshorten-memefs): New option.
	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
	(PASSES_EXTRA): Add riscv-passes.def.
	* doc/invoke.texi: Document -mshorten-memrefs.
---
 gcc/config.gcc                           |   2 +-
 gcc/config/riscv/riscv-passes.def        |  20 +++
 gcc/config/riscv/riscv-protos.h          |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c | 188 +++++++++++++++++++++++
 gcc/config/riscv/riscv.c                 |  86 ++++++++++-
 gcc/config/riscv/riscv.h                 |   5 +
 gcc/config/riscv/riscv.opt               |   6 +
 gcc/config/riscv/t-riscv                 |   5 +
 gcc/doc/invoke.texi                      |  10 ++
 9 files changed, 318 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c

Jeff Law Oct. 26, 2019, 6:21 p.m. UTC | #1

On 10/25/19 11:39 AM, Craig Blackmore wrote:
> This patch aims to allow more load/store instructions to be compressed by
> replacing a load/store of 'base register + large offset' with a new load/store
> of 'new base + small offset'. If the new base gets stored in a compressed
> register, then the new load/store can be compressed. Since there is an overhead
> in creating the new base, this change is only attempted when 'base register' is
> referenced in at least 4 load/stores in a basic block.
> 
> The optimization is implemented in a new RISC-V specific pass called
> shorten_memrefs which is enabled for RVC targets. It has been developed for the
> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
> 
> Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
> rv64imafdc via QEMU. No regressions.
> 
> gcc/ChangeLog:
> 
> 	* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv.
> 	* config/riscv/riscv-passes.def: New file.
> 	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
> 	* config/riscv/riscv-shorten-memrefs.c: New file.
> 	* config/riscv/riscv.c (tree-pass.h): New include.
> 	(riscv_compressed_reg_p): New Function
> 	(riscv_compressed_lw_offset_p): Likewise.
> 	(riscv_compressed_lw_address_p): Likewise.
> 	(riscv_shorten_lw_offset): Likewise.
> 	(riscv_legitimize_address): Attempt to convert base + large_offset
>         to compressible new_base + small_offset.
> 	(riscv_address_cost): Make anticipated compressed load/stores
>         cheaper for code size than uncompressed load/stores.
> 	(riscv_register_priority): Move compressed register check to
>         riscv_compressed_reg_p.
> 	* config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define.
> 	* config/riscv/riscv.opt (mshorten-memefs): New option.
> 	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> 	(PASSES_EXTRA): Add riscv-passes.def.
> 	* doc/invoke.texi: Document -mshorten-memrefs.
This has traditionally been done via the the legitimize_address hook.
Is there some reason that hook is insufficient for this case?

The hook, IIRC, is called out explow.c.

Jeff

Oleg Endo Oct. 26, 2019, 7:10 p.m. UTC | #2

On Sat, 2019-10-26 at 12:21 -0600, Jeff Law wrote:
> On 10/25/19 11:39 AM, Craig Blackmore wrote:
> > This patch aims to allow more load/store instructions to be
> > compressed by
> > replacing a load/store of 'base register + large offset' with a new
> > load/store
> > of 'new base + small offset'. If the new base gets stored in a
> > compressed
> > register, then the new load/store can be compressed. Since there is
> > an overhead
> > in creating the new base, this change is only attempted when 'base
> > register' is
> > referenced in at least 4 load/stores in a basic block.
> > 
> > The optimization is implemented in a new RISC-V specific pass
> > called
> > shorten_memrefs which is enabled for RVC targets. It has been
> > developed for the
> > 32-bit lw/sw instructions but could also be extended to 64-bit
> > ld/sd in future.
> > 
> > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc,
> > rv64imac,
> > rv64imafdc via QEMU. No regressions.
> > 
> > gcc/ChangeLog:
> > 
> > 	* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for
> > riscv.
> > 	* config/riscv/riscv-passes.def: New file.
> > 	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs):
> > Declare.
> > 	* config/riscv/riscv-shorten-memrefs.c: New file.
> > 	* config/riscv/riscv.c (tree-pass.h): New include.
> > 	(riscv_compressed_reg_p): New Function
> > 	(riscv_compressed_lw_offset_p): Likewise.
> > 	(riscv_compressed_lw_address_p): Likewise.
> > 	(riscv_shorten_lw_offset): Likewise.
> > 	(riscv_legitimize_address): Attempt to convert base +
> > large_offset
> >         to compressible new_base + small_offset.
> > 	(riscv_address_cost): Make anticipated compressed load/stores
> >         cheaper for code size than uncompressed load/stores.
> > 	(riscv_register_priority): Move compressed register check to
> >         riscv_compressed_reg_p.
> > 	* config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET):
> > Define.
> > 	* config/riscv/riscv.opt (mshorten-memefs): New option.
> > 	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> > 	(PASSES_EXTRA): Add riscv-passes.def.
> > 	* doc/invoke.texi: Document -mshorten-memrefs.
> 
> This has traditionally been done via the the legitimize_address hook.
> Is there some reason that hook is insufficient for this case?
> 
> The hook, IIRC, is called out explow.c.
> 

This sounds like some of my addressing mode selection (AMS) attempts on
SH.  Haven't looked at the patch (sorry), but I'm sure the problem is
pretty much the same.

On SH legitimize_address is used to do ... "something" ... to the
address in order to make the displacement fit.  The issue is,
legitimize_address doesn't get any context so it can't even try to find
a local optimal base address or something like that.

Cheers,
Oleg

Andrew Waterman Oct. 26, 2019, 7:33 p.m. UTC | #3

I don't know enough to say whether the legitimize_address hook is
sufficient for the intended purpose, but I am sure that RISC-V's
concerns are not unique: other GCC targets have to cope with
offset-size constraints that are coupled to register-allocation
constraints.


On Sat, Oct 26, 2019 at 11:21 AM Jeff Law <law@redhat.com> wrote:
>
> On 10/25/19 11:39 AM, Craig Blackmore wrote:
> > This patch aims to allow more load/store instructions to be compressed by
> > replacing a load/store of 'base register + large offset' with a new load/store
> > of 'new base + small offset'. If the new base gets stored in a compressed
> > register, then the new load/store can be compressed. Since there is an overhead
> > in creating the new base, this change is only attempted when 'base register' is
> > referenced in at least 4 load/stores in a basic block.
> >
> > The optimization is implemented in a new RISC-V specific pass called
> > shorten_memrefs which is enabled for RVC targets. It has been developed for the
> > 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
> >
> > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
> > rv64imafdc via QEMU. No regressions.
> >
> > gcc/ChangeLog:
> >
> >       * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv.
> >       * config/riscv/riscv-passes.def: New file.
> >       * config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
> >       * config/riscv/riscv-shorten-memrefs.c: New file.
> >       * config/riscv/riscv.c (tree-pass.h): New include.
> >       (riscv_compressed_reg_p): New Function
> >       (riscv_compressed_lw_offset_p): Likewise.
> >       (riscv_compressed_lw_address_p): Likewise.
> >       (riscv_shorten_lw_offset): Likewise.
> >       (riscv_legitimize_address): Attempt to convert base + large_offset
> >         to compressible new_base + small_offset.
> >       (riscv_address_cost): Make anticipated compressed load/stores
> >         cheaper for code size than uncompressed load/stores.
> >       (riscv_register_priority): Move compressed register check to
> >         riscv_compressed_reg_p.
> >       * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define.
> >       * config/riscv/riscv.opt (mshorten-memefs): New option.
> >       * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> >       (PASSES_EXTRA): Add riscv-passes.def.
> >       * doc/invoke.texi: Document -mshorten-memrefs.
> This has traditionally been done via the the legitimize_address hook.
> Is there some reason that hook is insufficient for this case?
>
> The hook, IIRC, is called out explow.c.
>
> Jeff
>

Jeff Law Oct. 26, 2019, 8:02 p.m. UTC | #4

On 10/26/19 1:10 PM, Oleg Endo wrote:
> On Sat, 2019-10-26 at 12:21 -0600, Jeff Law wrote:
>> On 10/25/19 11:39 AM, Craig Blackmore wrote:
>>> This patch aims to allow more load/store instructions to be
>>> compressed by
>>> replacing a load/store of 'base register + large offset' with a new
>>> load/store
>>> of 'new base + small offset'. If the new base gets stored in a
>>> compressed
>>> register, then the new load/store can be compressed. Since there is
>>> an overhead
>>> in creating the new base, this change is only attempted when 'base
>>> register' is
>>> referenced in at least 4 load/stores in a basic block.
>>>
>>> The optimization is implemented in a new RISC-V specific pass
>>> called
>>> shorten_memrefs which is enabled for RVC targets. It has been
>>> developed for the
>>> 32-bit lw/sw instructions but could also be extended to 64-bit
>>> ld/sd in future.
>>>
>>> Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc,
>>> rv64imac,
>>> rv64imafdc via QEMU. No regressions.
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for
>>> riscv.
>>> 	* config/riscv/riscv-passes.def: New file.
>>> 	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs):
>>> Declare.
>>> 	* config/riscv/riscv-shorten-memrefs.c: New file.
>>> 	* config/riscv/riscv.c (tree-pass.h): New include.
>>> 	(riscv_compressed_reg_p): New Function
>>> 	(riscv_compressed_lw_offset_p): Likewise.
>>> 	(riscv_compressed_lw_address_p): Likewise.
>>> 	(riscv_shorten_lw_offset): Likewise.
>>> 	(riscv_legitimize_address): Attempt to convert base +
>>> large_offset
>>>         to compressible new_base + small_offset.
>>> 	(riscv_address_cost): Make anticipated compressed load/stores
>>>         cheaper for code size than uncompressed load/stores.
>>> 	(riscv_register_priority): Move compressed register check to
>>>         riscv_compressed_reg_p.
>>> 	* config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET):
>>> Define.
>>> 	* config/riscv/riscv.opt (mshorten-memefs): New option.
>>> 	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
>>> 	(PASSES_EXTRA): Add riscv-passes.def.
>>> 	* doc/invoke.texi: Document -mshorten-memrefs.
>>
>> This has traditionally been done via the the legitimize_address hook.
>> Is there some reason that hook is insufficient for this case?
>>
>> The hook, IIRC, is called out explow.c.
>>
> 
> This sounds like some of my addressing mode selection (AMS) attempts on
> SH.  Haven't looked at the patch (sorry), but I'm sure the problem is
> pretty much the same.
> 
> On SH legitimize_address is used to do ... "something" ... to the
> address in order to make the displacement fit.  The issue is,
> legitimize_address doesn't get any context so it can't even try to find
> a local optimal base address or something like that.
True it's not given any context.  I think most ports try to form the
address in such a way that the base+offset' is likely to be a common
subexpression.

THere was a similar hook for reload which I didn't mention as I wasn't
sure it was used i the LRA world.

SH is likely  than most more complex.  Joern tried really hard to get
good code on the SH and as a result the implementation is sometimes
tough to follow.

Jeff

Jeff Law Oct. 26, 2019, 8:04 p.m. UTC | #5

On 10/26/19 1:33 PM, Andrew Waterman wrote:
> I don't know enough to say whether the legitimize_address hook is
> sufficient for the intended purpose, but I am sure that RISC-V's
> concerns are not unique: other GCC targets have to cope with
> offset-size constraints that are coupled to register-allocation
> constraints.
Yup.  I think every risc port in the 90s faces this problem.  I always
wished for a generic mechanism for ports to handle this problem.

Regardless, it's probably worth investigating.

jeff

Oleg Endo Oct. 27, 2019, 5:58 a.m. UTC | #6

On Sat, 2019-10-26 at 14:04 -0600, Jeff Law wrote:
> On 10/26/19 1:33 PM, Andrew Waterman wrote:
> > I don't know enough to say whether the legitimize_address hook is
> > sufficient for the intended purpose, but I am sure that RISC-V's
> > concerns are not unique: other GCC targets have to cope with
> > offset-size constraints that are coupled to register-allocation
> > constraints.
> 
> Yup.  I think every risc port in the 90s faces this problem.  I
> always
> wished for a generic mechanism for ports to handle this problem.
> 
> Regardless, it's probably worth investigating.
> 

What we tried to do with the address mode selection (AMS) optimization
some time ago was the following:

  - Extract memory accesses from the insns stream and put them in 
    "access sequences".  Also analyze the address expression and try   
    to find effective base addresses by tracing back address 
    calculations.

  - For each memory access, get a set of address mode alternatives and 
    the corresponding costs from the backend.  The full context of each
    access is provided, so the backend can detect things like 
    "in this access sequence, FP loads dominate" and use this 
    information to tune the alternative costs.

  - Given the alternatives and costs for each memory access, the pass 
    would then try to minimize the costs of the whole memory access    
    sequence, taking costs of address modification isnns into account. 

I think this is quite generic, but of course also complex.  The
optimization problem itself is hard.  There was some research done by
others using  CPLEX or PBQP solvers.  To keep things simple we used a 
backtracking algorithm and handled only a limited set of scenarios. 
For example, I think we could not get loop constructs work nicely to
improve post-inc address mode utilization.

The SH ISA has very similar properties to ARM thumb and RVC, and
perhaps others.  Advantages would not be limited to RISC only, even
CISC ISAs like M68K, RX, ... can benefit from it, as the "proper
optimization" can reduce the instruction sizes by shortening the
addresses in the instruction stream.

If anyone is interested, here is the AMS optimization pass class:

https://github.com/erikvarga/gcc/blob/master/gcc/ams.h
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc

It's using a different style to callback into the backend code.  Not
GCC's "hooks" but a delegate pattern.  SH backend's delegate
implementation is here

https://github.com/erikvarga/gcc/blob/master/gcc/config/sh/sh.c#L11897

We were getting some improvements in the generated code, but it started
putting pressure on register allocation related issues in the SH
backend (the R0 problem), so we could not do more proper testing.

Maybe somebody can get some ideas out of it.

Cheers,
Oleg

Jim Wilson Oct. 30, 2019, 11:57 p.m. UTC | #7

On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
> This patch aims to allow more load/store instructions to be compressed by
> replacing a load/store of 'base register + large offset' with a new load/store
> of 'new base + small offset'. If the new base gets stored in a compressed
> register, then the new load/store can be compressed. Since there is an overhead
> in creating the new base, this change is only attempted when 'base register' is
> referenced in at least 4 load/stores in a basic block.
>
> The optimization is implemented in a new RISC-V specific pass called
> shorten_memrefs which is enabled for RVC targets. It has been developed for the
> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.

The fact that this needs 4 load/stores in a block with the same base
address means it won't trigger very often.  But it does seem to work.
I see about a 0.05% code size reduction for a rv32gc newlib nano
libc.a.  There might be other codes that benefit more.

I'm concerned about the number of RISC-V specific optimization passes
people are writing.  I've seen at least 3 so far.  This one is at
least small and simple enough to understand that it doesn't look like
it will cause maintenance problems.

The config.gcc change conflicts with the save-restore optimization
pass that Andrew Burgess added, but that is a trivial fix.

The code only works for 32-bit load/stores.  rv64 has compressed
64-bit load/stores, and the F and D extensions have float and double
compressed loads and stores.  The patch would be more useful if it
handled all of these.

The patch doesn't check the other operand, it only looks at the memory
operand.  This results in some cases where the code rewrites
instructions that can never be compressed.  For instance, given
void
store1z (int *array)
{
  array[200] = 0;
  array[201] = 0;
  array[202] = 0;
  array[203] = 0;
}
the patch increases code size by 4 bytes because it rewrites the
stores, but we don't have compressed store 0, and x0 is not a
compressed reg, so there is no point in rewriting the stores.  I can
also create examples that show a size increase with loads but it is a
little trickier.
extern int sub1 (int, int, int, int, int, int, int);
int
load1a (int a0, int a1, int a2, int a3, int a4, int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return sub1 (a0, a1, a2, a3, a4, 0, a);
}
The register allocator will pass a0-a4 directly through from args to
the call, leaving only a5-a7 for the load base address and dest, and
only one of those regs is a compressed reg, but we need two, so these
loads can't be compressed.  The code still gets rewritten, resulting
in a size increase for the extra add.  Not sure if you can do anything
about that though, since you are doing this before register
allocation.

It isn't clear that the change riscv_address_cost is for.  That should
be commented.

I'd suggest parens in the riscv_compressed_lw_offset_p return
statement, that makes it easier to align the code correctly (in emacs
at least).

The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the "ISA
constants" section of riscv.h, and maybe renamed similar to the other
constants.

There is a REG_P check in get_si_mem_reg.  That should probably handle
SUBREGs too.

A testcase to verify the patch would be nice.  I have one I wrote for
testing that shows some tests that work, some tests that don't work,
and some that work but maybe shouldn't.

I vaguely remember Micheal Meissner talking about doing something
similar for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know
if he tried, or how far he got if he did try.  It might be useful to
ask him about that work.

Otherwise this looks OK to me.

Jim

Jim Wilson Oct. 31, 2019, midnight UTC | #8

FYI the testcase I'm using to test the patch.  Some of the functions
get smaller, some of them get bigger, and some don't change in size
but should when compiled for an rv64 target.

Jim

Nidal Faour Oct. 31, 2019, 9:41 a.m. UTC | #9

A tests case for this patch has been written and pushed to WD github repository at the following link:
https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset

the test case itself has been written based on a real product scenario. 
We got a saving of about 10% in code size of the test itself.

Best Regards,

Nidal Faour
Staff Engineer, R&D Engineering – Firmware & Toolchain, CTO Group

Western Digital®
Migdal Tefen 24959, P.O Box 3
Email: nidal.faour@wdc.com
Office: +972-4-9078756
Mobile: +972-50-8867756

>-----Original Message-----
>From: Jim Wilson <jimw@sifive.com>
>Sent: Thursday, 31 October 2019 1:57
>To: Craig Blackmore <craig.blackmore@embecosm.com>
>Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Ofer Shinaar
><Ofer.Shinaar@wdc.com>; Nidal Faour <Nidal.Faour@wdc.com>; Kito Cheng
><kito.cheng@gmail.com>; Jeff Law <law@redhat.com>
>Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
>
>CAUTION: This email originated from outside of Western Digital. Do not click
>on links or open attachments unless you recognize the sender and know that
>the content is safe.
>
>
>On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
><craig.blackmore@embecosm.com> wrote:
>> This patch aims to allow more load/store instructions to be compressed
>> by replacing a load/store of 'base register + large offset' with a new
>> load/store of 'new base + small offset'. If the new base gets stored
>> in a compressed register, then the new load/store can be compressed.
>> Since there is an overhead in creating the new base, this change is
>> only attempted when 'base register' is referenced in at least 4 load/stores in
>a basic block.
>>
>> The optimization is implemented in a new RISC-V specific pass called
>> shorten_memrefs which is enabled for RVC targets. It has been
>> developed for the 32-bit lw/sw instructions but could also be extended to
>64-bit ld/sd in future.
>
>The fact that this needs 4 load/stores in a block with the same base address
>means it won't trigger very often.  But it does seem to work.
>I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a.  There
>might be other codes that benefit more.
>
>I'm concerned about the number of RISC-V specific optimization passes
>people are writing.  I've seen at least 3 so far.  This one is at least small and
>simple enough to understand that it doesn't look like it will cause maintenance
>problems.
>
>The config.gcc change conflicts with the save-restore optimization pass that
>Andrew Burgess added, but that is a trivial fix.
>
>The code only works for 32-bit load/stores.  rv64 has compressed 64-bit
>load/stores, and the F and D extensions have float and double compressed
>loads and stores.  The patch would be more useful if it handled all of these.
>
>The patch doesn't check the other operand, it only looks at the memory
>operand.  This results in some cases where the code rewrites instructions that
>can never be compressed.  For instance, given void store1z (int *array) {
>  array[200] = 0;
>  array[201] = 0;
>  array[202] = 0;
>  array[203] = 0;
>}
>the patch increases code size by 4 bytes because it rewrites the stores, but we
>don't have compressed store 0, and x0 is not a compressed reg, so there is no
>point in rewriting the stores.  I can also create examples that show a size
>increase with loads but it is a little trickier.
>extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int
>a3, int a4, int *array) {
>  int a = 0;
>  a += array[200];
>  a += array[201];
>  a += array[202];
>  a += array[203];
>  return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4
>directly through from args to the call, leaving only a5-a7 for the load base
>address and dest, and only one of those regs is a compressed reg, but we
>need two, so these loads can't be compressed.  The code still gets rewritten,
>resulting in a size increase for the extra add.  Not sure if you can do anything
>about that though, since you are doing this before register allocation.
>
>It isn't clear that the change riscv_address_cost is for.  That should be
>commented.
>
>I'd suggest parens in the riscv_compressed_lw_offset_p return statement,
>that makes it easier to align the code correctly (in emacs at least).
>
>The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the
>"ISA constants" section of riscv.h, and maybe renamed similar to the other
>constants.
>
>There is a REG_P check in get_si_mem_reg.  That should probably handle
>SUBREGs too.
>
>A testcase to verify the patch would be nice.  I have one I wrote for testing
>that shows some tests that work, some tests that don't work, and some that
>work but maybe shouldn't.
>
>I vaguely remember Micheal Meissner talking about doing something similar
>for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know if he tried, or
>how far he got if he did try.  It might be useful to ask him about that work.
>
>Otherwise this looks OK to me.
>
>Jim

Andrew Waterman Oct. 31, 2019, 10:18 a.m. UTC | #10

Nevertheless, as Jim observed, it's a great burden on the RISC-V
backend maintainers to support all these passes.  Are you saying WD is
willing to put its money where its mouth is and will provide active
support for these passes?

On Thu, Oct 31, 2019 at 2:42 AM Nidal Faour <Nidal.Faour@wdc.com> wrote:
>
> A tests case for this patch has been written and pushed to WD github repository at the following link:
> https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset
>
> the test case itself has been written based on a real product scenario.
> We got a saving of about 10% in code size of the test itself.
>
> Best Regards,
>
> Nidal Faour
> Staff Engineer, R&D Engineering – Firmware & Toolchain, CTO Group
>
> Western Digital®
> Migdal Tefen 24959, P.O Box 3
> Email: nidal.faour@wdc.com
> Office: +972-4-9078756
> Mobile: +972-50-8867756
>
> >-----Original Message-----
> >From: Jim Wilson <jimw@sifive.com>
> >Sent: Thursday, 31 October 2019 1:57
> >To: Craig Blackmore <craig.blackmore@embecosm.com>
> >Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Ofer Shinaar
> ><Ofer.Shinaar@wdc.com>; Nidal Faour <Nidal.Faour@wdc.com>; Kito Cheng
> ><kito.cheng@gmail.com>; Jeff Law <law@redhat.com>
> >Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
> >
> >CAUTION: This email originated from outside of Western Digital. Do not click
> >on links or open attachments unless you recognize the sender and know that
> >the content is safe.
> >
> >
> >On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
> ><craig.blackmore@embecosm.com> wrote:
> >> This patch aims to allow more load/store instructions to be compressed
> >> by replacing a load/store of 'base register + large offset' with a new
> >> load/store of 'new base + small offset'. If the new base gets stored
> >> in a compressed register, then the new load/store can be compressed.
> >> Since there is an overhead in creating the new base, this change is
> >> only attempted when 'base register' is referenced in at least 4 load/stores in
> >a basic block.
> >>
> >> The optimization is implemented in a new RISC-V specific pass called
> >> shorten_memrefs which is enabled for RVC targets. It has been
> >> developed for the 32-bit lw/sw instructions but could also be extended to
> >64-bit ld/sd in future.
> >
> >The fact that this needs 4 load/stores in a block with the same base address
> >means it won't trigger very often.  But it does seem to work.
> >I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a.  There
> >might be other codes that benefit more.
> >
> >I'm concerned about the number of RISC-V specific optimization passes
> >people are writing.  I've seen at least 3 so far.  This one is at least small and
> >simple enough to understand that it doesn't look like it will cause maintenance
> >problems.
> >
> >The config.gcc change conflicts with the save-restore optimization pass that
> >Andrew Burgess added, but that is a trivial fix.
> >
> >The code only works for 32-bit load/stores.  rv64 has compressed 64-bit
> >load/stores, and the F and D extensions have float and double compressed
> >loads and stores.  The patch would be more useful if it handled all of these.
> >
> >The patch doesn't check the other operand, it only looks at the memory
> >operand.  This results in some cases where the code rewrites instructions that
> >can never be compressed.  For instance, given void store1z (int *array) {
> >  array[200] = 0;
> >  array[201] = 0;
> >  array[202] = 0;
> >  array[203] = 0;
> >}
> >the patch increases code size by 4 bytes because it rewrites the stores, but we
> >don't have compressed store 0, and x0 is not a compressed reg, so there is no
> >point in rewriting the stores.  I can also create examples that show a size
> >increase with loads but it is a little trickier.
> >extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int
> >a3, int a4, int *array) {
> >  int a = 0;
> >  a += array[200];
> >  a += array[201];
> >  a += array[202];
> >  a += array[203];
> >  return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4
> >directly through from args to the call, leaving only a5-a7 for the load base
> >address and dest, and only one of those regs is a compressed reg, but we
> >need two, so these loads can't be compressed.  The code still gets rewritten,
> >resulting in a size increase for the extra add.  Not sure if you can do anything
> >about that though, since you are doing this before register allocation.
> >
> >It isn't clear that the change riscv_address_cost is for.  That should be
> >commented.
> >
> >I'd suggest parens in the riscv_compressed_lw_offset_p return statement,
> >that makes it easier to align the code correctly (in emacs at least).
> >
> >The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the
> >"ISA constants" section of riscv.h, and maybe renamed similar to the other
> >constants.
> >
> >There is a REG_P check in get_si_mem_reg.  That should probably handle
> >SUBREGs too.
> >
> >A testcase to verify the patch would be nice.  I have one I wrote for testing
> >that shows some tests that work, some tests that don't work, and some that
> >work but maybe shouldn't.
> >
> >I vaguely remember Micheal Meissner talking about doing something similar
> >for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know if he tried, or
> >how far he got if he did try.  It might be useful to ask him about that work.
> >
> >Otherwise this looks OK to me.
> >
> >Jim

Ofer Shinaar Oct. 31, 2019, 3:56 p.m. UTC | #11

The test case present a real embedded case, yes it is not functional 
benchmark, it is a test case, but on our days, most of the MCU's for IoT 
devices acts as controllers and less as computing/algorithm units. This 
days, functional algorithms are done on HW and managed by SW with 
snippet as the giving test-case.
This means that SW that accessing the memory on the SoC, for managing 
some HW feature, is commonly used. Therefor the benefit in this pass 
will serve everyone.

We (WD) will obviously support it as we are part of this joint effort to 
drive better density. As a next phase, we can improve this, taking Jim 
comments among others.
Also, as Jim said this, this is a simple case and should not cause 
maintenance problems.

On 31/10/2019 12:18, Andrew Waterman wrote:
> Nevertheless, as Jim observed, it's a great burden on the RISC-V
> backend maintainers to support all these passes.  Are you saying WD is
> willing to put its money where its mouth is and will provide active
> support for these passes?
>
> On Thu, Oct 31, 2019 at 2:42 AM Nidal Faour <Nidal.Faour@wdc.com> wrote:
>> A tests case for this patch has been written and pushed to WD github repository at the following link:
>> https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset
>>
>> the test case itself has been written based on a real product scenario.
>> We got a saving of about 10% in code size of the test itself.
>>
>> Best Regards,
>>
>> Nidal Faour
>> Staff Engineer, R&D Engineering – Firmware & Toolchain, CTO Group
>>
>> Western Digital®
>> Migdal Tefen 24959, P.O Box 3
>> Email: nidal.faour@wdc.com
>> Office: +972-4-9078756
>> Mobile: +972-50-8867756
>>
>>> -----Original Message-----
>>> From: Jim Wilson <jimw@sifive.com>
>>> Sent: Thursday, 31 October 2019 1:57
>>> To: Craig Blackmore <craig.blackmore@embecosm.com>
>>> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Ofer Shinaar
>>> <Ofer.Shinaar@wdc.com>; Nidal Faour <Nidal.Faour@wdc.com>; Kito Cheng
>>> <kito.cheng@gmail.com>; Jeff Law <law@redhat.com>
>>> Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
>>>
>>> CAUTION: This email originated from outside of Western Digital. Do not click
>>> on links or open attachments unless you recognize the sender and know that
>>> the content is safe.
>>>
>>>
>>> On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
>>> <craig.blackmore@embecosm.com> wrote:
>>>> This patch aims to allow more load/store instructions to be compressed
>>>> by replacing a load/store of 'base register + large offset' with a new
>>>> load/store of 'new base + small offset'. If the new base gets stored
>>>> in a compressed register, then the new load/store can be compressed.
>>>> Since there is an overhead in creating the new base, this change is
>>>> only attempted when 'base register' is referenced in at least 4 load/stores in
>>> a basic block.
>>>> The optimization is implemented in a new RISC-V specific pass called
>>>> shorten_memrefs which is enabled for RVC targets. It has been
>>>> developed for the 32-bit lw/sw instructions but could also be extended to
>>> 64-bit ld/sd in future.
>>>
>>> The fact that this needs 4 load/stores in a block with the same base address
>>> means it won't trigger very often.  But it does seem to work.
>>> I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a.  There
>>> might be other codes that benefit more.
>>>
>>> I'm concerned about the number of RISC-V specific optimization passes
>>> people are writing.  I've seen at least 3 so far.  This one is at least small and
>>> simple enough to understand that it doesn't look like it will cause maintenance
>>> problems.
>>>
>>> The config.gcc change conflicts with the save-restore optimization pass that
>>> Andrew Burgess added, but that is a trivial fix.
>>>
>>> The code only works for 32-bit load/stores.  rv64 has compressed 64-bit
>>> load/stores, and the F and D extensions have float and double compressed
>>> loads and stores.  The patch would be more useful if it handled all of these.
>>>
>>> The patch doesn't check the other operand, it only looks at the memory
>>> operand.  This results in some cases where the code rewrites instructions that
>>> can never be compressed.  For instance, given void store1z (int *array) {
>>>   array[200] = 0;
>>>   array[201] = 0;
>>>   array[202] = 0;
>>>   array[203] = 0;
>>> }
>>> the patch increases code size by 4 bytes because it rewrites the stores, but we
>>> don't have compressed store 0, and x0 is not a compressed reg, so there is no
>>> point in rewriting the stores.  I can also create examples that show a size
>>> increase with loads but it is a little trickier.
>>> extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int
>>> a3, int a4, int *array) {
>>>   int a = 0;
>>>   a += array[200];
>>>   a += array[201];
>>>   a += array[202];
>>>   a += array[203];
>>>   return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4
>>> directly through from args to the call, leaving only a5-a7 for the load base
>>> address and dest, and only one of those regs is a compressed reg, but we
>>> need two, so these loads can't be compressed.  The code still gets rewritten,
>>> resulting in a size increase for the extra add.  Not sure if you can do anything
>>> about that though, since you are doing this before register allocation.
>>>
>>> It isn't clear that the change riscv_address_cost is for.  That should be
>>> commented.
>>>
>>> I'd suggest parens in the riscv_compressed_lw_offset_p return statement,
>>> that makes it easier to align the code correctly (in emacs at least).
>>>
>>> The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the
>>> "ISA constants" section of riscv.h, and maybe renamed similar to the other
>>> constants.
>>>
>>> There is a REG_P check in get_si_mem_reg.  That should probably handle
>>> SUBREGs too.
>>>
>>> A testcase to verify the patch would be nice.  I have one I wrote for testing
>>> that shows some tests that work, some tests that don't work, and some that
>>> work but maybe shouldn't.
>>>
>>> I vaguely remember Micheal Meissner talking about doing something similar
>>> for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know if he tried, or
>>> how far he got if he did try.  It might be useful to ask him about that work.
>>>
>>> Otherwise this looks OK to me.
>>>
>>> Jim

Craig Blackmore Dec. 10, 2019, 6:28 p.m. UTC | #12

Hi Jim,

Thank you for your review. I have posted an updated patch below which I think
addresses your comments.

On 30/10/2019 23:57, Jim Wilson wrote:
> On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>> This patch aims to allow more load/store instructions to be compressed by
>> replacing a load/store of 'base register + large offset' with a new load/store
>> of 'new base + small offset'. If the new base gets stored in a compressed
>> register, then the new load/store can be compressed. Since there is an overhead
>> in creating the new base, this change is only attempted when 'base register' is
>> referenced in at least 4 load/stores in a basic block.
>>
>> The optimization is implemented in a new RISC-V specific pass called
>> shorten_memrefs which is enabled for RVC targets. It has been developed for the
>> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
>
> The fact that this needs 4 load/stores in a block with the same base
> address means it won't trigger very often.  But it does seem to work.
> I see about a 0.05% code size reduction for a rv32gc newlib nano
> libc.a.  There might be other codes that benefit more.
>
> I'm concerned about the number of RISC-V specific optimization passes
> people are writing.  I've seen at least 3 so far.  This one is at
> least small and simple enough to understand that it doesn't look like
> it will cause maintenance problems.
>
> The config.gcc change conflicts with the save-restore optimization
> pass that Andrew Burgess added, but that is a trivial fix.
>

I have rebased onto latest trunk and fixed this.

> The code only works for 32-bit load/stores.  rv64 has compressed
> 64-bit load/stores, and the F and D extensions have float and double
> compressed loads and stores.  The patch would be more useful if it
> handled all of these.
>

We plan to add support for these in the future.

> The patch doesn't check the other operand, it only looks at the memory
> operand.  This results in some cases where the code rewrites
> instructions that can never be compressed.  For instance, given
> void
> store1z (int *array)
> {
>   array[200] = 0;
>   array[201] = 0;
>   array[202] = 0;
>   array[203] = 0;
> }
> the patch increases code size by 4 bytes because it rewrites the
> stores, but we don't have compressed store 0, and x0 is not a
> compressed reg, so there is no point in rewriting the stores.

I have now excluded store zero from being counted and rewritten.

>  I can
> also create examples that show a size increase with loads but it is a
> little trickier.
> extern int sub1 (int, int, int, int, int, int, int);
> int
> load1a (int a0, int a1, int a2, int a3, int a4, int *array)
> {
>   int a = 0;
>   a += array[200];
>   a += array[201];
>   a += array[202];
>   a += array[203];
>   return sub1 (a0, a1, a2, a3, a4, 0, a);
> }
> The register allocator will pass a0-a4 directly through from args to
> the call, leaving only a5-a7 for the load base address and dest, and
> only one of those regs is a compressed reg, but we need two, so these
> loads can't be compressed.  The code still gets rewritten, resulting
> in a size increase for the extra add.  Not sure if you can do anything
> about that though, since you are doing this before register
> allocation.

I haven't got a way of avoiding this at present, so I have added this
case as an xfail.

>
> It isn't clear that the change riscv_address_cost is for.  That should
> be commented.
>

This is now commented.

> I'd suggest parens in the riscv_compressed_lw_offset_p return
> statement, that makes it easier to align the code correctly (in emacs
> at least).
>

Parens added.

> The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the "ISA
> constants" section of riscv.h, and maybe renamed similar to the other
> constants.
>

Done.

> There is a REG_P check in get_si_mem_reg.  That should probably handle
> SUBREGs too.
>

I couldn't produce a case where the address is 'subreg + offset', so I have
not added handling for SUBREGs for now.

> A testcase to verify the patch would be nice.  I have one I wrote for
> testing that shows some tests that work, some tests that don't work,
> and some that work but maybe shouldn't.
>

Thank you for the test file. I have added some testcases to the patch
based on this. 

> I vaguely remember Micheal Meissner talking about doing something
> similar for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know
> if he tried, or how far he got if he did try.  It might be useful to
> ask him about that work.
>
> Otherwise this looks OK to me.
>
> Jim

Thanks,
Craig

---

gcc/ChangeLog:

	* config.gcc:  Add riscv-shorten-memrefs.o to extra_objs for riscv.
	* config/riscv/riscv-passes.def: New file.
	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
	* config/riscv/riscv-shorten-memrefs.c: New file.
	* config/riscv/riscv.c (tree-pass.h): New include.
	(riscv_compressed_reg_p): New Function
	(riscv_compressed_lw_offset_p): Likewise.
	(riscv_compressed_lw_address_p): Likewise.
	(riscv_shorten_lw_offset): Likewise.
	(riscv_legitimize_address): Attempt to convert base + large_offset
	to compressible new_base + small_offset.
	(riscv_address_cost): Make anticipated compressed load/stores
	cheaper for code size than uncompressed load/stores.
	(riscv_register_priority): Move compressed register check to
	riscv_compressed_reg_p.
	(riscv_new_address_profitable_p): New function.
	(TARGET_NEW_ADDRESS_PROFITABLE_P): Define.
	* config/riscv/riscv.h (C_S_BITS): Define.
	(CSW_MAX_OFFSET): Define.
	* config/riscv/riscv.opt (mshorten-memefs): New option.
	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
	(PASSES_EXTRA): Add riscv-passes.def.
	* doc/invoke.texi: Document -mshorten-memrefs.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/shorten-memrefs-1.c: New test.
	* gcc.target/riscv/shorten-memrefs-2.c: New test.
	* gcc.target/riscv/shorten-memrefs-3.c: New test.
	* gcc.target/riscv/shorten-memrefs-4.c: New test.
	* gcc.target/riscv/shorten-memrefs-5.c: New test.
	* gcc.target/riscv/shorten-memrefs-6.c: New test.
	* gcc.target/riscv/shorten-memrefs-7.c: New test.
---
 gcc/config.gcc                                |   2 +-
 gcc/config/riscv/riscv-passes.def             |  20 ++
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c      | 200 ++++++++++++++++++
 gcc/config/riscv/riscv.c                      |  88 +++++++-
 gcc/config/riscv/riscv.h                      |   5 +
 gcc/config/riscv/riscv.opt                    |   6 +
 gcc/config/riscv/t-riscv                      |   5 +
 gcc/doc/invoke.texi                           |  10 +
 .../gcc.target/riscv/shorten-memrefs-1.c      |  26 +++
 .../gcc.target/riscv/shorten-memrefs-2.c      |  51 +++++
 .../gcc.target/riscv/shorten-memrefs-3.c      |  39 ++++
 .../gcc.target/riscv/shorten-memrefs-4.c      |  26 +++
 .../gcc.target/riscv/shorten-memrefs-5.c      |  53 +++++
 .../gcc.target/riscv/shorten-memrefs-6.c      |  39 ++++
 .../gcc.target/riscv/shorten-memrefs-7.c      |  46 ++++
 16 files changed, 612 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5aa0130135f..29840752544 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -523,7 +523,7 @@ pru-*-*)
 	;;
 riscv*)
 	cpu_type=riscv
-	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o"
+	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o"
 	d_target_objs="riscv-d.o"
 	;;
 rs6000*-*-*)
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
new file mode 100644
index 00000000000..8a4ea0918db
--- /dev/null
+++ b/gcc/config/riscv/riscv-passes.def
@@ -0,0 +1,20 @@
+/* Declaration of target-specific passes for RISC-V.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5092294803c..78008c28b75 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -89,4 +89,6 @@ extern void riscv_init_builtins (void);
 /* Routines implemented in riscv-common.c.  */
 extern std::string riscv_arch_str ();
 
+rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-shorten-memrefs.c b/gcc/config/riscv/riscv-shorten-memrefs.c
new file mode 100644
index 00000000000..3686005fe2e
--- /dev/null
+++ b/gcc/config/riscv/riscv-shorten-memrefs.c
@@ -0,0 +1,200 @@
+/* Shorten memrefs pass for RISC-V.
+   Copyright (C) 2018-2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "backend.h"
+#include "regs.h"
+#include "target.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "df.h"
+#include "predict.h"
+#include "tree-pass.h"
+
+/* Try to make more use of compressed load and store instructions by replacing
+   a load/store at address BASE + LARGE_OFFSET with a new load/store at address
+   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed register, the
+   load/store can be compressed.  Since creating NEW BASE incurs an overhead,
+   the change is only attempted when BASE is referenced by at least four
+   load/stores in the same basic block.  */
+
+namespace {
+
+const pass_data pass_data_shorten_memrefs =
+{
+  RTL_PASS, /* type */
+  "shorten_memrefs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_shorten_memrefs : public rtl_opt_pass
+{
+public:
+  pass_shorten_memrefs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return TARGET_RVC && riscv_mshorten_memrefs && optimize > 0;
+    }
+  virtual unsigned int execute (function *);
+
+private:
+  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
+  typedef hash_map <regno_hash, int> regno_map;
+
+  regno_map * analyze (basic_block bb);
+  void transform (regno_map *m, basic_block bb);
+  bool get_si_mem_base_reg (rtx mem, rtx *addr);
+}; // class pass_shorten_memrefs
+
+bool
+pass_shorten_memrefs::get_si_mem_base_reg (rtx mem, rtx *addr)
+{
+  if (!MEM_P (mem) || GET_MODE (mem) != SImode)
+    return false;
+  *addr = XEXP (mem, 0);
+  return GET_CODE (*addr) == PLUS && REG_P (XEXP (*addr, 0));
+}
+
+/* Count how many times each regno is referenced as base address for a memory
+   access.  */
+
+pass_shorten_memrefs::regno_map *
+pass_shorten_memrefs::analyze (basic_block bb)
+{
+  regno_map *m = hash_map<regno_hash, int>::create_ggc (10);
+  rtx_insn *insn;
+
+  regstat_init_n_sets_and_refs ();
+
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      /* Analyze stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      /* Do not count store zero as these cannot be compressed.  */
+	      if (i == 0)
+		{
+		  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1))))
+		    continue;
+		}
+	      if (REG_N_REFS (regno) < 4)
+		continue;
+	      m->get_or_insert (regno)++;
+	    }
+	  }
+    }
+  regstat_free_n_sets_and_refs ();
+
+  return m;
+}
+
+/* Convert BASE + LARGE_OFFSET to NEW_BASE + SMALL_OFFSET for each load/store
+   with a base reg referenced at least 4 times.  */
+
+void
+pass_shorten_memrefs::transform (regno_map *m, basic_block bb)
+{
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      start_sequence ();
+      /* Transform stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      /* Do not transform store zero as these cannot be compressed.  */
+	      if (i == 0)
+		{
+		  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1))))
+		    continue;
+		}
+	      if (m->get_or_insert (regno) > 3)
+		{
+		  addr
+		    = targetm.legitimize_address (addr, addr, GET_MODE (mem));
+		  XEXP (pat, i) = replace_equiv_address (mem, addr);
+		  df_insn_rescan (insn);
+		}
+	    }
+	}
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      emit_insn_before (seq, insn);
+    }
+}
+
+unsigned int
+pass_shorten_memrefs::execute (function *fn)
+{
+  basic_block bb;
+
+  FOR_ALL_BB_FN (bb, fn)
+  {
+    regno_map *m;
+    if (optimize_bb_for_speed_p (bb))
+      continue;
+    m = analyze (bb);
+    transform (m, bb);
+  }
+
+  return 0;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_shorten_memrefs (gcc::context *ctxt)
+{
+  return new pass_shorten_memrefs (ctxt);
+}
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 9aa4d266e6b..1d6d2f89f7d 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "builtins.h"
 #include "predict.h"
+#include "tree-pass.h"
 
 /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
 #define UNSPEC_ADDRESS_P(X)					\
@@ -848,6 +849,52 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p)
   return riscv_classify_address (&addr, x, mode, strict_p);
 }
 
+/* Return true if hard reg REGNO can be used in compressed instructions.  */
+
+static bool
+riscv_compressed_reg_p (int regno)
+{
+  /* x8-x15/f8-f15 are compressible registers.  */
+  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
+	  || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
+}
+
+/* Return true if x is an unsigned 5-bit immediate scaled by 4.  */
+
+static bool
+riscv_compressed_lw_offset_p (rtx x)
+{
+  return (CONST_INT_P (x)
+	  && (INTVAL (x) & 3) == 0
+	  && IN_RANGE (INTVAL (x), 0, CSW_MAX_OFFSET));
+}
+
+/* Return true if load/store from/to address x can be compressed.  */
+
+static bool
+riscv_compressed_lw_address_p (rtx x)
+{
+  struct riscv_address_info addr;
+  bool result = riscv_classify_address (&addr, x, GET_MODE (x),
+					reload_completed);
+
+  /* Before reload, assuming all load/stores of valid addresses get compressed
+     gives better code size than checking if the address is reg + small_offset
+     early on.  */
+  if (result && !reload_completed)
+    return true;
+
+  /* Return false if address is not compressed_reg + small_offset.  */
+  if (!result
+      || addr.type != ADDRESS_REG
+      || (!riscv_compressed_reg_p (REGNO (addr.reg))
+	    && addr.reg != stack_pointer_rtx)
+      || !riscv_compressed_lw_offset_p (addr.offset))
+    return false;
+
+  return result;
+}
+
 /* Return the number of instructions needed to load or store a value
    of mode MODE at address X.  Return 0 if X isn't valid for MODE.
    Assume that multiword moves may need to be split into word moves
@@ -1305,6 +1352,24 @@ riscv_force_address (rtx x, machine_mode mode)
   return x;
 }
 
+/* Modify base + offset so that offset fits within a compressed load/store insn
+   and the excess is added to base.  */
+
+static rtx
+riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset)
+{
+  rtx addr, high;
+  /* Leave OFFSET as an unsigned 5-bit offset scaled by 4 and put the excess
+     into HIGH.  */
+  high = GEN_INT (offset & ~CSW_MAX_OFFSET);
+  offset &= CSW_MAX_OFFSET;
+  if (!SMALL_OPERAND (INTVAL (high)))
+    high = force_reg (Pmode, high);
+  base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
+  addr = plus_constant (Pmode, base, offset);
+  return addr;
+}
+
 /* This function is used to implement LEGITIMIZE_ADDRESS.  If X can
    be legitimized in a way that the generic machinery might not expect,
    return a new address, otherwise return NULL.  MODE is the mode of
@@ -1323,7 +1388,7 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
   if (riscv_split_symbol (NULL, x, mode, &addr, FALSE))
     return riscv_force_address (addr, mode);
 
-  /* Handle BASE + OFFSET using riscv_add_offset.  */
+  /* Handle BASE + OFFSET.  */
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))
       && INTVAL (XEXP (x, 1)) != 0)
     {
@@ -1332,7 +1397,14 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
 
       if (!riscv_valid_base_register_p (base, mode, false))
 	base = copy_to_mode_reg (Pmode, base);
-      addr = riscv_add_offset (NULL, base, offset);
+      if (optimize_function_for_size_p (cfun)
+	  && (strcmp (current_pass->name, "shorten_memrefs") == 0)
+	  && mode == SImode)
+	/* Convert BASE + LARGE_OFFSET into NEW_BASE + SMALL_OFFSET to allow
+	   possible compressed load/store.  */
+	addr = riscv_shorten_lw_offset (base, offset);
+      else
+	addr = riscv_add_offset (NULL, base, offset);
       return riscv_force_address (addr, mode);
     }
 
@@ -1822,6 +1894,11 @@ riscv_address_cost (rtx addr, machine_mode mode,
 		    addr_space_t as ATTRIBUTE_UNUSED,
 		    bool speed ATTRIBUTE_UNUSED)
 {
+  /* When optimizing for size, make uncompressible 32-bit addresses more
+   * expensive so that compressible 32-bit addresses are preferred.  */
+  if (!speed && riscv_mshorten_memrefs && mode == SImode
+      && !riscv_compressed_lw_address_p (addr))
+    return riscv_address_insns (addr, mode, false) + 1;
   return riscv_address_insns (addr, mode, false);
 }
 
@@ -4647,6 +4724,7 @@ riscv_option_override (void)
     error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32"
 	   " [%<-mriscv-attribute%>]");
 #endif
+
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -4686,9 +4764,9 @@ riscv_conditional_register_usage (void)
 static int
 riscv_register_priority (int regno)
 {
-  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.  */
-  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
-		     || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
+  /* Favor compressed registers to improve the odds of RVC instruction
+     selection.  */
+  if (riscv_compressed_reg_p (regno))
     return 1;
 
   return 0;
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index b1e3403f8a7..a0d5354eae9 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -913,6 +913,7 @@ extern unsigned riscv_stack_boundary;
 #define SHIFT_RS1 15
 #define SHIFT_IMM 20
 #define IMM_BITS 12
+#define C_S_BITS 5
 #define C_SxSP_BITS 6
 
 #define IMM_REACH (1LL << IMM_BITS)
@@ -922,6 +923,10 @@ extern unsigned riscv_stack_boundary;
 #define SWSP_REACH (4LL << C_SxSP_BITS)
 #define SDSP_REACH (8LL << C_SxSP_BITS)
 
+/* This is the maximum value that can be represented in a compressed load/store
+   offset (an unsigned 5-bit value scaled by 4).  */
+#define CSW_MAX_OFFSET ((4LL << C_S_BITS) - 1) & ~3
+
 /* Called from RISCV_REORG, this is defined in riscv-sr.c.  */
 
 extern void riscv_remove_unneeded_save_restore_calls (void);
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 7f0c35e9e9c..138bddd369f 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -87,6 +87,12 @@ msave-restore
 Target Report Mask(SAVE_RESTORE)
 Use smaller but slower prologue and epilogue code.
 
+mshorten-memrefs
+Target Bool Var(riscv_mshorten_memrefs) Init(1)
+Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
+memory accesses to be generated as compressed instructions.  Currently targets
+32-bit integer load/stores.
+
 mcmodel=
 Target Report RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL)
 Specify the code model.
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 5ecb3c160a6..4820fb35d31 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -19,3 +19,8 @@ riscv-d.o: $(srcdir)/config/riscv/riscv-d.c
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
+PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index af3c7f2b910..329083fb9b9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1079,6 +1079,7 @@ See RS/6000 and PowerPC Options.
 -mpreferred-stack-boundary=@var{num} @gol
 -msmall-data-limit=@var{N-bytes} @gol
 -msave-restore  -mno-save-restore @gol
+-mshorten-memrefs  -mno-shorten-memrefs @gol
 -mstrict-align  -mno-strict-align @gol
 -mcmodel=medlow  -mcmodel=medany @gol
 -mexplicit-relocs  -mno-explicit-relocs @gol
@@ -24334,6 +24335,15 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@item -mshorten-memrefs
+@itemx -mno-shorten-memrefs
+@opindex mshorten-memrefs
+Do or do not attempt to make more use of compressed load/store instructions by
+replacing a load/store of 'base register + large offset' with a new load/store
+of 'new base + small offset'.  If the new base gets stored in a compressed
+register, then the new load/store can be compressed.  Currently targets 32-bit
+integer load/stores only.
+
 @item -mstrict-align
 @itemx -mno-strict-align
 @opindex mstrict-align
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
new file mode 100644
index 00000000000..958942a6f7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
@@ -0,0 +1,26 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* These stores cannot be compressed because x0 is not a compressed reg.
+   Therefore the shorten_memrefs pass should not attempt to rewrite them into a
+   compressible format.  */
+
+void
+store1z (int *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+void
+store2z (long long *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */
+/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
new file mode 100644
index 00000000000..2c2f41548c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
@@ -0,0 +1,51 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* shorten_memrefs should rewrite these load/stores into a compressible
+   format.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler "store1a:\n\taddi" } } */
+/* The sd insns in store2a are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-*  } } } */
+/* { dg-final { scan-assembler "load1r:\n\taddi" } } */
+/* { dg-final { scan-assembler "load2r:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
new file mode 100644
index 00000000000..2001fe871ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
@@ -0,0 +1,39 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* These loads cannot be compressed because only one compressed reg is
+   available (since args are passed in a0-a4, that leaves a5-a7 available, of
+   which only a5 is a compressed reg). Therefore the shorten_memrefs pass should
+   not attempt to rewrite these loads into a compressible format. It may not
+   be possible to avoid this because shorten_memrefs happens before reg alloc.
+*/
+
+extern int sub1 (int, int, int, int, int, int, int);
+
+int
+load1a (int a0, int a1, int a2, int a3, int a4, int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub1 (a0, a1, a2, a3, a4, 0, a);
+}
+
+extern long long sub2 (long long, long long, long long, long long, long long,
+           long long, long long);
+
+long long
+load2a (long long a0, long long a1, long long a2, long long a3, long long a4,
+  long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub2 (a0, a1, a2, a3, a4, 0, a);
+}
+
+/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" { xfail riscv*-*-*  } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
new file mode 100644
index 00000000000..cd4784913e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
@@ -0,0 +1,26 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* These stores cannot be compressed because x0 is not a compressed reg.
+   Therefore the shorten_memrefs pass should not attempt to rewrite them into a
+   compressible format.  */
+
+void
+store1z (int *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+void
+store2z (long long *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */
+/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
new file mode 100644
index 00000000000..80b3897e4da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
@@ -0,0 +1,53 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* shorten_memrefs should rewrite these load/stores into a compressible
+   format.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler "store1a:\n\taddi" } } */
+/* The sd insns in store2a are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler "load1r:\n\taddi" } } */
+/* The ld insns in load2r are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "load2r:\n\taddi" { xfail riscv*-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
new file mode 100644
index 00000000000..3403c7044df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
@@ -0,0 +1,39 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* These loads cannot be compressed because only one compressed reg is
+   available (since args are passed in a0-a4, that leaves a5-a7 available, of
+   which only a5 is a compressed reg). Therefore the shorten_memrefs pass should
+   not attempt to rewrite these loads into a compressible format. It may not
+   be possible to avoid this because shorten_memrefs happens before reg alloc.
+*/
+
+extern int sub1 (int, int, int, int, int, int, int);
+
+int
+load1a (int a0, int a1, int a2, int a3, int a4, int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub1 (a0, a1, a2, a3, a4, 0, a);
+}
+
+extern long long sub2 (long long, long long, long long, long long, long long,
+           long long, long long);
+
+long long
+load2a (long long a0, long long a1, long long a2, long long a3, long long a4,
+  long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub2 (a0, a1, a2, a3, a4, 0, a);
+}
+
+/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c
new file mode 100644
index 00000000000..a5833fd356d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c
@@ -0,0 +1,46 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32 -mno-shorten-memrefs" } */
+
+/* Check that these load/stores do not get rewritten into a compressible format
+   when shorten_memrefs is disabled.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler-not "addi" } } */

Craig Blackmore Feb. 19, 2020, 11:40 a.m. UTC | #13

On 10/12/2019 18:28, Craig Blackmore wrote:
>
> Hi Jim,
>
> Thank you for your review. I have posted an updated patch below which I think
> addresses your comments.
>
Ping

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00712.html

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00713.html

Craig

Jim Wilson April 8, 2020, 4:04 p.m. UTC | #14

On Wed, Feb 19, 2020 at 3:40 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
> On 10/12/2019 18:28, Craig Blackmore wrote:
> Thank you for your review. I have posted an updated patch below which I think
> addresses your comments.
>
> Ping
>
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00712.html

This looks OK.  There are some minor issues.

        (riscv_new_address_profitable_p): New function.
        (TARGET_NEW_ADDRESS_PROFITABLE_P): Define.

These are actually in part2, and part2 already has changelog entries
for them, so these can just be dropped.

+  /* When optimizing for size, make uncompressible 32-bit addresses more
+   * expensive so that compressible 32-bit addresses are preferred.  */
+  if (!speed && riscv_mshorten_memrefs && mode == SImode
+      && !riscv_compressed_lw_address_p (addr))
+    return riscv_address_insns (addr, mode, false) + 1;

I think that there should be a TARGET_RVC check here, just like in the gate
function for the new pass.  But I also suspect that this probably
doesn't matter much.

> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00713.html

This also looks OK to me, but adding a new target hook is something
that should really wait for stage1 which should only be a few weeks
away now.  That will also give us more time to test the patch before
it appears in a gcc release.  So this is OK once stage1 opens.

Jim

Craig Blackmore April 27, 2020, 5:08 p.m. UTC | #15

On 08/04/2020 17:04, Jim Wilson wrote:

> On Wed, Feb 19, 2020 at 3:40 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>> On 10/12/2019 18:28, Craig Blackmore wrote:
>> Thank you for your review. I have posted an updated patch below which I think
>> addresses your comments.
>>
>> Ping
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00712.html
> This looks OK.  There are some minor issues.
>
>         (riscv_new_address_profitable_p): New function.
>         (TARGET_NEW_ADDRESS_PROFITABLE_P): Define.
>
> These are actually in part2, and part2 already has changelog entries
> for them, so these can just be dropped.
>
> +  /* When optimizing for size, make uncompressible 32-bit addresses more
> +   * expensive so that compressible 32-bit addresses are preferred.  */
> +  if (!speed && riscv_mshorten_memrefs && mode == SImode
> +      && !riscv_compressed_lw_address_p (addr))
> +    return riscv_address_insns (addr, mode, false) + 1;
>
> I think that there should be a TARGET_RVC check here, just like in the gate
> function for the new pass.  But I also suspect that this probably
> doesn't matter much.

Hi Jim,

Thanks for the review. I have updated the following patch with those changes.

Craig

---

gcc/ChangeLog:

	* config.gcc:  Add riscv-shorten-memrefs.o to extra_objs for riscv.
	* config/riscv/riscv-passes.def: New file.
	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
	* config/riscv/riscv-shorten-memrefs.c: New file.
	* config/riscv/riscv.c (tree-pass.h): New include.
	(riscv_compressed_reg_p): New Function
	(riscv_compressed_lw_offset_p): Likewise.
	(riscv_compressed_lw_address_p): Likewise.
	(riscv_shorten_lw_offset): Likewise.
	(riscv_legitimize_address): Attempt to convert base + large_offset
	to compressible new_base + small_offset.
	(riscv_address_cost): Make anticipated compressed load/stores
	cheaper for code size than uncompressed load/stores.
	(riscv_register_priority): Move compressed register check to
	riscv_compressed_reg_p.
	* config/riscv/riscv.h (C_S_BITS): Define.
	(CSW_MAX_OFFSET): Define.
	* config/riscv/riscv.opt (mshorten-memefs): New option.
	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
	(PASSES_EXTRA): Add riscv-passes.def.
	* doc/invoke.texi: Document -mshorten-memrefs.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/shorten-memrefs-1.c: New test.
	* gcc.target/riscv/shorten-memrefs-2.c: New test.
	* gcc.target/riscv/shorten-memrefs-3.c: New test.
	* gcc.target/riscv/shorten-memrefs-4.c: New test.
	* gcc.target/riscv/shorten-memrefs-5.c: New test.
	* gcc.target/riscv/shorten-memrefs-6.c: New test.
	* gcc.target/riscv/shorten-memrefs-7.c: New test.
---
 gcc/config.gcc                                |   2 +-
 gcc/config/riscv/riscv-passes.def             |  20 ++
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c      | 200 ++++++++++++++++++
 gcc/config/riscv/riscv.c                      |  88 +++++++-
 gcc/config/riscv/riscv.h                      |   5 +
 gcc/config/riscv/riscv.opt                    |   6 +
 gcc/config/riscv/t-riscv                      |   5 +
 gcc/doc/invoke.texi                           |  10 +
 .../gcc.target/riscv/shorten-memrefs-1.c      |  26 +++
 .../gcc.target/riscv/shorten-memrefs-2.c      |  51 +++++
 .../gcc.target/riscv/shorten-memrefs-3.c      |  39 ++++
 .../gcc.target/riscv/shorten-memrefs-4.c      |  26 +++
 .../gcc.target/riscv/shorten-memrefs-5.c      |  53 +++++
 .../gcc.target/riscv/shorten-memrefs-6.c      |  39 ++++
 .../gcc.target/riscv/shorten-memrefs-7.c      |  46 ++++
 16 files changed, 612 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index cf1a87e2efd..3c2a0389b98 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -525,7 +525,7 @@ pru-*-*)
 	;;
 riscv*)
 	cpu_type=riscv
-	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o"
+	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o"
 	d_target_objs="riscv-d.o"
 	;;
 rs6000*-*-*)
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
new file mode 100644
index 00000000000..8a4ea0918db
--- /dev/null
+++ b/gcc/config/riscv/riscv-passes.def
@@ -0,0 +1,20 @@
+/* Declaration of target-specific passes for RISC-V.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 8cf9137b5e7..72280ec1c76 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -91,4 +91,6 @@ extern std::string riscv_arch_str ();
 
 extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
 
+rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-shorten-memrefs.c b/gcc/config/riscv/riscv-shorten-memrefs.c
new file mode 100644
index 00000000000..3686005fe2e
--- /dev/null
+++ b/gcc/config/riscv/riscv-shorten-memrefs.c
@@ -0,0 +1,200 @@
+/* Shorten memrefs pass for RISC-V.
+   Copyright (C) 2018-2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "backend.h"
+#include "regs.h"
+#include "target.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "df.h"
+#include "predict.h"
+#include "tree-pass.h"
+
+/* Try to make more use of compressed load and store instructions by replacing
+   a load/store at address BASE + LARGE_OFFSET with a new load/store at address
+   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed register, the
+   load/store can be compressed.  Since creating NEW BASE incurs an overhead,
+   the change is only attempted when BASE is referenced by at least four
+   load/stores in the same basic block.  */
+
+namespace {
+
+const pass_data pass_data_shorten_memrefs =
+{
+  RTL_PASS, /* type */
+  "shorten_memrefs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_shorten_memrefs : public rtl_opt_pass
+{
+public:
+  pass_shorten_memrefs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return TARGET_RVC && riscv_mshorten_memrefs && optimize > 0;
+    }
+  virtual unsigned int execute (function *);
+
+private:
+  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
+  typedef hash_map <regno_hash, int> regno_map;
+
+  regno_map * analyze (basic_block bb);
+  void transform (regno_map *m, basic_block bb);
+  bool get_si_mem_base_reg (rtx mem, rtx *addr);
+}; // class pass_shorten_memrefs
+
+bool
+pass_shorten_memrefs::get_si_mem_base_reg (rtx mem, rtx *addr)
+{
+  if (!MEM_P (mem) || GET_MODE (mem) != SImode)
+    return false;
+  *addr = XEXP (mem, 0);
+  return GET_CODE (*addr) == PLUS && REG_P (XEXP (*addr, 0));
+}
+
+/* Count how many times each regno is referenced as base address for a memory
+   access.  */
+
+pass_shorten_memrefs::regno_map *
+pass_shorten_memrefs::analyze (basic_block bb)
+{
+  regno_map *m = hash_map<regno_hash, int>::create_ggc (10);
+  rtx_insn *insn;
+
+  regstat_init_n_sets_and_refs ();
+
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      /* Analyze stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      /* Do not count store zero as these cannot be compressed.  */
+	      if (i == 0)
+		{
+		  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1))))
+		    continue;
+		}
+	      if (REG_N_REFS (regno) < 4)
+		continue;
+	      m->get_or_insert (regno)++;
+	    }
+	  }
+    }
+  regstat_free_n_sets_and_refs ();
+
+  return m;
+}
+
+/* Convert BASE + LARGE_OFFSET to NEW_BASE + SMALL_OFFSET for each load/store
+   with a base reg referenced at least 4 times.  */
+
+void
+pass_shorten_memrefs::transform (regno_map *m, basic_block bb)
+{
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      start_sequence ();
+      /* Transform stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      /* Do not transform store zero as these cannot be compressed.  */
+	      if (i == 0)
+		{
+		  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1))))
+		    continue;
+		}
+	      if (m->get_or_insert (regno) > 3)
+		{
+		  addr
+		    = targetm.legitimize_address (addr, addr, GET_MODE (mem));
+		  XEXP (pat, i) = replace_equiv_address (mem, addr);
+		  df_insn_rescan (insn);
+		}
+	    }
+	}
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      emit_insn_before (seq, insn);
+    }
+}
+
+unsigned int
+pass_shorten_memrefs::execute (function *fn)
+{
+  basic_block bb;
+
+  FOR_ALL_BB_FN (bb, fn)
+  {
+    regno_map *m;
+    if (optimize_bb_for_speed_p (bb))
+      continue;
+    m = analyze (bb);
+    transform (m, bb);
+  }
+
+  return 0;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_shorten_memrefs (gcc::context *ctxt)
+{
+  return new pass_shorten_memrefs (ctxt);
+}
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 94b5ac01762..0c0879c8aee 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "builtins.h"
 #include "predict.h"
+#include "tree-pass.h"
 
 /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
 #define UNSPEC_ADDRESS_P(X)					\
@@ -848,6 +849,52 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p)
   return riscv_classify_address (&addr, x, mode, strict_p);
 }
 
+/* Return true if hard reg REGNO can be used in compressed instructions.  */
+
+static bool
+riscv_compressed_reg_p (int regno)
+{
+  /* x8-x15/f8-f15 are compressible registers.  */
+  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
+	  || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
+}
+
+/* Return true if x is an unsigned 5-bit immediate scaled by 4.  */
+
+static bool
+riscv_compressed_lw_offset_p (rtx x)
+{
+  return (CONST_INT_P (x)
+	  && (INTVAL (x) & 3) == 0
+	  && IN_RANGE (INTVAL (x), 0, CSW_MAX_OFFSET));
+}
+
+/* Return true if load/store from/to address x can be compressed.  */
+
+static bool
+riscv_compressed_lw_address_p (rtx x)
+{
+  struct riscv_address_info addr;
+  bool result = riscv_classify_address (&addr, x, GET_MODE (x),
+					reload_completed);
+
+  /* Before reload, assuming all load/stores of valid addresses get compressed
+     gives better code size than checking if the address is reg + small_offset
+     early on.  */
+  if (result && !reload_completed)
+    return true;
+
+  /* Return false if address is not compressed_reg + small_offset.  */
+  if (!result
+      || addr.type != ADDRESS_REG
+      || (!riscv_compressed_reg_p (REGNO (addr.reg))
+	    && addr.reg != stack_pointer_rtx)
+      || !riscv_compressed_lw_offset_p (addr.offset))
+    return false;
+
+  return result;
+}
+
 /* Return the number of instructions needed to load or store a value
    of mode MODE at address X.  Return 0 if X isn't valid for MODE.
    Assume that multiword moves may need to be split into word moves
@@ -1308,6 +1355,24 @@ riscv_force_address (rtx x, machine_mode mode)
   return x;
 }
 
+/* Modify base + offset so that offset fits within a compressed load/store insn
+   and the excess is added to base.  */
+
+static rtx
+riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset)
+{
+  rtx addr, high;
+  /* Leave OFFSET as an unsigned 5-bit offset scaled by 4 and put the excess
+     into HIGH.  */
+  high = GEN_INT (offset & ~CSW_MAX_OFFSET);
+  offset &= CSW_MAX_OFFSET;
+  if (!SMALL_OPERAND (INTVAL (high)))
+    high = force_reg (Pmode, high);
+  base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
+  addr = plus_constant (Pmode, base, offset);
+  return addr;
+}
+
 /* This function is used to implement LEGITIMIZE_ADDRESS.  If X can
    be legitimized in a way that the generic machinery might not expect,
    return a new address, otherwise return NULL.  MODE is the mode of
@@ -1326,7 +1391,7 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
   if (riscv_split_symbol (NULL, x, mode, &addr, FALSE))
     return riscv_force_address (addr, mode);
 
-  /* Handle BASE + OFFSET using riscv_add_offset.  */
+  /* Handle BASE + OFFSET.  */
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))
       && INTVAL (XEXP (x, 1)) != 0)
     {
@@ -1335,7 +1400,14 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
 
       if (!riscv_valid_base_register_p (base, mode, false))
 	base = copy_to_mode_reg (Pmode, base);
-      addr = riscv_add_offset (NULL, base, offset);
+      if (optimize_function_for_size_p (cfun)
+	  && (strcmp (current_pass->name, "shorten_memrefs") == 0)
+	  && mode == SImode)
+	/* Convert BASE + LARGE_OFFSET into NEW_BASE + SMALL_OFFSET to allow
+	   possible compressed load/store.  */
+	addr = riscv_shorten_lw_offset (base, offset);
+      else
+	addr = riscv_add_offset (NULL, base, offset);
       return riscv_force_address (addr, mode);
     }
 
@@ -1833,6 +1905,11 @@ riscv_address_cost (rtx addr, machine_mode mode,
 		    addr_space_t as ATTRIBUTE_UNUSED,
 		    bool speed ATTRIBUTE_UNUSED)
 {
+  /* When optimizing for size, make uncompressible 32-bit addresses more
+   * expensive so that compressible 32-bit addresses are preferred.  */
+  if (TARGET_RVC && !speed && riscv_mshorten_memrefs && mode == SImode
+      && !riscv_compressed_lw_address_p (addr))
+    return riscv_address_insns (addr, mode, false) + 1;
   return riscv_address_insns (addr, mode, false);
 }
 
@@ -4666,6 +4743,7 @@ riscv_option_override (void)
     error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32"
 	   " [%<-mriscv-attribute%>]");
 #endif
+
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -4705,9 +4783,9 @@ riscv_conditional_register_usage (void)
 static int
 riscv_register_priority (int regno)
 {
-  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.  */
-  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
-		     || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
+  /* Favor compressed registers to improve the odds of RVC instruction
+     selection.  */
+  if (riscv_compressed_reg_p (regno))
     return 1;
 
   return 0;
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 567c23380fe..e6209ede9d6 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -920,6 +920,7 @@ extern unsigned riscv_stack_boundary;
 #define SHIFT_RS1 15
 #define SHIFT_IMM 20
 #define IMM_BITS 12
+#define C_S_BITS 5
 #define C_SxSP_BITS 6
 
 #define IMM_REACH (1LL << IMM_BITS)
@@ -929,6 +930,10 @@ extern unsigned riscv_stack_boundary;
 #define SWSP_REACH (4LL << C_SxSP_BITS)
 #define SDSP_REACH (8LL << C_SxSP_BITS)
 
+/* This is the maximum value that can be represented in a compressed load/store
+   offset (an unsigned 5-bit value scaled by 4).  */
+#define CSW_MAX_OFFSET ((4LL << C_S_BITS) - 1) & ~3
+
 /* Called from RISCV_REORG, this is defined in riscv-sr.c.  */
 
 extern void riscv_remove_unneeded_save_restore_calls (void);
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 29de246759e..e4bfcb86f51 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -87,6 +87,12 @@ msave-restore
 Target Report Mask(SAVE_RESTORE)
 Use smaller but slower prologue and epilogue code.
 
+mshorten-memrefs
+Target Bool Var(riscv_mshorten_memrefs) Init(1)
+Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
+memory accesses to be generated as compressed instructions.  Currently targets
+32-bit integer load/stores.
+
 mcmodel=
 Target Report RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL)
 Specify the code model.
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 5ecb3c160a6..4820fb35d31 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -19,3 +19,8 @@ riscv-d.o: $(srcdir)/config/riscv/riscv-d.c
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
+PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a37a2ee9c19..ad4c6d94f82 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1129,6 +1129,7 @@ See RS/6000 and PowerPC Options.
 -mpreferred-stack-boundary=@var{num} @gol
 -msmall-data-limit=@var{N-bytes} @gol
 -msave-restore  -mno-save-restore @gol
+-mshorten-memrefs  -mno-shorten-memrefs @gol
 -mstrict-align  -mno-strict-align @gol
 -mcmodel=medlow  -mcmodel=medany @gol
 -mexplicit-relocs  -mno-explicit-relocs @gol
@@ -25321,6 +25322,15 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@item -mshorten-memrefs
+@itemx -mno-shorten-memrefs
+@opindex mshorten-memrefs
+Do or do not attempt to make more use of compressed load/store instructions by
+replacing a load/store of 'base register + large offset' with a new load/store
+of 'new base + small offset'.  If the new base gets stored in a compressed
+register, then the new load/store can be compressed.  Currently targets 32-bit
+integer load/stores only.
+
 @item -mstrict-align
 @itemx -mno-strict-align
 @opindex mstrict-align
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
new file mode 100644
index 00000000000..958942a6f7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
@@ -0,0 +1,26 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* These stores cannot be compressed because x0 is not a compressed reg.
+   Therefore the shorten_memrefs pass should not attempt to rewrite them into a
+   compressible format.  */
+
+void
+store1z (int *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+void
+store2z (long long *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */
+/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
new file mode 100644
index 00000000000..2c2f41548c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
@@ -0,0 +1,51 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* shorten_memrefs should rewrite these load/stores into a compressible
+   format.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler "store1a:\n\taddi" } } */
+/* The sd insns in store2a are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-*  } } } */
+/* { dg-final { scan-assembler "load1r:\n\taddi" } } */
+/* { dg-final { scan-assembler "load2r:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
new file mode 100644
index 00000000000..2001fe871ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
@@ -0,0 +1,39 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* These loads cannot be compressed because only one compressed reg is
+   available (since args are passed in a0-a4, that leaves a5-a7 available, of
+   which only a5 is a compressed reg). Therefore the shorten_memrefs pass should
+   not attempt to rewrite these loads into a compressible format. It may not
+   be possible to avoid this because shorten_memrefs happens before reg alloc.
+*/
+
+extern int sub1 (int, int, int, int, int, int, int);
+
+int
+load1a (int a0, int a1, int a2, int a3, int a4, int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub1 (a0, a1, a2, a3, a4, 0, a);
+}
+
+extern long long sub2 (long long, long long, long long, long long, long long,
+           long long, long long);
+
+long long
+load2a (long long a0, long long a1, long long a2, long long a3, long long a4,
+  long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub2 (a0, a1, a2, a3, a4, 0, a);
+}
+
+/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" { xfail riscv*-*-*  } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
new file mode 100644
index 00000000000..cd4784913e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
@@ -0,0 +1,26 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* These stores cannot be compressed because x0 is not a compressed reg.
+   Therefore the shorten_memrefs pass should not attempt to rewrite them into a
+   compressible format.  */
+
+void
+store1z (int *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+void
+store2z (long long *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */
+/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
new file mode 100644
index 00000000000..80b3897e4da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
@@ -0,0 +1,53 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* shorten_memrefs should rewrite these load/stores into a compressible
+   format.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler "store1a:\n\taddi" } } */
+/* The sd insns in store2a are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler "load1r:\n\taddi" } } */
+/* The ld insns in load2r are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "load2r:\n\taddi" { xfail riscv*-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
new file mode 100644
index 00000000000..3403c7044df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
@@ -0,0 +1,39 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* These loads cannot be compressed because only one compressed reg is
+   available (since args are passed in a0-a4, that leaves a5-a7 available, of
+   which only a5 is a compressed reg). Therefore the shorten_memrefs pass should
+   not attempt to rewrite these loads into a compressible format. It may not
+   be possible to avoid this because shorten_memrefs happens before reg alloc.
+*/
+
+extern int sub1 (int, int, int, int, int, int, int);
+
+int
+load1a (int a0, int a1, int a2, int a3, int a4, int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub1 (a0, a1, a2, a3, a4, 0, a);
+}
+
+extern long long sub2 (long long, long long, long long, long long, long long,
+           long long, long long);
+
+long long
+load2a (long long a0, long long a1, long long a2, long long a3, long long a4,
+  long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub2 (a0, a1, a2, a3, a4, 0, a);
+}
+
+/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c
new file mode 100644
index 00000000000..a5833fd356d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c
@@ -0,0 +1,46 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32 -mno-shorten-memrefs" } */
+
+/* Check that these load/stores do not get rewritten into a compressible format
+   when shorten_memrefs is disabled.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler-not "addi" } } */

Jim Wilson May 12, 2020, 10:33 p.m. UTC | #16

On Mon, Apr 27, 2020 at 10:08 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
> Thanks for the review. I have updated the following patch with those changes.

This looks good, and the tree is open for development work again, so I
committed both parts 1 and 2 and pushed it.

One weird thing is that while the patch program accepted the patch
fine, git am would not, and kept giving me an error that didn't make
any sense, pointing at a line that didn't have any visible problem.
So I had to commit it by hand and then use git commit --amend to fix
the authorship before pushing it.  The end result looks OK to me
though.

Jim

Craig Blackmore May 13, 2020, 5:51 p.m. UTC | #17

On 12/05/2020 23:33, Jim Wilson wrote:
> On Mon, Apr 27, 2020 at 10:08 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>> Thanks for the review. I have updated the following patch with those changes.
> This looks good, and the tree is open for development work again, so I
> committed both parts 1 and 2 and pushed it.
>
> One weird thing is that while the patch program accepted the patch
> fine, git am would not, and kept giving me an error that didn't make
> any sense, pointing at a line that didn't have any visible problem.
> So I had to commit it by hand and then use git commit --amend to fix
> the authorship before pushing it.  The end result looks OK to me
> though.
>
> Jim

Hi Jim,

Many thanks for your help in reviewing this patch.

I am not sure what happened with git am, thanks for working around it,
the end result looks good to me.

Craig

[v2,1/2] RISC-V: Add shorten_memrefs pass

Commit Message

Comments

Patch