mbox series

[v2,0/2] RISC-V: Allow more load/stores to be compressed

Message ID 1572025151-22783-1-git-send-email-craig.blackmore@embecosm.com
Headers show
Series RISC-V: Allow more load/stores to be compressed | expand

Message

Craig Blackmore Oct. 25, 2019, 5:39 p.m. UTC
Hi Kito,

Thank you for taking the time to review my patch. I am posting an updated
patchset taking into account your comments.

On 18/09/2019 11:01, Kito Cheng wrote:
> Hi Craig:
>
> Some general review comment:
> - Split new pass into new file.
> - Add new option to enable/disable this pass.
> - Could you extend this patch to support lw/sw/ld/sd/flw/fsw/fld/fsd?
>   I think there is lots of common logic for supporting other types
> compressed load/store
>   instruction, but I'd like to see those support at once.

I agree the patch could be extended to other load/store instructions but
unfortunately I don't have the time to do this at the moment. Can the lw/sw
support be merged and the others added later?

> - Do you have experimental data about doing that after register
> allocation/reload,

I don't think it is feasible to move the pass after reload, because the pass
requires a new register to be allocated for the new base.

>   I'd prefer doing such optimization after RA, because we can
> accurately estimate
>   how many byte we can gain, I guess it because RA didn't assign fit
> src/dest reg
>   or base reg so that increase code size?
>

Before reload, we do not know whether the base reg will be a compressed register
or not.


>
> On Fri, Sep 13, 2019 at 12:20 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>>
>> This patch aims to allow more load/store instructions to be compressed by
>> replacing a load/store of 'base register + large offset' with a new load/store
>> of 'new base + small offset'. If the new base gets stored in a compressed
>> register, then the new load/store can be compressed. Since there is an overhead
>> in creating the new base, this change is only attempted when 'base register' is
>> referenced in at least 4 load/stores in a basic block.
>>
>> The optimization is implemented in a new RISC-V specific pass called
>> shorten_memrefs which is enabled for RVC targets. It has been developed for the
>> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
>>
>> The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
>> compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
>> Embench benchmark suite (https://www.embench.org/) we see code size reductions
>> of up to 18 bytes (0.7%) and only two cases where code size is increased
>> slightly, by 2 bytes each:
>>
>> Embench results (.text size in bytes, excluding .rodata)
>>
>> Benchmark       Without patch  With patch  Diff
>> aha-mont64      1052           1052        0
>> crc32           232            232         0
>> cubic           2446           2448        2
>> edn             1454           1450        -4
>> huffbench       1642           1642        0
>> matmult-int     420            420         0
>> minver          1056           1056        0
>> nbody           714            714         0
>> nettle-aes      2888           2884        -4
>> nettle-sha256   5566           5564        -2
>> nsichneu        15052          15052       0
>> picojpeg        8078           8078        0
>> qrduino         6140           6140        0
>> sglib-combined  2444           2444        0
>> slre            2438           2420        -18
>> st              880            880         0
>> statemate       3842           3842        0
>> ud              702            702         0
>> wikisort        4278           4280        2
>> -------------------------------------------------
>> Total           61324          61300       -24
>>
>> The patch has been tested on the following bare metal targets using QEMU
>> and there were no regressions:
>>
>>   rv32i
>>   rv32iac
>>   rv32im
>>   rv32imac
>>   rv32imafc
>>   rv64imac
>>   rv64imafdc
>>
>> We noticed that sched2 undoes some of the addresses generated by this
>> optimization and consequently increases code size, therefore this patch adds a
>> check in sched-deps.c to avoid changes that are expected to increase code size
>> when not optimizing for speed. Since this change touches target-independent
>> code, the patch has been bootstrapped and tested on x86 with no regressions.
>>


>> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
>> index 52db3cc..92a0893 100644
>> --- a/gcc/sched-deps.c
>> +++ b/gcc/sched-deps.c
>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "sched-int.h"
>>  #include "params.h"
>>  #include "cselib.h"
>> +#include "predict.h"
>>
>>  #ifdef INSN_SCHEDULING
>>
>> @@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
>>    rtx mem = *mii->mem_loc;
>>    rtx new_mem;
>>
>> +  /* When not optimizing for speed, avoid changes that are expected to make code
>> +     size larger.  */
>> +  addr_space_t as = MEM_ADDR_SPACE (mem);
>> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
>> +  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
>> +  int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed);
>> +  if (new_cost > old_cost && !speed)
>
> I think !speed should not needed here, it mean address_cost is
> incorrect if generated worse code, but this change will effect all
> other targets,
> so I think it would be better to split into another patch and CC
> related reviewer.
>
>

I have removed !speed in the updated patch and CC'd Jeff Law.

Jeff - please could you review my change to sched-deps.c in patch 2/2?

Thanks,
Craig
--

Craig Blackmore (2):
  RISC-V: Add shorten_memrefs pass
  sched-deps.c: Avoid replacing address if it increases address cost

 gcc/config.gcc                           |   2 +-
 gcc/config/riscv/riscv-passes.def        |  20 +++
 gcc/config/riscv/riscv-protos.h          |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c | 188 +++++++++++++++++++++++
 gcc/config/riscv/riscv.c                 |  86 ++++++++++-
 gcc/config/riscv/riscv.h                 |   5 +
 gcc/config/riscv/riscv.opt               |   6 +
 gcc/config/riscv/t-riscv                 |   5 +
 gcc/doc/invoke.texi                      |  10 ++
 gcc/sched-deps.c                         |   9 ++
 10 files changed, 327 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c