From patchwork Fri Oct 25 17:39:09 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Craig Blackmore <craig.blackmore@embecosm.com>
X-Patchwork-Id: 1184353
Return-Path: 
 <gcc-patches-return-511788-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized)
	smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131;
	helo=sourceware.org;
	envelope-from=gcc-patches-return-511788-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none)
	header.from=embecosm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="JsTnx+3o"; dkim=pass (2048-bit key;
	unprotected) header.d=embecosm.com header.i=@embecosm.com
	header.b="Z21XT1B6"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 470BGQ16Lqz9sP3
	for <incoming@patchwork.ozlabs.org>;
	Sat, 26 Oct 2019 04:40:29 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:in-reply-to:references; q=dns; s=
	default; b=MfJV7L/8s2A4/ZaMFMQWapci77UcbgGjG1tG9MOndAK5kbRZ2Z5yN
	wAa3P9zExTocm5ctnL76And/PEFrTTbka0bEGv4wR7PA2Boral3jUoJ4evS0bj2a
	KEJSq/068/PHcY5rSEgeDUHARSgmD0pS210BnAEle3BzkL6UkYGXIk=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:in-reply-to:references; s=
	default; bh=k4+5jAbdmMJjbVMY03LkEdj7w4w=; b=JsTnx+3oUhPvrG+Baf3L
	RKivD6SOs2Cgnrlh6KU+OJ2MzOMUbD3HONouh3CZdX/Y+kPe5ox/15Hbzc6uqRlb
	Xwft1rCoXaKyL4KjDSZ6wvvEMkiTLGn6fMNrUB4XN4Uvp66XUVoCJSotCuUd9+Mx
	V2VUjwW7/DU7LN2IKiYjzjY=
Received: (qmail 70728 invoked by alias); 25 Oct 2019 17:40:23 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 70719 invoked by uid 89); 25 Oct 2019 17:40:23 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0,
	GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE,
	SPF_PASS autolearn=ham version=3.3.1 spammy=Cheng, paramsh,
	params.h, UD:params.h
X-HELO: mail-wm1-f66.google.com
Received: from mail-wm1-f66.google.com (HELO mail-wm1-f66.google.com)
	(209.85.128.66) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 25 Oct 2019 17:40:20 +0000
Received: by mail-wm1-f66.google.com with SMTP id c22so2834891wmd.1 for
	<gcc-patches@gcc.gnu.org>; Fri, 25 Oct 2019 10:40:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com;
	s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=ozsg0LoOeaeDDh6jLqY/VDVwkI2KtHmrupn7eCcn6zI=;
	b=Z21XT1B6M+R+PtbXf2KuDhhA3FwWEMfS4zg7ilrtzBD2oZV3T3KOcxmF/n9h3OYZpI
	heJhcGdVud4rDZSGC+CaaBfSdgNLvRcxOPxxhQvLVFQ2tvc8uPp2LKmBplP3rugOX1Hx
	1bcxrBaTIQwlvsZyH6XYS6wXkJTZ/XFQ4s5mIs6lY0Fq3ShbA+f5WfcE31YK6mV510Zl
	Q640DvilqdRnHeNH4d83zUZOeq3L0pzuCRL924m3vNRXnB8/9z+caRgbcUL92I46/vyZ
	8hguRMac/TBvnVuYNDGtQM0lh6u4vNUi+cr2hJHbL1W90trXRAG5lB61ebtxzQ/p1G3P
	x8BA==
Received: from localhost.localdomain ([80.0.42.246]) by smtp.googlemail.com
	with ESMTPSA id 126sm2672521wma.48.2019.10.25.10.40.16
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Fri, 25 Oct 2019 10:40:17 -0700 (PDT)
From: Craig Blackmore <craig.blackmore@embecosm.com>
To: gcc-patches@gcc.gnu.org
Cc: jimw@sifive.com, Ofer.Shinaar@wdc.com, Nidal.Faour@wdc.com,
	kito.cheng@gmail.com, law@redhat.com,
	Craig Blackmore <craig.blackmore@embecosm.com>
Subject: [PATCH v2 0/2] RISC-V: Allow more load/stores to be compressed
Date: Fri, 25 Oct 2019 18:39:09 +0100
Message-Id: <1572025151-22783-1-git-send-email-craig.blackmore@embecosm.com>
In-Reply-To: 
 <CA+yXCZDC2g8SNOeD-dWOLwDeWRd51514_dk++g7yVkS-dbG0-A@mail.gmail.com>
References: 
 <CA+yXCZDC2g8SNOeD-dWOLwDeWRd51514_dk++g7yVkS-dbG0-A@mail.gmail.com>

Hi Kito,

Thank you for taking the time to review my patch. I am posting an updated
patchset taking into account your comments.

On 18/09/2019 11:01, Kito Cheng wrote:
> Hi Craig:
>
> Some general review comment:
> - Split new pass into new file.
> - Add new option to enable/disable this pass.
> - Could you extend this patch to support lw/sw/ld/sd/flw/fsw/fld/fsd?
>   I think there is lots of common logic for supporting other types
> compressed load/store
>   instruction, but I'd like to see those support at once.

I agree the patch could be extended to other load/store instructions but
unfortunately I don't have the time to do this at the moment. Can the lw/sw
support be merged and the others added later?

> - Do you have experimental data about doing that after register
> allocation/reload,

I don't think it is feasible to move the pass after reload, because the pass
requires a new register to be allocated for the new base.

>   I'd prefer doing such optimization after RA, because we can
> accurately estimate
>   how many byte we can gain, I guess it because RA didn't assign fit
> src/dest reg
>   or base reg so that increase code size?
>

Before reload, we do not know whether the base reg will be a compressed register
or not.


>
> On Fri, Sep 13, 2019 at 12:20 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>>
>> This patch aims to allow more load/store instructions to be compressed by
>> replacing a load/store of 'base register + large offset' with a new load/store
>> of 'new base + small offset'. If the new base gets stored in a compressed
>> register, then the new load/store can be compressed. Since there is an overhead
>> in creating the new base, this change is only attempted when 'base register' is
>> referenced in at least 4 load/stores in a basic block.
>>
>> The optimization is implemented in a new RISC-V specific pass called
>> shorten_memrefs which is enabled for RVC targets. It has been developed for the
>> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
>>
>> The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
>> compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
>> Embench benchmark suite (https://www.embench.org/) we see code size reductions
>> of up to 18 bytes (0.7%) and only two cases where code size is increased
>> slightly, by 2 bytes each:
>>
>> Embench results (.text size in bytes, excluding .rodata)
>>
>> Benchmark       Without patch  With patch  Diff
>> aha-mont64      1052           1052        0
>> crc32           232            232         0
>> cubic           2446           2448        2
>> edn             1454           1450        -4
>> huffbench       1642           1642        0
>> matmult-int     420            420         0
>> minver          1056           1056        0
>> nbody           714            714         0
>> nettle-aes      2888           2884        -4
>> nettle-sha256   5566           5564        -2
>> nsichneu        15052          15052       0
>> picojpeg        8078           8078        0
>> qrduino         6140           6140        0
>> sglib-combined  2444           2444        0
>> slre            2438           2420        -18
>> st              880            880         0
>> statemate       3842           3842        0
>> ud              702            702         0
>> wikisort        4278           4280        2
>> -------------------------------------------------
>> Total           61324          61300       -24
>>
>> The patch has been tested on the following bare metal targets using QEMU
>> and there were no regressions:
>>
>>   rv32i
>>   rv32iac
>>   rv32im
>>   rv32imac
>>   rv32imafc
>>   rv64imac
>>   rv64imafdc
>>
>> We noticed that sched2 undoes some of the addresses generated by this
>> optimization and consequently increases code size, therefore this patch adds a
>> check in sched-deps.c to avoid changes that are expected to increase code size
>> when not optimizing for speed. Since this change touches target-independent
>> code, the patch has been bootstrapped and tested on x86 with no regressions.
>>


>> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
>> index 52db3cc..92a0893 100644
>> --- a/gcc/sched-deps.c
>> +++ b/gcc/sched-deps.c
>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "sched-int.h"
>>  #include "params.h"
>>  #include "cselib.h"
>> +#include "predict.h"
>>
>>  #ifdef INSN_SCHEDULING
>>
>> @@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
>>    rtx mem = *mii->mem_loc;
>>    rtx new_mem;
>>
>> +  /* When not optimizing for speed, avoid changes that are expected to make code
>> +     size larger.  */
>> +  addr_space_t as = MEM_ADDR_SPACE (mem);
>> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
>> +  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
>> +  int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed);
>> +  if (new_cost > old_cost && !speed)
>
> I think !speed should not needed here, it mean address_cost is
> incorrect if generated worse code, but this change will effect all
> other targets,
> so I think it would be better to split into another patch and CC
> related reviewer.
>
>

I have removed !speed in the updated patch and CC'd Jeff Law.

Jeff - please could you review my change to sched-deps.c in patch 2/2?

Thanks,
Craig
---

Craig Blackmore (2):
  RISC-V: Add shorten_memrefs pass
  sched-deps.c: Avoid replacing address if it increases address cost

 gcc/config.gcc                           |   2 +-
 gcc/config/riscv/riscv-passes.def        |  20 +++
 gcc/config/riscv/riscv-protos.h          |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c | 188 +++++++++++++++++++++++
 gcc/config/riscv/riscv.c                 |  86 ++++++++++-
 gcc/config/riscv/riscv.h                 |   5 +
 gcc/config/riscv/riscv.opt               |   6 +
 gcc/config/riscv/t-riscv                 |   5 +
 gcc/doc/invoke.texi                      |  10 ++
 gcc/sched-deps.c                         |   9 ++
 10 files changed, 327 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c