From patchwork Wed Nov 19 02:43:00 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Joey Ye <joey.ye@arm.com>
X-Patchwork-Id: 412233
Return-Path: 
 <gcc-patches-return-385005-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id CC725140139
	for <incoming@patchwork.ozlabs.org>;
	Wed, 19 Nov 2014 13:43:31 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type
	:content-transfer-encoding; q=dns; s=default; b=yHzCgwXfqddtb1M2
	9qwEVXNZMPYJV9T8MVwhQGCXGP3R/HJtZAqrwPdYkheRceBeaYkBiiHBhM0oSJLS
	siuKakLcbkP/DDCX92NhOZg/8oVGLQJ0GEKBAzFkJq3hLTo4QYeEYxAMYyGnqXav
	FjkRcGXAma8KtP5QKf3UuAYaRPA=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type
	:content-transfer-encoding; s=default; bh=JZkZKlF2IPgaR8pflt42My
	+jCNU=; b=pa+6EX3ayfH531h3UJQ5qN1ZOfKBNZhtFw7+kDQBQSQ0m0BidV+kb9
	nSqBlKxDwEBVfOQRUSHAB55bLUd5t0jtrVkZYOLNQzSGq44HjMnldQGdjYrG69GP
	7ZZ2GZqbmaT+KXdKOevMPzJKidmIrfei5rTRep7czGflDkzGh8u6A=
Received: (qmail 14920 invoked by alias); 19 Nov 2014 02:43:23 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 14904 invoked by uid 89); 19 Nov 2014 02:43:21 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: service87.mimecast.com
Received: from service87.mimecast.com (HELO service87.mimecast.com)
	(91.220.42.44) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 19 Nov 2014 02:43:19 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.21]) by service87.mimecast.com;
	Wed, 19 Nov 2014 02:43:17 +0000
Received: from SHAWIN205 ([10.1.255.212]) by cam-owa1.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Wed, 19 Nov 2014 02:43:15 +0000
From: "Joey Ye" <joey.ye@arm.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [patch, arm] Minor optimization on thumb2 tail call
Date: Wed, 19 Nov 2014 10:43:00 +0800
Message-ID: <000001d003a2$8b5b8300$a2128900$@arm.com>
MIME-Version: 1.0
X-MC-Unique: 114111902431701901

Current thumb2 -Os generates suboptimal code for following tail call case:

int f4(int b, int a, int c, int d);
int g(int a, int b, int c, int d)
{ return f4(b, a, c, d); }

arm-none-eabi-gcc -Os -mthumb -mcpu=cortex-m3 test.c

push
{r4, lr}
mov r4, r1
mov r1, r0
mov r0, r4
pop {r4, lr}

b f4

There are two issues: The first one is that saving/restoring lr is not
necessary, as there is no return via pop pc. The second one is that even if
we managed to avoid lr push/pop, ldmia.w sp!, {r4} is still emitted as there
is a missing pattern for pop single and code size is not optimal.

This patch fixes these two issues and introduces a shared test case. CSiBE
thumb2 -Os shows cross board code size reduction, except for one case with 4
bytes regression. The case is like:

void f ()
{
   if ()
     ...
   else if ()
     ...
   else g();
}

There are N=2 non-sibcall returns and S=1 sibcall return. Originally the
non-sibcall returns are just pop {r4, r5, pc}, now they become
  b.n  .Lreturn

.Lreturn:
  pop {r4, r5}
  bx lr

The one byte save from sibcall return does not win the non-sibcall return
regressions back. In general scenario, number of N non-sibcall returns use
b.n branching to merged tail, number of S sibcalls save 2 bytes by avoid
poping lr. It results in 4-2*S bytes regression. In the worst scenario, each
non-sibcall return has to use b.w branching to merged tail, resulting in
(N-S)*2 bytes regression. The worst scenario is rare, according to CSiBE.
The general regression scenario can only regress 2 bytes at most. So I would
not introduce additional complexity to handle the regression case.

Make check cortex-m3: pass
thumb2 bootstrap (O2/Os): pass

    * config/arm/arm.c (arm_compute_save_reg_mask):
    Do not save lr in case of tail call.
    * config/arm/thumb2.md (*thumb2_pop_single): New pattern.

    * gcc.target/arm/thumb2-pop-single.c: New test.

 ;; to reflect the fact that the permissible constant pool ranges differ
 ;; between ldr instructions taking low regs and ldr instructions taking
high

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4f04707..20d0b9e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19190,6 +19190,7 @@ arm_compute_save_reg_mask (void)
       || (save_reg_mask
 	  && optimize_size
 	  && ARM_FUNC_TYPE (func_type) == ARM_FT_NORMAL
+	  && !crtl->tail_call_emit
 	  && !crtl->calls_eh_return))
     save_reg_mask |= 1 << LR_REGNUM;
 
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 64acfea..29cfb17 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -267,6 +267,17 @@
    (set_attr "type" "multiple")]
 )
 
+;; Pop a single register as its size is preferred over a post-incremental
load
+(define_insn "*thumb2_pop_single"
+  [(set (match_operand:SI 0 "low_register_operand" "=r")
+        (mem:SI (post_inc:SI (reg:SI SP_REGNUM))))]
+  "TARGET_THUMB2 && (reload_in_progress || reload_completed)"
+  "pop\t{%0}"
+  [(set_attr "type" "load1")
+   (set_attr "length" "2")
+   (set_attr "predicable" "yes")]
+)
+
 ;; We have two alternatives here for memory loads (and similarly for
stores)