From patchwork Thu May 22 10:08:34 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Charles Baylis <charles.baylis@linaro.org>
X-Patchwork-Id: 351417
Return-Path: 
 <gcc-patches-return-368236-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id A5AE2140088
	for <incoming@patchwork.ozlabs.org>;
	Thu, 22 May 2014 20:08:55 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:date:message-id:subject:from:to:cc:content-type;
	q=dns; s=default; b=UfN887XdcqoZta3cuin8u8Kh6aatzwZ+mysgZHOVyaD
	lUAuHnmEk130If+3y0oQ05nYsh9T+ssxuNlc9rDY5ovULqZXFWLZPasMHtndTo00
	BlA+PfQ9qLaFE/bmMhnPZ+8VZ7bBhMQt3e0Me/RXbOTObn4LimgWkrFycU/IPg9o
	=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:date:message-id:subject:from:to:cc:content-type;
	s=default; bh=KOV1B7mqvtl62drWmBdjdGzYhaQ=; b=ZpVymiC4sC7NhrnpQ
	wB1+vLuLoW2+b8dLZYukhAREhk7W8FMhkOW7vVEnEqAo6nPHspsUYq9Fa/h9cCjJ
	hZ5itdugJShCKZMlj6Nc8JIt4vahOS++yqyYD4txJ2Ld/nh2uYuyFsi+hHQ/PV1e
	/ZBJdu5OxgAEpmJN6LYGL9+cec=
Received: (qmail 29221 invoked by alias); 22 May 2014 10:08:47 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 29196 invoked by uid 89); 22 May 2014 10:08:45 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00,
	RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-lb0-f169.google.com
Received: from mail-lb0-f169.google.com (HELO mail-lb0-f169.google.com)
	(209.85.217.169) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted)
	ESMTPS; Thu, 22 May 2014 10:08:40 +0000
Received: by mail-lb0-f169.google.com with SMTP id s7so2472342lbd.0 for
	<gcc-patches@gcc.gnu.org>; Thu, 22 May 2014 03:08:35 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
	s=20130820;
	h=x-gm-message-state:mime-version:date:message-id:subject:from:to:cc
	:content-type; bh=yL0hPOk+OwF7FHXIN4/aLVBK3MCupIMVk8NYqh2Z87I=;
	b=gJ6EMT3JusxSOu/AE7bBvp8441bOS1y6BkQzc11XG9aFM6ZuKBB27mdEfhKe8l26ZT
	Stze//JPI28U3Eym4XzIfUC6iOSLVlXc8UlE6PFwHoBfT2fhaWHe/mdn02WAlRKZfgME
	uwilqbijKEb4pNgKeXgiEk6PMpNK46OhIgsxiB4ij0g5KUYNxhmEEj2C9ZHQHnzkXiAK
	Zb8qBGDPMKaP7NJ2znJZybROzEt4pa0M+RTC4L1oxtcTF0FgVhYylcvvGoP2kAtoRJlp
	D4+Y8tscIFVlf+Kkr95z94LDomfaRGJv52/lj4gwfuUQ9CF6vA4rhfsI9d8cG3SeEW39
	FfHQ==
X-Gm-Message-State: 
 ALoCoQlT2JM7urcIrk5rNQhF2UARiACAh919fNuDk9sG1M5BnMx6pGA7cCfdEN7d1g69VVInxLs7
MIME-Version: 1.0
X-Received: by 10.152.234.229 with SMTP id uh5mr1278755lac.56.1400753315684;
	Thu, 22 May 2014 03:08:35 -0700 (PDT)
Received: by 10.112.142.227 with HTTP; Thu, 22 May 2014 03:08:34 -0700 (PDT)
Date: Thu, 22 May 2014 11:08:34 +0100
Message-ID: 
 <CADnVucB50UQqXTqn+ODx0QQ-5rFY15tM_iSart4nWXbbrW2zGQ@mail.gmail.com>
Subject: [PATCH, ARM, v2] Improve 64 bit division performance
From: Charles Baylis <charles.baylis@linaro.org>
To: Richard Earnshaw <rearnsha@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
X-IsSubscribed: yes

On 1 May 2014 16:41, Richard Earnshaw <rearnsha@arm.com> wrote:
> I think really, you've got three independent changes here:

Version 2 of this patch series is now a 9 patch series which addresses
most of the following. Exceptions discussed below.

> 1) Optimize the prologue/epilogue sequences when ldrd is available.
> 2) Replace the call to __gnu_ldivmod_helper with __udivmoddi4

I assume you mean __gnu_uldivmod_helper here, as __gnu_ldivmod_helper
performs signed division and can't be directly replaced with the
unsigned division performed by __udivmoddi4.

> 3) Optimize the code to __aeabi_ldivmod.

Converting to call __udivmoddi4, fixing up signedness of operands and
results and optimisation are all one change.

> Ideally, therefore, this is a three patch series, but it's then missing
> a few bits.
>
> 4) Step 2 can also be trivially applied to bpabi-v6m.S as well, since
> it's a direct swap of one function for another (unless I've misread the
> changes, I think the ABI of the two helper functions are the same).

For __aeabi_uldivmod this is true. For __aeabi_ldivmod this is not
trivial as the signedness fix-ups must be written.

> 5) Step 4 then makes __gnu_ldivmod_helper in bpabi.c a dead function
> which can be deleted.  This is good because currently pulling in either
> 64-bit division function causes both these helper functions to be pulled
> in and thus the whole of the 64-bit div-mod code for both signed and
> unsigned values.   That's particularly unfortunate for ARMv6m class
> devices as that's potentially a lot of redundant code.

Similarly, __gnu_uldivmod_helper not __gnu_ldivmod_helper.

I've included two patches which do the trivial steps for the unsigned case.

>
> Finally, I know this was the original code, but the complete lack of
> comments in this code made reviewing even the trivial parts a complete
> nightmare -- it took me half an hour before I remembered that
> __udivmoddi4 took three parameters, the third of which was on the stack:
> thus the messing around with sp/ip in the prologue wasn't just trivial
> padding but a necessary part of the function.  Please could you add, at
> least some short comments clarifying the register disposition on input
> and what that prologue code is up to...

Done.

> Finally, how was this code tested?

It has been built and "make check" has been run with no regressions on:
arm-unknown-linux-gnueabihf --with-mode=thumb --with-arch=armv7-a
arm-unknown-linux-gnueabihf --with-mode=arm --with-arch=armv7-a
arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv5te
arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv4t

I have also run a simple test harness which checks the result of
several 64 bit division operations where gcc has been built with the
above configurations.

I am not currently set up with a way to test v6M, so those parts aren't tested.

> Anyway, some additional comments below:
>
> Don't repeat the function name for multiple tweaks to the same function;
> as mentioned above, if these are really separate changes they should be
> in separate submissions.  Mixing unrelated changes just makes the
> reviewing step that much harder.

Done.


>> +     strd ip,lr, [sp, #-16]!
>
> Space after comma.

Done

> Also, since you've essentially rewritten the entire function, can you
> please also reformat them to follow the coding style of the rest of the
> file: namely "<tab>OP<tab>operands".

Done

>>  #else
>> +     sub sp, sp, #8
>>       do_push {sp, lr}
>>  #endif
>
> Please add a comment that the value at *sp is the address of the the
> slot for the remainder.

Done
>> +#if defined(__thumb2__) && CAN_USE_LDRD
>> +     sub ip, sp, #8
>> +     strd ip,lr, [sp, #-16]!
>
> Space after comma.

Done

>>  #else
>> +     sub sp, sp, #8
>>       do_push {sp, lr}
>>  #endif
>> +     cmp xxh, #0
>> +     blt 1f
>> +     cmp yyh, #0
>> +     blt 2f
>> +
>> +98:  cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>
> The CFI push should really precede your conditional tests, it relates to
> the do_push expression.

Done.

>> +     bl SYM(__udivmoddi4) __PLT__
>> +     ldr lr, [sp, #4]
>> +#if CAN_USE_LDRD
>> +     ldrd r2, r3, [sp, #8]
>> +     add sp, sp, #16
>> +#else
>> +     add sp, sp, #8
>> +     do_pop {r2, r3}
>> +#endif
>
> You're missing a CFI pop, which is needed when the values on the stack
> go out of scope.

The existing code doesn't do this. Since there are multiple exit
points from the optimised function the existing cfi_* macros aren't
sufficient (there is no cfi_save_state/cfi_restore_state), so I have
included a patch which uses the gas .cfi_* directives. This may be
interesting on non-DWARF or non-ELF platforms, if any are still
supported .

>> +     RET
>> +1: /* xxh:xxl is negative */
>> +     rsbs xxl, xxl, #0
>
> We're using unified syntax, so NEGS is preferable.

Done

>> +     sbc xxh, xxh, xxh, lsl #1
>
> Worthy of a comment, Thumb2 has no RSC instruction, so use X - 2X.

Done

>> +     cmp yyh, #0
>> +     blt 3f
>> +98:  cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>
> This CFI push looks wrong.  You've already pushed things earlier.  On
> the other hand, you should save the state before the CFI pop above, so
> that you can restore the state again for the next (ie this) block of code.

Done (see above)

>> +98:  cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>> +     bl SYM(__udivmoddi4) __PLT__
>> +     ldr lr, [sp, #4]
>> +#if CAN_USE_LDRD
>> +     ldrd r2, r3, [sp, #8]
>> +     add sp, sp, #16
>> +#else
>> +     add sp, sp, #8
>> +     do_pop {r2, r3}
>> +#endif
>> +     rsbs yyl, yyl, #0
>> +     sbc yyh, yyh, yyh, lsl #1
>> +     RET
>>
>>  #endif /* L_aeabi_ldivmod */
>>
>
> You use the LDRD vs do_pop sequence identically several times.  To avoid
> a lot of ifdefs, it might be worth considering a macro for this idiom to
> reduce the overall amount of conditionalized code.

Done.


The updated patch series is attached. Hopefully, patches 1 through 6
are now ready. Patches 7 through 9 can be dropped if necessary.


0001-Whitespace.patch

2014-05-22  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace.
        (__aeabi_ldivmod): Fix whitespace.


0002-Add-comments.patch

2014-05-22  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/bpabi.S (__aeabi_uldivmod, __aeabi_ldivmod): Add comment
        describing register usage on function entry and exit.


0003-Optimise-__aeabi_uldivmod-stack-manipulation.patch

2014-05-22  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer
        manipulation.


0004-Optimise-__aeabi_uldivmod.patch

2014-05-22  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call
        to __udivmoddi4.


0005-Optimise-__aeabi_ldivmod-stack-manipulation.patch

2014-05-22  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/bpabi.S (__aeabi_ldivmod): Optimise stack manipulation.


0006-Optimise-__aeabi_ldivmod.patch

2014-05-22  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/bpabi.S (__aeabi_ldivmod): Perform division using
        __udivmoddi4, and fixups for negative operands.


0007-Fix-cfi-annotations.patch

2014-05-22  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/bpabi.S (__aeabi_ldivmod, __aeabi_uldivmod,
        push_for_divide, pop_for_divide): Use .cfi_* directives for DWARF
        annotations. Fix DWARF information.


0008-Use-__udivmoddi4-for-v6M-aeabi_uldivmod.patch

2014-05-22  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/bpabi-v6m.S (__aeabi_uldivmod): Perform division using
        __udivmoddi4.


0009-Remove-__gnu_uldivmod_helper.patch

2014-05-22  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove.

From 540a4016a32953cb37dbe2f7ea1ec3e1cf0bfdcf Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Mon, 12 May 2014 18:58:04 +0100
Subject: [PATCH 9/9] Remove __gnu_uldivmod_helper

2014-05-12  Charles Baylis  <charles.baylis@linaro.org>

	* config/arm/bpabi.c (__gnu_uldivmod_helper): Remove.
---
 libgcc/config/arm/bpabi.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/libgcc/config/arm/bpabi.c b/libgcc/config/arm/bpabi.c
index 7b155cc..e90d044 100644
--- a/libgcc/config/arm/bpabi.c
+++ b/libgcc/config/arm/bpabi.c
@@ -26,9 +26,6 @@ extern long long __divdi3 (long long, long long);
 extern unsigned long long __udivdi3 (unsigned long long, 
 				     unsigned long long);
 extern long long __gnu_ldivmod_helper (long long, long long, long long *);
-extern unsigned long long __gnu_uldivmod_helper (unsigned long long, 
-						 unsigned long long, 
-						 unsigned long long *);
 
 
 long long
@@ -43,14 +40,3 @@ __gnu_ldivmod_helper (long long a,
   return quotient;
 }
 
-unsigned long long
-__gnu_uldivmod_helper (unsigned long long a, 
-		       unsigned long long b,
-		       unsigned long long *remainder)
-{
-  unsigned long long quotient;
-
-  quotient = __udivdi3 (a, b);
-  *remainder = a - b * quotient;
-  return quotient;
-}
-- 
1.9.1