From patchwork Thu May 22 10:08:34 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charles Baylis X-Patchwork-Id: 351417 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id A5AE2140088 for ; Thu, 22 May 2014 20:08:55 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:cc:content-type; q=dns; s=default; b=UfN887XdcqoZta3cuin8u8Kh6aatzwZ+mysgZHOVyaD lUAuHnmEk130If+3y0oQ05nYsh9T+ssxuNlc9rDY5ovULqZXFWLZPasMHtndTo00 BlA+PfQ9qLaFE/bmMhnPZ+8VZ7bBhMQt3e0Me/RXbOTObn4LimgWkrFycU/IPg9o = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:cc:content-type; s=default; bh=KOV1B7mqvtl62drWmBdjdGzYhaQ=; b=ZpVymiC4sC7NhrnpQ wB1+vLuLoW2+b8dLZYukhAREhk7W8FMhkOW7vVEnEqAo6nPHspsUYq9Fa/h9cCjJ hZ5itdugJShCKZMlj6Nc8JIt4vahOS++yqyYD4txJ2Ld/nh2uYuyFsi+hHQ/PV1e /ZBJdu5OxgAEpmJN6LYGL9+cec= Received: (qmail 29221 invoked by alias); 22 May 2014 10:08:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29196 invoked by uid 89); 22 May 2014 10:08:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-lb0-f169.google.com Received: from mail-lb0-f169.google.com (HELO mail-lb0-f169.google.com) (209.85.217.169) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 22 May 2014 10:08:40 +0000 Received: by mail-lb0-f169.google.com with SMTP id s7so2472342lbd.0 for ; Thu, 22 May 2014 03:08:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to:cc :content-type; bh=yL0hPOk+OwF7FHXIN4/aLVBK3MCupIMVk8NYqh2Z87I=; b=gJ6EMT3JusxSOu/AE7bBvp8441bOS1y6BkQzc11XG9aFM6ZuKBB27mdEfhKe8l26ZT Stze//JPI28U3Eym4XzIfUC6iOSLVlXc8UlE6PFwHoBfT2fhaWHe/mdn02WAlRKZfgME uwilqbijKEb4pNgKeXgiEk6PMpNK46OhIgsxiB4ij0g5KUYNxhmEEj2C9ZHQHnzkXiAK Zb8qBGDPMKaP7NJ2znJZybROzEt4pa0M+RTC4L1oxtcTF0FgVhYylcvvGoP2kAtoRJlp D4+Y8tscIFVlf+Kkr95z94LDomfaRGJv52/lj4gwfuUQ9CF6vA4rhfsI9d8cG3SeEW39 FfHQ== X-Gm-Message-State: ALoCoQlT2JM7urcIrk5rNQhF2UARiACAh919fNuDk9sG1M5BnMx6pGA7cCfdEN7d1g69VVInxLs7 MIME-Version: 1.0 X-Received: by 10.152.234.229 with SMTP id uh5mr1278755lac.56.1400753315684; Thu, 22 May 2014 03:08:35 -0700 (PDT) Received: by 10.112.142.227 with HTTP; Thu, 22 May 2014 03:08:34 -0700 (PDT) Date: Thu, 22 May 2014 11:08:34 +0100 Message-ID: Subject: [PATCH, ARM, v2] Improve 64 bit division performance From: Charles Baylis To: Richard Earnshaw Cc: GCC Patches , Ramana Radhakrishnan X-IsSubscribed: yes On 1 May 2014 16:41, Richard Earnshaw wrote: > I think really, you've got three independent changes here: Version 2 of this patch series is now a 9 patch series which addresses most of the following. Exceptions discussed below. > 1) Optimize the prologue/epilogue sequences when ldrd is available. > 2) Replace the call to __gnu_ldivmod_helper with __udivmoddi4 I assume you mean __gnu_uldivmod_helper here, as __gnu_ldivmod_helper performs signed division and can't be directly replaced with the unsigned division performed by __udivmoddi4. > 3) Optimize the code to __aeabi_ldivmod. Converting to call __udivmoddi4, fixing up signedness of operands and results and optimisation are all one change. > Ideally, therefore, this is a three patch series, but it's then missing > a few bits. > > 4) Step 2 can also be trivially applied to bpabi-v6m.S as well, since > it's a direct swap of one function for another (unless I've misread the > changes, I think the ABI of the two helper functions are the same). For __aeabi_uldivmod this is true. For __aeabi_ldivmod this is not trivial as the signedness fix-ups must be written. > 5) Step 4 then makes __gnu_ldivmod_helper in bpabi.c a dead function > which can be deleted. This is good because currently pulling in either > 64-bit division function causes both these helper functions to be pulled > in and thus the whole of the 64-bit div-mod code for both signed and > unsigned values. That's particularly unfortunate for ARMv6m class > devices as that's potentially a lot of redundant code. Similarly, __gnu_uldivmod_helper not __gnu_ldivmod_helper. I've included two patches which do the trivial steps for the unsigned case. > > Finally, I know this was the original code, but the complete lack of > comments in this code made reviewing even the trivial parts a complete > nightmare -- it took me half an hour before I remembered that > __udivmoddi4 took three parameters, the third of which was on the stack: > thus the messing around with sp/ip in the prologue wasn't just trivial > padding but a necessary part of the function. Please could you add, at > least some short comments clarifying the register disposition on input > and what that prologue code is up to... Done. > Finally, how was this code tested? It has been built and "make check" has been run with no regressions on: arm-unknown-linux-gnueabihf --with-mode=thumb --with-arch=armv7-a arm-unknown-linux-gnueabihf --with-mode=arm --with-arch=armv7-a arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv5te arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv4t I have also run a simple test harness which checks the result of several 64 bit division operations where gcc has been built with the above configurations. I am not currently set up with a way to test v6M, so those parts aren't tested. > Anyway, some additional comments below: > > Don't repeat the function name for multiple tweaks to the same function; > as mentioned above, if these are really separate changes they should be > in separate submissions. Mixing unrelated changes just makes the > reviewing step that much harder. Done. >> + strd ip,lr, [sp, #-16]! > > Space after comma. Done > Also, since you've essentially rewritten the entire function, can you > please also reformat them to follow the coding style of the rest of the > file: namely "OPoperands". Done >> #else >> + sub sp, sp, #8 >> do_push {sp, lr} >> #endif > > Please add a comment that the value at *sp is the address of the the > slot for the remainder. Done >> +#if defined(__thumb2__) && CAN_USE_LDRD >> + sub ip, sp, #8 >> + strd ip,lr, [sp, #-16]! > > Space after comma. Done >> #else >> + sub sp, sp, #8 >> do_push {sp, lr} >> #endif >> + cmp xxh, #0 >> + blt 1f >> + cmp yyh, #0 >> + blt 2f >> + >> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 > > The CFI push should really precede your conditional tests, it relates to > the do_push expression. Done. >> + bl SYM(__udivmoddi4) __PLT__ >> + ldr lr, [sp, #4] >> +#if CAN_USE_LDRD >> + ldrd r2, r3, [sp, #8] >> + add sp, sp, #16 >> +#else >> + add sp, sp, #8 >> + do_pop {r2, r3} >> +#endif > > You're missing a CFI pop, which is needed when the values on the stack > go out of scope. The existing code doesn't do this. Since there are multiple exit points from the optimised function the existing cfi_* macros aren't sufficient (there is no cfi_save_state/cfi_restore_state), so I have included a patch which uses the gas .cfi_* directives. This may be interesting on non-DWARF or non-ELF platforms, if any are still supported . >> + RET >> +1: /* xxh:xxl is negative */ >> + rsbs xxl, xxl, #0 > > We're using unified syntax, so NEGS is preferable. Done >> + sbc xxh, xxh, xxh, lsl #1 > > Worthy of a comment, Thumb2 has no RSC instruction, so use X - 2X. Done >> + cmp yyh, #0 >> + blt 3f >> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 > > This CFI push looks wrong. You've already pushed things earlier. On > the other hand, you should save the state before the CFI pop above, so > that you can restore the state again for the next (ie this) block of code. Done (see above) >> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 >> + bl SYM(__udivmoddi4) __PLT__ >> + ldr lr, [sp, #4] >> +#if CAN_USE_LDRD >> + ldrd r2, r3, [sp, #8] >> + add sp, sp, #16 >> +#else >> + add sp, sp, #8 >> + do_pop {r2, r3} >> +#endif >> + rsbs yyl, yyl, #0 >> + sbc yyh, yyh, yyh, lsl #1 >> + RET >> >> #endif /* L_aeabi_ldivmod */ >> > > You use the LDRD vs do_pop sequence identically several times. To avoid > a lot of ifdefs, it might be worth considering a macro for this idiom to > reduce the overall amount of conditionalized code. Done. The updated patch series is attached. Hopefully, patches 1 through 6 are now ready. Patches 7 through 9 can be dropped if necessary. 0001-Whitespace.patch 2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace. (__aeabi_ldivmod): Fix whitespace. 0002-Add-comments.patch 2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod, __aeabi_ldivmod): Add comment describing register usage on function entry and exit. 0003-Optimise-__aeabi_uldivmod-stack-manipulation.patch 2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer manipulation. 0004-Optimise-__aeabi_uldivmod.patch 2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call to __udivmoddi4. 0005-Optimise-__aeabi_ldivmod-stack-manipulation.patch 2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod): Optimise stack manipulation. 0006-Optimise-__aeabi_ldivmod.patch 2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod): Perform division using __udivmoddi4, and fixups for negative operands. 0007-Fix-cfi-annotations.patch 2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod, __aeabi_uldivmod, push_for_divide, pop_for_divide): Use .cfi_* directives for DWARF annotations. Fix DWARF information. 0008-Use-__udivmoddi4-for-v6M-aeabi_uldivmod.patch 2014-05-22 Charles Baylis * config/arm/bpabi-v6m.S (__aeabi_uldivmod): Perform division using __udivmoddi4. 0009-Remove-__gnu_uldivmod_helper.patch 2014-05-22 Charles Baylis * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove. From 540a4016a32953cb37dbe2f7ea1ec3e1cf0bfdcf Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Mon, 12 May 2014 18:58:04 +0100 Subject: [PATCH 9/9] Remove __gnu_uldivmod_helper 2014-05-12 Charles Baylis * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove. --- libgcc/config/arm/bpabi.c | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/libgcc/config/arm/bpabi.c b/libgcc/config/arm/bpabi.c index 7b155cc..e90d044 100644 --- a/libgcc/config/arm/bpabi.c +++ b/libgcc/config/arm/bpabi.c @@ -26,9 +26,6 @@ extern long long __divdi3 (long long, long long); extern unsigned long long __udivdi3 (unsigned long long, unsigned long long); extern long long __gnu_ldivmod_helper (long long, long long, long long *); -extern unsigned long long __gnu_uldivmod_helper (unsigned long long, - unsigned long long, - unsigned long long *); long long @@ -43,14 +40,3 @@ __gnu_ldivmod_helper (long long a, return quotient; } -unsigned long long -__gnu_uldivmod_helper (unsigned long long a, - unsigned long long b, - unsigned long long *remainder) -{ - unsigned long long quotient; - - quotient = __udivdi3 (a, b); - *remainder = a - b * quotient; - return quotient; -} -- 1.9.1