From patchwork Thu Feb 27 16:38:40 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charles Baylis X-Patchwork-Id: 324850 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id D58962C00A0 for ; Fri, 28 Feb 2014 03:38:54 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:content-type; q=dns; s=default; b=kAkavlz45zztFzQR3iLmu qA1kpUClVTKqOPQ7Ol0+oj+0xC9LuL0RBczRku3vohIH1vQkLhfHE/Sec+Pk0mEu aESrVqY9CypgSI5p24mECfzLj9Mq3ruCD78rkA5lOlG7yapCkiLZBgXaIzRjcO4D 8HpW9ZB+1x39v0DooJf4QQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:content-type; s=default; bh=shTm0LsgFEnUWEccWFlOTWvYLl4 =; b=U0d8ey8xdXi0uU8TSPJE9HcDYpPvu8S47DFim9f4AtEQZq6Lg8ENhjtzSiJ XlGopNPcZ9KBt0FN4eh0td1CfLcuIsK8yq4oSpeG7nblRC4aWHmBQlSxV6F36+qV jNVWoeurz7sKPFVNqeVpN9UwEatNQDPjZeNFq4ALtVFXosbU= Received: (qmail 29320 invoked by alias); 27 Feb 2014 16:38:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29309 invoked by uid 89); 27 Feb 2014 16:38:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-lb0-f174.google.com Received: from mail-lb0-f174.google.com (HELO mail-lb0-f174.google.com) (209.85.217.174) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 27 Feb 2014 16:38:44 +0000 Received: by mail-lb0-f174.google.com with SMTP id u14so1585182lbd.5 for ; Thu, 27 Feb 2014 08:38:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=ynIHvmkrAIviyflUisOhkz4RTTW2PRRPw3hEMf6F2bg=; b=DlUBhJMKKOzMbiuKnfkAzZGgz/nmr5TIMTbMczfMvWSE3AlUMFtO9CpAJ0TCg2v7KS 6Xvr/vYXsziaVt5Hp+JpG49gKHUV4LE4mtfNIh9RkKulpLUN4mf7Ux52mPld3TBzWNi1 B6CNFXoC/4bnlVyu2J15x2cqIid5H2o5norlETh+5M5v1yRbc8lNZveyZaiBFKiEjJaW z0rSEnDm2m40aJUdwaIVKZWj1lDMZE4jY7IQ1u6No/Xj6gs9uiYnW0FryeXqqtY1q1KQ A4+RoEivkaycLpPxLyttIPfC4lzp7lAF0uQ69RF4S5L3Mr9sQwY/WFmb8JQHYvviY4E+ MOaQ== X-Gm-Message-State: ALoCoQme1WIZVusq6HoZfV2i52cdOnlnY8GczdAM3qx+Rk+xPslkoJ3mwuG1uH0CEOR/7W9PiT2X MIME-Version: 1.0 X-Received: by 10.112.136.227 with SMTP id qd3mr2070416lbb.55.1393519120650; Thu, 27 Feb 2014 08:38:40 -0800 (PST) Received: by 10.112.202.201 with HTTP; Thu, 27 Feb 2014 08:38:40 -0800 (PST) In-Reply-To: References: Date: Thu, 27 Feb 2014 16:38:40 +0000 Message-ID: Subject: Fwd: [PATCH, ARM] Improve 64 bit division performance From: Charles Baylis To: GCC Patches , Ramana Radhakrishnan , Richard Earnshaw X-IsSubscribed: yes [resending as text/plain] Hi These patches optimise 64 bit division by removing the use of the __gnu_[u]ldivmod_helper functions and hence avoiding the redundant calculation of the remainder in those functions. Bootstrapped, tested and checked for arm-unknown-linux-gnueabihf. Benchmarked on Chromebook and Raspberry Pi using attached divbench3.c. Loop1 varies the divisor and loop2 varies the dividend. Chromebook: before: loop1 unsigned: 3.474419 loop2 unsigned: 6.564871 loop1 signed: 4.127967 loop2 signed: 6.071490 after: loop1 unsigned: 2.781364 loop2 unsigned: 6.166478 loop1 signed: 2.800974 loop2 signed: 6.129588 Raspberry pi: before loop1 unsigned: 28.881753 loop2 unsigned: 19.876385 loop1 signed: 32.074941 loop2 signed: 20.594860 after: loop1 unsigned: 24.893846 loop2 unsigned: 19.537562 loop1 signed: 25.334509 loop2 signed: 19.615088 Any comments? OK for stage 1? Patch 1: 2014-02-27 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call to __udivmoddi4. Patch 2: 2014-02-27 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod): Perform signed division via call to __udivmoddi4 and fixing up for negative operands. From 975d9c624e77ee00476e6866250b0e2e31461fca Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 25 Feb 2014 16:27:59 +0000 Subject: [PATCH 2/2] Optimise __aeabi_ldivmod 2014-02-25 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod): Perform signed division using unsigned division via call to __udivmoddi4 and additional logic. --- libgcc/config/arm/bpabi.S | 74 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 69 insertions(+), 5 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index e020af5..8b75a28 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -136,20 +136,84 @@ ARM_FUNC_START aeabi_ldivmod cfi_start __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod) test_div_by_zero signed - sub sp, sp, #8 -#if defined(__thumb2__) - mov ip, sp - push {ip, lr} +#if defined(__thumb2__) && CAN_USE_LDRD + sub ip, sp, #8 + strd ip,lr, [sp, #-16]! #else + sub sp, sp, #8 do_push {sp, lr} #endif + cmp xxh, #0 + blt 1f + cmp yyh, #0 + blt 2f + +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + bl SYM(__udivmoddi4) __PLT__ + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif + RET +1: /* xxh:xxl is negative */ + rsbs xxl, xxl, #0 + sbc xxh, xxh, xxh, lsl #1 + cmp yyh, #0 + blt 3f +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + bl SYM(__udivmoddi4) __PLT__ + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif + rsbs xxl, xxl, #0 + sbc xxh, xxh, xxh, lsl #1 + rsbs yyl, yyl, #0 + sbc yyh, yyh, yyh, lsl #1 + RET + +2: /* only yyh:yyl is negative */ + rsbs yyl, yyl, #0 + sbc yyh, yyh, yyh, lsl #1 98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 - bl SYM(__gnu_ldivmod_helper) __PLT__ + bl SYM(__udivmoddi4) __PLT__ ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else add sp, sp, #8 do_pop {r2, r3} +#endif + rsbs xxl, xxl, #0 + sbc xxh, xxh, xxh, lsl #1 RET + +3: /* both xxh:xxl and yyh:yyl are negative */ + rsbs yyl, yyl, #0 + sbc yyh, yyh, yyh, lsl #1 cfi_end LSYM(Lend_aeabi_ldivmod) +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + bl SYM(__udivmoddi4) __PLT__ + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif + rsbs yyl, yyl, #0 + sbc yyh, yyh, yyh, lsl #1 + RET #endif /* L_aeabi_ldivmod */ -- 1.8.3.2