From patchwork Fri May 6 23:00:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 619500 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2001:1868:205::9]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3r1nMH5WN3z9snm for ; Sat, 7 May 2016 09:01:15 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1ayokD-0005IU-Vn; Fri, 06 May 2016 23:01:13 +0000 Received: from mail.linuxfoundation.org ([140.211.169.12]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1ayok5-0004c1-SV; Fri, 06 May 2016 23:01:06 +0000 Received: from akpm3.mtv.corp.google.com (unknown [104.132.1.65]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 3A071307; Fri, 6 May 2016 23:00:42 +0000 (UTC) Date: Fri, 6 May 2016 16:00:41 -0700 From: Andrew Morton To: zengzhaoxiu@163.com Subject: Re: [patch V4] lib: GCD: Use binary GCD algorithm instead of Euclidean Message-Id: <20160506160041.2b5e47757329c288efaed4fb@linux-foundation.org> In-Reply-To: <1462527763-15301-1-git-send-email-zengzhaoxiu@163.com> References: <1462527763-15301-1-git-send-email-zengzhaoxiu@163.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160506_160106_013195_A5E9BD61 X-CRM114-Status: GOOD ( 13.67 ) X-Spam-Score: -4.2 (----) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-4.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [140.211.169.12 listed in list.dnswl.org] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [140.211.169.12 listed in wl.mailspike.net] -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-snps-arc@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Linux on Synopsys ARC Processors List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-mips@linux-mips.org, linux-m68k@vger.kernel.org, dalias@libc.org, linux-sh@vger.kernel.org, peterz@infradead.org, Heiko Carstens , James.Bottomley@HansenPartnership.com, linux@lists.openrisc.net, sparclinux@vger.kernel.org, sam@ravnborg.org, Lennox Wu , Jonas Bonn , Chen Liqin , Russell King , Yoshinori Sato , Helge Deller , "James E.J. Bottomley" , jjuran@gmail.com, geert@linux-m68k.org, Matt Turner , linux-snps-arc@lists.infradead.org, uclinux-h8-devel@lists.sourceforge.jp, James Hogan , linux-s390@vger.kernel.org, nios2-dev@lists.rocketboards.org, Ivan Kokshaysky , Zhaoxiu Zeng , linux@horizon.com, linux-metag@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Richard Henderson , Michal Simek , linux-parisc@vger.kernel.org, Vineet Gupta , linux-kernel@vger.kernel.org, Ralf Baechle , linux-alpha@vger.kernel.org, Martin Schwidefsky , Ley Foon Tan , davem@davemloft.net Sender: "linux-snps-arc" Errors-To: linux-snps-arc-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org On Fri, 6 May 2016 17:42:42 +0800 zengzhaoxiu@163.com wrote: > From: Zhaoxiu Zeng > > The binary GCD algorithm is based on the following facts: > 1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2) > 2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b) > 3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b) > > Even on x86 machines with reasonable division hardware, the binary > algorithm runs about 25% faster (80% the execution time) than the > division-based Euclidian algorithm. > > On platforms like Alpha and ARMv6 where division is a function call to > emulation code, it's even more significant. > > There are two variants of the code here, depending on whether a > fast __ffs (find least significant set bit) instruction is available. > This allows the unpredictable branches in the bit-at-a-time shifting > loop to be eliminated. > > ... > > --- a/arch/alpha/Kconfig > +++ b/arch/alpha/Kconfig > @@ -27,6 +27,7 @@ config ALPHA > select MODULES_USE_ELF_RELA > select ODD_RT_SIGACTION > select OLD_SIGSUSPEND > + select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67 > help argh. Please don't always put new items at end-of-list. That's the perfect way of maximizing the number of patch collisions. Insert it at a random position. avoiding the end (if the list isn't alpha-sorted, which it should be). > > ... > > --- a/arch/mips/include/asm/cpu-features.h > +++ b/arch/mips/include/asm/cpu-features.h > @@ -180,6 +180,16 @@ > #endif > #endif > > +/* __builtin_constant_p(cpu_has_mips_r) && cpu_has_mips_r */ > +#if !((defined(cpu_has_mips32r1) && cpu_has_mips32r1) || \ > + (defined(cpu_has_mips32r2) && cpu_has_mips32r2) || \ > + (defined(cpu_has_mips32r6) && cpu_has_mips32r6) || \ > + (defined(cpu_has_mips64r1) && cpu_has_mips64r1) || \ > + (defined(cpu_has_mips64r2) && cpu_has_mips64r2) || \ > + (defined(cpu_has_mips64r6) && cpu_has_mips64r6)) > +#define CONFIG_CPU_NO_EFFICIENT_FFS 1 > +#endif #defining a CONFIG_ variable is pretty rude - defining these is the role of the Kconfig system, not of header files macros. This was easy: --- a/arch/mips/include/asm/cpu-features.h~lib-gcd-use-binary-gcd-algorithm-instead-of-euclidean-fix +++ a/arch/mips/include/asm/cpu-features.h @@ -187,7 +187,7 @@ (defined(cpu_has_mips64r1) && cpu_has_mips64r1) || \ (defined(cpu_has_mips64r2) && cpu_has_mips64r2) || \ (defined(cpu_has_mips64r6) && cpu_has_mips64r6)) -#define CONFIG_CPU_NO_EFFICIENT_FFS 1 +#define CPU_NO_EFFICIENT_FFS 1 #endif #ifndef cpu_has_mips_1 --- a/lib/gcd.c~lib-gcd-use-binary-gcd-algorithm-instead-of-euclidean-fix +++ a/lib/gcd.c @@ -10,7 +10,7 @@ * has decent hardware division. */ -#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) +#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) && !defined(CPU_NO_EFFICIENT_FFS) /* If __ffs is available, the even/odd algorithm benchmarks slower. */ unsigned long gcd(unsigned long a, unsigned long b)