From patchwork Mon Oct 25 10:15:35 2010
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?b?RG91ZyBLd2FuICjpl5zmjK/lvrcp?=
 <dougkwan@google.com>
X-Patchwork-Id: 69083
Return-Path: 
 <gcc-patches-return-276310-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id D0B56B70A8
	for <incoming@patchwork.ozlabs.org>;
	Mon, 25 Oct 2010 21:15:53 +1100 (EST)
Received: (qmail 2628 invoked by alias); 25 Oct 2010 10:15:50 -0000
Received: (qmail 2620 invoked by uid 22791); 25 Oct 2010 10:15:48 -0000
X-SWARE-Spam-Status: No, hits=3.3 required=5.0	tests=AWL, BAYES_00,
	CHARSET_FARAWAY_HEADER, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU,
	MIME_CHARSET_FARAWAY, SPF_HELO_PASS, TW_CL, TW_LN, TW_LZ,
	TW_VS, TW_XF, TW_XS, TW_XX, TW_YY, T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from smtp-out.google.com (HELO smtp-out.google.com)
	(216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Mon, 25 Oct 2010 10:15:42 +0000
Received: from hpaq5.eem.corp.google.com (hpaq5.eem.corp.google.com
	[172.25.149.5])	by smtp-out.google.com with ESMTP id
	o9PAFcsY025788	for <gcc-patches@gcc.gnu.org>;
	Mon, 25 Oct 2010 03:15:38 -0700
Received: from pxi12 (pxi12.prod.google.com [10.243.27.12])	by
	hpaq5.eem.corp.google.com with ESMTP id o9PAFas4024229	for
	<gcc-patches@gcc.gnu.org>; Mon, 25 Oct 2010 03:15:36 -0700
Received: by pxi12 with SMTP id 12so995704pxi.0 for
	<gcc-patches@gcc.gnu.org>; Mon, 25 Oct 2010 03:15:35 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.142.179.7 with SMTP id b7mr5319166wff.147.1288001735364;
	Mon, 25 Oct 2010 03:15:35 -0700 (PDT)
Received: by 10.142.238.12 with HTTP; Mon, 25 Oct 2010 03:15:35 -0700 (PDT)
In-Reply-To: <201010221920.37094.paul@codesourcery.com>
References: <AANLkTi=A-ZuBARrxXJaJC4hz=J-oL4_5zYjbJ1w2d_nR@mail.gmail.com>
	<201010221920.37094.paul@codesourcery.com>
Date: Mon, 25 Oct 2010 18:15:35 +0800
Message-ID: <AANLkTi=BQk4TdhxW6vUcGzk_YkjMZE8iEZM02cZ7GJSt@mail.gmail.com>
Subject: Re: [PATCH][ARM] Optimized 64-bit multiplication for THUMB-1
From: =?Big5?B?RG91ZyBLd2FuICjD9q62vHcp?= <dougkwan@google.com>
To: Paul Brook <paul@codesourcery.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>, Nick Clifton <nickc@redhat.com>,
	Richard Earnshaw <rearnsha@arm.com>
X-System-Of-Record: true
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

Hi Paul,

   Thank you very much for your review and comments.  I have fixed the
push/pop and use of 2-argument code in 32-bit code.  I am not quite
sure what the problem in the __thumb2__ test is.  I built arm-eabi-gcc
with arches armv4, armv5te, armv7-a and no-arch and all build was
successful.  I did change the test so that forcing ARM mode is only
done if:

-ARM mode has UMULL instruction
-we are compiling for THUMB-1
-interworking is enabled.

Attached is the updated patch.

-Doug


在 2010年10月23日上午2:20，Paul Brook <paul@codesourcery.com> 寫道：
>> +/* Force using ARM code if it is possible except for THUMB2 target. */
>> +#if defined(USE_FAST_MULDI3) && !defined(__thumb2__)
>> +     ARM_FUNC_START muldi3
>
> The !__thumb2__ test is wrong. I'm surprised this even compiles.
>
>>+      mul     xxh, yyl
>>...
>>+      add     xxh, yyh
>
> Please use the proper 3-argument form in 32-bit code.
>
>>+      push    {r4, r5, r6, r7}
>
> Older assemblers do not support push/pop in ARM mode.
> Use do_push/do_pop.
>
> Paul
>

Index: gcc/config/arm/lib1funcs.asm
===================================================================
--- gcc/config/arm/lib1funcs.asm	(revision 165462)
+++ gcc/config/arm/lib1funcs.asm	(working copy)
@@ -1274,6 +1274,90 @@ LSYM(Lover12):
 #endif
 	
 #endif /* L_dvmd_lnx */
+
+#ifdef L_muldi3
+
+/* ------------------------------------------------------------------------ */
+/* Dword multiplication operation.
+
+   The THUMB ISA lacks an instruction to compute the higher half of the
+   64-bit result from a 32-bit by 32-bit multiplication.  This makes 64-bit
+   multiplication difficult to implement efficiently.  The ARM ISAs after V3M
+   have UMULL and MLA which can be used to implement 64-bit muliplication
+   efficiently.  On a target that support both ARM V3M+ and THUMB ISA's (but
+   not THUMB2), we want to use the ARM version of _muldi3 in the THUMB libgcc.
+
+   We do not need to use the ARM version for THUMB2 targets as the THUMB2
+   targets also support MLA and UMULL. */
+
+/* We cannot use the faster version for following situations:
+
+   -ARM architetures older than V3M lack the UMULL instruction.
+   -Target is ARMV6M, which does not run ARM code.  */
+
+#undef USE_FAST_MULDI3
+#if (__ARM_ARCH__ > 3 || defined(__ARM_ARCH_3M__)) && !defined(__ARM_ARCH_6M__)
+#define USE_FAST_MULDI3
+#endif
+
+/* Force using ARM code if:
+   1. ARM mode has UMULL (i.e. USE_FAST_MULDI3 is defined) and
+   2. This is THUMB-1 mode and
+   3. INTERWORKING is enabled.  */
+
+#if defined(USE_FAST_MULDI3) \
+    && (defined(__thumb__) && !defined(__thumb2__)) \
+    && defined(__THUMB_INTERWORK__)
+	ARM_FUNC_START muldi3
+	ARM_FUNC_ALIAS aeabi_lmul muldi3
+#else
+	FUNC_START muldi3
+	FUNC_ALIAS aeabi_lmul muldi3
+#endif
+
+#if defined(USE_FAST_MULDI3)
+	/* Fast version for ARM with umull and THUMB2.  */
+	mul	xxh, xxh, yyl
+	mla	yyh, xxl, yyh, xxh
+	umull	xxl, xxh, yyl, xxl
+	add	xxh, xxh, yyh
+	RET
+#else
+	/* Slow version for both THUMB and older ARMs lacking umull. */
+	mul	xxh, yyl		/* xxh := AH*BL */
+	do_push	{r4, r5, r6, r7}
+	mul	yyh, xxl		/* yyh := AL*BH */
+	ldr	r4, .L_mask
+	lsr	r5, xxl, #16		/* r5 := (AL>>16) */
+	lsr	r6, yyl, #16		/* r6 := (BL>>16) */
+	lsr	r7, xxl, #16		/* r7 := (AL>>16) */
+	mul	r5, r6			/* r5 = (AL>>16) * (BL>>16) */
+	and	xxl, r4			/* xxl = AL & 0xffff */
+	and	yyl, r4			/* yyl = BL & 0xffff */
+	add	xxh, yyh		/* xxh = AH*BL+AL*BH */
+	mul	r6, xxl			/* r6 = (AL&0xffff) * (BL>>16) */
+	mul	r7, yyl			/* r7 = (AL>>16) * (BL&0xffff) */
+	add	xxh, r5
+	mul	xxl, yyl		/* xxl = (AL&0xffff) * (BL&0xffff) */
+	mov	r4, #0	
+	adds	r6, r7			/* partial sum to result[47:16]. */
+	adc	r4, r4			/* carry to result[48]. */
+	lsr	yyh, r6, #16
+	lsl	r4, r4, #16
+	lsl	yyl, r6, #16
+	add	xxh, r4
+	adds	xxl, yyl
+	adc	xxh, yyh
+	do_pop	{r4, r5, r6, r7}
+	RET
+	.align	2
+.L_mask:
+	.word	65535
+#endif
+
+	FUNC_END muldi3
+#endif
+
 #ifdef L_clear_cache
 #if defined __ARM_EABI__ && defined __linux__
 @ EABI GNU/Linux call to cacheflush syscall.
Index: gcc/config/arm/t-strongarm-elf
===================================================================
--- gcc/config/arm/t-strongarm-elf	(revision 165462)
+++ gcc/config/arm/t-strongarm-elf	(working copy)
@@ -16,7 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func _clzsi2 _clzdi2
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
+	_clzsi2 _clzdi2 _muldi3
 
 # We want fine grained libraries, so use the new code to build the
 # floating point emulation libraries.
Index: gcc/config/arm/t-vxworks
===================================================================
--- gcc/config/arm/t-vxworks	(revision 165462)
+++ gcc/config/arm/t-vxworks	(working copy)
@@ -16,7 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
+	_call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2 _muldi3
 
 # We want fine grained libraries, so use the new code to build the
 # floating point emulation libraries.
Index: gcc/config/arm/t-pe
===================================================================
--- gcc/config/arm/t-pe	(revision 165462)
+++ gcc/config/arm/t-pe	(working copy)
@@ -17,7 +17,7 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2 _muldi3
 
 # We want fine grained libraries, so use the new code to build the
 # floating point emulation libraries.
Index: gcc/config/arm/t-arm-elf
===================================================================
--- gcc/config/arm/t-arm-elf	(revision 165462)
+++ gcc/config/arm/t-arm-elf	(working copy)
@@ -29,7 +29,7 @@ LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3
 	_arm_truncdfsf2 _arm_negsf2 _arm_addsubsf3 _arm_muldivsf3 \
 	_arm_cmpsf2 _arm_unordsf2 _arm_fixsfsi _arm_fixunssfsi \
 	_arm_floatdidf _arm_floatdisf _arm_floatundidf _arm_floatundisf \
-	_clzsi2 _clzdi2 
+	_clzsi2 _clzdi2 _muldi3
 
 MULTILIB_OPTIONS     = marm/mthumb
 MULTILIB_DIRNAMES    = arm thumb
Index: gcc/config/arm/t-linux
===================================================================
--- gcc/config/arm/t-linux	(revision 165462)
+++ gcc/config/arm/t-linux	(working copy)
@@ -23,7 +23,7 @@ TARGET_LIBGCC2_CFLAGS = -fomit-frame-pointer -fPIC
 
 LIB1ASMSRC = arm/lib1funcs.asm
 LIB1ASMFUNCS = _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_lnx _clzsi2 _clzdi2 \
-	_arm_addsubdf3 _arm_addsubsf3
+	_arm_addsubdf3 _arm_addsubsf3 _muldi3
 
 # MULTILIB_OPTIONS = mhard-float/msoft-float
 # MULTILIB_DIRNAMES = hard-float soft-float
Index: gcc/config/arm/t-symbian
===================================================================
--- gcc/config/arm/t-symbian	(revision 165462)
+++ gcc/config/arm/t-symbian	(working copy)
@@ -16,7 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _bb_init_func _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2
+LIB1ASMFUNCS += _bb_init_func _call_via_rX _interwork_call_via_rX _clzsi2 \
+	_clzdi2 _muldi3
 
 # These functions have __aeabi equivalents and will never be called by GCC.  
 # By putting them in LIB1ASMFUNCS, we avoid the standard libgcc2.c code being
Index: gcc/config/arm/t-wince-pe
===================================================================
--- gcc/config/arm/t-wince-pe	(revision 165462)
+++ gcc/config/arm/t-wince-pe	(working copy)
@@ -16,7 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _call_via_rX \
+	_interwork_call_via_rX _clzsi2 _clzdi2 _muldi3
 
 # We want fine grained libraries, so use the new code to build the
 # floating point emulation libraries.