From patchwork Mon Jun 15 14:30:20 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
X-Patchwork-Id: 484319
Return-Path: 
 <gcc-patches-return-400476-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 2BDFF1402A2
	for <incoming@patchwork.ozlabs.org>;
	Tue, 16 Jun 2015 00:30:34 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=G//pUXuJ; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:references
	:in-reply-to:content-type; q=dns; s=default; b=vbkIPaXflgkSmt4XM
	8Y67ms8NTq0y3wQctl6zpXpNPVWF6xqIidjMlvRRLcf7koA6/C3gPtWAFhhO279f
	s/iTey7T3bhYVECXNEGwNFJa8O6CHykBuw9phKXH4Lhz+rcApBJUZ89YNFzUkNcn
	aV9FuU5TfE9EnhPPSAYBhMwS2Q=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:references
	:in-reply-to:content-type; s=default; bh=veur8+xUS7BBnCiPB1K0Ahw
	X5V4=; b=G//pUXuJqDpERldvI1sbYg8UbwFXcmG7ahrfPpTZ5gi1cit48uQRfqn
	WiryjOwXuXAT73C1NEX9QelRHyRSmqivBhgtF+h+HN+il1k7GUS0r1yP+Eb4w3qc
	7zA5LoJGWxhxq9Qd/Hlo99S9VFTvarhfFclJ65fjiDQ9Unu4ztn4=
Received: (qmail 50048 invoked by alias); 15 Jun 2015 14:30:27 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 50022 invoked by uid 89); 15 Jun 2015 14:30:26 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_00,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO
	eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Mon, 15 Jun 2015 14:30:24 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.140]) by uk-mta-15.uk.mimecast.lan;
	Mon, 15 Jun 2015 15:30:20 +0100
Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Mon, 15 Jun 2015 15:30:20 +0100
Message-ID: <557EE17C.5000008@arm.com>
Date: Mon, 15 Jun 2015 15:30:20 +0100
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Mike Stump <mikestump@comcast.net>, gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: arm memcpy of aligned data
References: <9FF08D9A-529E-4FDE-9193-4BD81EDD89E3@comcast.net>
	<55682C81.9040709@arm.com> <55683C58.20701@arm.com>
In-Reply-To: <55683C58.20701@arm.com>
X-MC-Unique: ZqE-_3sTRd2SckxMZKh-xw-1
X-IsSubscribed: yes

On 29/05/15 11:15, Kyrill Tkachov wrote:
> On 29/05/15 10:08, Kyrill Tkachov wrote:
>> Hi Mike,
>>
>> On 28/05/15 22:15, Mike Stump wrote:
>>> So, the arm memcpy code of aligned data isn’t as good as it can be.
>>>
>>> void *memcpy(void *dest, const void *src, unsigned int n);
>>>
>>> void foo(char *dst, int i) {
>>>      memcpy (dst, &i, sizeof (i));
>>> }
>>>
>>> generates horrible code, but, it we are willing to notice the src or the destination are aligned, we can do much better:
>>>
>>> $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve -mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
>>> $ cat t.s
>>> [ … ]
>>> foo:
>>> 	@ args = 0, pretend = 0, frame = 4
>>> 	@ frame_needed = 0, uses_anonymous_args = 0
>>> 	@ link register save eliminated.
>>> 	sub	sp, sp, #4
>>> 	str	r1, [r0]	@ unaligned
>>> 	add	sp, sp, #4
>> I think there's something to do with cpu tuning here as well.
> That being said, I do think this is a good idea.
> I'll give it a test.

The patch passes bootstrap and testing ok and I've seen it
improve codegen in a few places in SPEC.
I've added a testcase all marked up.

Mike, I'll commit the attached patch in 24 hours unless somebody objects.

Thanks,
Kyrill

2015-06-15  Mike Stump  <mikestump@comcast.net>

     * config/arm/arm.c (arm_block_move_unaligned_straight):
     Emit normal move instead of unaligned load when source or destination
     are appropriately aligned.

2015-06-15 Mike Stump  <mikestump@comcast.net>
            Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/arm/memcpy-aligned-1.c: New test.

>
> Kyrill
>
>> For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
>>            sub     sp, sp, #8
>>            mov     r2, r0
>>            add     r3, sp, #8
>>            str     r1, [r3, #-4]!
>>            ldr     r0, [r3]        @ unaligned
>>            str     r0, [r2]        @ unaligned
>>            add     sp, sp, #8
>>            @ sp needed
>>            bx      lr
>>
>> whereas for -O2 -mcpu=cortex-a57 I get the much better:
>>            sub     sp, sp, #8
>>            str     r1, [r0]        @ unaligned
>>            add     sp, sp, #8
>>            @ sp needed
>>            bx      lr
>>
>> Kyrill
>>
>>
>>> Index: gcc/config/arm/arm.c
>>> ===================================================================
>>> --- gcc/config/arm/arm.c	(revision 223842)
>>> +++ gcc/config/arm/arm.c	(working copy)
>>> @@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
>>>     				srcoffset + j * UNITS_PER_WORD - src_autoinc);
>>>     	  mem = adjust_automodify_address (srcbase, SImode, addr,
>>>     					   srcoffset + j * UNITS_PER_WORD);
>>> -	  emit_insn (gen_unaligned_loadsi (regs[j], mem));
>>> +	  if (src_aligned)
>>> +	    emit_move_insn (regs[j], mem);
>>> +	  else
>>> +	    emit_insn (gen_unaligned_loadsi (regs[j], mem));
>>>     	}
>>>           srcoffset += words * UNITS_PER_WORD;
>>>         }
>>> @@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
>>>     				dstoffset + j * UNITS_PER_WORD - dst_autoinc);
>>>     	  mem = adjust_automodify_address (dstbase, SImode, addr,
>>>     					   dstoffset + j * UNITS_PER_WORD);
>>> -	  emit_insn (gen_unaligned_storesi (mem, regs[j]));
>>> +	  if (dst_aligned)
>>> +	    emit_move_insn (mem, regs[j]);
>>> +	  else
>>> +	    emit_insn (gen_unaligned_storesi (mem, regs[j]));
>>>     	}
>>>           dstoffset += words * UNITS_PER_WORD;
>>>         }
>>>
>>>
>>> Ok?
>>>
>>> Can someone spin this through an arm test suite run for me, I was doing this by inspection and cross compile on a system with no arm bits.  Bonus points if you can check it in with the test case above marked up as appropriate.
>>>

commit 77191f4224c8729d014a9150bd9364f95ff704b0
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Fri May 29 10:44:21 2015 +0100

    [ARM] arm memcpy of aligned data

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 638d659..3a33c26 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -14283,7 +14283,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase,
 				srcoffset + j * UNITS_PER_WORD - src_autoinc);
 	  mem = adjust_automodify_address (srcbase, SImode, addr,
 					   srcoffset + j * UNITS_PER_WORD);
-	  emit_insn (gen_unaligned_loadsi (regs[j], mem));
+	  if (src_aligned)
+	    emit_move_insn (regs[j], mem);
+	  else
+	    emit_insn (gen_unaligned_loadsi (regs[j], mem));
 	}
       srcoffset += words * UNITS_PER_WORD;
     }
@@ -14302,7 +14305,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase,
 				dstoffset + j * UNITS_PER_WORD - dst_autoinc);
 	  mem = adjust_automodify_address (dstbase, SImode, addr,
 					   dstoffset + j * UNITS_PER_WORD);
-	  emit_insn (gen_unaligned_storesi (mem, regs[j]));
+	  if (dst_aligned)
+	    emit_move_insn (mem, regs[j]);
+	  else
+	    emit_insn (gen_unaligned_storesi (mem, regs[j]));
 	}
       dstoffset += words * UNITS_PER_WORD;
     }
diff --git a/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c b/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c
new file mode 100644
index 0000000..852b391
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -save-temps" } */
+
+void *memcpy (void *dest, const void *src, unsigned int n);
+
+void foo (char *dst, int i)
+{
+  memcpy (dst, &i, sizeof (i));
+}
+
+/* { dg-final { scan-assembler-times "str\t" 1 } } */
+/* { dg-final { scan-assembler-not "ldr\t" } } */