From patchwork Mon Jun 15 14:30:20 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 484319 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 2BDFF1402A2 for ; Tue, 16 Jun 2015 00:30:34 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=G//pUXuJ; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; q=dns; s=default; b=vbkIPaXflgkSmt4XM 8Y67ms8NTq0y3wQctl6zpXpNPVWF6xqIidjMlvRRLcf7koA6/C3gPtWAFhhO279f s/iTey7T3bhYVECXNEGwNFJa8O6CHykBuw9phKXH4Lhz+rcApBJUZ89YNFzUkNcn aV9FuU5TfE9EnhPPSAYBhMwS2Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; s=default; bh=veur8+xUS7BBnCiPB1K0Ahw X5V4=; b=G//pUXuJqDpERldvI1sbYg8UbwFXcmG7ahrfPpTZ5gi1cit48uQRfqn WiryjOwXuXAT73C1NEX9QelRHyRSmqivBhgtF+h+HN+il1k7GUS0r1yP+Eb4w3qc 7zA5LoJGWxhxq9Qd/Hlo99S9VFTvarhfFclJ65fjiDQ9Unu4ztn4= Received: (qmail 50048 invoked by alias); 15 Jun 2015 14:30:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 50022 invoked by uid 89); 15 Jun 2015 14:30:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 15 Jun 2015 14:30:24 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-15.uk.mimecast.lan; Mon, 15 Jun 2015 15:30:20 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 15 Jun 2015 15:30:20 +0100 Message-ID: <557EE17C.5000008@arm.com> Date: Mon, 15 Jun 2015 15:30:20 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Mike Stump , gcc-patches Subject: Re: arm memcpy of aligned data References: <9FF08D9A-529E-4FDE-9193-4BD81EDD89E3@comcast.net> <55682C81.9040709@arm.com> <55683C58.20701@arm.com> In-Reply-To: <55683C58.20701@arm.com> X-MC-Unique: ZqE-_3sTRd2SckxMZKh-xw-1 X-IsSubscribed: yes On 29/05/15 11:15, Kyrill Tkachov wrote: > On 29/05/15 10:08, Kyrill Tkachov wrote: >> Hi Mike, >> >> On 28/05/15 22:15, Mike Stump wrote: >>> So, the arm memcpy code of aligned data isn’t as good as it can be. >>> >>> void *memcpy(void *dest, const void *src, unsigned int n); >>> >>> void foo(char *dst, int i) { >>> memcpy (dst, &i, sizeof (i)); >>> } >>> >>> generates horrible code, but, it we are willing to notice the src or the destination are aligned, we can do much better: >>> >>> $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve -mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s >>> $ cat t.s >>> [ … ] >>> foo: >>> @ args = 0, pretend = 0, frame = 4 >>> @ frame_needed = 0, uses_anonymous_args = 0 >>> @ link register save eliminated. >>> sub sp, sp, #4 >>> str r1, [r0] @ unaligned >>> add sp, sp, #4 >> I think there's something to do with cpu tuning here as well. > That being said, I do think this is a good idea. > I'll give it a test. The patch passes bootstrap and testing ok and I've seen it improve codegen in a few places in SPEC. I've added a testcase all marked up. Mike, I'll commit the attached patch in 24 hours unless somebody objects. Thanks, Kyrill 2015-06-15 Mike Stump * config/arm/arm.c (arm_block_move_unaligned_straight): Emit normal move instead of unaligned load when source or destination are appropriately aligned. 2015-06-15 Mike Stump Kyrylo Tkachov * gcc.target/arm/memcpy-aligned-1.c: New test. > > Kyrill > >> For the code you've given compiled with -O2 -mcpu=cortex-a53 I get: >> sub sp, sp, #8 >> mov r2, r0 >> add r3, sp, #8 >> str r1, [r3, #-4]! >> ldr r0, [r3] @ unaligned >> str r0, [r2] @ unaligned >> add sp, sp, #8 >> @ sp needed >> bx lr >> >> whereas for -O2 -mcpu=cortex-a57 I get the much better: >> sub sp, sp, #8 >> str r1, [r0] @ unaligned >> add sp, sp, #8 >> @ sp needed >> bx lr >> >> Kyrill >> >> >>> Index: gcc/config/arm/arm.c >>> =================================================================== >>> --- gcc/config/arm/arm.c (revision 223842) >>> +++ gcc/config/arm/arm.c (working copy) >>> @@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d >>> srcoffset + j * UNITS_PER_WORD - src_autoinc); >>> mem = adjust_automodify_address (srcbase, SImode, addr, >>> srcoffset + j * UNITS_PER_WORD); >>> - emit_insn (gen_unaligned_loadsi (regs[j], mem)); >>> + if (src_aligned) >>> + emit_move_insn (regs[j], mem); >>> + else >>> + emit_insn (gen_unaligned_loadsi (regs[j], mem)); >>> } >>> srcoffset += words * UNITS_PER_WORD; >>> } >>> @@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d >>> dstoffset + j * UNITS_PER_WORD - dst_autoinc); >>> mem = adjust_automodify_address (dstbase, SImode, addr, >>> dstoffset + j * UNITS_PER_WORD); >>> - emit_insn (gen_unaligned_storesi (mem, regs[j])); >>> + if (dst_aligned) >>> + emit_move_insn (mem, regs[j]); >>> + else >>> + emit_insn (gen_unaligned_storesi (mem, regs[j])); >>> } >>> dstoffset += words * UNITS_PER_WORD; >>> } >>> >>> >>> Ok? >>> >>> Can someone spin this through an arm test suite run for me, I was doing this by inspection and cross compile on a system with no arm bits. Bonus points if you can check it in with the test case above marked up as appropriate. >>> commit 77191f4224c8729d014a9150bd9364f95ff704b0 Author: Kyrylo Tkachov Date: Fri May 29 10:44:21 2015 +0100 [ARM] arm memcpy of aligned data diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 638d659..3a33c26 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -14283,7 +14283,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase, srcoffset + j * UNITS_PER_WORD - src_autoinc); mem = adjust_automodify_address (srcbase, SImode, addr, srcoffset + j * UNITS_PER_WORD); - emit_insn (gen_unaligned_loadsi (regs[j], mem)); + if (src_aligned) + emit_move_insn (regs[j], mem); + else + emit_insn (gen_unaligned_loadsi (regs[j], mem)); } srcoffset += words * UNITS_PER_WORD; } @@ -14302,7 +14305,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase, dstoffset + j * UNITS_PER_WORD - dst_autoinc); mem = adjust_automodify_address (dstbase, SImode, addr, dstoffset + j * UNITS_PER_WORD); - emit_insn (gen_unaligned_storesi (mem, regs[j])); + if (dst_aligned) + emit_move_insn (mem, regs[j]); + else + emit_insn (gen_unaligned_storesi (mem, regs[j])); } dstoffset += words * UNITS_PER_WORD; } diff --git a/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c b/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c new file mode 100644 index 0000000..852b391 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -save-temps" } */ + +void *memcpy (void *dest, const void *src, unsigned int n); + +void foo (char *dst, int i) +{ + memcpy (dst, &i, sizeof (i)); +} + +/* { dg-final { scan-assembler-times "str\t" 1 } } */ +/* { dg-final { scan-assembler-not "ldr\t" } } */