From patchwork Wed Jul 13 16:14:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 647971 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rqP7p3Qf3z9s5w for ; Thu, 14 Jul 2016 02:15:33 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=olzEoE8j; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=gFcYzcubwfS3M3n/yuAiTz3dBCy7eRSBr9sA7t0GA8J aH7QbeRZkgaT2MdysVmAC3vrqCHZLGNrvqqdNTXmNhvBBeW4ZWFeISRrZ9q/8T3l caHsL15UrwbbJY+GbA1GJZWJdonAllBME2TnhUAnBNhMeUKlsuj5W76MxoOlGSC4 = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=g/q6t3V2NIKjor76xU7CenJrOng=; b=olzEoE8jqvVJiYFzs MpujWu60ZUbaeeWWMrLneHvYwMTJYSsuz6sYx8k+0Nbg/gEsi/2J4vVpT4LasJWJ OGGyASx+r8UX2yN+BmGAY0iqLNXJguPOp487ZGwNhptN4mVtMw9/LJpGIVbyPwoL ipbkn7abyecv0FVy9tuQHLFxJQ= Received: (qmail 25608 invoked by alias); 13 Jul 2016 16:15:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 25442 invoked by uid 89); 13 Jul 2016 16:15:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=xreg, UD:aarch64.c, *info, aarch64c X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 13 Jul 2016 16:14:58 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 64CF62F; Wed, 13 Jul 2016 09:16:00 -0700 (PDT) Received: from [10.2.206.43] (e100706-lin.cambridge.arm.com [10.2.206.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C1C1D3F25F; Wed, 13 Jul 2016 09:14:54 -0700 (PDT) Message-ID: <578668FD.1020706@foss.arm.com> Date: Wed, 13 Jul 2016 17:14:53 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP Hi all, The most common way to load and store TImode value in aarch64 is to perform an LDP/STP of two X-registers. This is the *movti_aarch64 pattern in aarch64.md. There is a bug in the logic in aarch64_classify_address where it validates the offset in the address used to load a TImode value. It passes down TImode to the aarch64_offset_7bit_signed_scaled_p check which rejects offsets that are not a multiple of the mode size of TImode (16). However, this is too conservative as X-reg LDP/STP instructions accept immediate offsets that are a multiple of 8. Also, considering that the definition of aarch64_offset_7bit_signed_scaled_p is: return (offset >= -64 * GET_MODE_SIZE (mode) && offset < 64 * GET_MODE_SIZE (mode) && offset % GET_MODE_SIZE (mode) == 0); I think the range check may even be wrong for TImode as this will accept offsets in the range [-1024, 1024) (as long as they are a multiple of 16) whereas X-reg LDP/STP instructions only accept offsets in the range [-512, 512). So since the check is for an X-reg LDP/STP address we should be passing down DImode. This patch does that and enables more aggressive generation of REG+IMM addressing modes for 64-bit aligned TImode values, eliminating many address calculation instructions. For the testcase in the patch we currently generate: bar: add x1, x1, 8 add x0, x0, 8 ldp x2, x3, [x1] stp x2, x3, [x0] ret whereas with this patch we generate: bar: ldp x2, x3, [x1, 8] stp x2, x3, [x0, 8] ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2016-07-13 Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP addresses. 2016-07-13 Kyrylo Tkachov * gcc.target/aarch64/ldp_stp_unaligned_1.c: New test. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index bea67f88b900be39b6f1ae002353b44c5a4a9f7d..8fd93a54c54ab86c6e600afba48fa441101b57c7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4033,9 +4033,11 @@ aarch64_classify_address (struct aarch64_address_info *info, X,X: 7-bit signed scaled offset Q: 9-bit signed offset We conservatively require an offset representable in either mode. - */ + When performing the check for pairs of X registers i.e. LDP/STP + pass down DImode since that is the natural size of the LDP/STP + instruction memory accesses. */ if (mode == TImode || mode == TFmode) - return (aarch64_offset_7bit_signed_scaled_p (mode, offset) + return (aarch64_offset_7bit_signed_scaled_p (DImode, offset) && offset_9bit_signed_unscaled_p (mode, offset)); /* A 7bit offset check because OImode will emit a ldp/stp diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c new file mode 100644 index 0000000000000000000000000000000000000000..a70f92100fb91bcfdcfd4af1cab6f58915038568 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c @@ -0,0 +1,20 @@ +/* { dg-options "-O2" } */ + +/* Check that we can use a REG + IMM addressing mode when moving an unaligned + TImode value to and from memory. */ + +struct foo +{ + long long b; + __int128 a; +} __attribute__ ((packed)); + +void +bar (struct foo *p, struct foo *q) +{ + p->a = q->a; +} + +/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */ +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */ +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */