From patchwork Fri Apr 21 08:39:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 753205 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3w8Tgk6Jmsz9ryk for ; Fri, 21 Apr 2017 18:39:45 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="qApE03wS"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=JkhFAsKH5hbdsIt89SEMaYbELmfd63N/tky+RnWd95k Ks5FkLeb7OC1tYwb8cOkx5g5oy39aIuyDcqt+y7GeON9tYgwDOyy/4++LZWz8CxR AmRT69rdABiljYQs/zG5mt17VbExhtvtKU7J0TaKJj8PWSZ0rnfXQQGp4ayuWxHo = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=92fK1gvQhlVOHQ039Kz5KDd1gok=; b=qApE03wSj0NBgedQP ASTNNxmBIimGr7K6lvT/cyUpd/Bk5DiFtnxsRM4huGE/mp3udrjujlRQWW8XdQ+1 HLeI8DW16CFcopNGnedLfScP/4TStBywjX0tD6YgGBzRfQgmYyoEZdKQVdQFxbut uqsIJLYWNnA46gdKaTwYzlt/xQ= Received: (qmail 126849 invoked by alias); 21 Apr 2017 08:39:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 126834 invoked by uid 89); 21 Apr 2017 08:39:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 21 Apr 2017 08:39:32 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 19D8580D; Fri, 21 Apr 2017 01:39:32 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 55DB23F41F; Fri, 21 Apr 2017 01:39:31 -0700 (PDT) Message-ID: <58F9C541.2090900@foss.arm.com> Date: Fri, 21 Apr 2017 09:39:29 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64] Add combine pattern for storing lane zero of a vecto Hi all, Consider the code: typedef long long v2di __attribute__ ((vector_size (16))); void store_laned (v2di x, long long *y) { y[0] = x[1]; y[3] = x[0]; } AArch64 GCC will generate: store_laned: umov x1, v0.d[0] st1 {v0.d}[1], [x0] str x1, [x0, 24] ret It moves the zero lane into a core register and does a scalar store when instead it could have used a scalar FP store that supports the required addressing mode: store_laned: st1 {v0.d}[1], [x0] str d0, [x0, 24] ret Combine already tries to match this pattern: Trying 10 -> 11: Failed to match this instruction: (set (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) (const_int 24 [0x18])) [1 MEM[(long long int *)y_4(D) + 24B]+0 S8 A64]) (vec_select:DI (reg/v:V2DI 75 [ x ]) (parallel [ (const_int 0 [0]) ]))) but we don't match it in the backend. It's not hard to add it, so this patch does that for all the relevant vector modes. With this patch we generate the second sequence above and in SPEC2006 eliminate some address computation instructions because we use the more expressive STR instead of ST1 or we eliminate such moves to the integer registers because we can just do the store of the D-reg. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2017-04-21 Kyrylo Tkachov * config/aarch64/aarch64-simd.md (aarch64_store_lane0): New pattern. 2017-04-21 Kyrylo Tkachov * gcc.target/aarch64/store_lane0_str_1.c: New test. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f36665e27bb66e0f1fb42443ce7b506bd2bf6914..bf13f0753a856a13ae92ceeb44291df3dc379a13 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -153,6 +153,19 @@ (define_insn "*aarch64_simd_mov" (set_attr "length" "4,4,4,8,8,8,4")] ) +;; When storing lane zero we can use the normal STR and its more permissive +;; addressing modes. + +(define_insn "aarch64_store_lane0" + [(set (match_operand: 0 "memory_operand" "=m") + (vec_select: (match_operand:VALL_F16 1 "register_operand" "w") + (parallel [(match_operand 2 "const_int_operand" "n")])))] + "TARGET_SIMD + && ENDIAN_LANE_N (mode, INTVAL (operands[2])) == 0" + "str\\t%1, %0" + [(set_attr "type" "neon_store1_1reg")] +) + (define_insn "load_pair" [(set (match_operand:VD 0 "register_operand" "=w") (match_operand:VD 1 "aarch64_mem_pair_operand" "Ump")) diff --git a/gcc/testsuite/gcc.target/aarch64/store_lane0_str_1.c b/gcc/testsuite/gcc.target/aarch64/store_lane0_str_1.c new file mode 100644 index 0000000000000000000000000000000000000000..4464fec2c1f24c212be4fc6c94b509843fd0058e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/store_lane0_str_1.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef int v2si __attribute__ ((vector_size (8))); +typedef float v2sf __attribute__ ((vector_size (8))); +typedef short v4hi __attribute__ ((vector_size (8))); +typedef __fp16 v4hf __attribute__ ((vector_size (8))); +typedef char v8qi __attribute__ ((vector_size (8))); + +typedef int v4si __attribute__ ((vector_size (16))); +typedef float v4sf __attribute__ ((vector_size (16))); +typedef short v8hi __attribute__ ((vector_size (16))); +typedef __fp16 v8hf __attribute__ ((vector_size (16))); +typedef char v16qi __attribute__ ((vector_size (16))); +typedef long long v2di __attribute__ ((vector_size (16))); +typedef double v2df __attribute__ ((vector_size (16))); + +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ +#define LANE(N) (N - 1) +#else +#define LANE(N) 0 +#endif + +#define FUNC(T, E, N) \ +void \ +store_lane_##T (T x, E *y) \ +{ \ + y[0] = x[N - 1 - LANE (N)]; \ + y[3] = x[LANE (N)]; \ +} + +FUNC (v2si, int, 2) +FUNC (v2sf, float, 2) +FUNC (v4hi, short, 4) +FUNC (v4hf, __fp16, 4) +FUNC (v8qi, char, 8) + +FUNC (v4si, int, 4) +FUNC (v4sf, float, 4) +FUNC (v8hi, short, 8) +FUNC (v8hf, __fp16, 8) +FUNC (v16qi, char, 16) +FUNC (v2di, long long, 2) +FUNC (v2df, double, 2) + +/* When storing lane zero of a vector we can use the scalar STR instruction + that supports more addressing modes. */ + +/* { dg-final { scan-assembler-times "str\ts\[0-9\]+" 4 } } */ +/* { dg-final { scan-assembler-times "str\tb\[0-9\]+" 2 } } */ +/* { dg-final { scan-assembler-times "str\th\[0-9\]+" 4 } } */ +/* { dg-final { scan-assembler-times "str\td\[0-9\]+" 2 } } */ +/* { dg-final { scan-assembler-not "umov" } } */ +/* { dg-final { scan-assembler-not "dup" } } */