From patchwork Tue Jun 6 08:35:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 771699 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3whlPT3vQgz9ryT for ; Tue, 6 Jun 2017 18:35:24 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="RQVlAoFR"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; q= dns; s=default; b=ipp3V4X38j4VXUJ181PW706JwKxUhGq/+p33/lRCbmBsny FzCVOcZDKNkPBXSg2nM6u60Ou1vDZMHradSe2SCXDvEmoZhv6W8kdAyX08iYqjLm iIA4zu5Z5sZWIjoSgzuMmgYqCMYgjG914VCxd6TYnVh0CWbh1quywBJX+8AI0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; s= default; bh=J96KaR99FYKKakZu264sXgtHAJ0=; b=RQVlAoFR5XzY+LsgMbtK OY9PYzP1pki52ZlOdY3LaGqS2PoRjN0af4G3Rwy7H8yqK7dnJxCdjpRFq/aNQ2d2 QrZmV054HDvV1BX7AzTH5Eu8zzRMPh9QeYu2yQdiWjPjrJAezvAMWTBbqRPik9rB eaBVP4SdKcONRsqcRUZpwXE= Received: (qmail 45279 invoked by alias); 6 Jun 2017 08:35:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 45266 invoked by uid 89); 6 Jun 2017 08:35:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=demonstrate, H*MI:2050005, effectiveness, H*M:2050005 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 06 Jun 2017 08:35:09 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D0F3E2B for ; Tue, 6 Jun 2017 01:35:11 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7BE0E3F578 for ; Tue, 6 Jun 2017 01:35:11 -0700 (PDT) Message-ID: <5936693E.2050005@foss.arm.com> Date: Tue, 06 Jun 2017 09:35:10 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches Subject: [PATCH] vec_merge + vec_duplicate + vec_concat simplification Hi all, Another vec_merge simplification that's missing is transforming: (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N)) into (vec_concat x z) if N == 1 (0b01) or (vec_concat y x) if N == 2 (0b10) For the testcase in this patch on aarch64 this allows us to try matching during combine the pattern: (set (reg:V2DI 78 [ x ]) (vec_concat:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64]) (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64]))) rather than the more complex: (set (reg:V2DI 78 [ x ]) (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64])) (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])) (const_int 2 [0x2]))) We don't actually have an aarch64 pattern for the simplified version above, but it's a simple enough form to add, so this patch adds such a pattern that performs a concatenated load of two 64-bit vectors in adjacent memory locations as a single Q-register LDR. The new aarch64 pattern is needed to demonstrate the effectiveness of the simplify-rtx change, so I've kept them together as one patch. Now for the testcase in the patch we can generate: construct_lanedi: ldr q0, [x0] ret construct_lanedf: ldr q0, [x0] ret instead of: construct_lanedi: ld1r {v0.2d}, [x0] ldr x0, [x0, 8] ins v0.d[1], x0 ret construct_lanedf: ld1r {v0.2d}, [x0] ldr d1, [x0, 8] ins v0.d[1], v1.d[0] ret The new memory constraint Utq is needed because we need to allow only the Q-register addressing modes but the MEM expressions in the RTL pattern have 64-bit vector modes, and if we don't constrain them they will allow the D-register addressing modes during register allocation/address mode selection, which will produce invalid assembly. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2017-06-06 Kyrylo Tkachov * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): Simplify vec_merge of vec_duplicate and vec_concat. * config/aarch64/constraints.md (Utq): New constraint. * config/aarch64/aarch64-simd.md (load_pair_lanes): New define_insn. 2017-06-06 Kyrylo Tkachov * gcc.target/aarch64/load_v2vec_lanes_1.c: New test. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 77a3a7d6534e5fd3575e33d5a7c607713abd614b..b78affe9b06ffc973888822a4fcf1ec8e80ecdf6 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2803,6 +2803,20 @@ (define_insn "aarch64_get_lane" [(set_attr "type" "neon_to_gp, neon_dup, neon_store1_one_lane")] ) +(define_insn "load_pair_lanes" + [(set (match_operand: 0 "register_operand" "=w") + (vec_concat: + (match_operand:VDC 1 "memory_operand" "Utq") + (match_operand:VDC 2 "memory_operand" "m")))] + "TARGET_SIMD && !STRICT_ALIGNMENT + && rtx_equal_p (XEXP (operands[2], 0), + plus_constant (Pmode, + XEXP (operands[1], 0), + GET_MODE_SIZE (mode)))" + "ldr\\t%q0, %1" + [(set_attr "type" "neon_load1_1reg_q")] +) + ;; In this insn, operand 1 should be low, and operand 2 the high part of the ;; dest vector. diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index b8293376fde7e03c4cfc2a6ad6268201f487eb92..ab607b9f7488e903a14fe93e88d4c4e1fad762b3 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -161,6 +161,13 @@ (define_memory_constraint "Utv" (and (match_code "mem") (match_test "aarch64_simd_mem_operand_p (op)"))) +(define_memory_constraint "Utq" + "@internal + An address valid for loading or storing a 128-bit AdvSIMD register" + (and (match_code "mem") + (match_test "aarch64_legitimate_address_p (V2DImode, XEXP (op, 0), + MEM, 1)"))) + (define_constraint "Ufc" "A floating point constant which can be used with an\ FMOV immediate operation." diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 42824b6c61af37f6b005de75bd1e5ebe7522bdba..a4aebae68afc14a69870e1fd280d28251aa5f398 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -5701,6 +5701,25 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode, std::swap (newop0, newop1); return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1); } + /* Replace (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N)) + with (vec_concat x z) if N == 1, or (vec_concat y x) if N == 2. + Only applies for vectors of two elements. */ + if (GET_CODE (op0) == VEC_DUPLICATE + && GET_CODE (op1) == VEC_CONCAT + && GET_MODE_NUNITS (GET_MODE (op0)) == 2 + && GET_MODE_NUNITS (GET_MODE (op1)) == 2 + && IN_RANGE (sel, 1, 2)) + { + rtx newop0 = XEXP (op0, 0); + rtx newop1 = XEXP (op1, 2 - sel); + rtx otherop = XEXP (op1, sel - 1); + if (sel == 2) + std::swap (newop0, newop1); + /* Don't want to throw away the other part of the vec_concat if + it has side-effects. */ + if (!side_effects_p (otherop)) + return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1); + } } if (rtx_equal_p (op0, op1) diff --git a/gcc/testsuite/gcc.target/aarch64/load_v2vec_lanes_1.c b/gcc/testsuite/gcc.target/aarch64/load_v2vec_lanes_1.c new file mode 100644 index 0000000000000000000000000000000000000000..3c31b340154b5469fca858a579e9a6ab90ee0d22 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/load_v2vec_lanes_1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef long long v2di __attribute__ ((vector_size (16))); +typedef double v2df __attribute__ ((vector_size (16))); + +v2di +construct_lanedi (long long *y) +{ + v2di x = { y[0], y[1] }; + return x; +} + +v2df +construct_lanedf (double *y) +{ + v2df x = { y[0], y[1] }; + return x; +} + +/* We can use the load_pair_lanes pattern to vec_concat two DI/DF + values from consecutive memory into a 2-element vector by using + a Q-reg LDR. */ + +/* { dg-final { scan-assembler-times "ldr\tq\[0-9\]+" 2 } } */ +/* { dg-final { scan-assembler-not "ins\t" } } */