From patchwork Tue May 29 13:23:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 922119 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-478663-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=foss.arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="JG0lEz7R"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40wDvg4xbQz9s0W for ; Tue, 29 May 2018 23:23:58 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=oDm6tGcFRRi7HMyygCaT23meVjXlfGjTBDcIdfCEiP2 UOTUF79zcQRpeCCuZPfaq3Zx6h97ydYRDQfwB2Fl5VWU9vW270bnfGk6nz1rjm7y xLJ3cbRl9MIG8USyw1gYXVIUVJ6slUmzqbD6GDhXQoQYiTfNF9nJvG0/hskoJobs = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=kvkSJR5IYLvWcLEH/RsML8LPM3I=; b=JG0lEz7ROTS+LK7ev xJ0yyfAb08nOnEwYKXpQKVR0wmOZcr8AayDEF43U8l0NkU7VClMPYig00cO5SmkD mqY694uAIbLkyjFsa4kzCBPkrW0Bgh6j2sVvJodb65bqn+MmN93TlvkR4XFBR58O oEMDf54awYljvmfVC7sa4njlZI= Received: (qmail 68538 invoked by alias); 29 May 2018 13:23:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 67500 invoked by uid 89); 29 May 2018 13:23:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH autolearn=ham version=3.3.2 spammy=paradoxical, tkachov, Tkachov, lane X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 May 2018 13:23:48 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 713E180D; Tue, 29 May 2018 06:23:46 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 913963F24A; Tue, 29 May 2018 06:23:45 -0700 (PDT) Message-ID: <5B0D5460.7030603@foss.arm.com> Date: Tue, 29 May 2018 14:23:44 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" CC: Marcus Shawcroft , "Richard Earnshaw (lists)" , James Greenhalgh Subject: [PATCH][AArch64] Avoid paradoxical subregs for vector initialisation Hi all, The recent changes to aarch64_expand_vector_init cause an ICE in the attached testcase. The register allocator "ICEs with Max. number of generated reload insns per insn is achieved (90)" That is because aarch64_expand_vector_init creates a paradoxical subreg to move a DImode value into a V2DI vector: (insn 74 72 76 8 (set (reg:V2DI 287 [ _166 ]) (subreg:V2DI (reg/v/f:DI 112 [ d ]) 0)) 1050 {*aarch64_simd_movv2di} This is done because we want to express that the whole of the V2DI vector will be written so that init-regs doesn't try to zero-initialise it before we overwrite each lane individually anyway. This can go bad for because if the DImode value is allocated in, say, x30: the last register in that register class, the V2DI subreg of that isn't valid or meaningful and that seems to cause the trouble. It's kinda hard to know what the right solution for this is. We could emit a duplicate of the value into all the lanes of the vector, but we have tests that test against that (we're trying to avoid unnecessary duplicates) What this patch does is it defines a pattern for moving a scalar into lane 0 of a vector using a simple FMOV or LDR and represents that as a merging with a vector of zeroes. That way, the instruction represents a full write of the destination vector but doesn't "read" more bits from the source than necessary. The zeroing effect is also a more accurate representation of the semantics of FMOV. Bootstrapped and tested on aarch64-none-linux-gnu and tested also on aarch64_be-none-elf. Ok for trunk? Thanks, Kyrill 2018-05-29 Kyrylo Tkachov * config/aarch64/aarch64-simd.md (aarch64_simd_vec_mov): New define_insn. * config/aarch64/aarch64.c (aarch64_get_aarch64_simd_vec_mov_code): New function. (aarch64_expand_vector_init): Avoid creating paradoxical subregs of integer modes. 2018-05-29 Kyrylo Tkachov * c-c++-common/torture/aarch64-vect-init-1.c: New test. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9f4cb72f5c321136cfcc263bcb7c956365dbc512..292848984081e1fb517b85fc10816e50f5d0c841 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -808,6 +808,25 @@ (define_insn "aarch64_simd_vec_set" [(set_attr "type" "neon_ins, neon_from_gp, neon_load1_one_lane")] ) + +;; Helper for aarch64_expand_vector_init. + +(define_insn "aarch64_simd_vec_mov" + [(set (match_operand:VALL_F16 0 "register_operand" "=w,w,w") + (vec_merge:VALL_F16 + (vec_duplicate:VALL_F16 + (match_operand: 1 "general_operand" "w,?r,m")) + (match_operand:VALL_F16 2 "aarch64_simd_imm_zero" ) + (match_operand:SI 3 "immediate_operand")))] + "TARGET_SIMD + && ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[3]))) == 0" + "@ + fmov\\t%d0, %d1 + fmov\\t%d0, %1 + ldr\\t%0, %1" + [(set_attr "type" "fmov,f_mcr,neon_load1_1reg")] +) + (define_insn "*aarch64_simd_vec_copy_lane" [(set (match_operand:VALL_F16 0 "register_operand" "=w") (vec_merge:VALL_F16 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b9759c638b010567f5047ff5129912986a30b49a..fdd8d2f587087180c71ce71003ec074dceeb6242 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -13879,6 +13879,44 @@ aarch64_simd_make_constant (rtx vals) return NULL_RTX; } +/* Return the insn_code to use for MODE from the aarch64_simd_vec_mov + family of patterns. */ + +insn_code +aarch64_get_aarch64_simd_vec_mov_code (machine_mode mode) +{ + switch (mode) + { + case E_V8QImode: + return CODE_FOR_aarch64_simd_vec_movv8qi; + case E_V16QImode: + return CODE_FOR_aarch64_simd_vec_movv16qi; + case E_V4HImode: + return CODE_FOR_aarch64_simd_vec_movv4hi; + case E_V8HImode: + return CODE_FOR_aarch64_simd_vec_movv8hi; + case E_V2SImode: + return CODE_FOR_aarch64_simd_vec_movv2si; + case E_V4SImode: + return CODE_FOR_aarch64_simd_vec_movv4si; + case E_V2DImode: + return CODE_FOR_aarch64_simd_vec_movv2di; + case E_V4HFmode: + return CODE_FOR_aarch64_simd_vec_movv4hf; + case E_V8HFmode: + return CODE_FOR_aarch64_simd_vec_movv8hf; + case E_V2SFmode: + return CODE_FOR_aarch64_simd_vec_movv2sf; + case E_V4SFmode: + return CODE_FOR_aarch64_simd_vec_movv4sf; + case E_V2DFmode: + return CODE_FOR_aarch64_simd_vec_movv2df; + default: + gcc_unreachable (); + } + return CODE_FOR_nothing; +} + /* Expand a vector initialisation sequence, such that TARGET is initialised to contain VALS. */ @@ -13967,7 +14005,7 @@ aarch64_expand_vector_init (rtx target, rtx vals) are equally useless to us, in which case just immediately set the vector register using the first element. */ - if (maxv == 1) + if (maxv == 1 && GET_MODE_NUNITS (mode).is_constant ()) { /* For vectors of two 64-bit elements, we can do even better. */ if (n_elts == 2 @@ -13999,12 +14037,24 @@ aarch64_expand_vector_init (rtx target, rtx vals) return; } } - /* The subreg-move sequence below will move into lane zero of the + /* The move sequence below will move into lane zero of the vector register. For big-endian we want that position to hold the last element of VALS. */ maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0; rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement)); - aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode)); + + /* We will be writing all the lanes in the vector but the init-regs + machinery doesn't recognise the lane-by-lane initialisation + sequence of the whole vector and will emit an explicit zeroing + of TARGET. We want to avoid that. + We describe the first lane insert as a scalar value concatenated + with a vector of zeroes. */ + insn_code fmov_code = aarch64_get_aarch64_simd_vec_mov_code (mode); + unsigned HOST_WIDE_INT merge_imm + = HOST_WIDE_INT_1U + << ENDIAN_LANE_N (GET_MODE_NUNITS (mode).to_constant (), 0); + emit_insn (GEN_FCN (fmov_code) (target, x, CONST0_RTX (mode), + gen_int_mode (merge_imm, SImode))); } else { diff --git a/gcc/testsuite/g++.dg/torture/aarch64-vect-init-1.c b/gcc/testsuite/g++.dg/torture/aarch64-vect-init-1.c new file mode 100644 index 0000000000000000000000000000000000000000..07892018a7331a819d14ae72e9ca549533092483 --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/aarch64-vect-init-1.c @@ -0,0 +1,28 @@ +class A { +public: + unsigned char *fn1(); + int fn2(); +}; + +class B { + A fld1; + int fld2; + void fn3(); + unsigned char fld3; +}; + +int a; + +void +B::fn3() { + int b = fld1.fn2() / 8; + unsigned char *c = fld1.fn1(), *d = &fld3, *e = c; + for (; a < fld2;) + for (int j = 0; j < b; j++) + *d++ = e[j]; + for (; 0 < fld2;) + for (int j = 0; j < b; j++) + e[j] = *d++; + for (; fld2;) + ; +}