From patchwork Tue Jun 10 08:53:06 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 357794 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id D7E65140087 for ; Tue, 10 Jun 2014 18:53:20 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=rO9orWbNZ4vWuAnaSNyB1GwsfvMR3Amv45ujJDhi3Kv pt9qTS3u8Xm3pGrLvt1Lv1ge41KbF+ISP+FosjCbQUlFCVNTTTQvG0KO/4PgUiK4 H2Lwg8QVHVrT3324tdsD7YCWiLJGPlgxoiaUalIFrN7AlgFyAaxRkTToBBku1fnE = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=vDRc22z3RfLTACQMb2CYUE3nKlQ=; b=b0sI0EHTyl76a/Kgi R+4qnNT7rtrA0kBLPG3NI3jM213ZerPppBnnrY71f+gYej7NE+EwRzYpAjxeTzSV ze+hGfxQpcKxSfv/PEkddpOySW70olW7/oVm40NfHHo1Axmh5u4RYICXy6FZtW4j aiT4Rk3cIw2zMGA0fTK7UN2moo= Received: (qmail 8184 invoked by alias); 10 Jun 2014 08:53:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8170 invoked by uid 89); 10 Jun 2014 08:53:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 10 Jun 2014 08:53:09 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 10 Jun 2014 09:53:07 +0100 Received: from [10.1.208.24] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 10 Jun 2014 09:53:02 +0100 Message-ID: <5396C772.3040405@arm.com> Date: Tue, 10 Jun 2014 09:53:06 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw Subject: [PATCH][AArch64] Add a big-endian lane flip at expand-time in saturating math patterns X-MC-Unique: 114061009530702801 X-IsSubscribed: yes Hi all, On some of the saturating math expanders we need to perform a lane flip on big-endian when expanding to RTL so that we keep consistent with GCCs' view of lane numbering. During assembly emission the pattern will perform another lane flip to translate from GCCs' numbering to the architectural lane number. To do this a few of the patterns were renamed to *_internal and given an expander that will perform that first lane flip while the existing expanders get a lane flip added to them. The tests for these patterns will come soon in a separate patch. With this patch, when the user uses something like vqdmlal_laneq_s16 (a, b, c, 0) from arm_neon.h in big endian the resulting instruction will access lane 0 of c now, whereas before it would access lane 7. Tested and bootstrapped aarch64-none-linux-gnu and aarch64_be-none-elf. Ok for trunk? Thanks, Kyrill 2014-06-10 Kyrylo Tkachov * config/aarch64/aarch64-simd.md (aarch64_sqdmulh_lane): New expander. (aarch64_sqrdmulh_lane): Likewise. (aarch64_sqdmulh_lane): Rename to... (aarch64_sqdmulh_lane_interna): ...this. (aarch64_sqdmulh_laneq): New expander. (aarch64_sqrdmulh_laneq): Likewise. (aarch64_sqdmulh_laneq): Rename to... (aarch64_sqdmulh_laneq_internal): ...this. (aarch64_sqdmulh_lane): New expander. (aarch64_sqrdmulh_lane): Likewise. (aarch64_sqdmulh_lane): Rename to... (aarch64_sqdmulh_lane_internal): ...this. (aarch64_sqdmlal_lane): Add lane flip for big-endian. (aarch64_sqdmlal_laneq): Likewise. (aarch64_sqdmlsl_lane): Likewise. (aarch64_sqdmlsl_laneq): Likewise. (aarch64_sqdmlal2_lane): Likewise. (aarch64_sqdmlal2_laneq): Likewise. (aarch64_sqdmlsl2_lane): Likewise. (aarch64_sqdmlsl2_laneq): Likewise. (aarch64_sqdmull_lane): Likewise. (aarch64_sqdmull_laneq): Likewise. (aarch64_sqdmull2_lane): Likewise. (aarch64_sqdmull2_laneq): Likewise. commit 18ed07903bb21e7dea185a1618a130cd88ed9de7 Author: Kyrylo Tkachov Date: Tue Jun 3 15:27:09 2014 +0100 [AArch64] Saturating math lane fixes diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 108bc8d..fc028f5 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2650,7 +2650,41 @@ ;; sqdmulh_lane -(define_insn "aarch64_sqdmulh_lane" +(define_expand "aarch64_sqdmulh_lane" + [(match_operand:VDQHS 0 "register_operand" "") + (match_operand:VDQHS 1 "register_operand" "") + (match_operand: 2 "register_operand" "") + (match_operand:SI 3 "immediate_operand" "")] + "TARGET_SIMD" + { + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); + emit_insn (gen_aarch64_sqdmulh_lane_internal (operands[0], + operands[1], + operands[2], + operands[3])); + DONE; + } +) + +(define_expand "aarch64_sqrdmulh_lane" + [(match_operand:VDQHS 0 "register_operand" "") + (match_operand:VDQHS 1 "register_operand" "") + (match_operand: 2 "register_operand" "") + (match_operand:SI 3 "immediate_operand" "")] + "TARGET_SIMD" + { + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); + emit_insn (gen_aarch64_sqrdmulh_lane_internal (operands[0], + operands[1], + operands[2], + operands[3])); + DONE; + } +) + +(define_insn "aarch64_sqdmulh_lane_internal" [(set (match_operand:VDQHS 0 "register_operand" "=w") (unspec:VDQHS [(match_operand:VDQHS 1 "register_operand" "w") @@ -2666,7 +2700,41 @@ [(set_attr "type" "neon_sat_mul__scalar")] ) -(define_insn "aarch64_sqdmulh_laneq" +(define_expand "aarch64_sqdmulh_laneq" + [(match_operand:VDQHS 0 "register_operand" "") + (match_operand:VDQHS 1 "register_operand" "") + (match_operand: 2 "register_operand" "") + (match_operand:SI 3 "immediate_operand" "")] + "TARGET_SIMD" + { + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); + emit_insn (gen_aarch64_sqdmulh_laneq_internal (operands[0], + operands[1], + operands[2], + operands[3])); + DONE; + } +) + +(define_expand "aarch64_sqrdmulh_laneq" + [(match_operand:VDQHS 0 "register_operand" "") + (match_operand:VDQHS 1 "register_operand" "") + (match_operand: 2 "register_operand" "") + (match_operand:SI 3 "immediate_operand" "")] + "TARGET_SIMD" + { + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); + emit_insn (gen_aarch64_sqrdmulh_laneq_internal (operands[0], + operands[1], + operands[2], + operands[3])); + DONE; + } +) + +(define_insn "aarch64_sqdmulh_laneq_internal" [(set (match_operand:VDQHS 0 "register_operand" "=w") (unspec:VDQHS [(match_operand:VDQHS 1 "register_operand" "w") @@ -2676,13 +2744,46 @@ VQDMULH))] "TARGET_SIMD" "* - aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); return \"sqdmulh\\t%0., %1., %2.[%3]\";" [(set_attr "type" "neon_sat_mul__scalar")] ) -(define_insn "aarch64_sqdmulh_lane" +(define_expand "aarch64_sqdmulh_lane" + [(match_operand:SD_HSI 0 "register_operand" "") + (match_operand:SD_HSI 1 "register_operand" "") + (match_operand: 2 "register_operand" "") + (match_operand:SI 3 "immediate_operand" "")] + "TARGET_SIMD" + { + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); + emit_insn (gen_aarch64_sqdmulh_lane_internal (operands[0], + operands[1], + operands[2], + operands[3])); + DONE; + } +) + +(define_expand "aarch64_sqrdmulh_lane" + [(match_operand:SD_HSI 0 "register_operand" "") + (match_operand:SD_HSI 1 "register_operand" "") + (match_operand: 2 "register_operand" "") + (match_operand:SI 3 "immediate_operand" "")] + "TARGET_SIMD" + { + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); + emit_insn (gen_aarch64_sqrdmulh_lane_internal (operands[0], + operands[1], + operands[2], + operands[3])); + DONE; + } +) + +(define_insn "aarch64_sqdmulh_lane_internal" [(set (match_operand:SD_HSI 0 "register_operand" "=w") (unspec:SD_HSI [(match_operand:SD_HSI 1 "register_operand" "w") @@ -2692,7 +2793,6 @@ VQDMULH))] "TARGET_SIMD" "* - aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); return \"sqdmulh\\t%0, %1, %2.[%3]\";" [(set_attr "type" "neon_sat_mul__scalar")] @@ -2774,6 +2874,7 @@ "TARGET_SIMD" { aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (mode) / 2); + operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); emit_insn (gen_aarch64_sqdmlal_lane_internal (operands[0], operands[1], operands[2], operands[3], operands[4])); @@ -2789,6 +2890,7 @@ "TARGET_SIMD" { aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (mode)); + operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); emit_insn (gen_aarch64_sqdmlal_lane_internal (operands[0], operands[1], operands[2], operands[3], operands[4])); @@ -2804,6 +2906,7 @@ "TARGET_SIMD" { aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (mode) / 2); + operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); emit_insn (gen_aarch64_sqdmlsl_lane_internal (operands[0], operands[1], operands[2], operands[3], operands[4])); @@ -2819,6 +2922,7 @@ "TARGET_SIMD" { aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (mode)); + operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); emit_insn (gen_aarch64_sqdmlsl_lane_internal (operands[0], operands[1], operands[2], operands[3], operands[4])); @@ -2930,6 +3034,7 @@ { rtx p = aarch64_simd_vect_par_cnst_half (mode, true); aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (mode) / 2); + operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); emit_insn (gen_aarch64_sqdmlal2_lane_internal (operands[0], operands[1], operands[2], operands[3], operands[4], p)); @@ -2946,6 +3051,7 @@ { rtx p = aarch64_simd_vect_par_cnst_half (mode, true); aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (mode)); + operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); emit_insn (gen_aarch64_sqdmlal2_lane_internal (operands[0], operands[1], operands[2], operands[3], operands[4], p)); @@ -2962,6 +3068,7 @@ { rtx p = aarch64_simd_vect_par_cnst_half (mode, true); aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (mode) / 2); + operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); emit_insn (gen_aarch64_sqdmlsl2_lane_internal (operands[0], operands[1], operands[2], operands[3], operands[4], p)); @@ -2978,6 +3085,7 @@ { rtx p = aarch64_simd_vect_par_cnst_half (mode, true); aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (mode)); + operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); emit_insn (gen_aarch64_sqdmlsl2_lane_internal (operands[0], operands[1], operands[2], operands[3], operands[4], p)); @@ -3098,6 +3206,7 @@ "TARGET_SIMD" { aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode) / 2); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); emit_insn (gen_aarch64_sqdmull_lane_internal (operands[0], operands[1], operands[2], operands[3])); DONE; @@ -3111,6 +3220,7 @@ "TARGET_SIMD" { aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); emit_insn (gen_aarch64_sqdmull_lane_internal (operands[0], operands[1], operands[2], operands[3])); DONE; @@ -3203,6 +3313,7 @@ { rtx p = aarch64_simd_vect_par_cnst_half (mode, true); aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode) / 2); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); emit_insn (gen_aarch64_sqdmull2_lane_internal (operands[0], operands[1], operands[2], operands[3], p)); @@ -3218,6 +3329,7 @@ { rtx p = aarch64_simd_vect_par_cnst_half (mode, true); aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); emit_insn (gen_aarch64_sqdmull2_lane_internal (operands[0], operands[1], operands[2], operands[3], p));