From patchwork Thu Aug 19 14:14:58 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejas Belagod X-Patchwork-Id: 62160 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 5F7D3B6EFF for ; Fri, 20 Aug 2010 00:15:15 +1000 (EST) Received: (qmail 24637 invoked by alias); 19 Aug 2010 14:15:11 -0000 Received: (qmail 24619 invoked by uid 22791); 19 Aug 2010 14:15:07 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL, BAYES_00, TW_CN, TW_QH, TW_VQ, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from cam-admin0.cambridge.arm.com (HELO cam-admin0.cambridge.arm.com) (217.140.96.50) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 19 Aug 2010 14:15:01 +0000 Received: from cam-owa2.Emea.Arm.com (cam-owa2.emea.arm.com [10.1.105.18]) by cam-admin0.cambridge.arm.com (8.12.6/8.12.6) with ESMTP id o7JEDwF9004978 for ; Thu, 19 Aug 2010 15:13:58 +0100 (BST) Received: from [10.1.67.25] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 19 Aug 2010 15:14:57 +0100 Subject: [Patch, ARM] Implement widening vector moves and mults. From: Tejas Belagod To: gcc-patches@gcc.gnu.org Date: Thu, 19 Aug 2010 15:14:58 +0100 Message-Id: <1282227298.30429.50.camel@e102484-lin.cambridge.arm.com> Mime-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, Take 2 with the patch! This patch implements support for vector widening signed and unsigned moves and multiplications viz. VMOVL. and VMULL. NEON instructions. This helps vectorize loops whose bodies have widening moves or multiplications when compiled for NEON. This patch is implemented to have support for vectorizing with and without -mvectorize-with-neon-quad. Regression tested on trunk. OK for trunk? --- Tejas Belagod ARM. gcc/testsuite 2010-08-19 Tejas Belagod * lib/target-supports.exp (check_effective_target_vect_unpack): Set vect_unpack supported flag to true for neon. gcc/ 2010-08-19 Tejas Belagod * config/arm/iterators.md (VU, SE, V_widen_l): New. (V_unpack, US): New. * config/arm/neon.md (vec_unpack_hi_): Expansion for vmovl. (vec_unpack_lo_): Likewise. (neon_vec_unpack_hi_): Instruction pattern for vmovl. (neon_vec_unpack_lo_): Likewise. (vec_widen_mult_lo_): Expansion for vmull. (vec_widen_mult_hi_): Likewise. (neon_vec_mult_lo_"): Instruction pattern for vmull. (neon_vec_mult_hi_"): Likewise. (neon_unpack_): Widening move intermediate step for vectorizing without -mvectorize-with-neon-quad. (neon_vec_mult_): Widening multiply intermediate step for vectorizing without -mvectorize-with-neon-quad. * config/arm/predicates.md (vect_par_constant_high): Check for high-half lanes of a vector. (vect_par_constant_low): Check for low-half lanes of a vector. diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index ee04aab..3717f6b 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -136,7 +136,9 @@ ;; Modes with 32-bit elements only. (define_mode_iterator V32 [V2SI V2SF V4SI V4SF]) - +;; Modes with 8-bit, 16-bit and 32-bit elements. +(define_mode_iterator VU [V16QI V8HI V4SI]) + ;;---------------------------------------------------------------------------- ;; Code iterators ;;---------------------------------------------------------------------------- @@ -156,6 +158,8 @@ ;; without unsigned variants (for use with *SFmode pattern). (define_code_iterator vqhs_ops [plus smin smax]) +;; A list of widening operators +(define_code_iterator SE [sign_extend zero_extend]) ;;---------------------------------------------------------------------------- ;; Mode attributes @@ -360,6 +364,11 @@ (V2SF "2") (V4SF "4") (DI "1") (V2DI "2")]) +;; Same as V_widen, but lower-case. +(define_mode_attr V_widen_l [(V8QI "v8hi") (V4HI "v4si") ( V2SI "v2di")]) + +;; Widen. Result is half the number of elements, but widened to double-width. +(define_mode_attr V_unpack [(V16QI "V8HI") (V8HI "V4SI") (V4SI "V2DI")]) ;;---------------------------------------------------------------------------- ;; Code attributes @@ -375,3 +384,6 @@ (define_code_attr cnb [(ltu "CC_C") (geu "CC")]) (define_code_attr optab [(ltu "ltu") (geu "geu")]) + +;; Assembler mnemonics for signedness of widening operations +(define_code_attr US [(sign_extend "s") (zero_extend "u")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index bdc279a..3b75a18 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -4977,3 +4977,205 @@ emit_insn (gen_orn3_neon (operands[0], operands[1], operands[2])); DONE; }) + +(define_insn "neon_vec_unpack_lo_" + [(set (match_operand: 0 "register_operand" "=w") + (SE: (vec_select: + (match_operand:VU 1 "register_operand" "w") + (match_operand:VU 2 "vect_par_constant_low" ""))))] + "TARGET_NEON" + "vmovl. %q0, %e1" + [(set_attr "neon_type" "neon_shift_1")] +) + +(define_insn "neon_vec_unpack_hi_" + [(set (match_operand: 0 "register_operand" "=w") + (SE: (vec_select: + (match_operand:VU 1 "register_operand" "w") + (match_operand:VU 2 "vect_par_constant_high" ""))))] + "TARGET_NEON" + "vmovl. %q0, %f1" + [(set_attr "neon_type" "neon_shift_1")] +) + +(define_expand "vec_unpack_hi_" + [(match_operand: 0 "register_operand" "") + (SE: (match_operand:VU 1 "register_operand"))] + "TARGET_NEON" + { + rtvec v = rtvec_alloc (/2) ; + rtx t1; + int i; + for (i = 0; i < (/2); i++) + RTVEC_ELT (v, i) = GEN_INT ((/2) + i); + + t1 = gen_rtx_PARALLEL (mode, v); + emit_insn (gen_neon_vec_unpack_hi_ (operands[0], + operands[1], + t1)); + DONE; + } +) + +(define_expand "vec_unpack_lo_" + [(match_operand: 0 "register_operand" "") + (SE: (match_operand:VU 1 "register_operand" ""))] + "TARGET_NEON" + { + rtvec v = rtvec_alloc (/2) ; + rtx t1; + int i; + for (i = 0; i < (/2) ; i++) + RTVEC_ELT (v, i) = GEN_INT (i); + t1 = gen_rtx_PARALLEL (mode, v); + emit_insn (gen_neon_vec_unpack_lo_ (operands[0], + operands[1], + t1)); + DONE; + } +) + +(define_insn "neon_vec_mult_lo_" + [(set (match_operand: 0 "register_operand" "=w") + (mult: (SE: (vec_select: + (match_operand:VU 1 "register_operand" "w") + (match_operand:VU 2 "vect_par_constant_low" ""))) + (SE: (vec_select: + (match_operand:VU 3 "register_operand" "w") + (match_dup 2)))))] + "TARGET_NEON" + "vmull. %q0, %e1, %e3" + [(set_attr "neon_type" "neon_shift_1")] +) + +(define_expand "vec_widen_mult_lo_" + [(match_operand: 0 "register_operand" "") + (SE: (match_operand:VU 1 "register_operand" "")) + (SE: (match_operand:VU 2 "register_operand" ""))] + "TARGET_NEON" + { + rtvec v = rtvec_alloc (/2) ; + rtx t1; + int i; + for (i = 0; i < (/2) ; i++) + RTVEC_ELT (v, i) = GEN_INT (i); + t1 = gen_rtx_PARALLEL (mode, v); + + emit_insn (gen_neon_vec_mult_lo_ (operands[0], + operands[1], + t1, + operands[2])); + DONE; + } +) + +(define_insn "neon_vec_mult_hi_" + [(set (match_operand: 0 "register_operand" "=w") + (mult: (SE: (vec_select: + (match_operand:VU 1 "register_operand" "w") + (match_operand:VU 2 "vect_par_constant_high" ""))) + (SE: (vec_select: + (match_operand:VU 3 "register_operand" "w") + (match_dup 2)))))] + "TARGET_NEON" + "vmull. %q0, %f1, %f3" + [(set_attr "neon_type" "neon_shift_1")] +) + +(define_expand "vec_widen_mult_hi_" + [(match_operand: 0 "register_operand" "") + (SE: (match_operand:VU 1 "register_operand" "")) + (SE: (match_operand:VU 2 "register_operand" ""))] + "TARGET_NEON" + { + rtvec v = rtvec_alloc (/2) ; + rtx t1; + int i; + for (i = 0; i < (/2) ; i++) + RTVEC_ELT (v, i) = GEN_INT (/2 + i); + t1 = gen_rtx_PARALLEL (mode, v); + + emit_insn (gen_neon_vec_mult_hi_ (operands[0], + operands[1], + t1, + operands[2])); + DONE; + + } +) + +;; Vectorize for non-neon-quad case +(define_insn "neon_unpack_" + [(set (match_operand: 0 "register_operand" "=w") + (SE: (match_operand:VDI 1 "register_operand" "")))] + "TARGET_NEON" + "vmovl. %q0, %1" + [(set_attr "neon_type" "neon_shift_1")] +) + +(define_expand "vec_unpack_lo_" + [(match_operand: 0 "register_operand" "") + (SE:(match_operand:VDI 1 "register_operand"))] + "TARGET_NEON" +{ + rtx tmpreg = gen_reg_rtx (mode); + emit_insn (gen_neon_unpack_ (tmpreg, operands[1])); + emit_insn (gen_neon_vget_low (operands[0], tmpreg)); + + DONE; +} +) + +(define_expand "vec_unpack_hi_" + [(match_operand: 0 "register_operand" "") + (SE:(match_operand:VDI 1 "register_operand"))] + "TARGET_NEON" +{ + rtx tmpreg = gen_reg_rtx (mode); + emit_insn (gen_neon_unpack_ (tmpreg, operands[1])); + emit_insn (gen_neon_vget_high (operands[0], tmpreg)); + + DONE; +} +) + +(define_insn "neon_vec_mult_" + [(set (match_operand: 0 "register_operand" "=w") + (mult: (SE: + (match_operand:VDI 1 "register_operand" "w")) + (SE: + (match_operand:VDI 2 "register_operand" "w"))))] + "TARGET_NEON" + "vmull. %q0, %1, %2" + [(set_attr "neon_type" "neon_shift_1")] +) + +(define_expand "vec_widen_mult_hi_" + [(match_operand: 0 "register_operand" "") + (SE: (match_operand:VDI 1 "register_operand" "")) + (SE: (match_operand:VDI 2 "register_operand" ""))] + "TARGET_NEON" + { + rtx tmpreg = gen_reg_rtx (mode); + emit_insn (gen_neon_vec_mult_ (tmpreg, operands[1], operands[2])); + emit_insn (gen_neon_vget_high (operands[0], tmpreg)); + + DONE; + + } +) + +(define_expand "vec_widen_mult_lo_" + [(match_operand: 0 "register_operand" "") + (SE: (match_operand:VDI 1 "register_operand" "")) + (SE: (match_operand:VDI 2 "register_operand" ""))] + "TARGET_NEON" + { + rtx tmpreg = gen_reg_rtx (mode); + emit_insn (gen_neon_vec_mult_ (tmpreg, operands[1], operands[2])); + emit_insn (gen_neon_vget_low (operands[0], tmpreg)); + + DONE; + + } +) diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index da3b6dc..4e12670 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -619,3 +619,52 @@ (and (match_test "TARGET_32BIT") (match_operand 0 "arm_di_operand")))) +;; Predicates for parallel expanders based on mode. +(define_special_predicate "vect_par_constant_high" + (match_code "parallel") +{ + HOST_WIDE_INT count = XVECLEN (op, 0); + int i; + int base = GET_MODE_NUNITS (mode); + + if ((count < 1) + || (count != base/2)) + return false; + + if (!VECTOR_MODE_P (mode)) + return false; + + for (i = 0; i < count; i++) + { + rtx elt = XVECEXP (op, 0, i); + int val = INTVAL (elt); + if (val != (base/2) + i) + return false; + } + return true; +}) + +(define_special_predicate "vect_par_constant_low" + (match_code "parallel") +{ + HOST_WIDE_INT count = XVECLEN (op, 0); + int i; + int base = GET_MODE_NUNITS (mode); + + if ((count < 1) + || (count != base/2)) + return false; + + if (!VECTOR_MODE_P (mode)) + return false; + + for (i = 0; i < count; i++) + { + rtx elt = XVECEXP (op, 0, i); + int val = INTVAL (elt); + if (val != i) + return false; + } + return true; +}) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 1682d58..4b95323 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2640,7 +2640,8 @@ proc check_effective_target_vect_unpack { } { if { ([istarget powerpc*-*-*] && ![istarget powerpc-*paired*]) || [istarget i?86-*-*] || [istarget x86_64-*-*] - || [istarget spu-*-*] } { + || [istarget spu-*-*] + || ([istarget arm*-*-*] && [check_effective_target_arm_neon]) } { set et_vect_unpack_saved 1 } }