From patchwork Thu Sep 18 12:02:10 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 390775 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 83F3D1401AF for ; Thu, 18 Sep 2014 22:02:24 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; q=dns; s=default; b=tOgmrjiDW5S9k3vOx Ca/6H/d+kTifThWHzWqHgELOwOI9QvIwrQz0dVBuxQG+8bmpiXNzXLwq2FgvRS+C 9Strig3R/iGTaFqsgn09z3l+7s4/jt0T52BF7G3j/IYNnW8FKH3NYE3H/nbl1z/z bXUfa4emoJQh5dRgJIRBcwdrPQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; s=default; bh=lrNPVgDJnXD0qsGnuWmhyHD U9UI=; b=dCldrT9rK+yJzBG0d1h2B4Dkh+vZPL9NG+eOUT1QE2MpBxT4CV9LG4V V/zjGhYy/tKWfok5d/iwK+K6jEsP0b35KNsvr/+30TslChLHAvm4p78oeQNBEh7r 6ebx7kZhbvmWahiY8qlJlTFUvrBGr7fGRXGwk3sQFJVZSis7IZ5g= Received: (qmail 3679 invoked by alias); 18 Sep 2014 12:02:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3668 invoked by uid 89); 18 Sep 2014 12:02:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 18 Sep 2014 12:02:15 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Thu, 18 Sep 2014 13:02:12 +0100 Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 18 Sep 2014 13:02:10 +0100 Message-ID: <541AC9C2.1000402@arm.com> Date: Thu, 18 Sep 2014 13:02:10 +0100 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" Subject: [PATCH 5/14][AArch64] Use new reduc_[us](min|max)_scal optabs, inc. for builtins References: <541AC4D2.9040901@arm.com> In-Reply-To: <541AC4D2.9040901@arm.com> X-MC-Unique: 114091813021202701 X-IsSubscribed: yes Similarly to the previous patch (r/2205), this migrates AArch64 to the new reduce-to-scalar optabs for min and max. For consistency we apply the same treatment to the smax_nan and smin_nan patterns (used for __builtins), even though reduc_smin_nan_scal (etc.) is not a standard name. Tested: check-gcc on aarch64-none-elf and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (reduc_smax_, reduc_smin_, reduc_umax_, reduc_umin_, reduc_smax_nan_, reduc_smin_nan_): Remove. (reduc_smax_scal_, reduc_smin_scal_, reduc_umax_scal_, reduc_umin_scal_, reduc_smax_nan_scal_, reduc_smin_nan_scal_): New. * config/aarch64/aarch64-simd.md (reduc__): Rename VDQV_S variant to... (reduc__internal): ...this. (reduc__): New (VDQ_BHSI). (reduc__scal_): New (*2). (reduc__v2si): Combine with below, renaming... (reduc__): Combine V2F with above, renaming... (reduc__internal_): ...to this (VDQF). * config/aarch64/arm_neon.h (vmaxv_f32, vmaxv_s8, vmaxv_s16, vmaxv_s32, vmaxv_u8, vmaxv_u16, vmaxv_u32, vmaxvq_f32, vmaxvq_f64, vmaxvq_s8, vmaxvq_s16, vmaxvq_s32, vmaxvq_u8, vmaxvq_u16, vmaxvq_u32, vmaxnmv_f32, vmaxnmvq_f32, vmaxnmvq_f64, vminv_f32, vminv_s8, vminv_s16, vminv_s32, vminv_u8, vminv_u16, vminv_u32, vminvq_f32, vminvq_f64, vminvq_s8, vminvq_s16, vminvq_s32, vminvq_u8, vminvq_u16, vminvq_u32, vminnmv_f32, vminnmvq_f32, vminnmvq_f64): Update to use __builtin_aarch64_reduc_..._scal; remove vget_lane wrapper. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index ae4ab42e3e3df7de4e4b2c5e46a1476a2ed64175..e213b9ce3adfc0c4c50b4dc34f4f1b995d5e8042 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -251,13 +251,13 @@ /* Implemented by aarch64_reduc_plus_. */ BUILTIN_VALL (UNOP, reduc_plus_scal_, 10) - /* Implemented by reduc__. */ - BUILTIN_VDQIF (UNOP, reduc_smax_, 10) - BUILTIN_VDQIF (UNOP, reduc_smin_, 10) - BUILTIN_VDQ_BHSI (UNOP, reduc_umax_, 10) - BUILTIN_VDQ_BHSI (UNOP, reduc_umin_, 10) - BUILTIN_VDQF (UNOP, reduc_smax_nan_, 10) - BUILTIN_VDQF (UNOP, reduc_smin_nan_, 10) + /* Implemented by reduc__scal_ (producing scalar). */ + BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10) + BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10) + BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10) + BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10) + BUILTIN_VDQF (UNOP, reduc_smax_nan_scal_, 10) + BUILTIN_VDQF (UNOP, reduc_smin_nan_scal_, 10) /* Implemented by 3. smax variants map to fmaxnm, diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 23b89584d9ba1d88ff49bfa28d210b325e7dea7f..d4a745be59897b4cb2a0de23adb56b5d79203592 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1828,7 +1828,64 @@ ;; 'across lanes' max and min ops. -(define_insn "reduc__" +(define_expand "reduc__" + [(match_operand:VDQ_BHSI 0 "register_operand") + (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand")] + MAXMINV)] + "TARGET_SIMD" + { + /* Old optab/standard name, should not be used since we are providing + newer reduc_..._scal_. */ + gcc_unreachable (); + } +) + +(define_expand "reduc__" + [(match_operand:VDQF 0 "register_operand") + (unspec:VDQF [(match_operand:VDQF 1 "register_operand")] + FMAXMINV)] + "TARGET_SIMD" + { + /* Old optab/standard name, should not be used since we are providing + newer reduc_..._scal_. */ + gcc_unreachable (); + } +) + +;; Template for outputting a scalar, so we can create __builtins which can be +;; gimple_fold'd to the REDUC_(MAX|MIN)_EXPR tree code. (This is FP smax/smin). +(define_expand "reduc__scal_" + [(match_operand: 0 "register_operand") + (unspec:VDQF [(match_operand:VDQF 1 "register_operand")] + FMAXMINV)] + "TARGET_SIMD" + { + rtx elt = GEN_INT (ENDIAN_LANE_N (mode, 0)); + rtx scratch = gen_reg_rtx (mode); + emit_insn (gen_aarch64_reduc__internal (scratch, + operands[1])); + emit_insn (gen_aarch64_get_lane (operands[0], scratch, elt)); + DONE; + } +) + +;; Likewise for integer cases, signed and unsigned. +(define_expand "reduc__scal_" + [(match_operand: 0 "register_operand") + (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand")] + MAXMINV)] + "TARGET_SIMD" + { + rtx elt = GEN_INT (ENDIAN_LANE_N (mode, 0)); + rtx scratch = gen_reg_rtx (mode); + emit_insn (gen_aarch64_reduc__internal (scratch, + operands[1])); + emit_insn (gen_aarch64_get_lane (operands[0], scratch, elt)); + DONE; + } +) + +(define_insn "aarch64_reduc__internal" [(set (match_operand:VDQV_S 0 "register_operand" "=w") (unspec:VDQV_S [(match_operand:VDQV_S 1 "register_operand" "w")] MAXMINV))] @@ -1837,7 +1894,7 @@ [(set_attr "type" "neon_reduc_minmax")] ) -(define_insn "reduc__v2si" +(define_insn "aarch64_reduc__internalv2si" [(set (match_operand:V2SI 0 "register_operand" "=w") (unspec:V2SI [(match_operand:V2SI 1 "register_operand" "w")] MAXMINV))] @@ -1846,24 +1903,15 @@ [(set_attr "type" "neon_reduc_minmax")] ) -(define_insn "reduc__" - [(set (match_operand:V2F 0 "register_operand" "=w") - (unspec:V2F [(match_operand:V2F 1 "register_operand" "w")] +(define_insn "aarch64_reduc__internal" + [(set (match_operand:VDQF 0 "register_operand" "=w") + (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")] FMAXMINV))] "TARGET_SIMD" - "p\\t%0, %1." + "\\t%0, %1." [(set_attr "type" "neon_fp_reduc_minmax_")] ) -(define_insn "reduc__v4sf" - [(set (match_operand:V4SF 0 "register_operand" "=w") - (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "w")] - FMAXMINV))] - "TARGET_SIMD" - "v\\t%s0, %1.4s" - [(set_attr "type" "neon_fp_reduc_minmax_s_q")] -) - ;; aarch64_simd_bsl may compile to any of bsl/bif/bit depending on register ;; allocation. ;; Operand 1 is the mask, operands 2 and 3 are the bitfields from which diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 734788e1c0fc81f6bf7efc126b357a74c22692f5..35be8a0ba913461552e9cc1e740dffb6f6c95bd4 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -18047,106 +18047,91 @@ vmaxnmq_f64 (float64x2_t __a, float64x2_t __b) __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vmaxv_f32 (float32x2_t __a) { - return vget_lane_f32 (__builtin_aarch64_reduc_smax_nan_v2sf (__a), - 0); + return __builtin_aarch64_reduc_smax_nan_scal_v2sf (__a); } __extension__ static __inline int8_t __attribute__ ((__always_inline__)) vmaxv_s8 (int8x8_t __a) { - return vget_lane_s8 (__builtin_aarch64_reduc_smax_v8qi (__a), 0); + return __builtin_aarch64_reduc_smax_scal_v8qi (__a); } __extension__ static __inline int16_t __attribute__ ((__always_inline__)) vmaxv_s16 (int16x4_t __a) { - return vget_lane_s16 (__builtin_aarch64_reduc_smax_v4hi (__a), 0); + return __builtin_aarch64_reduc_smax_scal_v4hi (__a); } __extension__ static __inline int32_t __attribute__ ((__always_inline__)) vmaxv_s32 (int32x2_t __a) { - return vget_lane_s32 (__builtin_aarch64_reduc_smax_v2si (__a), 0); + return __builtin_aarch64_reduc_smax_scal_v2si (__a); } __extension__ static __inline uint8_t __attribute__ ((__always_inline__)) vmaxv_u8 (uint8x8_t __a) { - return vget_lane_u8 ((uint8x8_t) - __builtin_aarch64_reduc_umax_v8qi ((int8x8_t) __a), - 0); + return __builtin_aarch64_reduc_umax_scal_v8qi_uu (__a); } __extension__ static __inline uint16_t __attribute__ ((__always_inline__)) vmaxv_u16 (uint16x4_t __a) { - return vget_lane_u16 ((uint16x4_t) - __builtin_aarch64_reduc_umax_v4hi ((int16x4_t) __a), - 0); + return __builtin_aarch64_reduc_umax_scal_v4hi_uu (__a); } __extension__ static __inline uint32_t __attribute__ ((__always_inline__)) vmaxv_u32 (uint32x2_t __a) { - return vget_lane_u32 ((uint32x2_t) - __builtin_aarch64_reduc_umax_v2si ((int32x2_t) __a), - 0); + return __builtin_aarch64_reduc_umax_scal_v2si_uu (__a); } __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vmaxvq_f32 (float32x4_t __a) { - return vgetq_lane_f32 (__builtin_aarch64_reduc_smax_nan_v4sf (__a), - 0); + return __builtin_aarch64_reduc_smax_nan_scal_v4sf (__a); } __extension__ static __inline float64_t __attribute__ ((__always_inline__)) vmaxvq_f64 (float64x2_t __a) { - return vgetq_lane_f64 (__builtin_aarch64_reduc_smax_nan_v2df (__a), - 0); + return __builtin_aarch64_reduc_smax_nan_scal_v2df (__a); } __extension__ static __inline int8_t __attribute__ ((__always_inline__)) vmaxvq_s8 (int8x16_t __a) { - return vgetq_lane_s8 (__builtin_aarch64_reduc_smax_v16qi (__a), 0); + return __builtin_aarch64_reduc_smax_scal_v16qi (__a); } __extension__ static __inline int16_t __attribute__ ((__always_inline__)) vmaxvq_s16 (int16x8_t __a) { - return vgetq_lane_s16 (__builtin_aarch64_reduc_smax_v8hi (__a), 0); + return __builtin_aarch64_reduc_smax_scal_v8hi (__a); } __extension__ static __inline int32_t __attribute__ ((__always_inline__)) vmaxvq_s32 (int32x4_t __a) { - return vgetq_lane_s32 (__builtin_aarch64_reduc_smax_v4si (__a), 0); + return __builtin_aarch64_reduc_smax_scal_v4si (__a); } __extension__ static __inline uint8_t __attribute__ ((__always_inline__)) vmaxvq_u8 (uint8x16_t __a) { - return vgetq_lane_u8 ((uint8x16_t) - __builtin_aarch64_reduc_umax_v16qi ((int8x16_t) __a), - 0); + return __builtin_aarch64_reduc_umax_scal_v16qi_uu (__a); } __extension__ static __inline uint16_t __attribute__ ((__always_inline__)) vmaxvq_u16 (uint16x8_t __a) { - return vgetq_lane_u16 ((uint16x8_t) - __builtin_aarch64_reduc_umax_v8hi ((int16x8_t) __a), - 0); + return __builtin_aarch64_reduc_umax_scal_v8hi_uu (__a); } __extension__ static __inline uint32_t __attribute__ ((__always_inline__)) vmaxvq_u32 (uint32x4_t __a) { - return vgetq_lane_u32 ((uint32x4_t) - __builtin_aarch64_reduc_umax_v4si ((int32x4_t) __a), - 0); + return __builtin_aarch64_reduc_umax_scal_v4si_uu (__a); } /* vmaxnmv */ @@ -18154,20 +18139,19 @@ vmaxvq_u32 (uint32x4_t __a) __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vmaxnmv_f32 (float32x2_t __a) { - return vget_lane_f32 (__builtin_aarch64_reduc_smax_v2sf (__a), - 0); + return __builtin_aarch64_reduc_smax_scal_v2sf (__a); } __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vmaxnmvq_f32 (float32x4_t __a) { - return vgetq_lane_f32 (__builtin_aarch64_reduc_smax_v4sf (__a), 0); + return __builtin_aarch64_reduc_smax_scal_v4sf (__a); } __extension__ static __inline float64_t __attribute__ ((__always_inline__)) vmaxnmvq_f64 (float64x2_t __a) { - return vgetq_lane_f64 (__builtin_aarch64_reduc_smax_v2df (__a), 0); + return __builtin_aarch64_reduc_smax_scal_v2df (__a); } /* vmin */ @@ -18293,107 +18277,91 @@ vminnmq_f64 (float64x2_t __a, float64x2_t __b) __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vminv_f32 (float32x2_t __a) { - return vget_lane_f32 (__builtin_aarch64_reduc_smin_nan_v2sf (__a), - 0); + return __builtin_aarch64_reduc_smin_nan_scal_v2sf (__a); } __extension__ static __inline int8_t __attribute__ ((__always_inline__)) vminv_s8 (int8x8_t __a) { - return vget_lane_s8 (__builtin_aarch64_reduc_smin_v8qi (__a), - 0); + return __builtin_aarch64_reduc_smin_scal_v8qi (__a); } __extension__ static __inline int16_t __attribute__ ((__always_inline__)) vminv_s16 (int16x4_t __a) { - return vget_lane_s16 (__builtin_aarch64_reduc_smin_v4hi (__a), 0); + return __builtin_aarch64_reduc_smin_scal_v4hi (__a); } __extension__ static __inline int32_t __attribute__ ((__always_inline__)) vminv_s32 (int32x2_t __a) { - return vget_lane_s32 (__builtin_aarch64_reduc_smin_v2si (__a), 0); + return __builtin_aarch64_reduc_smin_scal_v2si (__a); } __extension__ static __inline uint8_t __attribute__ ((__always_inline__)) vminv_u8 (uint8x8_t __a) { - return vget_lane_u8 ((uint8x8_t) - __builtin_aarch64_reduc_umin_v8qi ((int8x8_t) __a), - 0); + return __builtin_aarch64_reduc_umin_scal_v8qi_uu (__a); } __extension__ static __inline uint16_t __attribute__ ((__always_inline__)) vminv_u16 (uint16x4_t __a) { - return vget_lane_u16 ((uint16x4_t) - __builtin_aarch64_reduc_umin_v4hi ((int16x4_t) __a), - 0); + return __builtin_aarch64_reduc_umin_scal_v4hi_uu (__a); } __extension__ static __inline uint32_t __attribute__ ((__always_inline__)) vminv_u32 (uint32x2_t __a) { - return vget_lane_u32 ((uint32x2_t) - __builtin_aarch64_reduc_umin_v2si ((int32x2_t) __a), - 0); + return __builtin_aarch64_reduc_umin_scal_v2si_uu (__a); } __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vminvq_f32 (float32x4_t __a) { - return vgetq_lane_f32 (__builtin_aarch64_reduc_smin_nan_v4sf (__a), - 0); + return __builtin_aarch64_reduc_smin_nan_scal_v4sf (__a); } __extension__ static __inline float64_t __attribute__ ((__always_inline__)) vminvq_f64 (float64x2_t __a) { - return vgetq_lane_f64 (__builtin_aarch64_reduc_smin_nan_v2df (__a), - 0); + return __builtin_aarch64_reduc_smin_nan_scal_v2df (__a); } __extension__ static __inline int8_t __attribute__ ((__always_inline__)) vminvq_s8 (int8x16_t __a) { - return vgetq_lane_s8 (__builtin_aarch64_reduc_smin_v16qi (__a), 0); + return __builtin_aarch64_reduc_smin_scal_v16qi (__a); } __extension__ static __inline int16_t __attribute__ ((__always_inline__)) vminvq_s16 (int16x8_t __a) { - return vgetq_lane_s16 (__builtin_aarch64_reduc_smin_v8hi (__a), 0); + return __builtin_aarch64_reduc_smin_scal_v8hi (__a); } __extension__ static __inline int32_t __attribute__ ((__always_inline__)) vminvq_s32 (int32x4_t __a) { - return vgetq_lane_s32 (__builtin_aarch64_reduc_smin_v4si (__a), 0); + return __builtin_aarch64_reduc_smin_scal_v4si (__a); } __extension__ static __inline uint8_t __attribute__ ((__always_inline__)) vminvq_u8 (uint8x16_t __a) { - return vgetq_lane_u8 ((uint8x16_t) - __builtin_aarch64_reduc_umin_v16qi ((int8x16_t) __a), - 0); + return __builtin_aarch64_reduc_umin_scal_v16qi_uu (__a); } __extension__ static __inline uint16_t __attribute__ ((__always_inline__)) vminvq_u16 (uint16x8_t __a) { - return vgetq_lane_u16 ((uint16x8_t) - __builtin_aarch64_reduc_umin_v8hi ((int16x8_t) __a), - 0); + return __builtin_aarch64_reduc_umin_scal_v8hi_uu (__a); } __extension__ static __inline uint32_t __attribute__ ((__always_inline__)) vminvq_u32 (uint32x4_t __a) { - return vgetq_lane_u32 ((uint32x4_t) - __builtin_aarch64_reduc_umin_v4si ((int32x4_t) __a), - 0); + return __builtin_aarch64_reduc_umin_scal_v4si_uu (__a); } /* vminnmv */ @@ -18401,19 +18369,19 @@ vminvq_u32 (uint32x4_t __a) __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vminnmv_f32 (float32x2_t __a) { - return vget_lane_f32 (__builtin_aarch64_reduc_smin_v2sf (__a), 0); + return __builtin_aarch64_reduc_smin_scal_v2sf (__a); } __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vminnmvq_f32 (float32x4_t __a) { - return vgetq_lane_f32 (__builtin_aarch64_reduc_smin_v4sf (__a), 0); + return __builtin_aarch64_reduc_smin_scal_v4sf (__a); } __extension__ static __inline float64_t __attribute__ ((__always_inline__)) vminnmvq_f64 (float64x2_t __a) { - return vgetq_lane_f64 (__builtin_aarch64_reduc_smin_v2df (__a), 0); + return __builtin_aarch64_reduc_smin_scal_v2df (__a); } /* vmla */