From patchwork Mon Apr 28 10:44:01 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramana Radhakrishnan X-Patchwork-Id: 343327 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 88BD7140091 for ; Mon, 28 Apr 2014 20:44:17 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=h8UET1yjQx66kcZ0i XQOc/CGOnPgmNZyC3mbViuE0G3zuUisJmC6muu/kQ47FwfEnEyYOoFbYLC5y58OC 9yzF9VneWsybxTC8T0xVLraWlN2L4RgkqpNlV0wpHa4UE5faIEbpXtev+1A/ulK0 wHwpuO7+zMo66g3SEWF06CJguQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=T7tb7MEvCQ3/xj/JBYQv1V4 cGIA=; b=vk9OBmjIufVmARuBQBo4wlOVBlp7Dls5dq3FX/lzKUn/MIMUm8CgOVA FavRQKg2mh3Mgw7+Uuk31fjC6+JQBrJVqXgZEVb2Jjfs3Zo/gDyGR6DL4Ou+sIhK OhR8yrWKpCqJDcA10ZyOR06eZszrtY6CTME/y5E96hDYn2xKfEDE= Received: (qmail 22569 invoked by alias); 28 Apr 2014 10:44:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 22557 invoked by uid 89); 28 Apr 2014 10:44:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 28 Apr 2014 10:44:06 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 28 Apr 2014 11:44:03 +0100 Received: from [10.1.209.147] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 28 Apr 2014 11:44:16 +0100 Message-ID: <535E30F1.4020902@arm.com> Date: Mon, 28 Apr 2014 11:44:01 +0100 From: Ramana Radhakrishnan User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: Ramana Radhakrishnan CC: gcc-patches@gcc.gnu.org, gcc-patches@gcc.gnu.org, Christophe Lyon Subject: [Patch ARM 1/3] Neon intrinsics TLC : Replace intrinsics with GNU C implementations where possible. References: <535E304D.7000800@arm.com> In-Reply-To: <535E304D.7000800@arm.com> X-MC-Unique: 114042811440306801 X-IsSubscribed: yes I've special cased the ffast-math case for the _f32 intrinsics to prevent the auto-vectorizer from coming along and vectorizing addv2sf and addv4sf type operations which we don't want to happen by default. Patch 1/3 causes apparent "regressions" in the rather ineffective neon intrinsics tests that we currently carry soon hopefully to be replaced by Christophe Lyon's rewrite that is being reviewed. On the whole I deem this patch stack to be safe to go in if necessary. These "regressions" are for -O0 with the vbic and vorn intrinsics which don't now get combined and well, so be it. Given we're in stage 1 and that I think we're getting some where with clyon's testsuite I feel that is reasonably practical in just carrying the noise with these extra failures. Christophe and I will testdrive his testsuite work in this space with these patches to see how the conversion process works and if there are any issues with these patches. Ramana Radhakrishnan * config/arm/arm_neon.h (vadd_s8): GNU C implementation (vadd_s16): Likewise. (vadd_s32): Likewise. (vadd_f32): Likewise. (vadd_u8): Likewise. (vadd_u16): Likewise. (vadd_u32): Likewise. (vadd_s64): Likewise. (vadd_u64): Likewise. (vaddq_s8): Likewise. (vaddq_s16): Likewise. (vaddq_s32): Likewise. (vaddq_s64): Likewise. (vaddq_f32): Likewise. (vaddq_u8): Likewise. (vaddq_u16): Likewise. (vaddq_u32): Likewise. (vaddq_u64): Likewise. (vmul_s8): Likewise. (vmul_s16): Likewise. (vmul_s32): Likewise. (vmul_f32): Likewise. (vmul_u8): Likewise. (vmul_u16): Likewise. (vmul_u32): Likewise. (vmul_p8): Likewise. (vmulq_s8): Likewise. (vmulq_s16): Likewise. (vmulq_s32): Likewise. (vmulq_f32): Likewise. (vmulq_u8): Likewise. (vmulq_u16): Likewise. (vmulq_u32): Likewise. (vsub_s8): Likewise. (vsub_s16): Likewise. (vsub_s32): Likewise. (vsub_f32): Likewise. (vsub_u8): Likewise. (vsub_u16): Likewise. (vsub_u32): Likewise. (vsub_s64): Likewise. (vsub_u64): Likewise. (vsubq_s8): Likewise. (vsubq_s16): Likewise. (vsubq_s32): Likewise. (vsubq_s64): Likewise. (vsubq_f32): Likewise. (vsubq_u8): Likewise. (vsubq_u16): Likewise. (vsubq_u32): Likewise. (vsubq_u64): Likewise. (vand_s8): Likewise. (vand_s16): Likewise. (vand_s32): Likewise. (vand_u8): Likewise. (vand_u16): Likewise. (vand_u32): Likewise. (vand_s64): Likewise. (vand_u64): Likewise. (vandq_s8): Likewise. (vandq_s16): Likewise. (vandq_s32): Likewise. (vandq_s64): Likewise. (vandq_u8): Likewise. (vandq_u16): Likewise. (vandq_u32): Likewise. (vandq_u64): Likewise. (vorr_s8): Likewise. (vorr_s16): Likewise. (vorr_s32): Likewise. (vorr_u8): Likewise. (vorr_u16): Likewise. (vorr_u32): Likewise. (vorr_s64): Likewise. (vorr_u64): Likewise. (vorrq_s8): Likewise. (vorrq_s16): Likewise. (vorrq_s32): Likewise. (vorrq_s64): Likewise. (vorrq_u8): Likewise. (vorrq_u16): Likewise. (vorrq_u32): Likewise. (vorrq_u64): Likewise. (veor_s8): Likewise. (veor_s16): Likewise. (veor_s32): Likewise. (veor_u8): Likewise. (veor_u16): Likewise. (veor_u32): Likewise. (veor_s64): Likewise. (veor_u64): Likewise. (veorq_s8): Likewise. (veorq_s16): Likewise. (veorq_s32): Likewise. (veorq_s64): Likewise. (veorq_u8): Likewise. (veorq_u16): Likewise. (veorq_u32): Likewise. (veorq_u64): Likewise. (vbic_s8): Likewise. (vbic_s16): Likewise. (vbic_s32): Likewise. (vbic_u8): Likewise. (vbic_u16): Likewise. (vbic_u32): Likewise. (vbic_s64): Likewise. (vbic_u64): Likewise. (vbicq_s8): Likewise. (vbicq_s16): Likewise. (vbicq_s32): Likewise. (vbicq_s64): Likewise. (vbicq_u8): Likewise. (vbicq_u16): Likewise. (vbicq_u32): Likewise. (vbicq_u64): Likewise. (vorn_s8): Likewise. (vorn_s16): Likewise. (vorn_s32): Likewise. (vorn_u8): Likewise. (vorn_u16): Likewise. (vorn_u32): Likewise. (vorn_s64): Likewise. (vorn_u64): Likewise. (vornq_s8): Likewise. (vornq_s16): Likewise. (vornq_s32): Likewise. (vornq_s64): Likewise. (vornq_u8): Likewise. (vornq_u16): Likewise. (vornq_u32): Likewise. (vornq_u64): Likewise. diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 37a6e61..479ec2c 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -453,114 +453,121 @@ typedef struct poly64x2x4_t } poly64x2x4_t; #endif - - +/* vadd */ __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) vadd_s8 (int8x8_t __a, int8x8_t __b) { - return (int8x8_t)__builtin_neon_vaddv8qi (__a, __b, 1); + return __a + __b; } __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) vadd_s16 (int16x4_t __a, int16x4_t __b) { - return (int16x4_t)__builtin_neon_vaddv4hi (__a, __b, 1); + return __a + __b; } __extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) vadd_s32 (int32x2_t __a, int32x2_t __b) { - return (int32x2_t)__builtin_neon_vaddv2si (__a, __b, 1); + return __a + __b; } __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vadd_f32 (float32x2_t __a, float32x2_t __b) { - return (float32x2_t)__builtin_neon_vaddv2sf (__a, __b, 3); +#ifdef __FAST_MATH__ + return __a + __b; +#else + return (float32x2_t) __builtin_neon_vaddv2sf (__a, __b, 3); +#endif } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vadd_u8 (uint8x8_t __a, uint8x8_t __b) { - return (uint8x8_t)__builtin_neon_vaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 0); + return __a + __b; } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) vadd_u16 (uint16x4_t __a, uint16x4_t __b) { - return (uint16x4_t)__builtin_neon_vaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 0); + return __a + __b; } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) vadd_u32 (uint32x2_t __a, uint32x2_t __b) { - return (uint32x2_t)__builtin_neon_vaddv2si ((int32x2_t) __a, (int32x2_t) __b, 0); + return __a + __b; } __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) vadd_s64 (int64x1_t __a, int64x1_t __b) { - return (int64x1_t)__builtin_neon_vadddi (__a, __b, 1); + return __a + __b; } __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) vadd_u64 (uint64x1_t __a, uint64x1_t __b) { - return (uint64x1_t)__builtin_neon_vadddi ((int64x1_t) __a, (int64x1_t) __b, 0); + return __a + __b; } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vaddq_s8 (int8x16_t __a, int8x16_t __b) { - return (int8x16_t)__builtin_neon_vaddv16qi (__a, __b, 1); + return __a + __b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vaddq_s16 (int16x8_t __a, int16x8_t __b) { - return (int16x8_t)__builtin_neon_vaddv8hi (__a, __b, 1); + return __a + __b; } __extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) vaddq_s32 (int32x4_t __a, int32x4_t __b) { - return (int32x4_t)__builtin_neon_vaddv4si (__a, __b, 1); + return __a + __b; } __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vaddq_s64 (int64x2_t __a, int64x2_t __b) { - return (int64x2_t)__builtin_neon_vaddv2di (__a, __b, 1); + return __a + __b; } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vaddq_f32 (float32x4_t __a, float32x4_t __b) { - return (float32x4_t)__builtin_neon_vaddv4sf (__a, __b, 3); +#ifdef __FAST_MATH + return __a + __b; +#else + return (float32x4_t) __builtin_neon_vaddv4sf (__a, __b, 3); +#endif } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vaddq_u8 (uint8x16_t __a, uint8x16_t __b) { - return (uint8x16_t)__builtin_neon_vaddv16qi ((int8x16_t) __a, (int8x16_t) __b, 0); + return __a + __b; } __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) vaddq_u16 (uint16x8_t __a, uint16x8_t __b) { - return (uint16x8_t)__builtin_neon_vaddv8hi ((int16x8_t) __a, (int16x8_t) __b, 0); + return __a + __b; } __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) vaddq_u32 (uint32x4_t __a, uint32x4_t __b) { - return (uint32x4_t)__builtin_neon_vaddv4si ((int32x4_t) __a, (int32x4_t) __b, 0); + return __a + __b; } __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) vaddq_u64 (uint64x2_t __a, uint64x2_t __b) { - return (uint64x2_t)__builtin_neon_vaddv2di ((int64x2_t) __a, (int64x2_t) __b, 0); + return __a + __b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) @@ -950,91 +957,100 @@ vraddhn_u64 (uint64x2_t __a, uint64x2_t __b) __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) vmul_s8 (int8x8_t __a, int8x8_t __b) { - return (int8x8_t)__builtin_neon_vmulv8qi (__a, __b, 1); + return __a * __b; } __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) vmul_s16 (int16x4_t __a, int16x4_t __b) { - return (int16x4_t)__builtin_neon_vmulv4hi (__a, __b, 1); + return __a * __b; } __extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) vmul_s32 (int32x2_t __a, int32x2_t __b) { - return (int32x2_t)__builtin_neon_vmulv2si (__a, __b, 1); + return __a * __b; } __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vmul_f32 (float32x2_t __a, float32x2_t __b) { - return (float32x2_t)__builtin_neon_vmulv2sf (__a, __b, 3); +#ifdef __FAST_MATH + return __a * __b; +#else + return (float32x2_t) __builtin_neon_vmulv2sf (__a, __b, 3); +#endif + } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vmul_u8 (uint8x8_t __a, uint8x8_t __b) { - return (uint8x8_t)__builtin_neon_vmulv8qi ((int8x8_t) __a, (int8x8_t) __b, 0); + return __a * __b; } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) vmul_u16 (uint16x4_t __a, uint16x4_t __b) { - return (uint16x4_t)__builtin_neon_vmulv4hi ((int16x4_t) __a, (int16x4_t) __b, 0); + return __a * __b; } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) vmul_u32 (uint32x2_t __a, uint32x2_t __b) { - return (uint32x2_t)__builtin_neon_vmulv2si ((int32x2_t) __a, (int32x2_t) __b, 0); -} - -__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) -vmul_p8 (poly8x8_t __a, poly8x8_t __b) -{ - return (poly8x8_t)__builtin_neon_vmulv8qi ((int8x8_t) __a, (int8x8_t) __b, 2); + return __a * __b; } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vmulq_s8 (int8x16_t __a, int8x16_t __b) { - return (int8x16_t)__builtin_neon_vmulv16qi (__a, __b, 1); + return __a * __b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vmulq_s16 (int16x8_t __a, int16x8_t __b) { - return (int16x8_t)__builtin_neon_vmulv8hi (__a, __b, 1); + return __a * __b; } __extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) vmulq_s32 (int32x4_t __a, int32x4_t __b) { - return (int32x4_t)__builtin_neon_vmulv4si (__a, __b, 1); + return __a * __b; } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vmulq_f32 (float32x4_t __a, float32x4_t __b) { - return (float32x4_t)__builtin_neon_vmulv4sf (__a, __b, 3); +#ifdef __FAST_MATH + return __a * __b; +#else + return (float32x4_t) __builtin_neon_vmulv4sf (__a, __b, 3); +#endif } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vmulq_u8 (uint8x16_t __a, uint8x16_t __b) { - return (uint8x16_t)__builtin_neon_vmulv16qi ((int8x16_t) __a, (int8x16_t) __b, 0); + return __a * __b; } __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) vmulq_u16 (uint16x8_t __a, uint16x8_t __b) { - return (uint16x8_t)__builtin_neon_vmulv8hi ((int16x8_t) __a, (int16x8_t) __b, 0); + return __a * __b; } __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) vmulq_u32 (uint32x4_t __a, uint32x4_t __b) { - return (uint32x4_t)__builtin_neon_vmulv4si ((int32x4_t) __a, (int32x4_t) __b, 0); + return __a * __b; +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vmul_p8 (poly8x8_t __a, poly8x8_t __b) +{ + return (poly8x8_t)__builtin_neon_vmulv8qi ((int8x8_t) __a, (int8x8_t) __b, 2); } __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) @@ -1521,112 +1537,121 @@ vrndq_f32 (float32x4_t __a) } #endif + __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) vsub_s8 (int8x8_t __a, int8x8_t __b) { - return (int8x8_t)__builtin_neon_vsubv8qi (__a, __b, 1); + return __a - __b; } __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) vsub_s16 (int16x4_t __a, int16x4_t __b) { - return (int16x4_t)__builtin_neon_vsubv4hi (__a, __b, 1); + return __a - __b; } __extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) vsub_s32 (int32x2_t __a, int32x2_t __b) { - return (int32x2_t)__builtin_neon_vsubv2si (__a, __b, 1); + return __a - __b; } __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vsub_f32 (float32x2_t __a, float32x2_t __b) { - return (float32x2_t)__builtin_neon_vsubv2sf (__a, __b, 3); +#ifdef __FAST_MATH + return __a - __b; +#else + return (float32x2_t) __builtin_neon_vsubv2sf (__a, __b, 3); +#endif } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vsub_u8 (uint8x8_t __a, uint8x8_t __b) { - return (uint8x8_t)__builtin_neon_vsubv8qi ((int8x8_t) __a, (int8x8_t) __b, 0); + return __a - __b; } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) vsub_u16 (uint16x4_t __a, uint16x4_t __b) { - return (uint16x4_t)__builtin_neon_vsubv4hi ((int16x4_t) __a, (int16x4_t) __b, 0); + return __a - __b; } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) vsub_u32 (uint32x2_t __a, uint32x2_t __b) { - return (uint32x2_t)__builtin_neon_vsubv2si ((int32x2_t) __a, (int32x2_t) __b, 0); + return __a - __b; } __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) vsub_s64 (int64x1_t __a, int64x1_t __b) { - return (int64x1_t)__builtin_neon_vsubdi (__a, __b, 1); + return __a - __b; } __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) vsub_u64 (uint64x1_t __a, uint64x1_t __b) { - return (uint64x1_t)__builtin_neon_vsubdi ((int64x1_t) __a, (int64x1_t) __b, 0); + return __a - __b; } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vsubq_s8 (int8x16_t __a, int8x16_t __b) { - return (int8x16_t)__builtin_neon_vsubv16qi (__a, __b, 1); + return __a - __b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vsubq_s16 (int16x8_t __a, int16x8_t __b) { - return (int16x8_t)__builtin_neon_vsubv8hi (__a, __b, 1); + return __a - __b; } __extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) vsubq_s32 (int32x4_t __a, int32x4_t __b) { - return (int32x4_t)__builtin_neon_vsubv4si (__a, __b, 1); + return __a - __b; } __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vsubq_s64 (int64x2_t __a, int64x2_t __b) { - return (int64x2_t)__builtin_neon_vsubv2di (__a, __b, 1); + return __a - __b; } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vsubq_f32 (float32x4_t __a, float32x4_t __b) { - return (float32x4_t)__builtin_neon_vsubv4sf (__a, __b, 3); +#ifdef __FAST_MATH + return __a - __b; +#else + return (float32x4_t) __builtin_neon_vsubv4sf (__a, __b, 3); +#endif } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vsubq_u8 (uint8x16_t __a, uint8x16_t __b) { - return (uint8x16_t)__builtin_neon_vsubv16qi ((int8x16_t) __a, (int8x16_t) __b, 0); + return __a - __b; } __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) vsubq_u16 (uint16x8_t __a, uint16x8_t __b) { - return (uint16x8_t)__builtin_neon_vsubv8hi ((int16x8_t) __a, (int16x8_t) __b, 0); + return __a - __b; } __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) vsubq_u32 (uint32x4_t __a, uint32x4_t __b) { - return (uint32x4_t)__builtin_neon_vsubv4si ((int32x4_t) __a, (int32x4_t) __b, 0); + return __a - __b; } __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) vsubq_u64 (uint64x2_t __a, uint64x2_t __b) { - return (uint64x2_t)__builtin_neon_vsubv2di ((int64x2_t) __a, (int64x2_t) __b, 0); + return __a - __b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) @@ -10907,484 +10932,483 @@ vst4q_lane_p16 (poly16_t * __a, poly16x8x4_t __b, const int __c) __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) vand_s8 (int8x8_t __a, int8x8_t __b) { - return (int8x8_t)__builtin_neon_vandv8qi (__a, __b, 1); + return __a & __b; } __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) vand_s16 (int16x4_t __a, int16x4_t __b) { - return (int16x4_t)__builtin_neon_vandv4hi (__a, __b, 1); + return __a & __b; } __extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) vand_s32 (int32x2_t __a, int32x2_t __b) { - return (int32x2_t)__builtin_neon_vandv2si (__a, __b, 1); + return __a & __b; } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vand_u8 (uint8x8_t __a, uint8x8_t __b) { - return (uint8x8_t)__builtin_neon_vandv8qi ((int8x8_t) __a, (int8x8_t) __b, 0); + return __a & __b; } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) vand_u16 (uint16x4_t __a, uint16x4_t __b) { - return (uint16x4_t)__builtin_neon_vandv4hi ((int16x4_t) __a, (int16x4_t) __b, 0); + return __a & __b; } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) vand_u32 (uint32x2_t __a, uint32x2_t __b) { - return (uint32x2_t)__builtin_neon_vandv2si ((int32x2_t) __a, (int32x2_t) __b, 0); + return __a & __b; } __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) vand_s64 (int64x1_t __a, int64x1_t __b) { - return (int64x1_t)__builtin_neon_vanddi (__a, __b, 1); + return __a & __b; } __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) vand_u64 (uint64x1_t __a, uint64x1_t __b) { - return (uint64x1_t)__builtin_neon_vanddi ((int64x1_t) __a, (int64x1_t) __b, 0); + return __a & __b; } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vandq_s8 (int8x16_t __a, int8x16_t __b) { - return (int8x16_t)__builtin_neon_vandv16qi (__a, __b, 1); + return __a & __b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vandq_s16 (int16x8_t __a, int16x8_t __b) { - return (int16x8_t)__builtin_neon_vandv8hi (__a, __b, 1); + return __a & __b; } __extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) vandq_s32 (int32x4_t __a, int32x4_t __b) { - return (int32x4_t)__builtin_neon_vandv4si (__a, __b, 1); + return __a & __b; } __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vandq_s64 (int64x2_t __a, int64x2_t __b) { - return (int64x2_t)__builtin_neon_vandv2di (__a, __b, 1); + return __a & __b; } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vandq_u8 (uint8x16_t __a, uint8x16_t __b) { - return (uint8x16_t)__builtin_neon_vandv16qi ((int8x16_t) __a, (int8x16_t) __b, 0); + return __a & __b; } __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) vandq_u16 (uint16x8_t __a, uint16x8_t __b) { - return (uint16x8_t)__builtin_neon_vandv8hi ((int16x8_t) __a, (int16x8_t) __b, 0); + return __a & __b; } __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) vandq_u32 (uint32x4_t __a, uint32x4_t __b) { - return (uint32x4_t)__builtin_neon_vandv4si ((int32x4_t) __a, (int32x4_t) __b, 0); + return __a & __b; } __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) vandq_u64 (uint64x2_t __a, uint64x2_t __b) { - return (uint64x2_t)__builtin_neon_vandv2di ((int64x2_t) __a, (int64x2_t) __b, 0); + return __a & __b; } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) vorr_s8 (int8x8_t __a, int8x8_t __b) { - return (int8x8_t)__builtin_neon_vorrv8qi (__a, __b, 1); + return __a | __b; } __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) vorr_s16 (int16x4_t __a, int16x4_t __b) { - return (int16x4_t)__builtin_neon_vorrv4hi (__a, __b, 1); + return __a | __b; } __extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) vorr_s32 (int32x2_t __a, int32x2_t __b) { - return (int32x2_t)__builtin_neon_vorrv2si (__a, __b, 1); + return __a | __b; } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vorr_u8 (uint8x8_t __a, uint8x8_t __b) { - return (uint8x8_t)__builtin_neon_vorrv8qi ((int8x8_t) __a, (int8x8_t) __b, 0); + return __a | __b; } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) vorr_u16 (uint16x4_t __a, uint16x4_t __b) { - return (uint16x4_t)__builtin_neon_vorrv4hi ((int16x4_t) __a, (int16x4_t) __b, 0); + return __a | __b; } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) vorr_u32 (uint32x2_t __a, uint32x2_t __b) { - return (uint32x2_t)__builtin_neon_vorrv2si ((int32x2_t) __a, (int32x2_t) __b, 0); + return __a | __b; } __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) vorr_s64 (int64x1_t __a, int64x1_t __b) { - return (int64x1_t)__builtin_neon_vorrdi (__a, __b, 1); + return __a | __b; } __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) vorr_u64 (uint64x1_t __a, uint64x1_t __b) { - return (uint64x1_t)__builtin_neon_vorrdi ((int64x1_t) __a, (int64x1_t) __b, 0); + return __a | __b; } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vorrq_s8 (int8x16_t __a, int8x16_t __b) { - return (int8x16_t)__builtin_neon_vorrv16qi (__a, __b, 1); + return __a | __b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vorrq_s16 (int16x8_t __a, int16x8_t __b) { - return (int16x8_t)__builtin_neon_vorrv8hi (__a, __b, 1); + return __a | __b; } __extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) vorrq_s32 (int32x4_t __a, int32x4_t __b) { - return (int32x4_t)__builtin_neon_vorrv4si (__a, __b, 1); + return __a | __b; } __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vorrq_s64 (int64x2_t __a, int64x2_t __b) { - return (int64x2_t)__builtin_neon_vorrv2di (__a, __b, 1); + return __a | __b; } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vorrq_u8 (uint8x16_t __a, uint8x16_t __b) { - return (uint8x16_t)__builtin_neon_vorrv16qi ((int8x16_t) __a, (int8x16_t) __b, 0); + return __a | __b; } __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) vorrq_u16 (uint16x8_t __a, uint16x8_t __b) { - return (uint16x8_t)__builtin_neon_vorrv8hi ((int16x8_t) __a, (int16x8_t) __b, 0); + return __a | __b; } __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) vorrq_u32 (uint32x4_t __a, uint32x4_t __b) { - return (uint32x4_t)__builtin_neon_vorrv4si ((int32x4_t) __a, (int32x4_t) __b, 0); + return __a | __b; } __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) vorrq_u64 (uint64x2_t __a, uint64x2_t __b) { - return (uint64x2_t)__builtin_neon_vorrv2di ((int64x2_t) __a, (int64x2_t) __b, 0); + return __a | __b; } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) veor_s8 (int8x8_t __a, int8x8_t __b) { - return (int8x8_t)__builtin_neon_veorv8qi (__a, __b, 1); + return __a ^ __b; } __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) veor_s16 (int16x4_t __a, int16x4_t __b) { - return (int16x4_t)__builtin_neon_veorv4hi (__a, __b, 1); + return __a ^ __b; } __extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) veor_s32 (int32x2_t __a, int32x2_t __b) { - return (int32x2_t)__builtin_neon_veorv2si (__a, __b, 1); + return __a ^ __b; } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) veor_u8 (uint8x8_t __a, uint8x8_t __b) { - return (uint8x8_t)__builtin_neon_veorv8qi ((int8x8_t) __a, (int8x8_t) __b, 0); + return __a ^ __b; } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) veor_u16 (uint16x4_t __a, uint16x4_t __b) { - return (uint16x4_t)__builtin_neon_veorv4hi ((int16x4_t) __a, (int16x4_t) __b, 0); + return __a ^ __b; } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) veor_u32 (uint32x2_t __a, uint32x2_t __b) { - return (uint32x2_t)__builtin_neon_veorv2si ((int32x2_t) __a, (int32x2_t) __b, 0); + return __a ^ __b; } __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) veor_s64 (int64x1_t __a, int64x1_t __b) { - return (int64x1_t)__builtin_neon_veordi (__a, __b, 1); + return __a ^ __b; } __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) veor_u64 (uint64x1_t __a, uint64x1_t __b) { - return (uint64x1_t)__builtin_neon_veordi ((int64x1_t) __a, (int64x1_t) __b, 0); + return __a ^ __b; } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) veorq_s8 (int8x16_t __a, int8x16_t __b) { - return (int8x16_t)__builtin_neon_veorv16qi (__a, __b, 1); + return __a ^ __b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) veorq_s16 (int16x8_t __a, int16x8_t __b) { - return (int16x8_t)__builtin_neon_veorv8hi (__a, __b, 1); + return __a ^ __b; } __extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) veorq_s32 (int32x4_t __a, int32x4_t __b) { - return (int32x4_t)__builtin_neon_veorv4si (__a, __b, 1); + return __a ^ __b; } __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) veorq_s64 (int64x2_t __a, int64x2_t __b) { - return (int64x2_t)__builtin_neon_veorv2di (__a, __b, 1); + return __a ^ __b; } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) veorq_u8 (uint8x16_t __a, uint8x16_t __b) { - return (uint8x16_t)__builtin_neon_veorv16qi ((int8x16_t) __a, (int8x16_t) __b, 0); + return __a ^ __b; } __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) veorq_u16 (uint16x8_t __a, uint16x8_t __b) { - return (uint16x8_t)__builtin_neon_veorv8hi ((int16x8_t) __a, (int16x8_t) __b, 0); + return __a ^ __b; } __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) veorq_u32 (uint32x4_t __a, uint32x4_t __b) { - return (uint32x4_t)__builtin_neon_veorv4si ((int32x4_t) __a, (int32x4_t) __b, 0); + return __a ^ __b; } __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) veorq_u64 (uint64x2_t __a, uint64x2_t __b) { - return (uint64x2_t)__builtin_neon_veorv2di ((int64x2_t) __a, (int64x2_t) __b, 0); + return __a ^ __b; } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) vbic_s8 (int8x8_t __a, int8x8_t __b) { - return (int8x8_t)__builtin_neon_vbicv8qi (__a, __b, 1); + return __a & ~__b; } __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) vbic_s16 (int16x4_t __a, int16x4_t __b) { - return (int16x4_t)__builtin_neon_vbicv4hi (__a, __b, 1); + return __a & ~__b; } __extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) vbic_s32 (int32x2_t __a, int32x2_t __b) { - return (int32x2_t)__builtin_neon_vbicv2si (__a, __b, 1); + return __a & ~__b; } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vbic_u8 (uint8x8_t __a, uint8x8_t __b) { - return (uint8x8_t)__builtin_neon_vbicv8qi ((int8x8_t) __a, (int8x8_t) __b, 0); + return __a & ~__b; } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) vbic_u16 (uint16x4_t __a, uint16x4_t __b) { - return (uint16x4_t)__builtin_neon_vbicv4hi ((int16x4_t) __a, (int16x4_t) __b, 0); + return __a & ~__b; } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) vbic_u32 (uint32x2_t __a, uint32x2_t __b) { - return (uint32x2_t)__builtin_neon_vbicv2si ((int32x2_t) __a, (int32x2_t) __b, 0); + return __a & ~__b; } __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) vbic_s64 (int64x1_t __a, int64x1_t __b) { - return (int64x1_t)__builtin_neon_vbicdi (__a, __b, 1); + return __a & ~__b; } __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) vbic_u64 (uint64x1_t __a, uint64x1_t __b) { - return (uint64x1_t)__builtin_neon_vbicdi ((int64x1_t) __a, (int64x1_t) __b, 0); + return __a & ~__b; } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vbicq_s8 (int8x16_t __a, int8x16_t __b) { - return (int8x16_t)__builtin_neon_vbicv16qi (__a, __b, 1); + return __a & ~__b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vbicq_s16 (int16x8_t __a, int16x8_t __b) { - return (int16x8_t)__builtin_neon_vbicv8hi (__a, __b, 1); + return __a & ~__b; } __extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) vbicq_s32 (int32x4_t __a, int32x4_t __b) { - return (int32x4_t)__builtin_neon_vbicv4si (__a, __b, 1); + return __a & ~__b; } __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vbicq_s64 (int64x2_t __a, int64x2_t __b) { - return (int64x2_t)__builtin_neon_vbicv2di (__a, __b, 1); + return __a & ~__b; } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vbicq_u8 (uint8x16_t __a, uint8x16_t __b) { - return (uint8x16_t)__builtin_neon_vbicv16qi ((int8x16_t) __a, (int8x16_t) __b, 0); + return __a & ~__b; } __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) vbicq_u16 (uint16x8_t __a, uint16x8_t __b) { - return (uint16x8_t)__builtin_neon_vbicv8hi ((int16x8_t) __a, (int16x8_t) __b, 0); + return __a & ~__b; } __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) vbicq_u32 (uint32x4_t __a, uint32x4_t __b) { - return (uint32x4_t)__builtin_neon_vbicv4si ((int32x4_t) __a, (int32x4_t) __b, 0); + return __a & ~__b; } __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) vbicq_u64 (uint64x2_t __a, uint64x2_t __b) { - return (uint64x2_t)__builtin_neon_vbicv2di ((int64x2_t) __a, (int64x2_t) __b, 0); + return __a & ~__b; } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) vorn_s8 (int8x8_t __a, int8x8_t __b) { - return (int8x8_t)__builtin_neon_vornv8qi (__a, __b, 1); + return __a | ~__b; } __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) vorn_s16 (int16x4_t __a, int16x4_t __b) { - return (int16x4_t)__builtin_neon_vornv4hi (__a, __b, 1); + return __a | ~__b; } __extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) vorn_s32 (int32x2_t __a, int32x2_t __b) { - return (int32x2_t)__builtin_neon_vornv2si (__a, __b, 1); + return __a | ~__b; } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vorn_u8 (uint8x8_t __a, uint8x8_t __b) { - return (uint8x8_t)__builtin_neon_vornv8qi ((int8x8_t) __a, (int8x8_t) __b, 0); + return __a | ~__b; } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) vorn_u16 (uint16x4_t __a, uint16x4_t __b) { - return (uint16x4_t)__builtin_neon_vornv4hi ((int16x4_t) __a, (int16x4_t) __b, 0); + return __a | ~__b; } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) vorn_u32 (uint32x2_t __a, uint32x2_t __b) { - return (uint32x2_t)__builtin_neon_vornv2si ((int32x2_t) __a, (int32x2_t) __b, 0); + return __a | ~__b; } __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) vorn_s64 (int64x1_t __a, int64x1_t __b) { - return (int64x1_t)__builtin_neon_vorndi (__a, __b, 1); + return __a | ~__b; } __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) vorn_u64 (uint64x1_t __a, uint64x1_t __b) { - return (uint64x1_t)__builtin_neon_vorndi ((int64x1_t) __a, (int64x1_t) __b, 0); + return __a | ~__b; } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vornq_s8 (int8x16_t __a, int8x16_t __b) { - return (int8x16_t)__builtin_neon_vornv16qi (__a, __b, 1); + return __a | ~__b; } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vornq_s16 (int16x8_t __a, int16x8_t __b) { - return (int16x8_t)__builtin_neon_vornv8hi (__a, __b, 1); + return __a | ~__b; } __extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) vornq_s32 (int32x4_t __a, int32x4_t __b) { - return (int32x4_t)__builtin_neon_vornv4si (__a, __b, 1); + return __a | ~__b; } __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vornq_s64 (int64x2_t __a, int64x2_t __b) { - return (int64x2_t)__builtin_neon_vornv2di (__a, __b, 1); + return __a | ~__b; } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vornq_u8 (uint8x16_t __a, uint8x16_t __b) { - return (uint8x16_t)__builtin_neon_vornv16qi ((int8x16_t) __a, (int8x16_t) __b, 0); + return __a | ~__b; } __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) vornq_u16 (uint16x8_t __a, uint16x8_t __b) { - return (uint16x8_t)__builtin_neon_vornv8hi ((int16x8_t) __a, (int16x8_t) __b, 0); + return __a | ~__b; } __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) vornq_u32 (uint32x4_t __a, uint32x4_t __b) { - return (uint32x4_t)__builtin_neon_vornv4si ((int32x4_t) __a, (int32x4_t) __b, 0); + return __a | ~__b; } __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) vornq_u64 (uint64x2_t __a, uint64x2_t __b) { - return (uint64x2_t)__builtin_neon_vornv2di ((int64x2_t) __a, (int64x2_t) __b, 0); + return __a | ~__b; } - __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) vreinterpret_p8_p16 (poly16x4_t __a) {