From patchwork Wed Apr 23 19:44:21 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 341968 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 814DC1400E8 for ; Thu, 24 Apr 2014 05:44:35 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; q=dns; s=default; b=GZa348/7xMgquG0mt 85ZKZpwtxHZ5Ki2td02LX6MQ/BhEW5oU4K4xrFKW1msA6hb8/8AzHHO4Vz82rNfO VDZh2oU5uOmD2Sw280m3zcZcjY5/FXTuSPz/87Zv8BQ+1FpTGiXSryA6YiuHehVp WjNcUvjCvB7njUitudL7RIDln4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; s=default; bh=pmpVlHRUrbjiJfGV25wiEqo ibL8=; b=HVuyLQbBIWwYbvzsqkWG+0JtIMK2oisITkgw37GOTmVmt5GQlNDPjsv RQyJwlwLmezZgDI5u1372LMajZyUTGG/UNZpI5ykkx0cyHCQsHorewUTE9BhwY6g 7+pTwlyL/220CzwzZAignbc4w2kF4zn0dU/fX/kUxWMZy6JN+E3g= Received: (qmail 27504 invoked by alias); 23 Apr 2014 19:44:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 27492 invoked by uid 89); 23 Apr 2014 19:44:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS, UNSUBSCRIBE_BODY autolearn=no version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 23 Apr 2014 19:44:24 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Wed, 23 Apr 2014 20:44:21 +0100 Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 23 Apr 2014 20:44:36 +0100 Message-ID: <53581815.8020407@arm.com> Date: Wed, 23 Apr 2014 20:44:21 +0100 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" Subject: [AArch64/ARM 2/3] Recognize shuffle patterns for REV instructions on AARch64, rewrite intrinsics. References: <53580E36.2060105@arm.com> In-Reply-To: <53580E36.2060105@arm.com> X-MC-Unique: 114042320442103501 X-IsSubscribed: yes This patch (borrowing heavily from the ARM backend) makes aarch64_expand_vec_perm_const output REV instructions when appropriate, and then implements the vrev_XXX intrinsics in terms of __builtin_shuffle (which now produces the same assembly instructions). No regressions (and tests in previous patch http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01468.html still passing) on aarch64-none-elf; also on aarch64_be-none-elf, where there are no regressions following testsuite config changes in http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00579.html, but some "noise" (due to unexpected success in vectorization) without that patch. gcc/ChangeLog: 2014-04-23 Alan Lawrence * config/aarch64/iterators.md: add a REVERSE iterator and rev_op attribute for REV64/32/16 insns. * config/aarch64/aarch64-simd.md: add corresponding define_insn parameterized by REVERSE iterator. * config/aarch64/aarch64.c (aarch64_evpc_rev): recognize REVnn patterns. (aarch64_expand_vec_perm_const_1): call aarch64_evpc_rev also. * config/aarch64/arm_neon.h (vrev{16,32,64}[q]_{s,p,u,f}{8,16,32}): rewrite to use __builtin_shuffle. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 4dffb59..d499e86 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4032,6 +4032,15 @@ [(set_attr "type" "neon_permute")] ) +(define_insn "aarch64_rev" + [(set (match_operand:VALL 0 "register_operand" "=w") + (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")] + REVERSE))] + "TARGET_SIMD" + "rev\\t%0., %1." + [(set_attr "type" "neon_rev")] +) + (define_insn "aarch64_st2_dreg" [(set (match_operand:TI 0 "aarch64_simd_struct_operand" "=Utv") (unspec:TI [(match_operand:OI 1 "register_operand" "w") diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 16c51a8..5bb10a2 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8047,6 +8047,80 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d) return true; } +/* Recognize patterns for the REV insns. */ + +static bool +aarch64_evpc_rev (struct expand_vec_perm_d *d) +{ + unsigned int i, j, diff, nelt = d->nelt; + rtx (*gen) (rtx, rtx); + + if (!d->one_vector_p) + return false; + + diff = d->perm[0]; + switch (diff) + { + case 7: + switch (d->vmode) + { + case V16QImode: gen = gen_aarch64_rev64v16qi; break; + case V8QImode: gen = gen_aarch64_rev64v8qi; break; + default: + return false; + } + break; + case 3: + switch (d->vmode) + { + case V16QImode: gen = gen_aarch64_rev32v16qi; break; + case V8QImode: gen = gen_aarch64_rev32v8qi; break; + case V8HImode: gen = gen_aarch64_rev64v8hi; break; + case V4HImode: gen = gen_aarch64_rev64v4hi; break; + default: + return false; + } + break; + case 1: + switch (d->vmode) + { + case V16QImode: gen = gen_aarch64_rev16v16qi; break; + case V8QImode: gen = gen_aarch64_rev16v8qi; break; + case V8HImode: gen = gen_aarch64_rev32v8hi; break; + case V4HImode: gen = gen_aarch64_rev32v4hi; break; + case V4SImode: gen = gen_aarch64_rev64v4si; break; + case V2SImode: gen = gen_aarch64_rev64v2si; break; + case V4SFmode: gen = gen_aarch64_rev64v4sf; break; + case V2SFmode: gen = gen_aarch64_rev64v2sf; break; + default: + return false; + } + break; + default: + return false; + } + + for (i = 0; i < nelt ; i += diff + 1) + for (j = 0; j <= diff; j += 1) + { + /* This is guaranteed to be true as the value of diff + is 7, 3, 1 and we should have enough elements in the + queue to generate this. Getting a vector mask with a + value of diff other than these values implies that + something is wrong by the time we get here. */ + gcc_assert (i + j < nelt); + if (d->perm[i + j] != i + diff - j) + return false; + } + + /* Success! */ + if (d->testing_p) + return true; + + emit_insn (gen (d->target, d->op0)); + return true; +} + static bool aarch64_evpc_dup (struct expand_vec_perm_d *d) { @@ -8153,6 +8227,8 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) return true; else if (aarch64_evpc_trn (d)) return true; + else if (aarch64_evpc_rev (d)) + return true; else if (aarch64_evpc_dup (d)) return true; return aarch64_evpc_tbl (d); diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 6af99361..383ed56 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -10628,402 +10628,6 @@ vrecpeq_u32 (uint32x4_t a) return result; } -__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) -vrev16_p8 (poly8x8_t a) -{ - poly8x8_t result; - __asm__ ("rev16 %0.8b,%1.8b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vrev16_s8 (int8x8_t a) -{ - int8x8_t result; - __asm__ ("rev16 %0.8b,%1.8b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vrev16_u8 (uint8x8_t a) -{ - uint8x8_t result; - __asm__ ("rev16 %0.8b,%1.8b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) -vrev16q_p8 (poly8x16_t a) -{ - poly8x16_t result; - __asm__ ("rev16 %0.16b,%1.16b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) -vrev16q_s8 (int8x16_t a) -{ - int8x16_t result; - __asm__ ("rev16 %0.16b,%1.16b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) -vrev16q_u8 (uint8x16_t a) -{ - uint8x16_t result; - __asm__ ("rev16 %0.16b,%1.16b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) -vrev32_p8 (poly8x8_t a) -{ - poly8x8_t result; - __asm__ ("rev32 %0.8b,%1.8b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) -vrev32_p16 (poly16x4_t a) -{ - poly16x4_t result; - __asm__ ("rev32 %0.4h,%1.4h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vrev32_s8 (int8x8_t a) -{ - int8x8_t result; - __asm__ ("rev32 %0.8b,%1.8b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) -vrev32_s16 (int16x4_t a) -{ - int16x4_t result; - __asm__ ("rev32 %0.4h,%1.4h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vrev32_u8 (uint8x8_t a) -{ - uint8x8_t result; - __asm__ ("rev32 %0.8b,%1.8b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) -vrev32_u16 (uint16x4_t a) -{ - uint16x4_t result; - __asm__ ("rev32 %0.4h,%1.4h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) -vrev32q_p8 (poly8x16_t a) -{ - poly8x16_t result; - __asm__ ("rev32 %0.16b,%1.16b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) -vrev32q_p16 (poly16x8_t a) -{ - poly16x8_t result; - __asm__ ("rev32 %0.8h,%1.8h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) -vrev32q_s8 (int8x16_t a) -{ - int8x16_t result; - __asm__ ("rev32 %0.16b,%1.16b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) -vrev32q_s16 (int16x8_t a) -{ - int16x8_t result; - __asm__ ("rev32 %0.8h,%1.8h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) -vrev32q_u8 (uint8x16_t a) -{ - uint8x16_t result; - __asm__ ("rev32 %0.16b,%1.16b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) -vrev32q_u16 (uint16x8_t a) -{ - uint16x8_t result; - __asm__ ("rev32 %0.8h,%1.8h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) -vrev64_f32 (float32x2_t a) -{ - float32x2_t result; - __asm__ ("rev64 %0.2s,%1.2s" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) -vrev64_p8 (poly8x8_t a) -{ - poly8x8_t result; - __asm__ ("rev64 %0.8b,%1.8b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) -vrev64_p16 (poly16x4_t a) -{ - poly16x4_t result; - __asm__ ("rev64 %0.4h,%1.4h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vrev64_s8 (int8x8_t a) -{ - int8x8_t result; - __asm__ ("rev64 %0.8b,%1.8b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) -vrev64_s16 (int16x4_t a) -{ - int16x4_t result; - __asm__ ("rev64 %0.4h,%1.4h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) -vrev64_s32 (int32x2_t a) -{ - int32x2_t result; - __asm__ ("rev64 %0.2s,%1.2s" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vrev64_u8 (uint8x8_t a) -{ - uint8x8_t result; - __asm__ ("rev64 %0.8b,%1.8b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) -vrev64_u16 (uint16x4_t a) -{ - uint16x4_t result; - __asm__ ("rev64 %0.4h,%1.4h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) -vrev64_u32 (uint32x2_t a) -{ - uint32x2_t result; - __asm__ ("rev64 %0.2s,%1.2s" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) -vrev64q_f32 (float32x4_t a) -{ - float32x4_t result; - __asm__ ("rev64 %0.4s,%1.4s" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) -vrev64q_p8 (poly8x16_t a) -{ - poly8x16_t result; - __asm__ ("rev64 %0.16b,%1.16b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) -vrev64q_p16 (poly16x8_t a) -{ - poly16x8_t result; - __asm__ ("rev64 %0.8h,%1.8h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) -vrev64q_s8 (int8x16_t a) -{ - int8x16_t result; - __asm__ ("rev64 %0.16b,%1.16b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) -vrev64q_s16 (int16x8_t a) -{ - int16x8_t result; - __asm__ ("rev64 %0.8h,%1.8h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) -vrev64q_s32 (int32x4_t a) -{ - int32x4_t result; - __asm__ ("rev64 %0.4s,%1.4s" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) -vrev64q_u8 (uint8x16_t a) -{ - uint8x16_t result; - __asm__ ("rev64 %0.16b,%1.16b" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) -vrev64q_u16 (uint16x8_t a) -{ - uint16x8_t result; - __asm__ ("rev64 %0.8h,%1.8h" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) -vrev64q_u32 (uint32x4_t a) -{ - uint32x4_t result; - __asm__ ("rev64 %0.4s,%1.4s" - : "=w"(result) - : "w"(a) - : /* No clobbers */); - return result; -} - #define vrshrn_high_n_s16(a, b, c) \ __extension__ \ ({ \ @@ -22473,6 +22077,234 @@ vrecpxd_f64 (float64_t __a) return __builtin_aarch64_frecpxdf (__a); } + +/* vrev */ + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vrev16_p8 (poly8x8_t a) +{ + return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 }); +} + +__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) +vrev16_s8 (int8x8_t a) +{ + return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 }); +} + +__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) +vrev16_u8 (uint8x8_t a) +{ + return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 }); +} + +__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) +vrev16q_p8 (poly8x16_t a) +{ + return __builtin_shuffle (a, + (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 }); +} + +__extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) +vrev16q_s8 (int8x16_t a) +{ + return __builtin_shuffle (a, + (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 }); +} + +__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) +vrev16q_u8 (uint8x16_t a) +{ + return __builtin_shuffle (a, + (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 }); +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vrev32_p8 (poly8x8_t a) +{ + return __builtin_shuffle (a, (uint8x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 }); +} + +__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) +vrev32_p16 (poly16x4_t a) +{ + return __builtin_shuffle (a, (uint16x4_t) { 1, 0, 3, 2 }); +} + +__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) +vrev32_s8 (int8x8_t a) +{ + return __builtin_shuffle (a, (uint8x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 }); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vrev32_s16 (int16x4_t a) +{ + return __builtin_shuffle (a, (uint16x4_t) { 1, 0, 3, 2 }); +} + +__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) +vrev32_u8 (uint8x8_t a) +{ + return __builtin_shuffle (a, (uint8x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 }); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vrev32_u16 (uint16x4_t a) +{ + return __builtin_shuffle (a, (uint16x4_t) { 1, 0, 3, 2 }); +} + +__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) +vrev32q_p8 (poly8x16_t a) +{ + return __builtin_shuffle (a, + (uint8x16_t) { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }); +} + +__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) +vrev32q_p16 (poly16x8_t a) +{ + return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 }); +} + +__extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) +vrev32q_s8 (int8x16_t a) +{ + return __builtin_shuffle (a, + (uint8x16_t) { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vrev32q_s16 (int16x8_t a) +{ + return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 }); +} + +__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) +vrev32q_u8 (uint8x16_t a) +{ + return __builtin_shuffle (a, + (uint8x16_t) { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vrev32q_u16 (uint16x8_t a) +{ + return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 }); +} + +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vrev64_f32 (float32x2_t a) +{ + return __builtin_shuffle (a, (uint32x2_t) { 1, 0 }); +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vrev64_p8 (poly8x8_t a) +{ + return __builtin_shuffle (a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 0 }); +} + +__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) +vrev64_p16 (poly16x4_t a) +{ + return __builtin_shuffle (a, (uint16x4_t) { 3, 2, 1, 0 }); +} + +__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) +vrev64_s8 (int8x8_t a) +{ + return __builtin_shuffle (a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 0 }); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vrev64_s16 (int16x4_t a) +{ + return __builtin_shuffle (a, (uint16x4_t) { 3, 2, 1, 0 }); +} + +__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) +vrev64_s32 (int32x2_t a) +{ + return __builtin_shuffle (a, (uint32x2_t) { 1, 0 }); +} + +__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) +vrev64_u8 (uint8x8_t a) +{ + return __builtin_shuffle (a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 0 }); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vrev64_u16 (uint16x4_t a) +{ + return __builtin_shuffle (a, (uint16x4_t) { 3, 2, 1, 0 }); +} + +__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) +vrev64_u32 (uint32x2_t a) +{ + return __builtin_shuffle (a, (uint32x2_t) { 1, 0 }); +} + +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vrev64q_f32 (float32x4_t a) +{ + return __builtin_shuffle (a, (uint32x4_t) { 1, 0, 3, 2 }); +} + +__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) +vrev64q_p8 (poly8x16_t a) +{ + return __builtin_shuffle (a, + (uint8x16_t) { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 }); +} + +__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) +vrev64q_p16 (poly16x8_t a) +{ + return __builtin_shuffle (a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 }); +} + +__extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) +vrev64q_s8 (int8x16_t a) +{ + return __builtin_shuffle (a, + (uint8x16_t) { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 }); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vrev64q_s16 (int16x8_t a) +{ + return __builtin_shuffle (a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 }); +} + +__extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) +vrev64q_s32 (int32x4_t a) +{ + return __builtin_shuffle (a, (uint32x4_t) { 1, 0, 3, 2 }); +} + +__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) +vrev64q_u8 (uint8x16_t a) +{ + return __builtin_shuffle (a, + (uint8x16_t) { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 }); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vrev64q_u16 (uint16x8_t a) +{ + return __builtin_shuffle (a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 }); +} + +__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) +vrev64q_u32 (uint32x4_t a) +{ + return __builtin_shuffle (a, (uint32x4_t) { 1, 0, 3, 2 }); +} + /* vrnd */ __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index f1339b8..c1f5544 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -267,6 +267,9 @@ UNSPEC_UZP2 ; Used in vector permute patterns. UNSPEC_TRN1 ; Used in vector permute patterns. UNSPEC_TRN2 ; Used in vector permute patterns. + UNSPEC_REV64 ; Used in vector reverse patterns (permute). + UNSPEC_REV32 ; Used in vector reverse patterns (permute). + UNSPEC_REV16 ; Used in vector reverse patterns (permute). UNSPEC_AESE ; Used in aarch64-simd.md. UNSPEC_AESD ; Used in aarch64-simd.md. UNSPEC_AESMC ; Used in aarch64-simd.md. @@ -855,6 +858,8 @@ UNSPEC_TRN1 UNSPEC_TRN2 UNSPEC_UZP1 UNSPEC_UZP2]) +(define_int_iterator REVERSE [UNSPEC_REV64 UNSPEC_REV32 UNSPEC_REV16]) + (define_int_iterator FRINT [UNSPEC_FRINTZ UNSPEC_FRINTP UNSPEC_FRINTM UNSPEC_FRINTN UNSPEC_FRINTI UNSPEC_FRINTX UNSPEC_FRINTA]) @@ -982,6 +987,10 @@ (UNSPEC_TRN1 "trn") (UNSPEC_TRN2 "trn") (UNSPEC_UZP1 "uzp") (UNSPEC_UZP2 "uzp")]) +; op code for REV instructions (size within which elements are reversed). +(define_int_attr rev_op [(UNSPEC_REV64 "64") (UNSPEC_REV32 "32") + (UNSPEC_REV16 "16")]) + (define_int_attr perm_hilo [(UNSPEC_ZIP1 "1") (UNSPEC_ZIP2 "2") (UNSPEC_TRN1 "1") (UNSPEC_TRN2 "2") (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")])