From patchwork Wed Jun 25 09:21:31 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 363918 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 034C31400A7 for ; Wed, 25 Jun 2014 19:21:44 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; q=dns; s= default; b=HNjNIOs2EK4OiYKLgFO16Yljn90fsA0N8qrqpvzC8TgkoKwU8EcuY 82UQbMzV6HwS+y3UIDTwtZSqY4fN8GluI82OeTD3ptdhoXas86b1iAT5q+sxWzk3 Ifr6N27bE79yecHsQrLDXjXRTujuDIvlRQuSFoA47xPS13gFFuQVRg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; s=default; bh=LHLVNzYvUSFt7ohsE9ZvcLpoe4A=; b=uFG0/NBP98SAkXe1jDvKDeZ1QnhD NeO/b3M71SVuxzU2jv/M14ypTrC0bp3fqbKBXHtJkIPuXoAhglizcCOXJUg+5Tp7 OK9A0s9MXZTAdDui/mkdyrt39akFcXI+72s3clRrUdFxtRx+kghPpb9oog9IaXFk R/UTuwb7MwX/49A= Received: (qmail 19411 invoked by alias); 25 Jun 2014 09:21:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 19371 invoked by uid 89); 25 Jun 2014 09:21:37 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 25 Jun 2014 09:21:35 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Wed, 25 Jun 2014 10:21:32 +0100 Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 25 Jun 2014 10:21:23 +0100 Message-ID: <53AA949B.4030709@arm.com> Date: Wed, 25 Jun 2014 10:21:31 +0100 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH AARCH64] fix and enable non-const shuffle for bigendian using TBL instruction References: <5357DC70.5080907@arm.com> In-Reply-To: <5357DC70.5080907@arm.com> X-MC-Unique: 114062510213210501 This one seems to have slipped under the radar. I've just rebased and run the regression tests on aarch64_be-none-elf, with no issues; ping? (patch applied straightforwardly, but rebased version below) --Alan Alan Lawrence wrote: > At present vec_perm with non-const indices is not handled on bigendian, so gcc > generates generic, slow, code. This patch fixes up TBL to reverse the indices > within each input vector (following Richard Henderson's suggestion of using an > XOR with (nelts - 1) rather than a complicated mask/add/subtract, > http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01285.html), and enables the code > for bigendian. > > Regressed on aarch64_be-none-elf with no changes. (This is as expected: in all > affected cases, gcc was already producing correct non-arch-specific code using > scalar op. However, I have manually verified for various tests in > c-c++-common/torture/vshuf-v* that (a) TBL instructions are now produced, (b) a > version of the compiler that produces TBLs without the index correction, fails > tests). > > Note tests c-c++-common/torture/vshuf-{v16hi,v4df,v4di,v8si} (i.e. the 32-byte > vectors) were broken prior to this patch and are not affected. > > gcc/ChangeLog: > 2014-04-23 Alan Lawrence > > * config/aarch64/aarch64-simd.md (vec_perm): Enable for bigendian. > * config/aarch64/aarch64.c (aarch64_expand_vec_perm): Remove assert > against bigendian and adjust indices. > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-sim index 42bfd3e..08eb6b3 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4224,7 +4224,7 @@ (match_operand:VB 1 "register_operand") (match_operand:VB 2 "register_operand") (match_operand:VB 3 "register_operand")] - "TARGET_SIMD && !BYTES_BIG_ENDIAN" + "TARGET_SIMD" { aarch64_expand_vec_perm (operands[0], operands[1], operands[2], operands[3]); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b2d005b..0ea277a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8730,18 +8730,24 @@ aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, r enum machine_mode vmode = GET_MODE (target); unsigned int i, nelt = GET_MODE_NUNITS (vmode); bool one_vector_p = rtx_equal_p (op0, op1); - rtx rmask[MAX_VECT_LEN], mask; - - gcc_checking_assert (!BYTES_BIG_ENDIAN); + rtx mask; /* The TBL instruction does not use a modulo index, so we must take care of that ourselves. */ - mask = GEN_INT (one_vector_p ? nelt - 1 : 2 * nelt - 1); - for (i = 0; i < nelt; ++i) - rmask[i] = mask; - mask = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rmask)); + mask = aarch64_simd_gen_const_vector_dup (vmode, + one_vector_p ? nelt - 1 : 2 * nelt - 1); sel = expand_simple_binop (vmode, AND, sel, mask, NULL, 0, OPTAB_LIB_WIDEN); + /* For big-endian, we also need to reverse the index within the vector + (but not which vector). */ + if (BYTES_BIG_ENDIAN) + { + /* If one_vector_p, mask is a vector of (nelt - 1)'s already. */ + if (!one_vector_p) + mask = aarch64_simd_gen_const_vector_dup (vmode, nelt - 1); + sel = expand_simple_binop (vmode, XOR, sel, mask, + NULL, 0, OPTAB_LIB_WIDEN); + } aarch64_expand_vec_perm_1 (target, op0, op1, sel); }