From patchwork Wed May 30 02:03:45 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
X-Patchwork-Id: 161831
Return-Path: 
 <gcc-patches-return-319819-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id AED1AB6FD1
	for <incoming@patchwork.ozlabs.org>;
	Wed, 30 May 2012 12:04:19 +1000 (EST)
Comment: DKIM? See http://www.dkim.org
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed;
	d=gcc.gnu.org; s=default; x=1338948261; h=Comment:
	DomainKey-Signature:Received:Received:Received:Received:
	MIME-Version:Received:Received:In-Reply-To:References:Date:
	Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List:
	Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:
	List-Help:Sender:Delivered-To; bh=naNZBx5qJeDtsDuoi3ZVG+5ZFcA=;
	b=ECvao19GK1fmVuxCTgCrWd4IFvb/HarZX3g/wMR78TLFGHBcGS993ItD+hCuJ6
	wAq+zRjxOgE7fxsi8thGwvWv59Qa3SYjd/nrEKSML2qvWW2Z21IMR0O/wQPpGBSv
	QX3/I8rlGmEX015HbaFD5FrDKDSrzNxSQOxTAeSF6qh8U=
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org;
	h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:X-Google-DKIM-Signature:MIME-Version:Received:Received:In-Reply-To:References:Date:Message-ID:Subject:From:To:Cc:Content-Type:X-Gm-Message-State:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To;
	b=kpQzesoqyKv5t/gQFajA3AynYloEouIKlh1hsPYJPInUtyweaqX3rQxlMUzolk
	OwSEaEJLYfTgK9lkr5cfO3yqY0zqaPMnQs+0T+7spKDJndpz+Ph1Sr34OapjVJfI
	VJ1ed+R7/tyfmzwF1j15ZcFuGscUPou7E6XP18x3bu9PI=;
Received: (qmail 25649 invoked by alias); 30 May 2012 02:04:10 -0000
Received: (qmail 25619 invoked by uid 22791); 30 May 2012 02:04:08 -0000
X-SWARE-Spam-Status: No, hits=-2.8 required=5.0	tests=AWL, BAYES_00,
	KHOP_RCVD_UNTRUST, KHOP_THREADED, RCVD_IN_DNSWL_LOW,
	RCVD_IN_HOSTKARMA_YE
X-Spam-Check-By: sourceware.org
Received: from mail-qc0-f175.google.com (HELO mail-qc0-f175.google.com)
	(209.85.216.175) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Wed, 30 May 2012 02:03:46 +0000
Received: by qcso7 with SMTP id o7so2540260qcs.20 for
	<gcc-patches@gcc.gnu.org>; Tue, 29 May 2012 19:03:45 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:x-gm-message-state;
	bh=O5c3Yof6V532v7N/xib7z9YheeN1BFk5LhGtYcn6SAc=;
	b=VRocF9ppnkXC+1yGtPpi5jCVZukq/b1vSXcdTVSk+8DXa1VWvKQLP4fDEsb1wbEWqr
	A70cZbwAynHsGHy7siDuRhY3pA8vamg6I5P1596/YAuQ2OXLxkGWZszMMwyqwGPYZX1w
	G50gm0fOGsjGOVJo+Uw3W2aXgw3Y6RaxYm6HdIkU9jtwuufjSudZNfBwKH4bamUb81YE
	D1KbMwkdXu1ZBegsZdPIQGzlYISkyKXGJM6WgQwN2gVK4wL/Z9exjguKNNs8V6okhWRa
	Cd9NVQAltj0WxxD3ZN7+IQG2xURKKPHtdB5hPocrYtQ9q2Tmj3oZIIIWrxlZqO5ow8eT
	2sXw==
MIME-Version: 1.0
Received: by 10.224.39.68 with SMTP id f4mr284137qae.3.1338343425288;
	Tue, 29 May 2012 19:03:45 -0700 (PDT)
Received: by 10.224.137.73 with HTTP; Tue, 29 May 2012 19:03:45 -0700 (PDT)
In-Reply-To: <4FC5079D.2010904@redhat.com>
References: 
 <CACUk7=WZMsuDyRAxhipc=bUcfACRCM1SPhV=mSuPWcXP4XON2Q@mail.gmail.com>
	<4FC5079D.2010904@redhat.com>
Date: Wed, 30 May 2012 03:03:45 +0100
Message-ID: 
 <CACUk7=V9zDvb0em-67zrK04ZrCYWNp50YmQabzgS0y+gCB_HBg@mail.gmail.com>
Subject: Re: [Patch ARM] Fix off by one error in neon_evpc_vrev.
From: Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
To: Richard Henderson <rth@redhat.com>
Cc: gcc-patches@gcc.gnu.org, Patch Tracking <patches@linaro.org>,
	Richard Earnshaw <rearnsha@arm.com>,
	Richard Guenther <richard.guenther@gmail.com>
X-Gm-Message-State: 
 ALoCoQnFyGWPzFcyWd4RX0iptvl9lUGXxylS9ppJIdkbTNuXY4zC1X+A7xqt4hX8ewUOIB3np7XV
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

On 29 May 2012 18:30, Richard Henderson <rth@redhat.com> wrote:
> On 05/26/2012 01:27 AM, Ramana Radhakrishnan wrote:
>>
>> -  for (i = 0; i<  nelt; i += diff)
>> +  for (i = 0; i<  nelt ; i += (diff + 1))
>>      for (j = 0; j<= diff; j += 1)
>> -      if (d->perm[i + j] != i + diff - j)
>> -       return false;
>> +      {
>> +       /* This is guaranteed to be true as the value of diff
>> +          is 7, 3, 1 and we should have enough elements in the
>> +          queue to generate this. Getting a vector mask with a
>> +          value of diff other than these values implies that
>> +          something is wrong by the time we get here.  */
>> +       gcc_assert ((i + j)<  nelt);
>
>
> Yep, that all looks correct.  Unnecessary () in both lines though.

Bah - Thanks - don't know why I put those in :(  .Committed to trunk
with those changes and I would like to backport this to the 4.7 branch
after a couple of weeks to allow the auto-testers to pick this up as
it really turns on this functionality in this particular case if the
release managers don't object. This is a significant performance issue
in 4.7 for cases where we reverse vectors and would be nice to fix
there. ( 2 loads + 2 generic permutes vs a single reverse instruciton)

regards,
Ramana


2012-05-30  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>

	* config/arm/arm.c (arm_evpc_neon_vrev): Adjust off by one error.
        * gcc.target/arm/neon-vrev..c: New.


>
>
> r~

Index: gcc/testsuite/gcc.target/arm/neon-vrev.c
===================================================================
--- gcc/testsuite/gcc.target/arm/neon-vrev.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/neon-vrev.c	(revision 187999)
@@ -0,0 +1,105 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+
+#include <arm_neon.h>
+
+uint16x4_t
+tst_vrev642_u16 (uint16x4_t __a)
+{
+  uint16x4_t __rv;
+  uint16x4_t __mask1 = { 3, 2, 1, 0};
+  return __builtin_shuffle ( __a, __mask1) ;
+}
+
+uint16x8_t
+tst_vrev64q2_u16 (uint16x8_t __a)
+{
+  uint16x8_t __rv;
+  uint16x8_t __mask1 = {3, 2, 1, 0, 7, 6, 5, 4 };
+  return __builtin_shuffle ( __a, __mask1) ;
+}
+
+uint8x8_t
+tst_vrev642_u8 (uint8x8_t __a)
+{
+  uint8x8_t __rv;
+  uint8x8_t __mask1 = { 7, 6, 5, 4, 3, 2, 1, 0};
+  return __builtin_shuffle ( __a, __mask1) ;
+}
+
+uint8x16_t
+tst_vrev64q2_u8 (uint8x16_t __a)
+{
+  uint8x16_t __rv;
+  uint8x16_t __mask1 = {7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8};
+  return __builtin_shuffle ( __a, __mask1) ;
+
+}
+
+uint32x2_t
+tst_vrev642_u32 (uint32x2_t __a)
+{
+  uint32x2_t __rv;
+  uint32x2_t __mask1 = {1, 0};
+  return __builtin_shuffle ( __a, __mask1) ;
+
+}
+
+uint32x4_t
+tst_vrev64q2_u32 (uint32x4_t __a)
+{
+  uint32x4_t __rv;
+  uint32x4_t __mask1 = {1, 0, 3, 2};
+  return __builtin_shuffle ( __a, __mask1) ;
+}
+
+uint16x4_t
+tst_vrev322_u16 (uint16x4_t __a)
+{
+  uint16x4_t __mask1 = { 1, 0, 3, 2 };
+  return __builtin_shuffle (__a, __mask1);
+}
+
+uint16x8_t
+tst_vrev32q2_u16 (uint16x8_t __a)
+{
+  uint16x8_t __mask1 = { 1, 0, 3, 2, 5, 4, 7, 6 }; 
+  return __builtin_shuffle (__a, __mask1);
+}
+
+uint8x8_t
+tst_vrev322_u8 (uint8x8_t __a)
+{
+  uint8x8_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4};
+  return __builtin_shuffle (__a, __mask1);
+}
+
+uint8x16_t
+tst_vrev32q2_u8 (uint8x16_t __a)
+{
+  uint8x16_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12};
+  return __builtin_shuffle (__a, __mask1);
+}
+
+uint8x8_t
+tst_vrev162_u8 (uint8x8_t __a)
+{
+  uint8x8_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6};
+  return __builtin_shuffle (__a, __mask);
+}
+
+uint8x16_t
+tst_vrev16q2_u8 (uint8x16_t __a)
+{
+  uint8x16_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14};
+  return __builtin_shuffle (__a, __mask);
+}
+
+/* { dg-final {scan-assembler-times "vrev32\.16\\t" 2} }  */
+/* { dg-final {scan-assembler-times "vrev32\.8\\t" 2} }  */ 
+/* { dg-final {scan-assembler-times "vrev16\.8\\t" 2} }  */
+/* { dg-final {scan-assembler-times "vrev64\.8\\t" 2} }  */
+/* { dg-final {scan-assembler-times "vrev64\.32\\t" 2} }  */
+/* { dg-final {scan-assembler-times "vrev64\.16\\t" 2} }  */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 187998)
+++ gcc/config/arm/arm.c	(revision 187999)
@@ -25637,10 +25637,18 @@
       return false;
     }
 
-  for (i = 0; i < nelt; i += diff)
+  for (i = 0; i < nelt ; i += diff + 1)
     for (j = 0; j <= diff; j += 1)
-      if (d->perm[i + j] != i + diff - j)
-	return false;
+      {
+	/* This is guaranteed to be true as the value of diff
+	   is 7, 3, 1 and we should have enough elements in the
+	   queue to generate this. Getting a vector mask with a
+	   value of diff other than these values implies that
+	   something is wrong by the time we get here.  */
+	gcc_assert (i + j < nelt);
+	if (d->perm[i + j] != i + diff - j)
+	  return false;
+      }
 
   /* Success! */
   if (d->testing_p)