From patchwork Wed May 30 02:03:45 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramana Radhakrishnan X-Patchwork-Id: 161831 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id AED1AB6FD1 for ; Wed, 30 May 2012 12:04:19 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1338948261; h=Comment: DomainKey-Signature:Received:Received:Received:Received: MIME-Version:Received:Received:In-Reply-To:References:Date: Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List: Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:Sender:Delivered-To; bh=naNZBx5qJeDtsDuoi3ZVG+5ZFcA=; b=ECvao19GK1fmVuxCTgCrWd4IFvb/HarZX3g/wMR78TLFGHBcGS993ItD+hCuJ6 wAq+zRjxOgE7fxsi8thGwvWv59Qa3SYjd/nrEKSML2qvWW2Z21IMR0O/wQPpGBSv QX3/I8rlGmEX015HbaFD5FrDKDSrzNxSQOxTAeSF6qh8U= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:X-Google-DKIM-Signature:MIME-Version:Received:Received:In-Reply-To:References:Date:Message-ID:Subject:From:To:Cc:Content-Type:X-Gm-Message-State:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=kpQzesoqyKv5t/gQFajA3AynYloEouIKlh1hsPYJPInUtyweaqX3rQxlMUzolk OwSEaEJLYfTgK9lkr5cfO3yqY0zqaPMnQs+0T+7spKDJndpz+Ph1Sr34OapjVJfI VJ1ed+R7/tyfmzwF1j15ZcFuGscUPou7E6XP18x3bu9PI=; Received: (qmail 25649 invoked by alias); 30 May 2012 02:04:10 -0000 Received: (qmail 25619 invoked by uid 22791); 30 May 2012 02:04:08 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, KHOP_THREADED, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-qc0-f175.google.com (HELO mail-qc0-f175.google.com) (209.85.216.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 30 May 2012 02:03:46 +0000 Received: by qcso7 with SMTP id o7so2540260qcs.20 for ; Tue, 29 May 2012 19:03:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=O5c3Yof6V532v7N/xib7z9YheeN1BFk5LhGtYcn6SAc=; b=VRocF9ppnkXC+1yGtPpi5jCVZukq/b1vSXcdTVSk+8DXa1VWvKQLP4fDEsb1wbEWqr A70cZbwAynHsGHy7siDuRhY3pA8vamg6I5P1596/YAuQ2OXLxkGWZszMMwyqwGPYZX1w G50gm0fOGsjGOVJo+Uw3W2aXgw3Y6RaxYm6HdIkU9jtwuufjSudZNfBwKH4bamUb81YE D1KbMwkdXu1ZBegsZdPIQGzlYISkyKXGJM6WgQwN2gVK4wL/Z9exjguKNNs8V6okhWRa Cd9NVQAltj0WxxD3ZN7+IQG2xURKKPHtdB5hPocrYtQ9q2Tmj3oZIIIWrxlZqO5ow8eT 2sXw== MIME-Version: 1.0 Received: by 10.224.39.68 with SMTP id f4mr284137qae.3.1338343425288; Tue, 29 May 2012 19:03:45 -0700 (PDT) Received: by 10.224.137.73 with HTTP; Tue, 29 May 2012 19:03:45 -0700 (PDT) In-Reply-To: <4FC5079D.2010904@redhat.com> References: <4FC5079D.2010904@redhat.com> Date: Wed, 30 May 2012 03:03:45 +0100 Message-ID: Subject: Re: [Patch ARM] Fix off by one error in neon_evpc_vrev. From: Ramana Radhakrishnan To: Richard Henderson Cc: gcc-patches@gcc.gnu.org, Patch Tracking , Richard Earnshaw , Richard Guenther X-Gm-Message-State: ALoCoQnFyGWPzFcyWd4RX0iptvl9lUGXxylS9ppJIdkbTNuXY4zC1X+A7xqt4hX8ewUOIB3np7XV X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 29 May 2012 18:30, Richard Henderson wrote: > On 05/26/2012 01:27 AM, Ramana Radhakrishnan wrote: >> >> -  for (i = 0; i<  nelt; i += diff) >> +  for (i = 0; i<  nelt ; i += (diff + 1)) >>      for (j = 0; j<= diff; j += 1) >> -      if (d->perm[i + j] != i + diff - j) >> -       return false; >> +      { >> +       /* This is guaranteed to be true as the value of diff >> +          is 7, 3, 1 and we should have enough elements in the >> +          queue to generate this. Getting a vector mask with a >> +          value of diff other than these values implies that >> +          something is wrong by the time we get here.  */ >> +       gcc_assert ((i + j)<  nelt); > > > Yep, that all looks correct.  Unnecessary () in both lines though. Bah - Thanks - don't know why I put those in :( .Committed to trunk with those changes and I would like to backport this to the 4.7 branch after a couple of weeks to allow the auto-testers to pick this up as it really turns on this functionality in this particular case if the release managers don't object. This is a significant performance issue in 4.7 for cases where we reverse vectors and would be nice to fix there. ( 2 loads + 2 generic permutes vs a single reverse instruciton) regards, Ramana 2012-05-30 Ramana Radhakrishnan * config/arm/arm.c (arm_evpc_neon_vrev): Adjust off by one error. * gcc.target/arm/neon-vrev..c: New. > > > r~ Index: gcc/testsuite/gcc.target/arm/neon-vrev.c =================================================================== --- gcc/testsuite/gcc.target/arm/neon-vrev.c (revision 0) +++ gcc/testsuite/gcc.target/arm/neon-vrev.c (revision 187999) @@ -0,0 +1,105 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +#include + +uint16x4_t +tst_vrev642_u16 (uint16x4_t __a) +{ + uint16x4_t __rv; + uint16x4_t __mask1 = { 3, 2, 1, 0}; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint16x8_t +tst_vrev64q2_u16 (uint16x8_t __a) +{ + uint16x8_t __rv; + uint16x8_t __mask1 = {3, 2, 1, 0, 7, 6, 5, 4 }; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint8x8_t +tst_vrev642_u8 (uint8x8_t __a) +{ + uint8x8_t __rv; + uint8x8_t __mask1 = { 7, 6, 5, 4, 3, 2, 1, 0}; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint8x16_t +tst_vrev64q2_u8 (uint8x16_t __a) +{ + uint8x16_t __rv; + uint8x16_t __mask1 = {7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8}; + return __builtin_shuffle ( __a, __mask1) ; + +} + +uint32x2_t +tst_vrev642_u32 (uint32x2_t __a) +{ + uint32x2_t __rv; + uint32x2_t __mask1 = {1, 0}; + return __builtin_shuffle ( __a, __mask1) ; + +} + +uint32x4_t +tst_vrev64q2_u32 (uint32x4_t __a) +{ + uint32x4_t __rv; + uint32x4_t __mask1 = {1, 0, 3, 2}; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint16x4_t +tst_vrev322_u16 (uint16x4_t __a) +{ + uint16x4_t __mask1 = { 1, 0, 3, 2 }; + return __builtin_shuffle (__a, __mask1); +} + +uint16x8_t +tst_vrev32q2_u16 (uint16x8_t __a) +{ + uint16x8_t __mask1 = { 1, 0, 3, 2, 5, 4, 7, 6 }; + return __builtin_shuffle (__a, __mask1); +} + +uint8x8_t +tst_vrev322_u8 (uint8x8_t __a) +{ + uint8x8_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4}; + return __builtin_shuffle (__a, __mask1); +} + +uint8x16_t +tst_vrev32q2_u8 (uint8x16_t __a) +{ + uint8x16_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12}; + return __builtin_shuffle (__a, __mask1); +} + +uint8x8_t +tst_vrev162_u8 (uint8x8_t __a) +{ + uint8x8_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6}; + return __builtin_shuffle (__a, __mask); +} + +uint8x16_t +tst_vrev16q2_u8 (uint8x16_t __a) +{ + uint8x16_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14}; + return __builtin_shuffle (__a, __mask); +} + +/* { dg-final {scan-assembler-times "vrev32\.16\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev32\.8\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev16\.8\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev64\.8\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev64\.32\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev64\.16\\t" 2} } */ Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 187998) +++ gcc/config/arm/arm.c (revision 187999) @@ -25637,10 +25637,18 @@ return false; } - for (i = 0; i < nelt; i += diff) + for (i = 0; i < nelt ; i += diff + 1) for (j = 0; j <= diff; j += 1) - if (d->perm[i + j] != i + diff - j) - return false; + { + /* This is guaranteed to be true as the value of diff + is 7, 3, 1 and we should have enough elements in the + queue to generate this. Getting a vector mask with a + value of diff other than these values implies that + something is wrong by the time we get here. */ + gcc_assert (i + j < nelt); + if (d->perm[i + j] != i + diff - j) + return false; + } /* Success! */ if (d->testing_p)