From patchwork Wed Nov 27 17:09:07 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejas Belagod <tbelagod@arm.com>
X-Patchwork-Id: 294627
Return-Path: 
 <gcc-patches-return-356440-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (Client did not present a certificate)
	by ozlabs.org (Postfix) with ESMTPS id 655902C007C
	for <incoming@patchwork.ozlabs.org>;
	Thu, 28 Nov 2013 05:09:23 +1100 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:references
	:in-reply-to:content-type; q=dns; s=default; b=m3TqdjL7vdITwNT8t
	La2cN4HaKm4CVuBIl6wBcBFElzn7wxX7QNnRTOj8qn5nKSqImQ8YU9hyjs6O4zXJ
	za5WsWwTrtgjkFDEx2zny0yiSkxJCqkIUzfcNnGEQyGSQzM2IvSXnMfuZLH7RPhR
	P/e+wQVNpYpRGxDSt4yCgAO8Qo=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:references
	:in-reply-to:content-type; s=default; bh=l43fkExGoVlU1RG3QhN1cbG
	xAu4=; b=CEhw8YswJ9Vv31tU+h2vcA4qu3Y/F1JExuk2VihPcokZSTeF11/6Zkw
	Itqh7YlZAAS0svjv9WG5h3tgZ02RhY39sRM10f2b6HBUOui6m9e9CDi43h/duy5Q
	4wXflwayQvEt9Y65mImzv4gNT0vsNcgox0ET0pcgkeXkb4vzjeL8=
Received: (qmail 6692 invoked by alias); 27 Nov 2013 17:09:20 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-##L=##H@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 6674 invoked by uid 89); 27 Nov 2013 17:09:19 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL, BAYES_50,
	RDNS_NONE, SPF_PASS, URIBL_BLOCKED autolearn=no version=3.3.2
X-HELO: service87.mimecast.com
Received: from Unknown (HELO service87.mimecast.com) (91.220.42.44) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 27 Nov 2013 17:09:18 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.21]) by service87.mimecast.com;
	Wed, 27 Nov 2013 17:09:09 +0000
Received: from [10.1.203.80] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Wed, 27 Nov 2013 17:09:08 +0000
Message-ID: <52962733.7030005@arm.com>
Date: Wed, 27 Nov 2013 17:09:07 +0000
From: Tejas Belagod <tbelagod@arm.com>
User-Agent: Thunderbird 2.0.0.18 (X11/20081120)
MIME-Version: 1.0
To: Bill Schmidt <wschmidt@linux.vnet.ibm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	rdsandiford@googlemail.com
Subject: Re: [Patch, RTL] Eliminate redundant vec_select moves.
References: <527A4309.70209@arm.com>
	<8738n9sj8o.fsf@talisman.default>	<527A5EF4.5090505@arm.com>
	<87y551r01p.fsf@talisman.default>	<527A7612.2080406@arm.com>
	<877gcll9ht.fsf@talisman.default>	<527BA073.30900@arm.com>
	<87zjpg1d5p.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>
	<527BD411.6060300@arm.com>
	<878uwwdnx0.fsf@talisman.default>
In-Reply-To: <878uwwdnx0.fsf@talisman.default>
X-MC-Unique: 113112717090905801
X-IsSubscribed: yes

Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>>> The problem is that one reg rtx can span several hard registers.
>>> E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
>>> but it might instead represent two 32-bit registers (nos. 32 and 33).
>>> Obviously the latter's not very likely for vectors this small,
>>> but more likely for larger ones (including on NEON IIRC).
>>>
>>> So if we had 2 32-bit registers being treated as a V4HI, it would be:
>>>
>>>    <--32--><--33-->
>>>    msb          lsb
>>>    0000111122223333
>>>    VVVVVVVV
>>>    00001111
>>>    msb  lsb
>>>    <--32-->
>>>
>>> for big endian and:
>>>
>>>    <--33--><--32-->
>>>    msb          lsb
>>>    3333222211110000
>>>            VVVVVVVV
>>>            11110000
>>>            msb  lsb
>>>            <--32-->
>>>
>>> for little endian.
>> Ah, ok, that makes things clearer. Thanks for that.
>>
>> I can't find any helper function that figures out if we're writing partial or 
>> full result regs. Would something like
>>
>>      REGNO (src) == REGNO (dst) &&
>>      HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1
>>
>> be a sane check for partial result regs?
> 
> Yeah, that should work.  I think a more general alternative would be:
> 
>   simplify_subreg_regno (REGNO (src), GET_MODE (src),
>                          offset, GET_MODE (dst)) == (int) REGNO (dst)
> 
> where:
> 
>   offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0))
> 
> That offset is the byte offset of the first selected element from the
> start of a vector in memory, which is also the way that SUBREG_BYTEs
> are counted.  For little-endian it gives the offset of the lsb of the
> slice, while for big-endian it gives the offset of the msb (which is
> also how SUBREG_BYTEs work).
> 
> The simplify_subreg_regno should cope with both single-register vectors
> and multi-register vectors.

Sorry for the delayed response to this.

Thanks for the tip. Here's an improved patch that implements the 
simplify_sureg_regno () method of eliminating redundant moves. Regarding the 
test case, I failed to get the ppc back-end to generate RTL pattern that this 
patch checks for. I can easily write a test case for aarch64(big and little 
endian) on these lines

typedef float float32x4_t __attribute__ ((__vector_size__ (16)));

float foo_be (float32x4_t x)
{
   return x[3];
}

float foo_le (float32x4_t x)
{
   return x[0];
}

where I know that the vector indexing will generate a vec_select on the same src 
and dst regs that could be optimized away and hence test it. But I'm struggling 
to get a test case  that the ppc altivec back-end will generate such a 
vec_select for. I see that altivec does not define vec_extract, so a simple 
indexing like this seems to happen via memory. Also, I don't know enough about 
the ppc PCS or architecture to write a test that will check for this 
optimization opportunity on same src and dst hard-registers. Any hints?

This patch has been bootstrapped on x64_64 and regressed on aarch64-none-elf and 
aarch64_be-none-elf.

Thanks for your patience,
Tejas.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 0cd0c7e..ca25ce5 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
       dst = SUBREG_REG (dst);
     }
 
+  /* It is a NOOP if destination overlaps with selected src vector
+     elements.  */
+  if (GET_CODE (src) == VEC_SELECT
+      && REG_P (XEXP (src, 0)) && REG_P (dst)
+      && HARD_REGISTER_P (XEXP (src, 0))
+      && HARD_REGISTER_P (dst))
+    {
+      rtx par = XEXP (src, 1);
+      rtx src0 = XEXP (src, 0);
+      HOST_WIDE_INT offset =
+	GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0));
+
+      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
+				    offset, GET_MODE (dst)) == (int)REGNO (dst);
+    }
+
   return (REG_P (src) && REG_P (dst)
 	  && REGNO (src) == REGNO (dst));
 }