From patchwork Tue Jan 26 13:20:09 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
X-Patchwork-Id: 573238
Return-Path: 
 <gcc-patches-return-420043-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 16E8E1402C0
	for <incoming@patchwork.ozlabs.org>;
	Wed, 27 Jan 2016 00:20:24 +1100 (AEDT)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=kZOIcHFH; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:references
	:in-reply-to:content-type:content-transfer-encoding; q=dns; s=
	default; b=omKK3WY06apmo2yAUqSQYOjzqrqcPQQVYhXg8b9uwXjf2Dl1x7OGY
	iv5edECopL0dDZLBr7G2HR7X4nMRWd0u+B00qnApGd6e9JQtSbyhIHNGU5dyLCMw
	TXibhfjZtymLeUux+rUJjpREfIVyobxPVtuPe6jQJMQAPpjPKH8+6k=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:references
	:in-reply-to:content-type:content-transfer-encoding; s=default;
	bh=hbz/DK4dLg8J7dcMzv7B+oWEN7k=; b=kZOIcHFHJfvFOYbgFIRDDdjsvI7D
	jO7Q8Jp+ceqinMYMCDasCZm+0xMSY1iq1sLNYYB+aKxyAQ6//wrCbGhaMtTX0Is5
	5Gkom7HNXBZawUjiJTLxlEMAluqsWiNNmkXyWv83flBcY+bCkDk3BVR6We2wpOte
	Y/VqguA6tIbrasg=
Received: (qmail 94278 invoked by alias); 26 Jan 2016 13:20:17 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 94255 invoked by uid 89); 26 Jan 2016 13:20:14 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,
	KAM_LAZY_DOMAIN_SECURITY,
	RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=me!
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Tue, 26 Jan 2016 13:20:13 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by
	usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D671649;
	Tue, 26 Jan 2016 05:19:30 -0800 (PST)
Received: from [10.2.206.200] (e100706-lin.cambridge.arm.com
	[10.2.206.200])	by usa-sjc-imap-foss1.foss.arm.com (Postfix)
	with ESMTPSA id 161143F246; Tue, 26 Jan 2016 05:20:10 -0800 (PST)
Message-ID: <56A77289.30308@foss.arm.com>
Date: Tue, 26 Jan 2016 13:20:09 +0000
From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Christophe Lyon <christophe.lyon@linaro.org>,
	Alan Lawrence <alan.lawrence@foss.arm.com>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)
References: 
 <CAKdteOZbgTw44UarkL9zmmZBd3_gGPof9v0m8mPH-xu9=8yqqQ@mail.gmail.com>
	<1453143711-21320-1-git-send-email-alan.lawrence@arm.com>
	<CAKdteOaNmfhgeFVeaEgddvQ-FntWudtQSowYgPtW=xi8Ert60Q@mail.gmail.com>
	<569E4D77.6000809@foss.arm.com>
	<CAKdteOa+mqbODEs7jWb-Cj4H+OX93jDPwH1uauY2=0Tn-S=LpA@mail.gmail.com>
In-Reply-To: 
 <CAKdteOa+mqbODEs7jWb-Cj4H+OX93jDPwH1uauY2=0Tn-S=LpA@mail.gmail.com>

Hi Christophe,

On 20/01/16 21:10, Christophe Lyon wrote:
> On 19 January 2016 at 15:51, Alan Lawrence <alan.lawrence@foss.arm.com> wrote:
>> On 19/01/16 11:15, Christophe Lyon wrote:
>>
>>>>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
>>>>> neon_vdup_nv8hf instead of updating the VX iterator because I thought
>>>>> it was not desirable to impact neon_vrev32<mode>.
>>>>
>>>> Well, the same instruction will suffice for vrev32'ing vectors of HF just
>>>> as
>>>> well as vectors of HI, so I think I'd argue that's harmless enough. To
>>>> gain the
>>>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases,
>>>> though.
>>>>
>>> Since this is more intrusive, I'd rather leave that part for later. OK?
>>
>> Sure.
>>
>>>>> +#ifdef __ARM_BIG_ENDIAN
>>>>> +  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also the
>>>>> +     right value for vectors with 8 lanes.  */
>>>>> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
>>>>> +#else
>>>>> +#define __arm_lane(__vec, __idx) __idx
>>>>> +#endif
>>>>> +
>>>>
>>>> Looks right, but sounds... my concern here is that I'm hoping at some
>>>> point we
>>>> will move the *other* vget/set_lane intrinsics to use GCC vector
>>>> extensions
>>>> too. At which time (unlike __aarch64_lane which can be used everywhere)
>>>> this
>>>> will be the wrong formula. Can we name (and/or comment) it to avoid
>>>> misleading
>>>> anyone? The key characteristic seems to be that it is for vectors of
>>>> 16-bit
>>>> elements only.
>>>>
>>> I'm not to follow, here. Looking at the patterns for
>>> neon_vget_lane<mode>_*internal in neon.md,
>>> I can see 2 flavours: one for VD, one for VQ2. The latter uses "halfelts".
>>>
>>> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
>>> that would be similar to the aarch64 ones (by computing the number of
>>> lanes of the input vector), but the "q" one would use half the total
>>> number of lanes instead?
>>
>> That works for me! Sthg like:
>>
>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) +
>> (NUM_LANES(__vec)/2 - __idx)
>> //or similarly
>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))
>>
>> Alternatively I'd been thinking
>>
>> #define __arm_lane_32xN(__idx) __idx ^ 1
>> #define __arm_lane_16xN(__idx) __idx ^ 3
>> #define __arm_lane_8xN(__idx) __idx ^ 7
>>
>> Bear in mind PR64893 that we had on AArch64 :-(
>>
> Here is a new version, based on the comments above.
> I've also removed the addition of arm_fp_ok effective target since I
> added that in my other testsuite patch.
>
> OK now?
>
> Thanks,
>
> Christophe
>

I think it'd be clearer to write "if (GET_MODE_INNER (mode) == HFmode)"

+(define_expand "movv4hf"
+  [(set (match_operand:V4HF 0 "s_register_operand")
+	(match_operand:V4HF 1 "s_register_operand"))]
+  "TARGET_NEON && TARGET_FP16"
+{
+  if (can_create_pseudo_p ())
+    {
+      if (!REG_P (operands[0]))
+	operands[1] = force_reg (V4HFmode, operands[1]);
+    }
+})

Can you please add a comment saying why you need the force_reg here?
IIRC it's because of CANNOT_CHANGE_MODE_CLASS on big-endian that causes an
ICE during expand with subregs.

I've tried this patch out and it does indeed fix the ICE on armeb.
So ok for trunk with the changes above.
Thanks,
Kyrill

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3588b83..b1f408c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12370,6 +12370,10 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,
        if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE (el0)))
          return -1;
  
+      /* FP16 vectors cannot be represented.  */
+      if (innersize == 2)
+	return -1;
+
        r0 = CONST_DOUBLE_REAL_VALUE (el0);