From patchwork Mon Jun  9 10:13:06 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Evgeny Stupachenko <evstupac@gmail.com>
X-Patchwork-Id: 357378
Return-Path: 
 <gcc-patches-return-369785-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 269AA140092
	for <incoming@patchwork.ozlabs.org>;
	Mon,  9 Jun 2014 20:13:20 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type; q=dns; s=default; b=YPxXd8fEayoUtBOAEr
	2qHSWTlKUZpAHzKpTfj/Lrt6rUwNeBV2MSCdPZGB8Q0L2UheNe0buYjpfYsouPsL
	Jp28AF5M4g5XpZT/zEEn0IUgA2zoAnsVhJyZlkeRa/c47JP3h8Zu30T+vAYgUsr5
	jGf336zNSN8vSnV9/rxYRwOWw=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type; s=default; bh=wzsN1/axhmVpxuiTl3gNnAg5
	hPs=; b=CZ1oHfJDoeX9+Gxl9L6GTH4r/AOmbkqqS0IElNosT91wO5x0ZHVIVpcj
	VGRL3qPAQURULVHwFW0HYz2FJDcoiD1DuXp2lF80OfQ38RzWvckFj16EXufz5gCU
	5aroKHgcmmDZSq8XWHi35IwdSySTY271HYWH7k45b47t61kmwr0=
Received: (qmail 26050 invoked by alias); 9 Jun 2014 10:13:13 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 26039 invoked by uid 89); 9 Jun 2014 10:13:11 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00,
	FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-oa0-f54.google.com
Received: from mail-oa0-f54.google.com (HELO mail-oa0-f54.google.com)
	(209.85.219.54) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted)
	ESMTPS; Mon, 09 Jun 2014 10:13:08 +0000
Received: by mail-oa0-f54.google.com with SMTP id eb12so3052356oac.41 for
	<gcc-patches@gcc.gnu.org>; Mon, 09 Jun 2014 03:13:06 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.182.245.164 with SMTP id xp4mr23577279obc.23.1402308786285;
	Mon, 09 Jun 2014 03:13:06 -0700 (PDT)
Received: by 10.76.18.209 with HTTP; Mon, 9 Jun 2014 03:13:06 -0700 (PDT)
In-Reply-To: <53909F88.7070009@redhat.com>
References: 
 <CAOvf_xwh02FHmgQDe+7Qj0Dec0c8LuqBjP5UaMkc4TwrdP11nA@mail.gmail.com>
	<53909F88.7070009@redhat.com>
Date: Mon, 9 Jun 2014 14:13:06 +0400
Message-ID: 
 <CAOvf_xwdO2rn+GWiBNF9=9LJBCepfttc5UYEct10aXmT6aCvjQ@mail.gmail.com>
Subject: Re: [PATCH, x86] Improves x86 permutation expand
From: Evgeny Stupachenko <evstupac@gmail.com>
To: Richard Henderson <rth@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>, GCC Patches <gcc-patches@gcc.gnu.org>
X-IsSubscribed: yes

Right now we need to cover permutations coming from 3 loads/stores
group. My case covers them. I agree that another order of pblend and
pshufb covers additional cases. Good point. We can cover this in a
separate patch.

Asserts are ok if we exclude AVX2 in ISA checks.

Is the following patch ok?

   if (expand_vec_perm_2vperm2f128_vshuf (d))


On Thu, Jun 5, 2014 at 8:49 PM, Richard Henderson <rth@redhat.com> wrote:
> On 06/05/2014 08:29 AM, Evgeny Stupachenko wrote:
>> +  /* Figure out where permutation elements stay not in their
>> +     respective lanes.  */
>> +  for (i = 0, which = 0; i < nelt; ++i)
>> +    {
>> +      unsigned e = d->perm[i];
>> +      if (e != i)
>> +       which |= (e < nelt ? 1 : 2);
>> +    }
>> +  /* We can pblend the part where elements stay not in their
>> +     respective lanes only when these elements are all in one
>> +     half of a permutation.
>> +     {0 1 8 3 4 5 9 7} is ok as 8, 9 are not at their respective
>> +     lanes, but both 8 and 9 >= 8
>> +     {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their
>> +     respective lanes and 8 >= 8, but 2 not.  */
>> +  if (which != 1 && which != 2)
>> +    return false;
>
> I was about to suggest that you'd get more success by putting the blend first,
> and do the shuffle second.  But I suppose it does cover a few cases that the
> other way would miss, e.g.
>
>   { 0 4 7 3 }
>
> because we can't blend 0 and 4 (or 3 and 7) into the same vector.  Whereas the
> direction you're trying can't handle
>
>   { 0 6 6 1 }
>
> But that can be implemented with
>
>   { 0 1 2 3 }
>   { 4 5 6 7 }
>   -----------
>   { 0 1 6 3 } (pblend)
>   -----------
>   { 0 6 6 1 } (pshufb)
>
> So I guess we should cover these two cases in successive patches.
>
>> +  if (!expand_vec_perm_blend (&dcopy1))
>> +    return false;
>> +
>> +  return true;
>
> You should avoid doing any work in this function if the ISA isn't enabled.
> Don't wait until the last test for blend to fail.  Separate that out from the
> start of expand_vec_perm_blend as a subroutine, perhaps.
>
> We should be able to prove that we've got a valid blend as input here, so I'd
> be more inclined to write
>
>   ok = expand_vec_perm_blend (&dcopy1);
>   gcc_assert (ok);
>   return true;
>
>> +  if (!expand_vec_perm_1 (&dcopy))
>> +    return false;
>
> If we know we have pblend, then we know we have pshufb, so again I don't see
> how expand_vec_perm_1 can fail.  Another assert would be good.
>
> There is a point, earlier in the function, where we know whether we're going to
> succeed or not.  I believe just after
>
>> +  if (which != 1 && which != 2)
>> +    return false;
>
> You should add a
>
>   if (d->testing_p)
>     return true;
>
> at that point.
>
>
> r~

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8827256..1fe2398 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -43185,6 +43185,90 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
   return ok;
 }

+/* A subroutine of ix86_expand_vec_perm_const_1.  Try to simplify
+   the permutation using the SSE4_1 pblendv instruction.  Potentially
+   reduces permutaion from 2 pshufb and or to 1 pshufb and pblendv.  */
+
+static bool
+expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
+{
+  unsigned i, which, nelt = d->nelt;
+  struct expand_vec_perm_d dcopy, dcopy1;
+  enum machine_mode vmode = d->vmode;
+  bool ok;
+
+  /* Use the same checks as in expand_vec_perm_blend, but skipping
+     AVX2 as it requires more than 2 instructions for general case.  */
+  if (d->one_operand_p)
+    return false;
+  if (TARGET_AVX && (vmode == V4DFmode || vmode == V8SFmode))
+    ;
+  else if (TARGET_SSE4_1 && GET_MODE_SIZE (vmode) == 16)
+    ;
+  else
+    return false;
+
+  /* Figure out where permutation elements stay not in their
+     respective lanes.  */
+  for (i = 0, which = 0; i < nelt; ++i)
+    {
+      unsigned e = d->perm[i];
+      if (e != i)
+       which |= (e < nelt ? 1 : 2);
+    }
+  /* We can pblend the part where elements stay not in their
+     respective lanes only when these elements are all in one
+     half of a permutation.
+     {0 1 8 3 4 5 9 7} is ok as 8, 9 are at not at their respective
+     lanes, but both 8 and 9 >= 8
+     {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their
+     respective lanes and 8 >= 8, but 2 not.  */
+  if (which != 1 && which != 2)
+    return false;
+  if (d->testing_p)
+    return true;
+
+  /* First we apply one operand permutation to the part where
+     elements stay not in their respective lanes.  */
+  dcopy = *d;
+  if (which == 2)
+    dcopy.op0 = dcopy.op1 = d->op1;
+  else
+    dcopy.op0 = dcopy.op1 = d->op0;
+  dcopy.one_operand_p = true;
+
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned e = d->perm[i];
+      if (which == 2)
+       dcopy.perm[i] = ((e >= nelt) ? (e - nelt) : e);
+    }
+
+  ok = expand_vec_perm_1 (&dcopy);
+  gcc_assert (ok);
+
+  /* Next we put permuted elements into thier positions.  */
+  dcopy1 = *d;
+  if (which == 2)
+    dcopy1.op1 = dcopy.target;
+  else
+    dcopy1.op0 = dcopy.target;
+
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned e = d->perm[i];
+      if (which == 2)
+       dcopy1.perm[i] = ((e >= nelt) ? (nelt + i) : e);
+      else
+       dcopy1.perm[i] = ((e < nelt) ? i : e);
+    }
+
+  ok = expand_vec_perm_blend (&dcopy1);
+  gcc_assert (ok);
+
+  return true;
+}
+
 static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d);

 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
@@ -44557,6 +44641,9 @@ ix86_expand_vec_perm_const_1 (struct
expand_vec_perm_d *d)
   if (expand_vec_perm_vperm2f128 (d))
     return true;

+  if (expand_vec_perm_pblendv (d))
+    return true;
+
   /* Try sequences of three instructions.  */