From patchwork Mon Apr 28 16:48:09 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Evgeny Stupachenko <evstupac@gmail.com>
X-Patchwork-Id: 343498
Return-Path: 
 <gcc-patches-return-366184-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id B8A9D14007C
	for <incoming@patchwork.ozlabs.org>;
	Tue, 29 Apr 2014 02:48:21 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:date:message-id:subject:from:to:content-type; q=
	dns; s=default; b=Byf0GJQhWa6Jg3Xtk5gWwUn1tq67y7v2DvVvqLbjznX0bC
	aTumwGdJbXYNU6IFnwDLYDobE7+eRgRK3MW/XECPyGVQJMFtZPuc+FGwMOkVIwVX
	+RzBpmrT4LklTdU81PTZmgqS023n7ElG7TsLjL8UmMLZo8G2belTeSHfN5BqY=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:date:message-id:subject:from:to:content-type; s=
	default; bh=qRswT2tbYan+pkIGAHU1+/Aq/bA=; b=BKNax6OqJS8Sc1lLFAyp
	0CBeUSo+XSUm8v6elfRsAtAb6EUQ63kdUuiYkohJBpfHF2Hn4HvbwZEv3l1Rz9Sq
	aZS5fJBOO1HaFHnU+AXSYPY9DgU3/b+nHKWDWFeAFejXdahL+8wrygMQijINFy3H
	0MWImv6zl4aGuFbhKTXUETg=
Received: (qmail 13423 invoked by alias); 28 Apr 2014 16:48:13 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 13407 invoked by uid 89); 28 Apr 2014 16:48:12 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00,
	FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-oa0-f46.google.com
Received: from mail-oa0-f46.google.com (HELO mail-oa0-f46.google.com)
	(209.85.219.46) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted)
	ESMTPS; Mon, 28 Apr 2014 16:48:11 +0000
Received: by mail-oa0-f46.google.com with SMTP id m1so7610023oag.33 for
	<gcc-patches@gcc.gnu.org>; Mon, 28 Apr 2014 09:48:09 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.60.33.229 with SMTP id u5mr1602499oei.73.1398703689485;
	Mon, 28 Apr 2014 09:48:09 -0700 (PDT)
Received: by 10.76.170.39 with HTTP; Mon, 28 Apr 2014 09:48:09 -0700 (PDT)
Date: Mon, 28 Apr 2014 20:48:09 +0400
Message-ID: 
 <CAOvf_xx3-VpgN8YDxJBPvzzNGNykUPoLdU6xThW_QBN7byy5rw@mail.gmail.com>
Subject: [PATCH 1/2, x86] Add palignr support for AVX2.
From: Evgeny Stupachenko <evstupac@gmail.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>, Richard Biener <rguenther@suse.de>,
	Uros Bizjak <ubizjak@gmail.com>
X-IsSubscribed: yes

Hi,

The patch enables use of "palignr with perm" instead of "2 pshufb, or
and perm" at AVX2 for some cases.

Bootstrapped and passes make check on x86.

Is it ok?

2014-04-28  Evgeny Stupachenko  <evstupac@gmail.com>

        * config/i386/i386.c (expand_vec_perm_1): Try AVX2 vpshufb.
        * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
        PALINGR instruction.

    chance to succeed.  */
@@ -43015,14 +43021,20 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
   unsigned i, nelt = d->nelt;
   unsigned min, max;
   bool in_order, ok;
-  rtx shift, target;
+  rtx shift, shift1, target, tmp;
   struct expand_vec_perm_d dcopy;

-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
+     PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
+     PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
+  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&
+      GET_MODE_SIZE (d->vmode) != 32))
+    return false;
+  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
+  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
     return false;

-  min = nelt, max = 0;
+  min = 2 * nelt, max = 0;
   for (i = 0; i < nelt; ++i)
     {
       unsigned e = d->perm[i];
@@ -43041,9 +43053,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)

   dcopy = *d;
   shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
-                                 gen_lowpart (TImode, d->op0), shift));
+  shift1 = GEN_INT ((min - nelt / 2) *
+          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+
+  if (GET_MODE_SIZE (d->vmode) != 32)
+    {
+      target = gen_reg_rtx (TImode);
+      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
+                                     gen_lowpart (TImode, d->op0), shift));
+    }
+  else
+    {
+      target = gen_reg_rtx (V2TImode);
+      tmp = gen_reg_rtx (V4DImode);
+      emit_insn (gen_avx2_permv2ti (tmp,
+                                   gen_lowpart (V4DImode, d->op0),
+                                   gen_lowpart (V4DImode, d->op1),
+                                   GEN_INT (33)));
+      if (min < nelt / 2)
+        emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, tmp),
+                                        gen_lowpart (V2TImode, d->op0),
+                                        shift));
+      else
+       emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, d->op1),
+                                        gen_lowpart (V2TImode, tmp),
+                                        shift1));
+    }

   dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
   dcopy.one_operand_p = true;


Evgeny

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 88142a8..ae80477 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   return true;
 }

+static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
    in a single instruction.  */

@@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_pshufb (d))
     return true;

+  /* Try the AVX2 vpshufb.  */
+  if (expand_vec_perm_vpshufb2_vpermq (d))
+    return true;
+
   /* Try the AVX512F vpermi2 instructions.  */
   rtx vec[64];
   enum machine_mode mode = d->vmode;
@@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
expand_vec_perm_d *d)
 }

 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
-   the permutation using the SSSE3 palignr instruction.  This succeeds
+   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
    when all of the elements in PERM fit within one vector and we merely
    need to shift them down so that a single vector permutation has a