From patchwork Mon Jun 9 10:13:06 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Evgeny Stupachenko X-Patchwork-Id: 357378 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 269AA140092 for ; Mon, 9 Jun 2014 20:13:20 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=YPxXd8fEayoUtBOAEr 2qHSWTlKUZpAHzKpTfj/Lrt6rUwNeBV2MSCdPZGB8Q0L2UheNe0buYjpfYsouPsL Jp28AF5M4g5XpZT/zEEn0IUgA2zoAnsVhJyZlkeRa/c47JP3h8Zu30T+vAYgUsr5 jGf336zNSN8vSnV9/rxYRwOWw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=wzsN1/axhmVpxuiTl3gNnAg5 hPs=; b=CZ1oHfJDoeX9+Gxl9L6GTH4r/AOmbkqqS0IElNosT91wO5x0ZHVIVpcj VGRL3qPAQURULVHwFW0HYz2FJDcoiD1DuXp2lF80OfQ38RzWvckFj16EXufz5gCU 5aroKHgcmmDZSq8XWHi35IwdSySTY271HYWH7k45b47t61kmwr0= Received: (qmail 26050 invoked by alias); 9 Jun 2014 10:13:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 26039 invoked by uid 89); 9 Jun 2014 10:13:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-oa0-f54.google.com Received: from mail-oa0-f54.google.com (HELO mail-oa0-f54.google.com) (209.85.219.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 09 Jun 2014 10:13:08 +0000 Received: by mail-oa0-f54.google.com with SMTP id eb12so3052356oac.41 for ; Mon, 09 Jun 2014 03:13:06 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.182.245.164 with SMTP id xp4mr23577279obc.23.1402308786285; Mon, 09 Jun 2014 03:13:06 -0700 (PDT) Received: by 10.76.18.209 with HTTP; Mon, 9 Jun 2014 03:13:06 -0700 (PDT) In-Reply-To: <53909F88.7070009@redhat.com> References: <53909F88.7070009@redhat.com> Date: Mon, 9 Jun 2014 14:13:06 +0400 Message-ID: Subject: Re: [PATCH, x86] Improves x86 permutation expand From: Evgeny Stupachenko To: Richard Henderson Cc: Uros Bizjak , GCC Patches X-IsSubscribed: yes Right now we need to cover permutations coming from 3 loads/stores group. My case covers them. I agree that another order of pblend and pshufb covers additional cases. Good point. We can cover this in a separate patch. Asserts are ok if we exclude AVX2 in ISA checks. Is the following patch ok? if (expand_vec_perm_2vperm2f128_vshuf (d)) On Thu, Jun 5, 2014 at 8:49 PM, Richard Henderson wrote: > On 06/05/2014 08:29 AM, Evgeny Stupachenko wrote: >> + /* Figure out where permutation elements stay not in their >> + respective lanes. */ >> + for (i = 0, which = 0; i < nelt; ++i) >> + { >> + unsigned e = d->perm[i]; >> + if (e != i) >> + which |= (e < nelt ? 1 : 2); >> + } >> + /* We can pblend the part where elements stay not in their >> + respective lanes only when these elements are all in one >> + half of a permutation. >> + {0 1 8 3 4 5 9 7} is ok as 8, 9 are not at their respective >> + lanes, but both 8 and 9 >= 8 >> + {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their >> + respective lanes and 8 >= 8, but 2 not. */ >> + if (which != 1 && which != 2) >> + return false; > > I was about to suggest that you'd get more success by putting the blend first, > and do the shuffle second. But I suppose it does cover a few cases that the > other way would miss, e.g. > > { 0 4 7 3 } > > because we can't blend 0 and 4 (or 3 and 7) into the same vector. Whereas the > direction you're trying can't handle > > { 0 6 6 1 } > > But that can be implemented with > > { 0 1 2 3 } > { 4 5 6 7 } > ----------- > { 0 1 6 3 } (pblend) > ----------- > { 0 6 6 1 } (pshufb) > > So I guess we should cover these two cases in successive patches. > >> + if (!expand_vec_perm_blend (&dcopy1)) >> + return false; >> + >> + return true; > > You should avoid doing any work in this function if the ISA isn't enabled. > Don't wait until the last test for blend to fail. Separate that out from the > start of expand_vec_perm_blend as a subroutine, perhaps. > > We should be able to prove that we've got a valid blend as input here, so I'd > be more inclined to write > > ok = expand_vec_perm_blend (&dcopy1); > gcc_assert (ok); > return true; > >> + if (!expand_vec_perm_1 (&dcopy)) >> + return false; > > If we know we have pblend, then we know we have pshufb, so again I don't see > how expand_vec_perm_1 can fail. Another assert would be good. > > There is a point, earlier in the function, where we know whether we're going to > succeed or not. I believe just after > >> + if (which != 1 && which != 2) >> + return false; > > You should add a > > if (d->testing_p) > return true; > > at that point. > > > r~ diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8827256..1fe2398 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -43185,6 +43185,90 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d) return ok; } +/* A subroutine of ix86_expand_vec_perm_const_1. Try to simplify + the permutation using the SSE4_1 pblendv instruction. Potentially + reduces permutaion from 2 pshufb and or to 1 pshufb and pblendv. */ + +static bool +expand_vec_perm_pblendv (struct expand_vec_perm_d *d) +{ + unsigned i, which, nelt = d->nelt; + struct expand_vec_perm_d dcopy, dcopy1; + enum machine_mode vmode = d->vmode; + bool ok; + + /* Use the same checks as in expand_vec_perm_blend, but skipping + AVX2 as it requires more than 2 instructions for general case. */ + if (d->one_operand_p) + return false; + if (TARGET_AVX && (vmode == V4DFmode || vmode == V8SFmode)) + ; + else if (TARGET_SSE4_1 && GET_MODE_SIZE (vmode) == 16) + ; + else + return false; + + /* Figure out where permutation elements stay not in their + respective lanes. */ + for (i = 0, which = 0; i < nelt; ++i) + { + unsigned e = d->perm[i]; + if (e != i) + which |= (e < nelt ? 1 : 2); + } + /* We can pblend the part where elements stay not in their + respective lanes only when these elements are all in one + half of a permutation. + {0 1 8 3 4 5 9 7} is ok as 8, 9 are at not at their respective + lanes, but both 8 and 9 >= 8 + {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their + respective lanes and 8 >= 8, but 2 not. */ + if (which != 1 && which != 2) + return false; + if (d->testing_p) + return true; + + /* First we apply one operand permutation to the part where + elements stay not in their respective lanes. */ + dcopy = *d; + if (which == 2) + dcopy.op0 = dcopy.op1 = d->op1; + else + dcopy.op0 = dcopy.op1 = d->op0; + dcopy.one_operand_p = true; + + for (i = 0; i < nelt; ++i) + { + unsigned e = d->perm[i]; + if (which == 2) + dcopy.perm[i] = ((e >= nelt) ? (e - nelt) : e); + } + + ok = expand_vec_perm_1 (&dcopy); + gcc_assert (ok); + + /* Next we put permuted elements into thier positions. */ + dcopy1 = *d; + if (which == 2) + dcopy1.op1 = dcopy.target; + else + dcopy1.op0 = dcopy.target; + + for (i = 0; i < nelt; ++i) + { + unsigned e = d->perm[i]; + if (which == 2) + dcopy1.perm[i] = ((e >= nelt) ? (nelt + i) : e); + else + dcopy1.perm[i] = ((e < nelt) ? i : e); + } + + ok = expand_vec_perm_blend (&dcopy1); + gcc_assert (ok); + + return true; +} + static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d); /* A subroutine of ix86_expand_vec_perm_builtin_1. Try to simplify @@ -44557,6 +44641,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) if (expand_vec_perm_vperm2f128 (d)) return true; + if (expand_vec_perm_pblendv (d)) + return true; + /* Try sequences of three instructions. */