From patchwork Wed Nov 12 11:15:19 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Evgeny Stupachenko X-Patchwork-Id: 409944 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 59375140079 for ; Wed, 12 Nov 2014 22:15:31 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=xcBSPB+KheMrZEXRXC XxPqAno2T7Ccz/ginS98JPWFPMjA68Wh4vfup/XFU2Q9AVCwNJOFMF6VXJEnLIcM ROunfQg/qTxvY2mSDWTfDDHwwSs3DJiuB4UuOkIZRaedZDJZbjimJI8szbcFIXZw WqzXNQYcUAJdIq57cnhqZ+Tgc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=9GSDHPYUCMOTYFqT/JZkJJq6 U6E=; b=WOxF+VJCuqpoI9nwt5pY4iAIqMTit5llbbpcBnHdRBuY8Ecta6PGwRHm HpCsAXJ1fQQgwNfDl2SWcDTrkla+UVl362jUviS2y1lBeMjvX+i/CD0MrYU7otLE Do7uE68ynrliaryuewpW5st113tsmNHA3rzcowpmrkVKg1ryd8o= Received: (qmail 22609 invoked by alias); 12 Nov 2014 11:15:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 22599 invoked by uid 89); 12 Nov 2014 11:15:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ie0-f171.google.com Received: from mail-ie0-f171.google.com (HELO mail-ie0-f171.google.com) (209.85.223.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 12 Nov 2014 11:15:22 +0000 Received: by mail-ie0-f171.google.com with SMTP id x19so13253890ier.2 for ; Wed, 12 Nov 2014 03:15:19 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.107.162.130 with SMTP id l124mr23411403ioe.54.1415790919683; Wed, 12 Nov 2014 03:15:19 -0800 (PST) Received: by 10.107.11.220 with HTTP; Wed, 12 Nov 2014 03:15:19 -0800 (PST) In-Reply-To: References: Date: Wed, 12 Nov 2014 14:15:19 +0300 Message-ID: Subject: Re: [PATCH] Extend shift permutations on power of 2 cases From: Evgeny Stupachenko To: Richard Biener Cc: GCC Patches , Richard Henderson , Uros Bizjak X-IsSubscribed: yes Committed r217359. However, it appeared that AVX2 uses vperm2i128 for the shift here (instead of palignr for SSSE3/AVX). To handle AVX2 case we need to modify test case: On Tue, Nov 11, 2014 at 5:28 PM, Richard Biener wrote: > On Tue, Nov 11, 2014 at 3:21 PM, Evgeny Stupachenko wrote: >> Hi, >> >> The patch extends shift permutations technique on power of 2 cases >> (previously even/odd transformations was used unconditionally). >> Basically the patch just add loop for load group of length 2, like it >> is done in "vect_permute_load_chain" function. >> >> For Silvermont it reduces insn sequence for load group of length 4 >> from 31 to 20 insns. >> Performance for the test in the patch improved by ~20%. >> >> Bootstrap passed. >> Make check in progress. >> >> Is it ok? > > Ok. > > Thanks, > Richard. > >> 2014-11-11 Evgeny Stupachenko >> >> gcc/testsuite >> * gcc.target/i386/pr52252-atom-1.c: New. >> >> gcc/ >> * tree-vect-data-refs.c (vect_shift_permute_load_chain): Extend shift >> permutations on power of 2 cases. >> >> diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c >> b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c >> new file mode 100644 >> index 0000000..1fbd258 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c >> @@ -0,0 +1,22 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target ssse3 } */ >> +/* { dg-options "-O2 -ftree-vectorize -mssse3 -mtune=slm" } */ >> +#define byte unsigned char >> + >> +void >> +pair_mul_sum(byte *in, byte *out, int size) >> +{ >> + int j; >> + for(j = 0; j < size; j++) >> + { >> + byte a = in[0]; >> + byte b = in[1]; >> + byte c = in[2]; >> + byte d = in[3]; >> + out[0] = (byte)(a * b) + (byte)(b * c) + (byte)(c * d) + (byte)(d * a); >> + in += 4; >> + out += 1; >> + } >> +} >> + >> +/* { dg-final { scan-assembler "palignr" } } */ >> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c >> index 0bc0356..d2e0e93 100644 >> --- a/gcc/tree-vect-data-refs.c >> +++ b/gcc/tree-vect-data-refs.c >> @@ -5379,8 +5379,9 @@ vect_shift_permute_load_chain (vec dr_chain, >> memcpy (result_chain->address (), dr_chain.address (), >> length * sizeof (tree)); >> >> - if (length == 2 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4) >> + if (exact_log2 (length) != -1 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4) >> { >> + unsigned int j, log_length = exact_log2 (length); >> for (i = 0; i < nelt / 2; ++i) >> sel[i] = i * 2; >> for (i = 0; i < nelt / 2; ++i) >> @@ -5441,37 +5442,44 @@ vect_shift_permute_load_chain (vec dr_chain, >> select_mask = vect_gen_perm_mask (vectype, sel); >> gcc_assert (select_mask != NULL); >> >> - first_vect = dr_chain[0]; >> - second_vect = dr_chain[1]; >> - >> - data_ref = make_temp_ssa_name (vectype, NULL, "vect_shuffle2"); >> - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, >> - first_vect, first_vect, >> - perm2_mask1); >> - vect_finish_stmt_generation (stmt, perm_stmt, gsi); >> - vect[0] = data_ref; >> + for (i = 0; i < log_length; i++) >> + { >> + for (j = 0; j < length; j += 2) >> + { >> + first_vect = dr_chain[j]; >> + second_vect = dr_chain[j + 1]; >> >> - data_ref = make_temp_ssa_name (vectype, NULL, "vect_shuffle2"); >> - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, >> - second_vect, second_vect, >> - perm2_mask2); >> - vect_finish_stmt_generation (stmt, perm_stmt, gsi); >> - vect[1] = data_ref; >> + data_ref = make_temp_ssa_name (vectype, NULL, "vect_shuffle2"); >> + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, >> + first_vect, first_vect, >> + perm2_mask1); >> + vect_finish_stmt_generation (stmt, perm_stmt, gsi); >> + vect[0] = data_ref; >> >> - data_ref = make_temp_ssa_name (vectype, NULL, "vect_shift"); >> - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, >> - vect[0], vect[1], >> - shift1_mask); >> - vect_finish_stmt_generation (stmt, perm_stmt, gsi); >> - (*result_chain)[1] = data_ref; >> + data_ref = make_temp_ssa_name (vectype, NULL, "vect_shuffle2"); >> + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, >> + second_vect, >> second_vect, >> + perm2_mask2); >> + vect_finish_stmt_generation (stmt, perm_stmt, gsi); >> + vect[1] = data_ref; >> >> - data_ref = make_temp_ssa_name (vectype, NULL, "vect_select"); >> - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, >> - vect[0], vect[1], >> - select_mask); >> - vect_finish_stmt_generation (stmt, perm_stmt, gsi); >> - (*result_chain)[0] = data_ref; >> + data_ref = make_temp_ssa_name (vectype, NULL, "vect_shift"); >> + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, >> + vect[0], vect[1], >> + shift1_mask); >> + vect_finish_stmt_generation (stmt, perm_stmt, gsi); >> + (*result_chain)[j/2 + length/2] = data_ref; >> >> + data_ref = make_temp_ssa_name (vectype, NULL, "vect_select"); >> + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, >> + vect[0], vect[1], >> + select_mask); >> + vect_finish_stmt_generation (stmt, perm_stmt, gsi); >> + (*result_chain)[j/2] = data_ref; >> + } >> + memcpy (dr_chain.address (), result_chain->address (), >> + length * sizeof (tree)); >> + } >> return true; >> } >> if (length == 3 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 2) diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c index 1fbd258..020e983 100644 --- a/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c +++ b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c @@ -19,4 +19,4 @@ pair_mul_sum(byte *in, byte *out, int size) } } -/* { dg-final { scan-assembler "palignr" } } */ +/* { dg-final { scan-assembler "perm2i128|palignr" } } */