From patchwork Thu Oct 16 10:23:16 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Tocar X-Patchwork-Id: 400251 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 0EEFA1400D2 for ; Thu, 16 Oct 2014 21:23:47 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=KfdGR5jzkcQxJXg1s OC2KUeLJ15cpGhlitBYNd3v+9jNH4fz6cBb5/CyI7mRc0/ry07lcxmvBICNT/l4+ jrnUzevk23ExU7NElIM8pbt8E+4CyeAqGrHw1qQyTT+E+w33BOmUj3tthhR7Kut8 M3kZa5TFLxObcWa1obboO6ZJgg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=eYtppq85hGj6KIHkjnGAKy9 peA4=; b=vaOxj0VCGTgJK8uRHtqEf3ceMWtbXiysiPIl/cxL2a8oPqOR6UCHmNQ 9JrwJzilCXbSvgH8BVAeNxW2c36zen27ocT2YPDuwCR/cqPnf5cT8vuz8ji7/JHL QNFQQ+L4eeiEEL+8SqaamuixxFFZomgKA+aew4pfKfNm2f54QPNs= Received: (qmail 30891 invoked by alias); 16 Oct 2014 10:23:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 30880 invoked by uid 89); 16 Oct 2014 10:23:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wg0-f41.google.com Received: from mail-wg0-f41.google.com (HELO mail-wg0-f41.google.com) (74.125.82.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 16 Oct 2014 10:23:37 +0000 Received: by mail-wg0-f41.google.com with SMTP id b13so3408137wgh.24 for ; Thu, 16 Oct 2014 03:23:34 -0700 (PDT) X-Received: by 10.180.105.74 with SMTP id gk10mr20912230wib.0.1413455014223; Thu, 16 Oct 2014 03:23:34 -0700 (PDT) Received: from msticlxl7.ims.intel.com (jfdmzpr04-ext.jf.intel.com. [134.134.137.73]) by mx.google.com with ESMTPSA id pe8sm1470162wic.3.2014.10.16.03.23.28 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Oct 2014 03:23:32 -0700 (PDT) Date: Thu, 16 Oct 2014 14:23:16 +0400 From: Ilya Tocar To: Uros Bizjak Cc: Jakub Jelinek , Kirill Yukhin , Richard Henderson , GCC Patches Subject: Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi). Message-ID: <20141016102316.GA30455@msticlxl7.ims.intel.com> References: <20141006125527.GC13369@msticlxl57.ims.intel.com> <20141006141035.GZ1986@tucnak.redhat.com> <20141009121523.GB81768@msticlxl7.ims.intel.com> <20141009185105.GM1986@tucnak.redhat.com> <20141010154719.GA121201@msticlxl7.ims.intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes On 10 Oct 18:37, Uros Bizjak wrote: > On Fri, Oct 10, 2014 at 5:47 PM, Ilya Tocar wrote: > > > Please recode that horrible first switch statement to: > > --cut here-- > rtx (*gen) (rtx, rtx, rtx, rtx) = NULL; > > switch (mode) > { > case V8HImode: > if (TARGET_AVX512VL && TARGET_AVX152BW) > gen = gen_avx512vl_vpermi2varv8hi3; > break; > > ... > > case V2DFmode: > if (TARGET_AVX512VL) > { > gen = gen_avx512vl_vpermi2varv2df3; > maskmode = V2DImode; > > The patch is OK with the above improvement. > > Thanks, > Uros. > Will commit version below, if no objections in 24 hours. --- gcc/config/i386/i386.c | 292 ++++++++++++++++++++++++++++++++++++++----------- gcc/config/i386/sse.md | 45 ++++---- 2 files changed, 255 insertions(+), 82 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index aedac19..e1228e3 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21411,35 +21411,132 @@ ix86_expand_int_vcond (rtx operands[]) return true; } +/* AVX512F does support 64-byte integer vector operations, + thus the longest vector we are faced with is V64QImode. */ +#define MAX_VECT_LEN 64 + +struct expand_vec_perm_d +{ + rtx target, op0, op1; + unsigned char perm[MAX_VECT_LEN]; + enum machine_mode vmode; + unsigned char nelt; + bool one_operand_p; + bool testing_p; +}; + static bool -ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1) +ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1, + struct expand_vec_perm_d *d) { - enum machine_mode mode = GET_MODE (op0); + /* ix86_expand_vec_perm_vpermi2 is called from both const and non-const + expander, so args are either in d, or in op0, op1 etc. */ + enum machine_mode mode = GET_MODE (d ? d->op0 : op0); + enum machine_mode maskmode = mode; + rtx (*gen) (rtx, rtx, rtx, rtx) = NULL; + switch (mode) { + case V8HImode: + if (TARGET_AVX512VL && TARGET_AVX512BW) + gen = gen_avx512vl_vpermi2varv8hi3; + break; + case V16HImode: + if (TARGET_AVX512VL && TARGET_AVX512BW) + gen = gen_avx512vl_vpermi2varv16hi3; + break; + case V32HImode: + if (TARGET_AVX512BW) + gen = gen_avx512bw_vpermi2varv32hi3; + break; + case V4SImode: + if (TARGET_AVX512VL) + gen = gen_avx512vl_vpermi2varv4si3; + break; + case V8SImode: + if (TARGET_AVX512VL) + gen = gen_avx512vl_vpermi2varv8si3; + break; case V16SImode: - emit_insn (gen_avx512f_vpermi2varv16si3 (target, op0, - force_reg (V16SImode, mask), - op1)); - return true; + if (TARGET_AVX512F) + gen = gen_avx512f_vpermi2varv16si3; + break; + case V4SFmode: + if (TARGET_AVX512VL) + { + gen = gen_avx512vl_vpermi2varv4sf3; + maskmode = V4SImode; + } + break; + case V8SFmode: + if (TARGET_AVX512VL) + { + gen = gen_avx512vl_vpermi2varv8sf3; + maskmode = V8SImode; + } + break; case V16SFmode: - emit_insn (gen_avx512f_vpermi2varv16sf3 (target, op0, - force_reg (V16SImode, mask), - op1)); - return true; + if (TARGET_AVX512F) + { + gen = gen_avx512f_vpermi2varv16sf3; + maskmode = V16SImode; + } + break; + case V2DImode: + if (TARGET_AVX512VL) + gen = gen_avx512vl_vpermi2varv2di3; + break; + case V4DImode: + if (TARGET_AVX512VL) + gen = gen_avx512vl_vpermi2varv4di3; + break; case V8DImode: - emit_insn (gen_avx512f_vpermi2varv8di3 (target, op0, - force_reg (V8DImode, mask), - op1)); - return true; + if (TARGET_AVX512F) + gen = gen_avx512f_vpermi2varv8di3; + break; + case V2DFmode: + if (TARGET_AVX512VL) + { + gen = gen_avx512vl_vpermi2varv2df3; + maskmode = V2DImode; + } + break; + case V4DFmode: + if (TARGET_AVX512VL) + { + gen = gen_avx512vl_vpermi2varv4df3; + maskmode = V4DImode; + } + break; case V8DFmode: - emit_insn (gen_avx512f_vpermi2varv8df3 (target, op0, - force_reg (V8DImode, mask), - op1)); - return true; + if (TARGET_AVX512F) + { + gen = gen_avx512f_vpermi2varv8df3; + maskmode = V8DImode; + } + break; default: - return false; + break; } + + if (gen == NULL) + return false; + + /* ix86_expand_vec_perm_vpermi2 is called from both const and non-const + expander, so args are either in d, or in op0, op1 etc. */ + if (d) + { + rtx vec[64]; + target = d->target; + op0 = d->op0; + op1 = d->op1; + for (int i = 0; i < d->nelt; ++i) + vec[i] = GEN_INT (d->perm[i]); + mask = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (d->nelt, vec)); + } + + emit_insn (gen (target, op0, force_reg (maskmode, mask), op1)); + return true; } /* Expand a variable vector permutation. */ @@ -21462,8 +21559,7 @@ ix86_expand_vec_perm (rtx operands[]) e = GET_MODE_UNIT_SIZE (mode); gcc_assert (w <= 64); - if (TARGET_AVX512F - && ix86_expand_vec_perm_vpermi2 (target, op0, mask, op1)) + if (ix86_expand_vec_perm_vpermi2 (target, op0, mask, op1, NULL)) return; if (TARGET_AVX2) @@ -21835,6 +21931,15 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool unsigned_p, bool high_p) switch (imode) { + case V64QImode: + if (unsigned_p) + unpack = gen_avx512bw_zero_extendv32qiv32hi2; + else + unpack = gen_avx512bw_sign_extendv32qiv32hi2; + halfmode = V32QImode; + extract + = high_p ? gen_vec_extract_hi_v64qi : gen_vec_extract_lo_v64qi; + break; case V32QImode: if (unsigned_p) unpack = gen_avx2_zero_extendv16qiv16hi2; @@ -39683,20 +39788,6 @@ x86_emit_floatuns (rtx operands[2]) emit_label (donelab); } -/* AVX512F does support 64-byte integer vector operations, - thus the longest vector we are faced with is V64QImode. */ -#define MAX_VECT_LEN 64 - -struct expand_vec_perm_d -{ - rtx target, op0, op1; - unsigned char perm[MAX_VECT_LEN]; - enum machine_mode vmode; - unsigned char nelt; - bool one_operand_p; - bool testing_p; -}; - static bool canonicalize_perm (struct expand_vec_perm_d *d); static bool expand_vec_perm_1 (struct expand_vec_perm_d *d); static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d); @@ -42745,7 +42836,10 @@ expand_vec_perm_blend (struct expand_vec_perm_d *d) if (d->one_operand_p) return false; - if (TARGET_AVX2 && GET_MODE_SIZE (vmode) == 32) + if (TARGET_AVX512F && GET_MODE_SIZE (vmode) == 64 + && GET_MODE_SIZE (GET_MODE_INNER (vmode)) >= 4) + ; + else if (TARGET_AVX2 && GET_MODE_SIZE (vmode) == 32) ; else if (TARGET_AVX && (vmode == V4DFmode || vmode == V8SFmode)) ; @@ -42776,12 +42870,18 @@ expand_vec_perm_blend (struct expand_vec_perm_d *d) switch (vmode) { + case V8DFmode: + case V16SFmode: case V4DFmode: case V8SFmode: case V2DFmode: case V4SFmode: case V8HImode: case V8SImode: + case V32HImode: + case V64QImode: + case V16SImode: + case V8DImode: for (i = 0; i < nelt; ++i) mask |= (d->perm[i] >= nelt) << i; break; @@ -43004,9 +43104,9 @@ static bool expand_vec_perm_pshufb (struct expand_vec_perm_d *d) { unsigned i, nelt, eltsz, mask; - unsigned char perm[32]; + unsigned char perm[64]; enum machine_mode vmode = V16QImode; - rtx rperm[32], vperm, target, op0, op1; + rtx rperm[64], vperm, target, op0, op1; nelt = d->nelt; @@ -43095,6 +43195,19 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d) return false; } } + else if (GET_MODE_SIZE (d->vmode) == 64) + { + if (!TARGET_AVX512BW) + return false; + if (vmode == V64QImode) + { + /* vpshufb only works intra lanes, it is not + possible to shuffle bytes in between the lanes. */ + for (i = 0; i < nelt; ++i) + if ((d->perm[i] ^ i) & (nelt / 4)) + return false; + } + } else return false; } @@ -43112,6 +43225,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d) mask = 2 * nelt - 1; else if (vmode == V16QImode) mask = nelt - 1; + else if (vmode == V64QImode) + mask = nelt / 4 - 1; else mask = nelt / 2 - 1; @@ -43137,6 +43252,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d) emit_insn (gen_ssse3_pshufbv16qi3 (target, op0, vperm)); else if (vmode == V32QImode) emit_insn (gen_avx2_pshufbv32qi3 (target, op0, vperm)); + else if (vmode == V64QImode) + emit_insn (gen_avx512bw_pshufbv64qi3 (target, op0, vperm)); else if (vmode == V8SFmode) emit_insn (gen_avx2_permvarv8sf (target, op0, vperm)); else @@ -43192,12 +43309,24 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) rtx (*gen) (rtx, rtx) = NULL; switch (d->vmode) { + case V64QImode: + if (TARGET_AVX512BW) + gen = gen_avx512bw_vec_dupv64qi; + break; case V32QImode: gen = gen_avx2_pbroadcastv32qi_1; break; + case V32HImode: + if (TARGET_AVX512BW) + gen = gen_avx512bw_vec_dupv32hi; + break; case V16HImode: gen = gen_avx2_pbroadcastv16hi_1; break; + case V16SImode: + if (TARGET_AVX512F) + gen = gen_avx512f_vec_dupv16si; + break; case V8SImode: gen = gen_avx2_pbroadcastv8si_1; break; @@ -43207,9 +43336,21 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) case V8HImode: gen = gen_avx2_pbroadcastv8hi; break; + case V16SFmode: + if (TARGET_AVX512F) + gen = gen_avx512f_vec_dupv16sf; + break; case V8SFmode: gen = gen_avx2_vec_dupv8sf_1; break; + case V8DFmode: + if (TARGET_AVX512F) + gen = gen_avx512f_vec_dupv8df; + break; + case V8DImode: + if (TARGET_AVX512F) + gen = gen_avx512f_vec_dupv8di; + break; /* For other modes prefer other shuffles this function creates. */ default: break; } @@ -43294,23 +43435,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) /* Try the AVX2 vpalignr instruction. */ if (expand_vec_perm_palignr (d, true)) - return true; /* Try the AVX512F vpermi2 instructions. */ - if (TARGET_AVX512F) - { - rtx vec[64]; - enum machine_mode mode = d->vmode; - if (mode == V8DFmode) - mode = V8DImode; - else if (mode == V16SFmode) - mode = V16SImode; - for (i = 0; i < nelt; ++i) - vec[i] = GEN_INT (d->perm[i]); - rtx mask = gen_rtx_CONST_VECTOR (mode, gen_rtvec_v (nelt, vec)); - if (ix86_expand_vec_perm_vpermi2 (d->target, d->op0, mask, d->op1)) - return true; - } + if (ix86_expand_vec_perm_vpermi2 (NULL_RTX, NULL_RTX, NULL_RTX, NULL_RTX, d)) + return true; return false; } @@ -45097,21 +45225,56 @@ ix86_vectorize_vec_perm_const_ok (enum machine_mode vmode, /* Given sufficient ISA support we can just return true here for selected vector modes. */ - if (d.vmode == V16SImode || d.vmode == V16SFmode - || d.vmode == V8DFmode || d.vmode == V8DImode) - /* All implementable with a single vpermi2 insn. */ - return true; - if (GET_MODE_SIZE (d.vmode) == 16) + switch (d.vmode) { + case V16SFmode: + case V16SImode: + case V8DImode: + case V8DFmode: + if (TARGET_AVX512F) + /* All implementable with a single vpermi2 insn. */ + return true; + break; + case V32HImode: + if (TARGET_AVX512BW) + /* All implementable with a single vpermi2 insn. */ + return true; + break; + case V8SImode: + case V8SFmode: + case V4DFmode: + case V4DImode: + if (TARGET_AVX512VL) + /* All implementable with a single vpermi2 insn. */ + return true; + break; + case V16HImode: + if (TARGET_AVX2) + /* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns. */ + return true; + break; + case V32QImode: + if (TARGET_AVX2) + /* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns. */ + return true; + break; + case V4SImode: + case V4SFmode: + case V8HImode: + case V16QImode: /* All implementable with a single vpperm insn. */ if (TARGET_XOP) return true; /* All implementable with 2 pshufb + 1 ior. */ if (TARGET_SSSE3) return true; + break; + case V2DImode: + case V2DFmode: /* All implementable with shufpd or unpck[lh]pd. */ - if (d.nelt == 2) - return true; + return true; + default: + return false; } /* Extract the values from the vector CST into the permutation @@ -45231,6 +45394,11 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2) gen_il = gen_avx2_interleave_lowv32qi; gen_ih = gen_avx2_interleave_highv32qi; break; + case V64QImode: + himode = V32HImode; + gen_il = gen_avx512bw_interleave_lowv64qi; + gen_ih = gen_avx512bw_interleave_highv64qi; + break; default: gcc_unreachable (); } @@ -45291,7 +45459,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2) { /* For SSE2, we used an full interleave, so the desired results are in the even elements. */ - for (i = 0; i < 32; ++i) + for (i = 0; i < 64; ++i) d.perm[i] = i * 2; } else @@ -45299,7 +45467,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2) /* For AVX, the interleave used above was not cross-lane. So the extraction is evens but with the second and third quarter swapped. Happily, that is even one insn shorter than even extraction. */ - for (i = 0; i < 32; ++i) + for (i = 0; i < 64; ++i) d.perm[i] = i * 2 + ((i & 24) == 8 ? 16 : (i & 24) == 16 ? -16 : 0); } diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a6cf363..d78194f 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -301,6 +301,9 @@ (define_mode_iterator VI1_AVX2 [(V32QI "TARGET_AVX2") V16QI]) +(define_mode_iterator VI1_AVX512 + [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI]) + (define_mode_iterator VI2_AVX2 [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI]) @@ -9237,9 +9240,9 @@ (set_attr "mode" "TI")]) (define_expand "mul3" - [(set (match_operand:VI1_AVX2 0 "register_operand") - (mult:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand") - (match_operand:VI1_AVX2 2 "register_operand")))] + [(set (match_operand:VI1_AVX512 0 "register_operand") + (mult:VI1_AVX512 (match_operand:VI1_AVX512 1 "register_operand") + (match_operand:VI1_AVX512 2 "register_operand")))] "TARGET_SSE2 && && " { ix86_expand_vecop_qihi (MULT, operands[0], operands[1], operands[2]); @@ -10643,7 +10646,8 @@ (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2") (V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F") - (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")]) + (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") + (V32HI "TARGET_AVX512BW")]) (define_expand "vec_perm" [(match_operand:VEC_PERM_AVX2 0 "register_operand") @@ -10664,7 +10668,8 @@ (V8SI "TARGET_AVX") (V4DI "TARGET_AVX") (V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") - (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")]) + (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F") + (V32HI "TARGET_AVX512BW")]) (define_expand "vec_perm_const" [(match_operand:VEC_PERM_CONST 0 "register_operand") @@ -11028,8 +11033,8 @@ }) (define_insn "_packsswb" - [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,v") - (vec_concat:VI1_AVX2 + [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x") + (vec_concat:VI1_AVX512 (ss_truncate: (match_operand: 1 "register_operand" "0,v")) (ss_truncate: @@ -11062,8 +11067,8 @@ (set_attr "mode" "")]) (define_insn "_packuswb" - [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,v") - (vec_concat:VI1_AVX2 + [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x") + (vec_concat:VI1_AVX512 (us_truncate: (match_operand: 1 "register_operand" "0,v")) (us_truncate: @@ -13641,21 +13646,21 @@ (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) (set_attr "mode" "DI")]) -(define_insn "_pshufb3" - [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x") - (unspec:VI1_AVX2 - [(match_operand:VI1_AVX2 1 "register_operand" "0,x") - (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm")] +(define_insn "_pshufb3" + [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,v") + (unspec:VI1_AVX512 + [(match_operand:VI1_AVX512 1 "register_operand" "0,v") + (match_operand:VI1_AVX512 2 "nonimmediate_operand" "xm,vm")] UNSPEC_PSHUFB))] - "TARGET_SSSE3" + "TARGET_SSSE3 && && " "@ pshufb\t{%2, %0|%0, %2} - vpshufb\t{%2, %1, %0|%0, %1, %2}" + vpshufb\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sselog1") (set_attr "prefix_data16" "1,*") (set_attr "prefix_extra" "1") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "orig,maybe_evex") (set_attr "btver2_decode" "vector,vector") (set_attr "mode" "")]) @@ -16038,9 +16043,9 @@ (set_attr "mode" "TI")]) (define_expand "3" - [(set (match_operand:VI1_AVX2 0 "register_operand") - (any_shift:VI1_AVX2 - (match_operand:VI1_AVX2 1 "register_operand") + [(set (match_operand:VI1_AVX512 0 "register_operand") + (any_shift:VI1_AVX512 + (match_operand:VI1_AVX512 1 "register_operand") (match_operand:SI 2 "nonmemory_operand")))] "TARGET_SSE2" {