Message ID | 20141204094959.GA67582@msticlxl7.ims.intel.com |
---|---|
State | New |
Headers | show |
On Thu, Dec 4, 2014 at 1:49 AM, Ilya Tocar <tocarip.intel@gmail.com> wrote: > Hi, > > As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html > This patch enables v64qi permutations. > I've checked vshuf* tests from dg-torture.exp, > with avx512* options on sde and generated permutations are correct. > > OK for trunk? > Can you add a few testcases?
On Thu, Dec 04, 2014 at 03:54:25AM -0800, H.J. Lu wrote: > On Thu, Dec 4, 2014 at 1:49 AM, Ilya Tocar <tocarip.intel@gmail.com> wrote: > > Hi, > > > > As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html > > This patch enables v64qi permutations. > > I've checked vshuf* tests from dg-torture.exp, > > with avx512* options on sde and generated permutations are correct. > > > > OK for trunk? > > > > Can you add a few testcases? Isn't it already covered by gcc.dg/torture/vshuf* ? Jakub
On Thu, Dec 4, 2014 at 3:57 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Dec 04, 2014 at 03:54:25AM -0800, H.J. Lu wrote: >> On Thu, Dec 4, 2014 at 1:49 AM, Ilya Tocar <tocarip.intel@gmail.com> wrote: >> > Hi, >> > >> > As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html >> > This patch enables v64qi permutations. >> > I've checked vshuf* tests from dg-torture.exp, >> > with avx512* options on sde and generated permutations are correct. >> > >> > OK for trunk? >> > >> >> Can you add a few testcases? > > Isn't it already covered by gcc.dg/torture/vshuf* ? > I didn't see them fail on my machines today.
On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote: > >> Can you add a few testcases? > > > > Isn't it already covered by gcc.dg/torture/vshuf* ? > > > > I didn't see them fail on my machines today. Those are executable testcases, those better should not fail. The patch just improved code generation and the testcases test if the improved code generation works well. Did you mean some scan-assembler test that verifies the better code generation? Guess it is possible, though fragile. Jakub
On Thu, Dec 4, 2014 at 1:04 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote: >> >> Can you add a few testcases? >> > >> > Isn't it already covered by gcc.dg/torture/vshuf* ? >> > >> >> I didn't see them fail on my machines today. > > Those are executable testcases, those better should not fail. > The patch just improved code generation and the testcases test > if the improved code generation works well. > Did you mean some scan-assembler test that verifies the better code > generation? Guess it is possible, though fragile. I think that existing executable testcases adequately cover the functionality of the patch. The patch is OK. Thanks, Uros.
On Thu, Dec 4, 2014 at 4:04 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote: >> >> Can you add a few testcases? >> > >> > Isn't it already covered by gcc.dg/torture/vshuf* ? >> > >> >> I didn't see them fail on my machines today. > > Those are executable testcases, those better should not fail. > The patch just improved code generation and the testcases test > if the improved code generation works well. > Did you mean some scan-assembler test that verifies the better code > generation? Guess it is possible, though fragile. Well, we will never be sure that the better code is really generated unless we visually exam the assembly code. Any changes in the future may disable the better code generation and we won't notice until much later.
On 12/04/2014 01:49 AM, Ilya Tocar wrote:
> + if (!TARGET_AVX512BW || !(d->vmode == V64QImode))
Please don't over-complicate the expression.
Use x != y instead of !(x == y).
r~
On 12/10/2014 11:49 AM, Richard Henderson wrote: > On 12/04/2014 01:49 AM, Ilya Tocar wrote: >> + if (!TARGET_AVX512BW || !(d->vmode == V64QImode)) > > Please don't over-complicate the expression. > Use x != y instead of !(x == y). To me the original reads more clearly, since it is of the parallel form !X or !Y, I don't see it as somehow more complicated??? > > > r~ >
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eafc15a..f29f8ce 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21831,6 +21831,10 @@ ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1, if (TARGET_AVX512VL && TARGET_AVX512BW) gen = gen_avx512vl_vpermi2varv16hi3; break; + case V64QImode: + if (TARGET_AVX512VBMI) + gen = gen_avx512bw_vpermi2varv64qi3; + break; case V32HImode: if (TARGET_AVX512BW) gen = gen_avx512bw_vpermi2varv32hi3; @@ -48872,6 +48876,7 @@ expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d) emit_move_insn (d->target, gen_lowpart (d->vmode, dest)); return true; + case V64QImode: case V32QImode: case V16HImode: case V8SImode: @@ -48905,6 +48910,78 @@ expand_vec_perm_broadcast (struct expand_vec_perm_d *d) return expand_vec_perm_broadcast_1 (d); } +/* Implement arbitrary permutations of two V64QImode operands + will 2 vpermi2w, 2 vpshufb and one vpor instruction. */ +static bool +expand_vec_perm_vpermi2_vpshub2 (struct expand_vec_perm_d *d) +{ + if (!TARGET_AVX512BW || !(d->vmode == V64QImode)) + return false; + + if (d->testing_p) + return true; + + struct expand_vec_perm_d ds[2]; + rtx rperm[128], vperm, target0, target1; + unsigned int i, nelt; + machine_mode vmode; + + nelt = d->nelt; + vmode = V64QImode; + + for (i = 0; i < 2; i++) + { + ds[i] = *d; + ds[i].vmode = V32HImode; + ds[i].nelt = 32; + ds[i].target = gen_reg_rtx (V32HImode); + ds[i].op0 = gen_lowpart (V32HImode, d->op0); + ds[i].op1 = gen_lowpart (V32HImode, d->op1); + } + + /* Prepare permutations such that the first one takes care of + putting the even bytes into the right positions or one higher + positions (ds[0]) and the second one takes care of + putting the odd bytes into the right positions or one below + (ds[1]). */ + + for (i = 0; i < nelt; i++) + { + ds[i & 1].perm[i / 2] = d->perm[i] / 2; + if (i & 1) + { + rperm[i] = constm1_rtx; + rperm[i + 64] = GEN_INT ((i & 14) + (d->perm[i] & 1)); + } + else + { + rperm[i] = GEN_INT ((i & 14) + (d->perm[i] & 1)); + rperm[i + 64] = constm1_rtx; + } + } + + bool ok = expand_vec_perm_1 (&ds[0]); + gcc_assert (ok); + ds[0].target = gen_lowpart (V64QImode, ds[0].target); + + ok = expand_vec_perm_1 (&ds[1]); + gcc_assert (ok); + ds[1].target = gen_lowpart (V64QImode, ds[1].target); + + vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm)); + vperm = force_reg (vmode, vperm); + target0 = gen_reg_rtx (V64QImode); + emit_insn (gen_avx512bw_pshufbv64qi3 (target0, ds[0].target, vperm)); + + vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm + 64)); + vperm = force_reg (vmode, vperm); + target1 = gen_reg_rtx (V64QImode); + emit_insn (gen_avx512bw_pshufbv64qi3 (target1, ds[1].target, vperm)); + + emit_insn (gen_iorv64qi3 (d->target, target0, target1)); + return true; +} + /* Implement arbitrary permutation of two V32QImode and V16QImode operands with 4 vpshufb insns, 2 vpermq and 3 vpor. We should have already failed all the shorter instruction sequences. */ @@ -49079,6 +49156,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) if (expand_vec_perm_vpshufb2_vpermq_even_odd (d)) return true; + if (expand_vec_perm_vpermi2_vpshub2 (d)) + return true; + /* ??? Look for narrow permutations whose element orderings would allow the promotion to a wider mode. */ @@ -49223,6 +49303,11 @@ ix86_vectorize_vec_perm_const_ok (machine_mode vmode, /* All implementable with a single vpermi2 insn. */ return true; break; + case V64QImode: + if (TARGET_AVX512BW) + /* Implementable with 2 vpermi2, 2 vpshufb and 1 or insn. */ + return true; + break; case V8SImode: case V8SFmode: case V4DFmode: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index ca5d720..6252e7e 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -10678,7 +10678,7 @@ (V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") - (V32HI "TARGET_AVX512BW")]) + (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")]) (define_expand "vec_perm<mode>" [(match_operand:VEC_PERM_AVX2 0 "register_operand") @@ -10700,7 +10700,7 @@ (V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F") - (V32HI "TARGET_AVX512BW")]) + (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")]) (define_expand "vec_perm_const<mode>" [(match_operand:VEC_PERM_CONST 0 "register_operand")