Message ID | CAOvf_xwh+0j-9YAgJCxKQSoNNJfq9KYr_iSLw=43JhFxAuOfPg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote: > +(define_insn_and_split "avx2_rotate<mode>_perm" > + [(set (match_operand:V_256 0 "register_operand" "=&x") > + (vec_select:V_256 > + (match_operand:V_256 1 "register_operand" "x") > + (match_parallel 2 "palignr_operand" > + [(match_operand 3 "const_int_operand" "n")])))] > + "TARGET_AVX2" > + "#" > + "&& reload_completed" > + [(const_int 0)] Why are you waiting until after reload to expand this? It's only the vec_select parallel that determines which direction the palignr should be done. This seems like something you could do during permutation expansion. r~
The rotate insn appeared right after expand. I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>". I don't see any potential losses on splitting that after reload. On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote: > On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote: >> +(define_insn_and_split "avx2_rotate<mode>_perm" >> + [(set (match_operand:V_256 0 "register_operand" "=&x") >> + (vec_select:V_256 >> + (match_operand:V_256 1 "register_operand" "x") >> + (match_parallel 2 "palignr_operand" >> + [(match_operand 3 "const_int_operand" "n")])))] >> + "TARGET_AVX2" >> + "#" >> + "&& reload_completed" >> + [(const_int 0)] > > Why are you waiting until after reload to expand this? It's only the > vec_select parallel that determines which direction the palignr should be done. > > This seems like something you could do during permutation expansion. > > > r~ > >
PING On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: > The rotate insn appeared right after expand. > I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>". > I don't see any potential losses on splitting that after reload. > > On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote: >> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote: >>> +(define_insn_and_split "avx2_rotate<mode>_perm" >>> + [(set (match_operand:V_256 0 "register_operand" "=&x") >>> + (vec_select:V_256 >>> + (match_operand:V_256 1 "register_operand" "x") >>> + (match_parallel 2 "palignr_operand" >>> + [(match_operand 3 "const_int_operand" "n")])))] >>> + "TARGET_AVX2" >>> + "#" >>> + "&& reload_completed" >>> + [(const_int 0)] >> >> Why are you waiting until after reload to expand this? It's only the >> vec_select parallel that determines which direction the palignr should be done. >> >> This seems like something you could do during permutation expansion. >> >> >> r~ >> >>
PING 2 On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: > PING > > On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >> The rotate insn appeared right after expand. >> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>". >> I don't see any potential losses on splitting that after reload. >> >> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote: >>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote: >>>> +(define_insn_and_split "avx2_rotate<mode>_perm" >>>> + [(set (match_operand:V_256 0 "register_operand" "=&x") >>>> + (vec_select:V_256 >>>> + (match_operand:V_256 1 "register_operand" "x") >>>> + (match_parallel 2 "palignr_operand" >>>> + [(match_operand 3 "const_int_operand" "n")])))] >>>> + "TARGET_AVX2" >>>> + "#" >>>> + "&& reload_completed" >>>> + [(const_int 0)] >>> >>> Why are you waiting until after reload to expand this? It's only the >>> vec_select parallel that determines which direction the palignr should be done. >>> >>> This seems like something you could do during permutation expansion. >>> >>> >>> r~ >>> >>>
On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: > PING 2 > > On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >> PING >> >> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >>> The rotate insn appeared right after expand. >>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>". >>> I don't see any potential losses on splitting that after reload. >>> >>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote: >>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote: >>>>> +(define_insn_and_split "avx2_rotate<mode>_perm" >>>>> + [(set (match_operand:V_256 0 "register_operand" "=&x") >>>>> + (vec_select:V_256 >>>>> + (match_operand:V_256 1 "register_operand" "x") >>>>> + (match_parallel 2 "palignr_operand" >>>>> + [(match_operand 3 "const_int_operand" "n")])))] >>>>> + "TARGET_AVX2" >>>>> + "#" >>>>> + "&& reload_completed" >>>>> + [(const_int 0)] >>>> >>>> Why are you waiting until after reload to expand this? It's only the >>>> vec_select parallel that determines which direction the palignr should be done. >>>> >>>> This seems like something you could do during permutation expansion. >>>> >>>> Assuming your change is triggered today without any additional changes you should include some testcases. For now, it doesn't show if it does anything useful.
It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128. On Tue, Sep 16, 2014 at 6:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >> PING 2 >> >> On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >>> PING >>> >>> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >>>> The rotate insn appeared right after expand. >>>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>". >>>> I don't see any potential losses on splitting that after reload. >>>> >>>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote: >>>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote: >>>>>> +(define_insn_and_split "avx2_rotate<mode>_perm" >>>>>> + [(set (match_operand:V_256 0 "register_operand" "=&x") >>>>>> + (vec_select:V_256 >>>>>> + (match_operand:V_256 1 "register_operand" "x") >>>>>> + (match_parallel 2 "palignr_operand" >>>>>> + [(match_operand 3 "const_int_operand" "n")])))] >>>>>> + "TARGET_AVX2" >>>>>> + "#" >>>>>> + "&& reload_completed" >>>>>> + [(const_int 0)] >>>>> >>>>> Why are you waiting until after reload to expand this? It's only the >>>>> vec_select parallel that determines which direction the palignr should be done. >>>>> >>>>> This seems like something you could do during permutation expansion. >>>>> >>>>> > > Assuming your change is triggered today without any additional changes > you should include some testcases. For now, it doesn't show if it does > anything useful. > > -- > H.J.
On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: > It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128. > I suggest you resubmit the patch as a bug fix for pr62128 with testcases from pr62128 as well as gcc.target/i386/pr52252-atom.c.
The test in pr62128 is exactly TEST 22 from gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct or not. Resubmitting patch looks good as current mail thread is already too complicated. On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128. >> > > I suggest you resubmit the patch as a bug fix for pr62128 with > testcases from pr62128 as well as gcc.target/i386/pr52252-atom.c. > > > -- > H.J.
Getting back to initial patch, is it ok? It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check. X86 bootstrap is also ok. 2014-10-01 Evgeny Stupachenko <evstupac@gmail.com> * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2 PALINGR instruction. * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try for AVX2. On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: > The test in pr62128 is exactly TEST 22 from > gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct > or not. > Resubmitting patch looks good as current mail thread is already too complicated. > > On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128. >>> >> >> I suggest you resubmit the patch as a bug fix for pr62128 with >> testcases from pr62128 as well as gcc.target/i386/pr52252-atom.c. >> >> >> -- >> H.J.
On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: > Getting back to initial patch, is it ok? IMO, we should start with Jakub's proposed patch [1] [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html Uros. > It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check. > X86 bootstrap is also ok. > > 2014-10-01 Evgeny Stupachenko <evstupac@gmail.com> > > * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2 > PALINGR instruction. > * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try > for AVX2. > > On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >> The test in pr62128 is exactly TEST 22 from >> gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct >> or not. >> Resubmitting patch looks good as current mail thread is already too complicated. >> >> On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >>>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128. >>>> >>> >>> I suggest you resubmit the patch as a bug fix for pr62128 with >>> testcases from pr62128 as well as gcc.target/i386/pr52252-atom.c. >>> >>> >>> -- >>> H.J.
On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote: > On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: > > Getting back to initial patch, is it ok? > > IMO, we should start with Jakub's proposed patch [1] > > [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html That doesn't compile, will post a new version; got interrupted when I found that in GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' one test is miscompiled even with unpatched compiler, debugging that now. That said, my patch will not do anything about the case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation, for that we can't do vpalignr followed by vpshufb or similar, but need to do some permutation first and then vpalignr on the result. So it would need a new routine. It is still a 2 insn permutation, not 6, and needs different algorithm, so sharing the same routine for that is undesirable. Jakub
This is not a fix for the case { 1, 2, 3, ..., 31, 0 }. This patch is an extension of expand_vec_perm_palignr on AVX2 case. For the case { 1, 2, 3, ..., 31, 0 } we should use separate function/pattern. I like split as it is similar to already handled SSE byte rotate {1,2,3,.....,15, 0}: ssse3_palignr<mode>_perm and AVX2 split: *avx_vperm_broadcast_<mode>. On Wed, Oct 1, 2014 at 2:35 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote: >> On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >> > Getting back to initial patch, is it ok? >> >> IMO, we should start with Jakub's proposed patch [1] >> >> [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html > > That doesn't compile, will post a new version; got interrupted when > I found that in > GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' > one test is miscompiled even with unpatched compiler, debugging that now. > > That said, my patch will not do anything about the > case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation, > for that we can't do vpalignr followed by vpshufb or similar, > but need to do some permutation first and then vpalignr on > the result. So it would need a new routine. It is still a 2 > insn permutation, not 6, and needs different algorithm, so sharing > the same routine for that is undesirable. > > Jakub
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index d6155cf..68ee65a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -81,6 +81,7 @@ ;; For AVX2 support UNSPEC_VPERMVAR UNSPEC_VPERMTI + UNSPEC_VPALIGNRDI UNSPEC_GATHER UNSPEC_VSIBADDR @@ -14167,6 +14168,19 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) +(define_insn "avx2_palignrv4di" + [(set (match_operand:V4DI 0 "register_operand" "=x") + (unspec:V4DI + [(match_operand:V4DI 1 "register_operand" "x") + (match_operand:V4DI 2 "nonimmediate_operand" "xm") + (match_operand:SI 3 "const_0_to_255_operand" "n")] + UNSPEC_VPALIGNRDI))] + "TARGET_AVX2" + "vpalignr\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "sselog") + (set_attr "prefix" "vex") + (set_attr "mode" "OI")]) + (define_insn "avx2_vec_dupv4df" [(set (match_operand:V4DF 0 "register_operand" "=x") (vec_duplicate:V4DF @@ -14658,6 +14672,49 @@ (set_attr "length_immediate" "1") (set_attr "prefix" "orig,vex")]) +(define_insn_and_split "avx2_rotate<mode>_perm" + [(set (match_operand:V_256 0 "register_operand" "=&x") + (vec_select:V_256 + (match_operand:V_256 1 "register_operand" "x") + (match_parallel 2 "palignr_operand" + [(match_operand 3 "const_int_operand" "n")])))] + "TARGET_AVX2" + "#" + "&& reload_completed" + [(const_int 0)] +{ + enum machine_mode imode = GET_MODE_INNER (<MODE>mode); + int shift = INTVAL (operands[3]) * GET_MODE_SIZE (imode); + rtx op0 = gen_rtx_REG (V4DImode, REGNO (operands[0])); + rtx op1 = gen_rtx_REG (V4DImode, REGNO (operands[1])); + + if (shift == 0) + emit_move_insn (operands[0], operands[1]); + else + { + emit_insn (gen_avx2_permv2ti (op0, + op1, + op1, + GEN_INT (33))); + if (shift < 16) + emit_insn (gen_avx2_palignrv4di (op0, + op0, + op1, + GEN_INT (shift))); + else if (shift > 16) + emit_insn (gen_avx2_palignrv4di (op0, + op1, + op0, + GEN_INT (shift - 16))); + } + DONE; +} + [(set_attr "type" "sseishft") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "vex")]) + + (define_expand "avx_vinsertf128<mode>" [(match_operand:V_256 0 "register_operand") (match_operand:V_256 1 "register_operand")