Message ID | 1287320125.2288.8.camel@localhost.localdomain |
---|---|
State | New |
Headers | show |
On Sun, Oct 17, 2010 at 5:55 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sun, 2010-10-17 at 10:34 +0200, Uros Bizjak wrote: >> On Sat, 2010-10-16 at 16:07 -0700, H.J. Lu wrote: >> > On Tue, Oct 12, 2010 at 9:57 AM, Richard Henderson <rth@redhat.com> wrote: >> > > On 10/11/2010 06:49 PM, H.J. Lu wrote: >> > >> (define_expand "storent<mode>" >> > >> + [(set (match_operand:AVX256MODEF2P 0 "memory_operand" "") >> > >> + (unspec:AVX256MODEF2P >> > >> + [(match_operand:AVX256MODEF2P 1 "register_operand" "")] >> > >> + UNSPEC_MOVNT))] >> > >> + "AVX256_VEC_FLOAT_MODE_P (<MODE>mode)" >> > >> + "") >> > >> + >> > >> +(define_expand "storent<mode>" >> > >> [(set (match_operand:SSEMODEF2P 0 "memory_operand" "") >> > >> (unspec:SSEMODEF2P >> > >> [(match_operand:SSEMODEF2P 1 "register_operand" "")] >> > > >> > > Do not duplicate all of these patterns. Instead create a >> > > mode iterator that only supplies the AVX modes when AVX is >> > > enabled. >> > > >> > >> +(define_insn "*vec_concat_lo_<mode>_avx" >> > >> +(define_insn "*vec_concat_hi_<mode>_avx" >> > >> +(define_insn "*vec_concat_lo_<mode>_avx" >> > >> +(define_insn "*vec_concat_hi_<mode>_avx" >> > > >> > > Why are you doing this as separate patterns like this? >> > > Canonically, the select goes on the outside of the concat, >> > > >> > > (vec_select (vec_concat x y) (parallel ...)) >> > > >> > > instead of how you're doing it here. >> > > >> > > Indeed, putting the vec_select on the outside would allow >> > > a match_parallel to describe the full permutation of this >> > > insn. You're only considering half of the cases here. >> >> In the attached patch, you still use vec_concat before vec_select. >> However, I think that new insn patterns are not needed at all, since >> *avx_vperm2f128<mode>_nozero should handle all these RTXes. >> >> Please fix the expanders that generate wrong patterns instead, i.e. >> vec_interleave_lowv4df. > > Something like attached, but there are probably more instances to fix. > > PR target/46051 > * config/i386/sse.md (vec_interleave_highv4df): Fix third member of > generated sequence to match *avx_vperm2f128<mode>_nozero. > (vec_interleave_lowv4df): Ditto. > It works. Thanks.
Index: sse.md =================================================================== --- sse.md (revision 165547) +++ sse.md (working copy) @@ -4583,13 +4583,12 @@ (parallel [(const_int 1) (const_int 5) (const_int 3) (const_int 7)]))) (set (match_operand:V4DF 0 "register_operand" "") - (vec_concat:V4DF - (vec_select:V2DF + (vec_select:V4DF + (vec_concat:V8DF (match_dup 3) - (parallel [(const_int 2) (const_int 3)])) - (vec_select:V2DF - (match_dup 4) - (parallel [(const_int 2) (const_int 3)]))))] + (match_dup 4)) + (parallel [(const_int 2) (const_int 3) + (const_int 6) (const_int 7)])))] "TARGET_AVX" { operands[3] = gen_reg_rtx (V4DFmode); @@ -4718,13 +4717,12 @@ (parallel [(const_int 1) (const_int 5) (const_int 3) (const_int 7)]))) (set (match_operand:V4DF 0 "register_operand" "") - (vec_concat:V4DF - (vec_select:V2DF + (vec_select:V4DF + (vec_concat:V8DF (match_dup 3) - (parallel [(const_int 0) (const_int 1)])) - (vec_select:V2DF - (match_dup 4) - (parallel [(const_int 0) (const_int 1)]))))] + (match_dup 4)) + (parallel [(const_int 0) (const_int 1) + (const_int 4) (const_int 5)])))] "TARGET_AVX" { operands[3] = gen_reg_rtx (V4DFmode);