Message ID | alpine.DEB.2.02.1303191627200.4515@stedding.saclay.inria.fr |
---|---|
State | New |
Headers | show |
On 03/19/2013 08:47 AM, Marc Glisse wrote: > (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" > [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") > - (unspec:AVX256MODE2P > - [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")] > - UNSPEC_CAST))] > + (subreg:AVX256MODE2P > + (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))] > "TARGET_AVX" > "#" > "&& reload_completed" > [(const_int 0)] I'm not fond of this, primarily because I believe the pattern should not exist at all. One of the following is true: (1) reload needs working around (thus all the reload_completed nonsense) or (2) the entire pattern is useless and would be subsumed by mov<mode> or (3) the entire pattern is useless and is *already* subsumed by mov<mode>, since mov is earlier in the md file, making this pattern dead code. r~
On Tue, 19 Mar 2013, Richard Henderson wrote: > On 03/19/2013 08:47 AM, Marc Glisse wrote: >> (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" >> [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") >> - (unspec:AVX256MODE2P >> - [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")] >> - UNSPEC_CAST))] >> + (subreg:AVX256MODE2P >> + (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))] >> "TARGET_AVX" >> "#" >> "&& reload_completed" >> [(const_int 0)] > > I'm not fond of this, primarily because I believe the pattern should > not exist at all. Sure, removing it would be even better. > One of the following is true: > > (1) reload needs working around (thus all the reload_completed nonsense) > or > (2) the entire pattern is useless and would be subsumed by mov<mode> > or > (3) the entire pattern is useless and is *already* subsumed by > mov<mode>, since mov is earlier in the md file, making this > pattern dead code. We need something to expand _mm256_castpd128_pd256 to. I tried making it a define_expand (with the subreg pattern, and keeping the {} part intact), but that gives check_rtl errors in lra. I then tried to remove the REG_P condition and use simplify_gen_subreg or gen_lowpart, but the first one gives unrecognizable insn at -O0 (same as removing the {} part completely) (it seems happier at -O1), while the second ICEs (gen_lowpart_common returns 0) for any -Ox except -O0. As must be obvious from this paragraph, I just tried a few random bad ideas... and when none worked I posted the minimal patch that worked. Do you at least agree that vector-vector subregs make sense, or is that part wrong as well?
On 03/20/2013 08:00 AM, Marc Glisse wrote: > Do you at least agree that vector-vector subregs make sense, or is that part > wrong as well? You mean a V4SImode subreg of a V8SImode register, not just same-size casting? It makes logical sense, but I'm fairly sure you'll need a lot more surgery throughout the compiler to make that happen. I'm curious how a define_expand can fail in LRA, but your define_insn succeeds? Is the failure because of ix86_cannot_change_mode_class? Because that hook fairly well defines what subregs are valid. And if that says it isn't valid, then even having a define_insn that uses such is wrong. r~
On Wed, Mar 20, 2013 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote: > On 03/20/2013 08:00 AM, Marc Glisse wrote: >> Do you at least agree that vector-vector subregs make sense, or is that part >> wrong as well? > > You mean a V4SImode subreg of a V8SImode register, not just same-size casting? > It makes logical sense, but I'm fairly sure you'll need a lot more surgery > throughout the compiler to make that happen. > > I'm curious how a define_expand can fail in LRA, but your define_insn succeeds? > Is the failure because of ix86_cannot_change_mode_class? Because that hook > fairly well defines what subregs are valid. And if that says it isn't valid, > then even having a define_insn that uses such is wrong. Don't we have vec_select to get a V4SImode out of a V8SImode? So you only need a define_insn that special-cases the subreg-like ones? Richard. > > r~
On Wed, 20 Mar 2013, Richard Henderson wrote: > On 03/20/2013 08:00 AM, Marc Glisse wrote: >> Do you at least agree that vector-vector subregs make sense, or is that part >> wrong as well? > > You mean a V4SImode subreg of a V8SImode register, not just same-size casting? I am mostly interested in the reverse, a paradoxical subreg, since vec_select can only model one direction (and only rvalues, but that's a different question). > It makes logical sense, but I'm fairly sure you'll need a lot more surgery > throughout the compiler to make that happen. > > I'm curious how a define_expand can fail in LRA, but your define_insn succeeds? Total guesswork: I think it is related to that REG_P protected code, and the reload_complete test. With the define_insn_and_split, we keep the insn until after reload and only do the subreg magic then. With a define_expand, we end up writing to reg 60 as a V2DF and reading it as a V4DF, and since it isn't a hard register, that causes a problem. > Is the failure because of ix86_cannot_change_mode_class? Because that hook > fairly well defines what subregs are valid. And if that says it isn't valid, > then even having a define_insn that uses such is wrong. A quick look at ix86_cannot_change_mode_class seems to indicate that it does not mind such paradoxical subregs.
On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote: > On Wed, 20 Mar 2013, Richard Henderson wrote: > >> On 03/20/2013 08:00 AM, Marc Glisse wrote: >>> >>> Do you at least agree that vector-vector subregs make sense, or is that >>> part >>> wrong as well? >> >> >> You mean a V4SImode subreg of a V8SImode register, not just same-size >> casting? > > > I am mostly interested in the reverse, a paradoxical subreg, since > vec_select can only model one direction (and only rvalues, but that's a > different question). vec_duplicate? Honestly, what semantics should _mm256_castpd128_pd256 have if it is supposed to cast a v2df to a v4df? Or what use? Richard.
On Wed, 20 Mar 2013, Richard Biener wrote: > On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote: >> On Wed, 20 Mar 2013, Richard Henderson wrote: >> >>> On 03/20/2013 08:00 AM, Marc Glisse wrote: >>>> >>>> Do you at least agree that vector-vector subregs make sense, or is that >>>> part >>>> wrong as well? >>> >>> >>> You mean a V4SImode subreg of a V8SImode register, not just same-size >>> casting? >> >> >> I am mostly interested in the reverse, a paradoxical subreg, since >> vec_select can only model one direction (and only rvalues, but that's a >> different question). > > vec_duplicate? There is already some of that in various places, and there may be even more vec_merge+vec_duplicate patterns soon, but you want to make sure you don't actually do the duplication. > Honestly, what semantics should _mm256_castpd128_pd256 have if > it is supposed to cast a v2df to a v4df? NOP. We don't care what is in the high part of the vector. > Or what use? Many vector operations are defined as taking 2 vectors and merging them somehow. I didn't check if this case works, but for instance if you want to copy a V2DF to the bottom part of a V4DF using Intel's intrinsics, you will probably have to cast the V2DF to a V4DF and then use an intrinsic that takes 2 V4DF. (there are many issues with those intrinsics, but we don't control them)
On Wed, Mar 20, 2013 at 4:54 PM, Marc Glisse <marc.glisse@inria.fr> wrote: > On Wed, 20 Mar 2013, Richard Biener wrote: > >> On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote: >>> >>> On Wed, 20 Mar 2013, Richard Henderson wrote: >>> >>>> On 03/20/2013 08:00 AM, Marc Glisse wrote: >>>>> >>>>> >>>>> Do you at least agree that vector-vector subregs make sense, or is that >>>>> part >>>>> wrong as well? >>>> >>>> >>>> >>>> You mean a V4SImode subreg of a V8SImode register, not just same-size >>>> casting? >>> >>> >>> >>> I am mostly interested in the reverse, a paradoxical subreg, since >>> vec_select can only model one direction (and only rvalues, but that's a >>> different question). >> >> >> vec_duplicate? > > > There is already some of that in various places, and there may be even more > vec_merge+vec_duplicate patterns soon, but you want to make sure you don't > actually do the duplication. > > >> Honestly, what semantics should _mm256_castpd128_pd256 have if >> it is supposed to cast a v2df to a v4df? > > > NOP. We don't care what is in the high part of the vector. > >> Or what use? > > > Many vector operations are defined as taking 2 vectors and merging them > somehow. I didn't check if this case works, but for instance if you want to > copy a V2DF to the bottom part of a V4DF using Intel's intrinsics, you will > probably have to cast the V2DF to a V4DF and then use an intrinsic that > takes 2 V4DF. (there are many issues with those intrinsics, but we don't > control them) Hmm, I see. I still think that we should expose most of the intrinsics and builtins implementation details earlier, at the GIMPLE level. This one would be an awkward one there, too. You'd need sth like v4df_3 = CONSTRUCTOR { v2df_2, v2df_1(D) }; thus, make that "uninitialized" explicit by using a default def. I think we don't support generating the above from C/C++ source with GNU extensions as vector type casts are quite restricted at the moment, so there you'd have to write sth like double uninit; v4df res = { v2dfv[0], v2dfv[1], uninit, uninit }; which would get you D.1723 = BIT_FIELD_REF <x, 64, 0>; D.1724 = BIT_FIELD_REF <x, 64, 64>; D.1725 = {D.1723, D.1724, uninit, uninit}; at the moment. And of course awkward code in the end ;) Which leaves the other option of folding the __builtin_ia32_ps256_ps in the target (and most other builtins). Just side-tracking from the RTL issue of course ... Richard. > -- > Marc Glisse
Index: gcc/testsuite/gcc.target/i386/pr50829.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr50829.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr50829.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -mavx" } */ + +#include <x86intrin.h> + +__m256d +concat (__m128d x) +{ + __m256d z = _mm256_castpd128_pd256 (x); + return _mm256_insertf128_pd (z, x, 1); +} + +/* { dg-final { scan-assembler-not "vmov" } } */ Property changes on: gcc/testsuite/gcc.target/i386/pr50829.c ___________________________________________________________________ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: gcc/config/i386/sse.md =================================================================== --- gcc/config/i386/sse.md (revision 196633) +++ gcc/config/i386/sse.md (working copy) @@ -66,21 +66,20 @@ UNSPEC_AESKEYGENASSIST ;; For PCLMUL support UNSPEC_PCLMUL ;; For AVX support UNSPEC_PCMP UNSPEC_VPERMIL UNSPEC_VPERMIL2 UNSPEC_VPERMIL2F128 - UNSPEC_CAST UNSPEC_VTESTP UNSPEC_VCVTPH2PS UNSPEC_VCVTPS2PH ;; For AVX2 support UNSPEC_VPERMVAR UNSPEC_VPERMTI UNSPEC_GATHER UNSPEC_VSIBADDR ]) @@ -11089,23 +11088,22 @@ "TARGET_AVX" "v<sseintprefix>maskmov<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "prefix" "vex") (set_attr "btver2_decode" "vector") (set_attr "mode" "<sseinsnmode>")]) (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") - (unspec:AVX256MODE2P - [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")] - UNSPEC_CAST))] + (subreg:AVX256MODE2P + (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))] "TARGET_AVX" "#" "&& reload_completed" [(const_int 0)] { rtx op0 = operands[0]; rtx op1 = operands[1]; if (REG_P (op0)) op0 = gen_rtx_REG (<ssehalfvecmode>mode, REGNO (op0)); else Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 196633) +++ gcc/emit-rtl.c (working copy) @@ -707,20 +707,23 @@ validate_subreg (enum machine_mode omode else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode)) && GET_MODE_INNER (imode) == omode) ; /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs, i.e. (subreg:V4SF (reg:SF) 0). This surely isn't the cleanest way to represent this. It's questionable if this ought to be represented at all -- why can't this all be hidden in post-reload splitters that make arbitrarily mode changes to the registers themselves. */ else if (VECTOR_MODE_P (omode) && GET_MODE_INNER (omode) == imode) ; + else if (VECTOR_MODE_P (omode) && VECTOR_MODE_P (imode) + && GET_MODE_INNER (omode) == GET_MODE_INNER (imode)) + ; /* Subregs involving floating point modes are not allowed to change size. Therefore (subreg:DI (reg:DF) 0) is fine, but (subreg:SI (reg:DF) 0) isn't. */ else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode)) { if (! (isize == osize /* LRA can use subreg to store a floating point value in an integer mode. Although the floating point and the integer modes need the same number of hard registers, the size of floating point mode can be less than the