Message ID | alpine.DEB.2.02.1210131032460.9651@stedding.saclay.inria.fr |
---|---|
State | New |
Headers | show |
On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse <marc.glisse@inria.fr> wrote: > Hello, > > this patch provides an alternate pattern to let combine recognize scalar > operations that preserve the high part of a vector. If the strategy is all > right, I could do the same for more operations (mul, div, ...). Something > similar is also possible for V4SF (different pattern though), but probably > not as useful. But, we _do_ have vec_merge pattern that describes the operation. Adding another one to each operation just to satisfy combine is IMO not correct approach. I'd rather see generic RTX simplification that simplifies your proposed pattern to vec_merge pattern. Also, as you mention in PR54855, Comment #5, the approach is too fragile... Uros.
On Sun, 14 Oct 2012, Uros Bizjak wrote: > On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse <marc.glisse@inria.fr> wrote: >> Hello, >> >> this patch provides an alternate pattern to let combine recognize scalar >> operations that preserve the high part of a vector. If the strategy is all >> right, I could do the same for more operations (mul, div, ...). Something >> similar is also possible for V4SF (different pattern though), but probably >> not as useful. > > But, we _do_ have vec_merge pattern that describes the operation. > Adding another one to each operation just to satisfy combine is IMO > not correct approach. At some point I wondered about _replacing_ the existing pattern, so there would only be one ;-) The vec_merge pattern takes as argument 2 vectors instead of a vector and a scalar, and describes the operation as a vector operation where we drop half of the result, instead of a scalar operation where we re-add the top half of the vector. I don't know if that's the most convenient choice. Adding code in simplify-rtx to replace vec_merge with vec_concat / vec_select might be easier than the other way around. If the middle-end somehow gave us: (plus X (vec_concat Y 0)) it would seem a bit strange to add an optimization that turns it into: (vec_merge (plus X (subreg:V2DF Y)) X 1) but then producing: (vec_concat (plus (vec_select X 0) Y) (vec_select X 1)) would be strange as well. (ignoring the signed zero issues here) > I'd rather see generic RTX simplification that > simplifies your proposed pattern to vec_merge pattern. Ok, I'll see what I can do. > Also, as you mention in PR54855, Comment #5, the approach is too > fragile... I am not sure I can make the RTX simplification much less fragile... Whenever I see (vec_concat X (vec_select Y 1)), I would have to check whether X is some (possibly large) tree of scalar computations involving Y[0], move it all to vec_merge computations, and fix other users of some of those scalars to now use S[0]. Seems too hard, I would stop at single-operation X that is used only once. Besides, the gain is larger in proportion when there is a single operation :-) Thank you for your comments,
On Sun, 14 Oct 2012, Marc Glisse wrote: > On Sun, 14 Oct 2012, Uros Bizjak wrote: > >> On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse <marc.glisse@inria.fr> wrote: >>> Hello, >>> >>> this patch provides an alternate pattern to let combine recognize scalar >>> operations that preserve the high part of a vector. If the strategy is all >>> right, I could do the same for more operations (mul, div, ...). Something >>> similar is also possible for V4SF (different pattern though), but probably >>> not as useful. >> >> But, we _do_ have vec_merge pattern that describes the operation. >> Adding another one to each operation just to satisfy combine is IMO >> not correct approach. > > At some point I wondered about _replacing_ the existing pattern, so there > would only be one ;-) > > The vec_merge pattern takes as argument 2 vectors instead of a vector and a > scalar, and describes the operation as a vector operation where we drop half > of the result, instead of a scalar operation where we re-add the top half of > the vector. I don't know if that's the most convenient choice. Adding code in > simplify-rtx to replace vec_merge with vec_concat / vec_select might be > easier than the other way around. > > > If the middle-end somehow gave us: > (plus X (vec_concat Y 0)) > it would seem a bit strange to add an optimization that turns it into: > (vec_merge (plus X (subreg:V2DF Y)) X 1) > but then producing: > (vec_concat (plus (vec_select X 0) Y) (vec_select X 1)) > would be strange as well. > (ignoring the signed zero issues here) > >> I'd rather see generic RTX simplification that >> simplifies your proposed pattern to vec_merge pattern. > > Ok, I'll see what I can do. > >> Also, as you mention in PR54855, Comment #5, the approach is too fragile... > > I am not sure I can make the RTX simplification much less fragile... Whenever > I see (vec_concat X (vec_select Y 1)), I would have to check whether X is > some (possibly large) tree of scalar computations involving Y[0], move it all > to vec_merge computations, and fix other users of some of those scalars to > now use S[0]. Seems too hard, I would stop at single-operation X that is used > only once. Besides, the gain is larger in proportion when there is a single > operation :-) > > Thank you for your comments, Hello, I experimented with the simplify-rtx transformation you suggested, see: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855 It works when the argument is a register, but not for memory (which is where the constant is in the testcase). And the description of the operation in sse.md does seem problematic. It says the second argument is: (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")) but Intel's documentation says "The source operand can be an XMM register or a 64-bit memory location", not quite the same. Do you think the .md description should really stay this way, or could we change it to something that better reflects "64-bit memory location"?
On Fri, Nov 30, 2012 at 1:34 PM, Marc Glisse <marc.glisse@inria.fr> wrote: > Hello, > > I experimented with the simplify-rtx transformation you suggested, see: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855 > > It works when the argument is a register, but not for memory (which is where > the constant is in the testcase). And the description of the operation in > sse.md does seem problematic. It says the second argument is: > > (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")) > > but Intel's documentation says "The source operand can be an XMM register or > a 64-bit memory location", not quite the same. > > Do you think the .md description should really stay this way, or could we > change it to something that better reflects "64-bit memory location"? For reference, we are talking about: (define_insn "<sse>_vm<plusminus_insn><mode>3" [(set (match_operand:VF_128 0 "register_operand" "=x,x") (vec_merge:VF_128 (plusminus:VF_128 (match_operand:VF_128 1 "register_operand" "0,x") (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")) (match_dup 1) (const_int 1)))] "TARGET_SSE" "@ <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %2} v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "prefix" "orig,vex") (set_attr "mode" "<ssescalarmode>")]) No, looking at your description, the operand 2 should be scalar operand (we use _s{s,d} scalar instruction here), and for doubles this should refer to 64bit memory location. I don't remember all the details about vec_merge scalar instructions, but it looks to me that canonical representation should be more like your proposal: +(define_insn "*sse2_vm<plusminus_insn>v2df3" + [(set (match_operand:V2DF 0 "register_operand" "=x,x") + (vec_concat:V2DF + (plusminus:DF + (vec_select:DF + (match_operand:V2DF 1 "register_operand" "0,x") + (parallel [(const_int 0)])) + (match_operand:DF 2 "nonimmediate_operand" "xm,xm")) + (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))))] + "TARGET_SSE2" Uros.
On Fri, 30 Nov 2012, Uros Bizjak wrote: > For reference, we are talking about: > > (define_insn "<sse>_vm<plusminus_insn><mode>3" > [(set (match_operand:VF_128 0 "register_operand" "=x,x") > (vec_merge:VF_128 > (plusminus:VF_128 > (match_operand:VF_128 1 "register_operand" "0,x") > (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")) > (match_dup 1) > (const_int 1)))] > "TARGET_SSE" > "@ > <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %2} > v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}" > [(set_attr "isa" "noavx,avx") > (set_attr "type" "sseadd") > (set_attr "prefix" "orig,vex") > (set_attr "mode" "<ssescalarmode>")]) > > No, looking at your description, the operand 2 should be scalar > operand (we use _s{s,d} scalar instruction here), and for doubles this > should refer to 64bit memory location. I don't remember all the > details about vec_merge scalar instructions, but it looks to me that > canonical representation should be more like your proposal: > > +(define_insn "*sse2_vm<plusminus_insn>v2df3" > + [(set (match_operand:V2DF 0 "register_operand" "=x,x") > + (vec_concat:V2DF > + (plusminus:DF > + (vec_select:DF > + (match_operand:V2DF 1 "register_operand" "0,x") > + (parallel [(const_int 0)])) > + (match_operand:DF 2 "nonimmediate_operand" "xm,xm")) > + (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))))] > + "TARGET_SSE2" Thank you. Among the following possible patterns, my choice (if nobody objects) is to use 4) for V2DF and 3) (rewritten without iterators) for V4SF. The question is then what should be done about the builtins and intrinsics. _mm_add_sd takes two __m128. If I change the signature of __builtin_ia32_addsd, I can make _mm_add_sd pass __B[0] as second argument, but I don't know if I am allowed to change that signature. Otherwise I guess I'll need to keep a separate expander for it (I'd rather not). And then there are several other operations than +/- to handle. 1) Current pattern: [(set (match_operand:VF_128 0 "register_operand" "=x,x") (vec_merge:VF_128 (plusminus:VF_128 (match_operand:VF_128 1 "register_operand" "0,x") (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")) (match_dup 1) (const_int 1)))] 2) Minimal fix: [(set (match_operand:VF_128 0 "register_operand" "=x,x") (vec_merge:VF_128 (plusminus:VF_128 (match_operand:VF_128 1 "register_operand" "0,x") (vec_duplicate:VF_128 (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,xm"))) (match_dup 1) (const_int 1)))] 3) With the operation in scalar mode: [(set (match_operand:VF_128 0 "register_operand" "=x,x") (vec_merge:VF_128 (vec_duplicate:VF_128 (plusminus:<ssescalarmode> (vec_select:<ssescalarmode> (match_operand:VF_128 1 "register_operand" "0,x") (parallel [(const_int 0)])) (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,xm")))) (match_dup 1) (const_int 1)))] 4) Special version which only makes sense for vectors of 2 elements: [(set (match_operand:V2DF 0 "register_operand" "=x,x") (vec_concat:V2DF (plusminus:DF (vec_select:DF (match_operand:V2DF 1 "register_operand" "0,x") (parallel [(const_int 0)])) (match_operand:DF 2 "nonimmediate_operand" "xm,xm")) (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))))]
Index: config/i386/sse.md =================================================================== --- config/i386/sse.md (revision 192420) +++ config/i386/sse.md (working copy) @@ -812,20 +812,38 @@ (const_int 1)))] "TARGET_SSE" "@ <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %2} v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "prefix" "orig,vex") (set_attr "mode" "<ssescalarmode>")]) +(define_insn "*sse2_vm<plusminus_insn>v2df3" + [(set (match_operand:V2DF 0 "register_operand" "=x,x") + (vec_concat:V2DF + (plusminus:DF + (vec_select:DF + (match_operand:V2DF 1 "register_operand" "0,x") + (parallel [(const_int 0)])) + (match_operand:DF 2 "nonimmediate_operand" "xm,xm")) + (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))))] + "TARGET_SSE2" + "@ + <plusminus_mnemonic>sd\t{%2, %0|%0, %2} + v<plusminus_mnemonic>sd\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sseadd") + (set_attr "prefix" "orig,vex") + (set_attr "mode" "DF")]) + (define_expand "mul<mode>3" [(set (match_operand:VF 0 "register_operand") (mult:VF (match_operand:VF 1 "nonimmediate_operand") (match_operand:VF 2 "nonimmediate_operand")))] "TARGET_SSE" "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);") (define_insn "*mul<mode>3" [(set (match_operand:VF 0 "register_operand" "=x,x") Index: testsuite/gcc.target/i386/pr54855.c =================================================================== --- testsuite/gcc.target/i386/pr54855.c (revision 0) +++ testsuite/gcc.target/i386/pr54855.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O -msse2" } */ + +typedef double vec __attribute__((vector_size(16))); + +vec f (vec x) +{ + x[0] += 2; + return x; +} + +vec g (vec x) +{ + x[0] -= 1; + return x; +} + +/* { dg-final { scan-assembler-not "mov" } } */