Message ID | 20140715110155.GA48158@msticlxl57.ims.intel.com |
---|---|
State | New |
Headers | show |
On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > On 15 Jul 10:42, Uros Bizjak wrote: >> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> >> >>>>> Also fully restrict xmm8-15 does not seem right. It is just costly >> >>>>> but not fully disallowed. >> >>>> >> >>>> As said earlier, you can try "Ya*x" as a constraint. >> >>> >> >>> I tried it. It does not seem to affect allocation much. I do not see >> >>> any gain on targeted tests. >> >> >> >> Strange, because the documentation claims: >> >> >> >> '*' >> >> Says that the following character should be ignored when choosing >> >> register preferences. '*' has no effect on the meaning of the >> >> constraint as a constraint, and no effect on reloading. For LRA >> >> '*' additionally disparages slightly the alternative if the >> >> following character matches the operand. >> >> >> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's >> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was >> >> needed for some MMX alternatives, where we didn't want RA to allocate >> >> a MMX register when the value could be passed in integer regs, but the >> >> value was still allowed in MMX register. >> > >> > That's is what my patch already does, but with '?' instead of '!'. >> >> Yes, I know. The problem is, that Ya*x type conditional allocation >> worked OK in the past for "not preferred, but still alowed regclass" >> registers, There are several patterns in i386.md that live by this >> premise, including movsf_internal and movdf_internal. If this approach >> doesn't work anymore, then we have to either figure out what is the >> reason, or invent a new strategy that will be applicable to all cases. >> >> Can you please post a small test that illustrates the case where Ya,!x >> works, but Ya*x doesn't? > > It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage. I use tcpjumbo test from TCPmark for initial check of how my patch works. This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15. I tried two versions of a simple patch which modifies only pmovzxwd instruction. > > Patch1: > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index d907353..6b03b72 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -11852,10 +11852,10 @@ > (set_attr "mode" "OI")]) > > (define_insn "sse4_1_<code>v4hiv4si2" > - [(set (match_operand:V4SI 0 "register_operand" "=x") > + [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x") > (any_extend:V4SI > (vec_select:V4HI > - (match_operand:V8HI 1 "nonimmediate_operand" "xm") > + (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm") > (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3)]))))] > "TARGET_SSE4_1" > > Patch2: > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index d907353..b3721c4 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -11852,10 +11852,10 @@ > (set_attr "mode" "OI")]) > > (define_insn "sse4_1_<code>v4hiv4si2" > - [(set (match_operand:V4SI 0 "register_operand" "=x") > + [(set (match_operand:V4SI 0 "register_operand" "=Yr*x") > (any_extend:V4SI > (vec_select:V4HI > - (match_operand:V8HI 1 "nonimmediate_operand" "xm") > + (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm") > (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3)]))))] > "TARGET_SSE4_1" > > > Here are results of looking for pmovzxwd in resulting binaries: > #objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l > 76 > #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l > 0 > #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l > 76 > > Therefore I make a conclusion that Yr*x does not really differ much from x. Just FTR: Using "Yr,*x" is also a viable option: #objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l 0 I believe that the above is the way to go with LRA. Vladimir, what do you think? Uros.
Hi, Having stage1 close to end, may we make some decision regarding this patch? Having a couple of working variants, may we choose and use one of them? Thanks, Ilya 2014-07-15 17:38 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>: > On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> On 15 Jul 10:42, Uros Bizjak wrote: >>> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> >>> >>>>> Also fully restrict xmm8-15 does not seem right. It is just costly >>> >>>>> but not fully disallowed. >>> >>>> >>> >>>> As said earlier, you can try "Ya*x" as a constraint. >>> >>> >>> >>> I tried it. It does not seem to affect allocation much. I do not see >>> >>> any gain on targeted tests. >>> >> >>> >> Strange, because the documentation claims: >>> >> >>> >> '*' >>> >> Says that the following character should be ignored when choosing >>> >> register preferences. '*' has no effect on the meaning of the >>> >> constraint as a constraint, and no effect on reloading. For LRA >>> >> '*' additionally disparages slightly the alternative if the >>> >> following character matches the operand. >>> >> >>> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's >>> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was >>> >> needed for some MMX alternatives, where we didn't want RA to allocate >>> >> a MMX register when the value could be passed in integer regs, but the >>> >> value was still allowed in MMX register. >>> > >>> > That's is what my patch already does, but with '?' instead of '!'. >>> >>> Yes, I know. The problem is, that Ya*x type conditional allocation >>> worked OK in the past for "not preferred, but still alowed regclass" >>> registers, There are several patterns in i386.md that live by this >>> premise, including movsf_internal and movdf_internal. If this approach >>> doesn't work anymore, then we have to either figure out what is the >>> reason, or invent a new strategy that will be applicable to all cases. >>> >>> Can you please post a small test that illustrates the case where Ya,!x >>> works, but Ya*x doesn't? >> >> It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage. I use tcpjumbo test from TCPmark for initial check of how my patch works. This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15. I tried two versions of a simple patch which modifies only pmovzxwd instruction. >> >> Patch1: >> >> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >> index d907353..6b03b72 100644 >> --- a/gcc/config/i386/sse.md >> +++ b/gcc/config/i386/sse.md >> @@ -11852,10 +11852,10 @@ >> (set_attr "mode" "OI")]) >> >> (define_insn "sse4_1_<code>v4hiv4si2" >> - [(set (match_operand:V4SI 0 "register_operand" "=x") >> + [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x") >> (any_extend:V4SI >> (vec_select:V4HI >> - (match_operand:V8HI 1 "nonimmediate_operand" "xm") >> + (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm") >> (parallel [(const_int 0) (const_int 1) >> (const_int 2) (const_int 3)]))))] >> "TARGET_SSE4_1" >> >> Patch2: >> >> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >> index d907353..b3721c4 100644 >> --- a/gcc/config/i386/sse.md >> +++ b/gcc/config/i386/sse.md >> @@ -11852,10 +11852,10 @@ >> (set_attr "mode" "OI")]) >> >> (define_insn "sse4_1_<code>v4hiv4si2" >> - [(set (match_operand:V4SI 0 "register_operand" "=x") >> + [(set (match_operand:V4SI 0 "register_operand" "=Yr*x") >> (any_extend:V4SI >> (vec_select:V4HI >> - (match_operand:V8HI 1 "nonimmediate_operand" "xm") >> + (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm") >> (parallel [(const_int 0) (const_int 1) >> (const_int 2) (const_int 3)]))))] >> "TARGET_SSE4_1" >> >> >> Here are results of looking for pmovzxwd in resulting binaries: >> #objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >> 76 >> #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >> 0 >> #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >> 76 >> >> Therefore I make a conclusion that Yr*x does not really differ much from x. > > Just FTR: > > Using "Yr,*x" is also a viable option: > > #objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep > "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l > 0 > > I believe that the above is the way to go with LRA. Vladimir, what do you think? > > Uros.
On Wed, Nov 5, 2014 at 10:35 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > Hi, > > Having stage1 close to end, may we make some decision regarding this > patch? Having a couple of working variants, may we choose and use one > of them? I propose to wait for Vlad for an update about his plans on register preference algorythm that would fix this (and other "Ya*r"-type issues). In the absence of the fix, we'll go with "Yr,*x". Uros. > > Thanks, > Ilya > > 2014-07-15 17:38 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>: >> On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> On 15 Jul 10:42, Uros Bizjak wrote: >>>> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>> >>>> >>>>> Also fully restrict xmm8-15 does not seem right. It is just costly >>>> >>>>> but not fully disallowed. >>>> >>>> >>>> >>>> As said earlier, you can try "Ya*x" as a constraint. >>>> >>> >>>> >>> I tried it. It does not seem to affect allocation much. I do not see >>>> >>> any gain on targeted tests. >>>> >> >>>> >> Strange, because the documentation claims: >>>> >> >>>> >> '*' >>>> >> Says that the following character should be ignored when choosing >>>> >> register preferences. '*' has no effect on the meaning of the >>>> >> constraint as a constraint, and no effect on reloading. For LRA >>>> >> '*' additionally disparages slightly the alternative if the >>>> >> following character matches the operand. >>>> >> >>>> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's >>>> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was >>>> >> needed for some MMX alternatives, where we didn't want RA to allocate >>>> >> a MMX register when the value could be passed in integer regs, but the >>>> >> value was still allowed in MMX register. >>>> > >>>> > That's is what my patch already does, but with '?' instead of '!'. >>>> >>>> Yes, I know. The problem is, that Ya*x type conditional allocation >>>> worked OK in the past for "not preferred, but still alowed regclass" >>>> registers, There are several patterns in i386.md that live by this >>>> premise, including movsf_internal and movdf_internal. If this approach >>>> doesn't work anymore, then we have to either figure out what is the >>>> reason, or invent a new strategy that will be applicable to all cases. >>>> >>>> Can you please post a small test that illustrates the case where Ya,!x >>>> works, but Ya*x doesn't? >>> >>> It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage. I use tcpjumbo test from TCPmark for initial check of how my patch works. This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15. I tried two versions of a simple patch which modifies only pmovzxwd instruction. >>> >>> Patch1: >>> >>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >>> index d907353..6b03b72 100644 >>> --- a/gcc/config/i386/sse.md >>> +++ b/gcc/config/i386/sse.md >>> @@ -11852,10 +11852,10 @@ >>> (set_attr "mode" "OI")]) >>> >>> (define_insn "sse4_1_<code>v4hiv4si2" >>> - [(set (match_operand:V4SI 0 "register_operand" "=x") >>> + [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x") >>> (any_extend:V4SI >>> (vec_select:V4HI >>> - (match_operand:V8HI 1 "nonimmediate_operand" "xm") >>> + (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm") >>> (parallel [(const_int 0) (const_int 1) >>> (const_int 2) (const_int 3)]))))] >>> "TARGET_SSE4_1" >>> >>> Patch2: >>> >>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >>> index d907353..b3721c4 100644 >>> --- a/gcc/config/i386/sse.md >>> +++ b/gcc/config/i386/sse.md >>> @@ -11852,10 +11852,10 @@ >>> (set_attr "mode" "OI")]) >>> >>> (define_insn "sse4_1_<code>v4hiv4si2" >>> - [(set (match_operand:V4SI 0 "register_operand" "=x") >>> + [(set (match_operand:V4SI 0 "register_operand" "=Yr*x") >>> (any_extend:V4SI >>> (vec_select:V4HI >>> - (match_operand:V8HI 1 "nonimmediate_operand" "xm") >>> + (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm") >>> (parallel [(const_int 0) (const_int 1) >>> (const_int 2) (const_int 3)]))))] >>> "TARGET_SSE4_1" >>> >>> >>> Here are results of looking for pmovzxwd in resulting binaries: >>> #objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >>> 76 >>> #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >>> 0 >>> #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >>> 76 >>> >>> Therefore I make a conclusion that Yr*x does not really differ much from x. >> >> Just FTR: >> >> Using "Yr,*x" is also a viable option: >> >> #objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep >> "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >> 0 >> >> I believe that the above is the way to go with LRA. Vladimir, what do you think? >> >> Uros.
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index d907353..6b03b72 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -11852,10 +11852,10 @@ (set_attr "mode" "OI")]) (define_insn "sse4_1_<code>v4hiv4si2" - [(set (match_operand:V4SI 0 "register_operand" "=x") + [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x") (any_extend:V4SI (vec_select:V4HI - (match_operand:V8HI 1 "nonimmediate_operand" "xm") + (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))))] "TARGET_SSE4_1" Patch2: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index d907353..b3721c4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -11852,10 +11852,10 @@ (set_attr "mode" "OI")]) (define_insn "sse4_1_<code>v4hiv4si2" - [(set (match_operand:V4SI 0 "register_operand" "=x") + [(set (match_operand:V4SI 0 "register_operand" "=Yr*x") (any_extend:V4SI (vec_select:V4HI - (match_operand:V8HI 1 "nonimmediate_operand" "xm") + (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))))] "TARGET_SSE4_1"