Message ID | 54218928.4040709@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Sep 23, 2014 at 4:52 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: > On 09/23/2014 02:07 AM, Uros Bizjak wrote: >> On Tue, Sep 23, 2014 at 3:26 AM, Vladimir Makarov <vmakarov@redhat.com> wrote: >>> The previous patch to solve PR61360 fixed the problem in IRA (it was >>> easier for me to do as I know the code well) >>> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 >>> >>> Although imo it was an ok fix, Richard expressed concerns with the patch >>> and the practice to have different enable attribute values depending on the >>> current pass. >>> >>> I don't understand why "x,m" alternative is better to "x,r" and "x,r" >>> should be disabled. Even if the path from general regs to sse regs is slow >>> (usually such slow path is implemented internally by micro-architecture >>> through cache). "x,r" alternative results in only smaller insns (including >>> number of insns) with probably the same time for the movement. So "x,r" >>> should be at least no slower, insn cache should have more locality, and less >>> overhead for decoding/translating insns. >>> >>> Here I propose another solution avoiding to have different enable >>> attribute values. >>> >>> The patch was successfully bootstrapped on x86/x86-64 and tested with and >>> without -march=amdfam10 (actually the patch results in 2 less failures when >>> -march=amdfam10 were used). >>> >>> Uros, is i386.md change ok for the trunk? >> I don't think so. This would be a regression, since 4.8 (and later >> versions until Richard's patch) were able to handle this functionality >> just fine. Please also note that there are a couple of other patterns >> with the same problem, that is using ("nonimmediate_operand" "m") >> constraint. >> >> Please see PR 60704, comment 7 [1]. If LRA is able to fixup >> ("nonimmediate_operand" "m") to a memory from a register, then other >> RTL infrastructure should also be updated for this functionality. IMO, >> recog should be fixed/enhanced, so pseudo registers will also satisfy >> ("nonimmediate_operand" "m") constraint before LRA. >> >> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60704#c7 >> >> > Uros, my patch does not result in PR60704 (I tested it before submitting > the patch). No, we didn't understand each other. The fix for PR60704 (enablement of register alternative before LRA) caused PR61360. As I argued in the referred comment of PR61360, the original fix was wrong, and should be reverted. So, the condition should simply read: > (set (attr "enabled") > (cond [(eq_attr "alternative" "0") > (symbol_ref "TARGET_MIX_SSE_I387 > && X87_ENABLE_FLOAT (<MODEF:MODE>mode, > <SWI48:MODE>mode)") > (eq_attr "alternative" "1") > (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS > || optimize_function_for_size_p (cfun)") > ] > (symbol_ref "true"))) > > Also I don't understand why you are mentioning only "m" alternative as I > enabled another alternative "x,r" in original pattern (alternative "1" > in other words alternative in the middle). This part of the comment refers to: (define_insn "*float<SWI48x:mode><MODEF:mode>2_i387" (as mention in the #c7 comment of PR60704). > You are right constrain_operands is not upto LRA possibilities and we should make the following change: > > Index: recog.c > =================================================================== > --- recog.c (revision 215337) > +++ recog.c (working copy) > @@ -2639,7 +2639,10 @@ constrain_operands (int strict) > || (strict < 0 && CONSTANT_P (op)) > /* During reload, accept a pseudo */ > || (reload_in_progress && REG_P (op) > - && REGNO (op) >= FIRST_PSEUDO_REGISTER))) > + && REGNO (op) >= FIRST_PSEUDO_REGISTER) > + /* LRA can put reg value into memory if > + it is necessary. */ > + || (strict <= 0 && targetm.lra_p () && REG_P (op))) > win = 1; > else if (insn_extra_address_constraint (cn) > /* Every address operand can be reloaded to fit. */ > > But that is a different story (for insns with single alternative containing only "m"). > > I guess I should submit such change for recog.c as a separate patch. I think that the above is the right approach to fix PR60704, so the current PR60704 fix [1] should be reverted. [1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989 Uros.
On 09/23/2014 11:02 AM, Uros Bizjak wrote: > On Tue, Sep 23, 2014 at 4:52 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: >> On 09/23/2014 02:07 AM, Uros Bizjak wrote: >>> >> Uros, my patch does not result in PR60704 (I tested it before submitting >> the patch). > No, we didn't understand each other. The fix for PR60704 (enablement > of register alternative before LRA) caused PR61360. As I argued in the > referred comment of PR61360, the original fix was wrong, and should be > reverted. So, the condition should simply read: > >> (set (attr "enabled") >> (cond [(eq_attr "alternative" "0") >> (symbol_ref "TARGET_MIX_SSE_I387 >> && X87_ENABLE_FLOAT (<MODEF:MODE>mode, >> <SWI48:MODE>mode)") >> (eq_attr "alternative" "1") >> (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS >> || optimize_function_for_size_p (cfun)") >> ] >> (symbol_ref "true"))) >> >> Also I don't understand why you are mentioning only "m" alternative as I >> enabled another alternative "x,r" in original pattern (alternative "1" >> in other words alternative in the middle). > This part of the comment refers to: > > (define_insn "*float<SWI48x:mode><MODEF:mode>2_i387" > > (as mention in the #c7 comment of PR60704). Ok, I see. >> You are right constrain_operands is not upto LRA possibilities and we should make the following change: >> >> Index: recog.c >> =================================================================== >> --- recog.c (revision 215337) >> +++ recog.c (working copy) >> @@ -2639,7 +2639,10 @@ constrain_operands (int strict) >> || (strict < 0 && CONSTANT_P (op)) >> /* During reload, accept a pseudo */ >> || (reload_in_progress && REG_P (op) >> - && REGNO (op) >= FIRST_PSEUDO_REGISTER))) >> + && REGNO (op) >= FIRST_PSEUDO_REGISTER) >> + /* LRA can put reg value into memory if >> + it is necessary. */ >> + || (strict <= 0 && targetm.lra_p () && REG_P (op))) >> win = 1; >> else if (insn_extra_address_constraint (cn) >> /* Every address operand can be reloaded to fit. */ >> >> But that is a different story (for insns with single alternative containing only "m"). >> >> I guess I should submit such change for recog.c as a separate patch. > I think that the above is the right approach to fix PR60704, so the > current PR60704 fix [1] should be reverted. > > [1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989 > > Ok. I can submit patch reverting it + the change in recog.c. I have still a question: do we really need (eq_attr "alternative" "1") (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)") As I wrote I'd always enable the alternative. I don't expect performance improvement in disabling this alternative when path r->x is slow (as I heard it is implemented internally by moving through cache anyway). Even it is slow I believe it is still not faster than r->m->x. What do you think?
On Tue, Sep 23, 2014 at 5:02 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Tue, Sep 23, 2014 at 3:26 AM, Vladimir Makarov <vmakarov@redhat.com> wrote: >>>> The previous patch to solve PR61360 fixed the problem in IRA (it was >>>> easier for me to do as I know the code well) >>>> >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 >>>> >>>> Although imo it was an ok fix, Richard expressed concerns with the patch >>>> and the practice to have different enable attribute values depending on the >>>> current pass. >>>> >>>> I don't understand why "x,m" alternative is better to "x,r" and "x,r" >>>> should be disabled. Even if the path from general regs to sse regs is slow >>>> (usually such slow path is implemented internally by micro-architecture >>>> through cache). "x,r" alternative results in only smaller insns (including >>>> number of insns) with probably the same time for the movement. So "x,r" >>>> should be at least no slower, insn cache should have more locality, and less >>>> overhead for decoding/translating insns. >>>> >>>> Here I propose another solution avoiding to have different enable >>>> attribute values. >>>> >>>> The patch was successfully bootstrapped on x86/x86-64 and tested with and >>>> without -march=amdfam10 (actually the patch results in 2 less failures when >>>> -march=amdfam10 were used). >>>> >>>> Uros, is i386.md change ok for the trunk? >>> I don't think so. This would be a regression, since 4.8 (and later >>> versions until Richard's patch) were able to handle this functionality >>> just fine. Please also note that there are a couple of other patterns >>> with the same problem, that is using ("nonimmediate_operand" "m") >>> constraint. >>> >>> Please see PR 60704, comment 7 [1]. If LRA is able to fixup >>> ("nonimmediate_operand" "m") to a memory from a register, then other >>> RTL infrastructure should also be updated for this functionality. IMO, >>> recog should be fixed/enhanced, so pseudo registers will also satisfy >>> ("nonimmediate_operand" "m") constraint before LRA. >>> >>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60704#c7 >>> >>> >> Uros, my patch does not result in PR60704 (I tested it before submitting >> the patch). > > No, we didn't understand each other. The fix for PR60704 (enablement > of register alternative before LRA) caused PR61360. As I argued in the > referred comment of PR61360, the original fix was wrong, and should be > reverted. So, the condition should simply read: > >> (set (attr "enabled") >> (cond [(eq_attr "alternative" "0") >> (symbol_ref "TARGET_MIX_SSE_I387 >> && X87_ENABLE_FLOAT (<MODEF:MODE>mode, >> <SWI48:MODE>mode)") >> (eq_attr "alternative" "1") >> (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS >> || optimize_function_for_size_p (cfun)") >> ] >> (symbol_ref "true"))) >> >> Also I don't understand why you are mentioning only "m" alternative as I >> enabled another alternative "x,r" in original pattern (alternative "1" >> in other words alternative in the middle). > > This part of the comment refers to: > > (define_insn "*float<SWI48x:mode><MODEF:mode>2_i387" > > (as mention in the #c7 comment of PR60704). > >> You are right constrain_operands is not upto LRA possibilities and we should make the following change: >> >> Index: recog.c >> =================================================================== >> --- recog.c (revision 215337) >> +++ recog.c (working copy) >> @@ -2639,7 +2639,10 @@ constrain_operands (int strict) >> || (strict < 0 && CONSTANT_P (op)) >> /* During reload, accept a pseudo */ >> || (reload_in_progress && REG_P (op) >> - && REGNO (op) >= FIRST_PSEUDO_REGISTER))) >> + && REGNO (op) >= FIRST_PSEUDO_REGISTER) >> + /* LRA can put reg value into memory if >> + it is necessary. */ >> + || (strict <= 0 && targetm.lra_p () && REG_P (op))) >> win = 1; >> else if (insn_extra_address_constraint (cn) >> /* Every address operand can be reloaded to fit. */ >> >> But that is a different story (for insns with single alternative containing only "m"). This is also the situation when the fix for PR60704 is reverted. The "r" alternative in "*float<SWI48:mode><MODEF: mode>2_sse" is unconditionally disabled and all remainting alternatives are "m". Uros.
On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: >>> You are right constrain_operands is not upto LRA possibilities and we should make the following change: >>> >>> Index: recog.c >>> =================================================================== >>> --- recog.c (revision 215337) >>> +++ recog.c (working copy) >>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict) >>> || (strict < 0 && CONSTANT_P (op)) >>> /* During reload, accept a pseudo */ >>> || (reload_in_progress && REG_P (op) >>> - && REGNO (op) >= FIRST_PSEUDO_REGISTER))) >>> + && REGNO (op) >= FIRST_PSEUDO_REGISTER) >>> + /* LRA can put reg value into memory if >>> + it is necessary. */ >>> + || (strict <= 0 && targetm.lra_p () && REG_P (op))) >>> win = 1; >>> else if (insn_extra_address_constraint (cn) >>> /* Every address operand can be reloaded to fit. */ >>> >>> But that is a different story (for insns with single alternative containing only "m"). >>> >>> I guess I should submit such change for recog.c as a separate patch. >> I think that the above is the right approach to fix PR60704, so the >> current PR60704 fix [1] should be reverted. >> >> [1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989 >> >> > Ok. I can submit patch reverting it + the change in recog.c. > > I have still a question: do we really need > > (eq_attr "alternative" "1") > (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS > || optimize_function_for_size_p (cfun)") > > As I wrote I'd always enable the alternative. I don't expect performance improvement in disabling this alternative when path r->x is slow (as I heard it is implemented internally by moving through cache anyway). Even it is slow I believe it is still not faster than r->m->x. What do you think? The "r->x" alternative results in "vector" decoding on amdfam10. This is AMD-speak for microcoded instructions, and AMD optimization manual strongly recommends avoiding them. I have CC'd Ganesh, maybe he can provide more relevant data on the performance impact. Thanks, Uros.
On September 23, 2014 5:33:35 PM CEST, Uros Bizjak <ubizjak@gmail.com> wrote: >On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov <vmakarov@redhat.com> >wrote: > >>>> You are right constrain_operands is not upto LRA possibilities and >we should make the following change: >>>> >>>> Index: recog.c >>>> =================================================================== >>>> --- recog.c (revision 215337) >>>> +++ recog.c (working copy) >>>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict) >>>> || (strict < 0 && CONSTANT_P (op)) >>>> /* During reload, accept a pseudo >*/ >>>> || (reload_in_progress && REG_P (op) >>>> - && REGNO (op) >= >FIRST_PSEUDO_REGISTER))) >>>> + && REGNO (op) >= >FIRST_PSEUDO_REGISTER) >>>> + /* LRA can put reg value into memory >if >>>> + it is necessary. */ >>>> + || (strict <= 0 && targetm.lra_p () >&& REG_P (op))) >>>> win = 1; >>>> else if (insn_extra_address_constraint (cn) >>>> /* Every address operand can be reloaded >to fit. */ >>>> >>>> But that is a different story (for insns with single alternative >containing only "m"). >>>> >>>> I guess I should submit such change for recog.c as a separate >patch. >>> I think that the above is the right approach to fix PR60704, so the >>> current PR60704 fix [1] should be reverted. >>> >>> [1] >https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989 >>> >>> >> Ok. I can submit patch reverting it + the change in recog.c. >> >> I have still a question: do we really need >> >> (eq_attr "alternative" "1") >> (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS >> || optimize_function_for_size_p (cfun)") >> >> As I wrote I'd always enable the alternative. I don't expect >performance improvement in disabling this alternative when path r->x is >slow (as I heard it is implemented internally by moving through cache >anyway). Even it is slow I believe it is still not faster than >r->m->x. What do you think? > >The "r->x" alternative results in "vector" decoding on amdfam10. This >is AMD-speak for microcoded instructions, and AMD optimization manual >strongly recommends avoiding them. I have CC'd Ganesh, maybe he can >provide more relevant data on the performance impact. IIRC a vector decoded instruction merely limits the frontend which can at most decode and dispatch one such insn at a time. So the performance impact depends on the context. Richard. >Thanks, >Uros.
Index: recog.c =================================================================== --- recog.c (revision 215337) +++ recog.c (working copy) @@ -2639,7 +2639,10 @@ constrain_operands (int strict) || (strict < 0 && CONSTANT_P (op)) /* During reload, accept a pseudo */ || (reload_in_progress && REG_P (op) - && REGNO (op) >= FIRST_PSEUDO_REGISTER))) + && REGNO (op) >= FIRST_PSEUDO_REGISTER) + /* LRA can put reg value into memory if + it is necessary. */ + || (strict <= 0 && targetm.lra_p () && REG_P (op))) win = 1; else if (insn_extra_address_constraint (cn) /* Every address operand can be reloaded to fit. */