Message ID | CA+4CFy73kNt8naM0Ds_8q7cpCn77aTtVRgCHUBiSFacJMQ+L7g@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 09/24/2013 07:57 PM, Wei Mi wrote: > Hi, > > This patch is to address the problem described here: > http://gcc.gnu.org/ml/gcc/2013-09/msg00187.html > > The patch changes ALLOCNO_MODE of a pseudo reg to be outermode if the > pseudo reg is used in a paradoxical subreg, so IRA will not mistakenly > assign an operand with a bigger mode to a smaller hardreg which > couldn't find a pair register. > > No test is added because I cannot create a small testcase to reproduce > the problem on trunk, the difficulty of which was described in the > above post. > > bootstrap and regression pass. ok for trunk? > > Thanks, > Wei Mi. > > 2013-09-24 Wei Mi <wmi@google.com> > > * ira-build.c (create_insn_allocnos): Fix ALLOCNO_MODE > in the case of paradoxical subreg. > > Index: ira-build.c > =================================================================== > --- ira-build.c (revision 201963) > +++ ira-build.c (working copy) > @@ -1688,6 +1688,30 @@ create_insn_allocnos (rtx x, bool output > } > return; > } > + else if (code == SUBREG) > + { > + int regno; > + rtx subreg_reg = SUBREG_REG (x); > + enum machine_mode outermode, innermode; > + > + create_insn_allocnos (subreg_reg, output_p); > + /* For paradoxical subreg, set allocno's mode to be > + the outermode. */ > + outermode = GET_MODE (x); > + innermode = GET_MODE (subreg_reg); > + if (REG_P (subreg_reg) > + && (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode))) > + { > + regno = REGNO (subreg_reg); > + if (regno >= FIRST_PSEUDO_REGISTER) > + { > + ira_allocno_t a = ira_curr_regno_allocno_map[regno]; > + ira_assert (a != NULL); > + ALLOCNO_MODE (a) = outermode; > + } > + } > + return; > + } > else if (code == SET) > { > create_insn_allocnos (SET_DEST (x), true); Wei Mi, sorry for misleading you. I thought more about the problem. It is not a trivial one. The patch is ok for the code correctness but it might hurt code performance. For example, we have code ... (reg:DI) ... ... ... (subreg:TI (reg:DI)) ... ...(reg:DI) We need two hard regs only for the second place by transforming p = (reg:DI) ...(subreg:TI p) With this patch we requires two hard regs for the all live range of the original pseudo (which can be quite long). It might considerably worsen code performance. So the problem could be fixed in LRA which can make this transformation or even in IRA (that would be even better as we put pseudo P into the global picture vs. probably spilling some pseudos for P in LRA). I need some time to think what is better (e.g. I don't know how to implement it in IRA without big compilation slow down). In any case, the solution for the problem will be not that easy as in the patch. Sorry again.
> performance. For example, we have code > > ... (reg:DI) ... > ... > ... (subreg:TI (reg:DI)) > ... > ...(reg:DI) > > We need two hard regs only for the second place by transforming > > p = (reg:DI) > > ...(subreg:TI p) > > With this patch we requires two hard regs for the all live range of the > original pseudo (which can be quite long). It might considerably worsen > code performance. > > So the problem could be fixed in LRA which can make this transformation > or even in IRA (that would be even better as we put pseudo P into the > global picture vs. probably spilling some pseudos for P in LRA). > Thanks for your detailed explanation. Now I understand what you concern here. > I need some time to think what is better (e.g. I don't know how to > implement it in IRA without big compilation slow down). In any case, > the solution for the problem will be not that easy as in the patch. To fix it in IRA, it looks like we want a live range splitting pass for pseudos used in paradoxical subreg here. Is the potential compilation slow down you mention here caused by more allocnos introduced by the live range splitting, or something else? Thanks, Wei Mi.
On 09/25/2013 12:42 PM, Wei Mi wrote: >> performance. For example, we have code >> >> ... (reg:DI) ... >> ... >> ... (subreg:TI (reg:DI)) >> ... >> ...(reg:DI) >> >> We need two hard regs only for the second place by transforming >> >> p = (reg:DI) >> >> ...(subreg:TI p) >> >> With this patch we requires two hard regs for the all live range of the >> original pseudo (which can be quite long). It might considerably worsen >> code performance. >> >> So the problem could be fixed in LRA which can make this transformation >> or even in IRA (that would be even better as we put pseudo P into the >> global picture vs. probably spilling some pseudos for P in LRA). >> > Thanks for your detailed explanation. Now I understand what you concern here. > >> I need some time to think what is better (e.g. I don't know how to >> implement it in IRA without big compilation slow down). In any case, >> the solution for the problem will be not that easy as in the patch. > To fix it in IRA, it looks like we want a live range splitting pass > for pseudos used in paradoxical subreg here. Is the potential > compilation slow down you mention here caused by more allocnos > introduced by the live range splitting, or something else? > > To define for what occurrence of the pseudo we should do the transformation, we need to create allocnos and calculate reg classes to know what paradoxical subreg needs more hard regs (the transformations can not be done for all paradoxical subregs as my experience shows many RTL changes result in worse RA even if we have heuristics to remove the generated changes as in this case would be trying to assign the same hard reg for the original and the new pseudo). After doing the transformations, we need to recalculate reg classes and rebuild allocnos (both are expensive). To speed up the process it could be implemented as some kind of update of already existing data but it will complicate code much. So right now I think implementing this in LRA would be easier Still LRA has a pretty good register (re-)allocation (although it is worse than in IRA).
> To define for what occurrence of the pseudo we should do the > transformation, we need to create allocnos and calculate reg classes to > know what paradoxical subreg needs more hard regs (the transformations > can not be done for all paradoxical subregs as my experience shows many > RTL changes result in worse RA even if we have heuristics to remove the > generated changes as in this case would be trying to assign the same > hard reg for the original and the new pseudo). > After doing the transformations, we need to recalculate reg classes > and rebuild allocnos (both are expensive). To speed up the process it > could be implemented as some kind of update of already existing data but > it will complicate code much. > I see, thanks! > So right now I think implementing this in LRA would be easier Still LRA > has a pretty good register (re-)allocation (although it is worse than in > IRA). > When you get an idea how to fix it in LRA, if you are still busy, I would be happy to do the implementation if you could brief your idea. Thanks, Wei Mi.
On 09/25/2013 02:00 PM, Wei Mi wrote: >> To define for what occurrence of the pseudo we should do the >> transformation, we need to create allocnos and calculate reg classes to >> know what paradoxical subreg needs more hard regs (the transformations >> can not be done for all paradoxical subregs as my experience shows many >> RTL changes result in worse RA even if we have heuristics to remove the >> generated changes as in this case would be trying to assign the same >> hard reg for the original and the new pseudo). >> After doing the transformations, we need to recalculate reg classes >> and rebuild allocnos (both are expensive). To speed up the process it >> could be implemented as some kind of update of already existing data but >> it will complicate code much. >> > I see, thanks! > >> So right now I think implementing this in LRA would be easier Still LRA >> has a pretty good register (re-)allocation (although it is worse than in >> IRA). >> > When you get an idea how to fix it in LRA, if you are still busy, I > would be happy to do the implementation if you could brief your idea. > Probably the best place to add a code for this is in lra-constraints.c::simplify_operand_subreg by permitting subreg reload for paradoxical subregs whose hard regs are not fully in allocno class of the inner pseudo. It needs a good testing (i'd check that the generated code is not changed on variety benchmarks to see that the change has no impact on the most programs performance) and you need to add a good comment describing why this change is needed.
Index: ira-build.c =================================================================== --- ira-build.c (revision 201963) +++ ira-build.c (working copy) @@ -1688,6 +1688,30 @@ create_insn_allocnos (rtx x, bool output } return; } + else if (code == SUBREG) + { + int regno; + rtx subreg_reg = SUBREG_REG (x); + enum machine_mode outermode, innermode; + + create_insn_allocnos (subreg_reg, output_p); + /* For paradoxical subreg, set allocno's mode to be + the outermode. */ + outermode = GET_MODE (x); + innermode = GET_MODE (subreg_reg); + if (REG_P (subreg_reg) + && (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode))) + { + regno = REGNO (subreg_reg); + if (regno >= FIRST_PSEUDO_REGISTER) + { + ira_allocno_t a = ira_curr_regno_allocno_map[regno]; + ira_assert (a != NULL); + ALLOCNO_MODE (a) = outermode; + } + } + return; + } else if (code == SET) { create_insn_allocnos (SET_DEST (x), true);