Message ID | 4E54329B.5040701@redhat.com |
---|---|
State | New |
Headers | show |
Vladimir Makarov <vmakarov@redhat.com> writes: > Instead of using explicitly necessary number of registers, I used > contains_reg_of_mode which also checks the number of necessary registers > but also it checks that the register class can hold value of given > mode. This resulted in different register pressure classes (before the > patch, they were GENERAL_REGS and FLOAT_REGS for x86. They became only > INT_FLOAT_REGS) because it became not costly to hold integer mode value > in FLOAT_REGS. The new register pressure class in own turn resulted in > low register pressure and one region allocation in most cases instead of > multiple region RA. As a consequence, we got a big degradation on Intel > 32 bit targets. Sorry, I know I should be able to work this out, but could you explain in a bit more detail why contains_reg_of_mode (CL1, MODE) was wrong? The loop is calculating costs for moving values of mode MODE into and out of CL1, and I wouldn't have expected those costs to have any meaning if CL1 can't in fact store anything of mode MODE. It just looked at first glance as though: > /* Some subclasses are to small to have enough registers to hold > a value of MODE. Just ignore them. */ > - if (! contains_reg_of_mode[cl1][mode]) > + if (ira_reg_class_max_nregs[cl1][mode] > ira_available_class_regs[cl1]) > continue; > COPY_HARD_REG_SET (temp_hard_regset, reg_class_contents[cl1]); > AND_COMPL_HARD_REG_SET (temp_hard_regset, no_unit_alloc_regs); expects CLASS_MAX_NREGS (CL1, MODE) to have a certain meaning even if CL1 can't store values of mode MODE, whereas I'd assumed it was undefined in that case. Richard
On 08/25/2011 05:57 AM, Richard Sandiford wrote: > Vladimir Makarov<vmakarov@redhat.com> writes: >> Instead of using explicitly necessary number of registers, I used >> contains_reg_of_mode which also checks the number of necessary registers >> but also it checks that the register class can hold value of given >> mode. This resulted in different register pressure classes (before the >> patch, they were GENERAL_REGS and FLOAT_REGS for x86. They became only >> INT_FLOAT_REGS) because it became not costly to hold integer mode value >> in FLOAT_REGS. The new register pressure class in own turn resulted in >> low register pressure and one region allocation in most cases instead of >> multiple region RA. As a consequence, we got a big degradation on Intel >> 32 bit targets. > Sorry, I know I should be able to work this out, but could you explain > in a bit more detail why contains_reg_of_mode (CL1, MODE) was wrong? > The loop is calculating costs for moving values of mode MODE into and > out of CL1, and I wouldn't have expected those costs to have any meaning > if CL1 can't in fact store anything of mode MODE. > Here is x86 example. For an integer mode it excludes FLOAT_REGS from updating max register move cost for two FLOAT_INT_REGS and the integer mode in the loop. At the end of function in another loop where ira_register_move_cost is defined more accurately from ira_max_register_move_cost, it results in smaller ira_register_move_cost for two FLOAT_INT_REGS and integer modes than ira_memory_move_cost for FLOAT_INT_REGS and integer mode. And the last results in one pressure class FLOAT_INT_REGS instead of GENERAL_REGS and FLOAT_REGS as it should be. It is a very complicated area. In previous versions of IRA, I tried to calculate cover classes. It never worked because the right cover classes was too critical for a good code and therefore I introduced a new macro for definition of cover classes. After introducing IRA without cover classes, I decided to calculate pressure classes because their accuracy were not critical (especially for targets with moderate or large size) and because the algorithm worked as I wanted (I checked a lot of targets but not all). If we have more troubles with pressure classes calculation, I think we could reconsider the approach and define pressure classes through a macro/hook. > It just looked at first glance as though: > >> /* Some subclasses are to small to have enough registers to hold >> a value of MODE. Just ignore them. */ >> - if (! contains_reg_of_mode[cl1][mode]) >> + if (ira_reg_class_max_nregs[cl1][mode]> ira_available_class_regs[cl1]) >> continue; >> COPY_HARD_REG_SET (temp_hard_regset, reg_class_contents[cl1]); >> AND_COMPL_HARD_REG_SET (temp_hard_regset, no_unit_alloc_regs); > expects CLASS_MAX_NREGS (CL1, MODE) to have a certain meaning even > if CL1 can't store values of mode MODE, whereas I'd assumed it > was undefined in that case. > It is a bit ambiguous area. The documentation said that it must be not zero even if the register can not hold the value. The usual (and recommended) definition is based on relation of #bit in value of given mode and #bits in register.
Vladimir Makarov <vmakarov@redhat.com> writes: > On 08/25/2011 05:57 AM, Richard Sandiford wrote: >> Vladimir Makarov<vmakarov@redhat.com> writes: >>> Instead of using explicitly necessary number of registers, I used >>> contains_reg_of_mode which also checks the number of necessary registers >>> but also it checks that the register class can hold value of given >>> mode. This resulted in different register pressure classes (before the >>> patch, they were GENERAL_REGS and FLOAT_REGS for x86. They became only >>> INT_FLOAT_REGS) because it became not costly to hold integer mode value >>> in FLOAT_REGS. The new register pressure class in own turn resulted in >>> low register pressure and one region allocation in most cases instead of >>> multiple region RA. As a consequence, we got a big degradation on Intel >>> 32 bit targets. >> Sorry, I know I should be able to work this out, but could you explain >> in a bit more detail why contains_reg_of_mode (CL1, MODE) was wrong? >> The loop is calculating costs for moving values of mode MODE into and >> out of CL1, and I wouldn't have expected those costs to have any meaning >> if CL1 can't in fact store anything of mode MODE. >> > Here is x86 example. For an integer mode it excludes FLOAT_REGS from > updating max register move cost for two FLOAT_INT_REGS and the integer > mode in the loop. At the end of function in another loop where > ira_register_move_cost is defined more accurately from > ira_max_register_move_cost, it results in smaller ira_register_move_cost > for two FLOAT_INT_REGS and integer modes than ira_memory_move_cost for > FLOAT_INT_REGS and integer mode. But isn't that correct though? If FLOAT_REGS can't store integer modes, and if FLOAT_INT_REGS is the union of FLOAT_REGS and INT_REGS, then doesn't it follow that the move cost for FLOAT_INT_REGS should be the same as for INT_REGS? To be clear, I'm not disputing that the pressure class changes were undesirable. I'm just not sure why changing the move costs in this way is the right way to get back to the desired pressure classes. If FLOAT_REGS can't store integer modes, then it doesn't seem like we should be taking the reported move cost for FLOAT_REGS and integer modes into account. I'm just not sure that those costs (or CLASS_MAX_NREGS) are meaningful in this case. Richard
On 08/25/2011 02:07 PM, Richard Sandiford wrote: > Vladimir Makarov<vmakarov@redhat.com> writes: >> On 08/25/2011 05:57 AM, Richard Sandiford wrote: >>> Vladimir Makarov<vmakarov@redhat.com> writes: >>>> Instead of using explicitly necessary number of registers, I used >>>> contains_reg_of_mode which also checks the number of necessary registers >>>> but also it checks that the register class can hold value of given >>>> mode. This resulted in different register pressure classes (before the >>>> patch, they were GENERAL_REGS and FLOAT_REGS for x86. They became only >>>> INT_FLOAT_REGS) because it became not costly to hold integer mode value >>>> in FLOAT_REGS. The new register pressure class in own turn resulted in >>>> low register pressure and one region allocation in most cases instead of >>>> multiple region RA. As a consequence, we got a big degradation on Intel >>>> 32 bit targets. >>> Sorry, I know I should be able to work this out, but could you explain >>> in a bit more detail why contains_reg_of_mode (CL1, MODE) was wrong? >>> The loop is calculating costs for moving values of mode MODE into and >>> out of CL1, and I wouldn't have expected those costs to have any meaning >>> if CL1 can't in fact store anything of mode MODE. >>> >> Here is x86 example. For an integer mode it excludes FLOAT_REGS from >> updating max register move cost for two FLOAT_INT_REGS and the integer >> mode in the loop. At the end of function in another loop where >> ira_register_move_cost is defined more accurately from >> ira_max_register_move_cost, it results in smaller ira_register_move_cost >> for two FLOAT_INT_REGS and integer modes than ira_memory_move_cost for >> FLOAT_INT_REGS and integer mode. > But isn't that correct though? If FLOAT_REGS can't store integer modes, > and if FLOAT_INT_REGS is the union of FLOAT_REGS and INT_REGS, then doesn't > it follow that the move cost for FLOAT_INT_REGS should be the same > as for INT_REGS? > I don't think it should be the same for the current algorithm of register pressure classes calculation, otherwise we will have FLOAT_INT_REGS as a single pressure class. Although the algorithm could be changed to follow your logic. It might be not an easy task. I spent a lot of time to write this code (and even much more time when I tried to write a code for cover classes calculation), so I'd rather avoid to this again. > To be clear, I'm not disputing that the pressure class changes were > undesirable. I'm just not sure why changing the move costs in this > way is the right way to get back to the desired pressure classes. > > If FLOAT_REGS can't store integer modes, then it doesn't seem like we > should be taking the reported move cost for FLOAT_REGS and integer modes > into account. I'm just not sure that those costs (or CLASS_MAX_NREGS) > are meaningful in this case. > There is a certain logic in your speculations. But for some reasons, they are defined. May be reload/old RA are/were based on these not meaningful costs. Now I think I should try to redesign this pressure reg class calculation. Again it will be not an easy task because, unfortunately, target definitions for costs are a real mess. They usually are not correctly defined for all classes and (even really possible) modes. Richard, thanks for the discussion. As usual it was always useful for me.
Index: ira.c =================================================================== --- ira.c (revision 177968) +++ ira.c (working copy) @@ -1503,7 +1503,7 @@ ira_init_register_move_cost (enum machin { /* Some subclasses are to small to have enough registers to hold a value of MODE. Just ignore them. */ - if (! contains_reg_of_mode[cl1][mode]) + if (ira_reg_class_max_nregs[cl1][mode] > ira_available_class_regs[cl1]) continue; COPY_HARD_REG_SET (temp_hard_regset, reg_class_contents[cl1]); AND_COMPL_HARD_REG_SET (temp_hard_regset, no_unit_alloc_regs);