diff mbox

[IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

Message ID CA+4CFy73kNt8naM0Ds_8q7cpCn77aTtVRgCHUBiSFacJMQ+L7g@mail.gmail.com
State New
Headers show

Commit Message

Wei Mi Sept. 24, 2013, 11:57 p.m. UTC
Hi,

This patch is to address the problem described here:
http://gcc.gnu.org/ml/gcc/2013-09/msg00187.html

The patch changes ALLOCNO_MODE of a pseudo reg to be outermode if the
pseudo reg is used in a paradoxical subreg, so IRA will not mistakenly
assign an operand with a bigger mode to a smaller hardreg which
couldn't find a pair register.

No test is added because I cannot create a small testcase to reproduce
the problem on trunk, the difficulty of which was described in the
above post.

bootstrap and regression pass. ok for trunk?

Thanks,
Wei Mi.

2013-09-24  Wei Mi  <wmi@google.com>

        * ira-build.c (create_insn_allocnos): Fix ALLOCNO_MODE
        in the case of paradoxical subreg.

Comments

Vladimir Makarov Sept. 25, 2013, 3:44 p.m. UTC | #1
On 09/24/2013 07:57 PM, Wei Mi wrote:
> Hi,
>
> This patch is to address the problem described here:
> http://gcc.gnu.org/ml/gcc/2013-09/msg00187.html
>
> The patch changes ALLOCNO_MODE of a pseudo reg to be outermode if the
> pseudo reg is used in a paradoxical subreg, so IRA will not mistakenly
> assign an operand with a bigger mode to a smaller hardreg which
> couldn't find a pair register.
>
> No test is added because I cannot create a small testcase to reproduce
> the problem on trunk, the difficulty of which was described in the
> above post.
>
> bootstrap and regression pass. ok for trunk?
>
> Thanks,
> Wei Mi.
>
> 2013-09-24  Wei Mi  <wmi@google.com>
>
>         * ira-build.c (create_insn_allocnos): Fix ALLOCNO_MODE
>         in the case of paradoxical subreg.
>
> Index: ira-build.c
> ===================================================================
> --- ira-build.c (revision 201963)
> +++ ira-build.c (working copy)
> @@ -1688,6 +1688,30 @@ create_insn_allocnos (rtx x, bool output
>         }
>        return;
>      }
> +  else if (code == SUBREG)
> +    {
> +      int regno;
> +      rtx subreg_reg = SUBREG_REG (x);
> +      enum machine_mode outermode, innermode;
> +
> +      create_insn_allocnos (subreg_reg, output_p);
> +      /* For paradoxical subreg, set allocno's mode to be
> +        the outermode.  */
> +      outermode = GET_MODE (x);
> +      innermode = GET_MODE (subreg_reg);
> +      if (REG_P (subreg_reg)
> +         && (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode)))
> +       {
> +         regno = REGNO (subreg_reg);
> +         if (regno >= FIRST_PSEUDO_REGISTER)
> +           {
> +             ira_allocno_t a = ira_curr_regno_allocno_map[regno];
> +             ira_assert (a != NULL);
> +             ALLOCNO_MODE (a) = outermode;
> +           }
> +       }
> +      return;
> +    }
>    else if (code == SET)
>      {
>        create_insn_allocnos (SET_DEST (x), true);
Wei Mi, sorry for misleading you.  I thought more about the problem.  It
is not a trivial one.

The patch is ok for the code correctness but it might hurt code
performance. For example, we have code

... (reg:DI) ...
...
... (subreg:TI (reg:DI))
...
...(reg:DI)

We need two hard regs only for the second place by transforming

p = (reg:DI)

...(subreg:TI p)

With this patch we requires two hard regs for the all live range of the
original pseudo (which can be quite long).  It might considerably worsen
code performance.

So the problem could be fixed in LRA which can make this transformation
or even in IRA (that would be even better as we put pseudo P into the
global picture vs. probably spilling some pseudos for P in LRA).

I need some time to think what is better (e.g. I don't know how to
implement it in IRA without big compilation slow down).  In any case,
the solution for the problem will be not that easy as in the patch.

Sorry again.
Wei Mi Sept. 25, 2013, 4:42 p.m. UTC | #2
> performance. For example, we have code
>
> ... (reg:DI) ...
> ...
> ... (subreg:TI (reg:DI))
> ...
> ...(reg:DI)
>
> We need two hard regs only for the second place by transforming
>
> p = (reg:DI)
>
> ...(subreg:TI p)
>
> With this patch we requires two hard regs for the all live range of the
> original pseudo (which can be quite long).  It might considerably worsen
> code performance.
>
> So the problem could be fixed in LRA which can make this transformation
> or even in IRA (that would be even better as we put pseudo P into the
> global picture vs. probably spilling some pseudos for P in LRA).
>

Thanks for your detailed explanation. Now I understand what you concern here.

> I need some time to think what is better (e.g. I don't know how to
> implement it in IRA without big compilation slow down).  In any case,
> the solution for the problem will be not that easy as in the patch.

To fix it in IRA, it looks like we want a live range splitting pass
for pseudos used in paradoxical subreg here. Is the potential
compilation slow down you mention here caused by more allocnos
introduced by the live range splitting, or something else?

Thanks,
Wei Mi.
Vladimir Makarov Sept. 25, 2013, 5:19 p.m. UTC | #3
On 09/25/2013 12:42 PM, Wei Mi wrote:
>> performance. For example, we have code
>>
>> ... (reg:DI) ...
>> ...
>> ... (subreg:TI (reg:DI))
>> ...
>> ...(reg:DI)
>>
>> We need two hard regs only for the second place by transforming
>>
>> p = (reg:DI)
>>
>> ...(subreg:TI p)
>>
>> With this patch we requires two hard regs for the all live range of the
>> original pseudo (which can be quite long).  It might considerably worsen
>> code performance.
>>
>> So the problem could be fixed in LRA which can make this transformation
>> or even in IRA (that would be even better as we put pseudo P into the
>> global picture vs. probably spilling some pseudos for P in LRA).
>>
> Thanks for your detailed explanation. Now I understand what you concern here.
>
>> I need some time to think what is better (e.g. I don't know how to
>> implement it in IRA without big compilation slow down).  In any case,
>> the solution for the problem will be not that easy as in the patch.
> To fix it in IRA, it looks like we want a live range splitting pass
> for pseudos used in paradoxical subreg here. Is the potential
> compilation slow down you mention here caused by more allocnos
> introduced by the live range splitting, or something else?
>
>
 To define for what occurrence of the pseudo we should do the
transformation, we need to create allocnos and calculate reg classes to
know what paradoxical subreg needs more hard regs (the transformations
can not be done for all paradoxical subregs as my experience shows many
RTL changes result in worse RA even if we have heuristics to remove the
generated changes as in this case would be trying to assign the same
hard reg for the original and the new pseudo).
  After doing the transformations, we need to recalculate reg classes
and rebuild allocnos (both are expensive).  To speed up the process it
could be implemented as some kind of update of already existing data but
it will complicate code much.

So right now I think implementing this in LRA would be easier  Still LRA
has a pretty good register (re-)allocation (although it is worse than in
IRA).
Wei Mi Sept. 25, 2013, 6 p.m. UTC | #4
>  To define for what occurrence of the pseudo we should do the
> transformation, we need to create allocnos and calculate reg classes to
> know what paradoxical subreg needs more hard regs (the transformations
> can not be done for all paradoxical subregs as my experience shows many
> RTL changes result in worse RA even if we have heuristics to remove the
> generated changes as in this case would be trying to assign the same
> hard reg for the original and the new pseudo).
>   After doing the transformations, we need to recalculate reg classes
> and rebuild allocnos (both are expensive).  To speed up the process it
> could be implemented as some kind of update of already existing data but
> it will complicate code much.
>

I see, thanks!

> So right now I think implementing this in LRA would be easier  Still LRA
> has a pretty good register (re-)allocation (although it is worse than in
> IRA).
>

When you get an idea how to fix it in LRA, if you are still busy, I
would be happy to do the implementation if you could brief your idea.

Thanks,
Wei Mi.
Vladimir Makarov Sept. 25, 2013, 7:30 p.m. UTC | #5
On 09/25/2013 02:00 PM, Wei Mi wrote:
>>  To define for what occurrence of the pseudo we should do the
>> transformation, we need to create allocnos and calculate reg classes to
>> know what paradoxical subreg needs more hard regs (the transformations
>> can not be done for all paradoxical subregs as my experience shows many
>> RTL changes result in worse RA even if we have heuristics to remove the
>> generated changes as in this case would be trying to assign the same
>> hard reg for the original and the new pseudo).
>>   After doing the transformations, we need to recalculate reg classes
>> and rebuild allocnos (both are expensive).  To speed up the process it
>> could be implemented as some kind of update of already existing data but
>> it will complicate code much.
>>
> I see, thanks!
>
>> So right now I think implementing this in LRA would be easier  Still LRA
>> has a pretty good register (re-)allocation (although it is worse than in
>> IRA).
>>
> When you get an idea how to fix it in LRA, if you are still busy, I
> would be happy to do the implementation if you could brief your idea.
>

Probably the best place to add a code for this is in
lra-constraints.c::simplify_operand_subreg by permitting subreg reload
for paradoxical subregs whose hard regs are not fully in allocno class
of the inner pseudo.

It needs a good testing (i'd check that the generated code is not
changed on variety benchmarks to see that the change has no impact on
the most programs performance) and you need to add a good comment
describing why this change is needed.
diff mbox

Patch

Index: ira-build.c
===================================================================
--- ira-build.c (revision 201963)
+++ ira-build.c (working copy)
@@ -1688,6 +1688,30 @@  create_insn_allocnos (rtx x, bool output
        }
       return;
     }
+  else if (code == SUBREG)
+    {
+      int regno;
+      rtx subreg_reg = SUBREG_REG (x);
+      enum machine_mode outermode, innermode;
+
+      create_insn_allocnos (subreg_reg, output_p);
+      /* For paradoxical subreg, set allocno's mode to be
+        the outermode.  */
+      outermode = GET_MODE (x);
+      innermode = GET_MODE (subreg_reg);
+      if (REG_P (subreg_reg)
+         && (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode)))
+       {
+         regno = REGNO (subreg_reg);
+         if (regno >= FIRST_PSEUDO_REGISTER)
+           {
+             ira_allocno_t a = ira_curr_regno_allocno_map[regno];
+             ira_assert (a != NULL);
+             ALLOCNO_MODE (a) = outermode;
+           }
+       }
+      return;
+    }
   else if (code == SET)
     {
       create_insn_allocnos (SET_DEST (x), true);