diff mbox

RFA: patch to fix PR59466 (inefficient address generation on ppc )

Message ID 52A89786.20605@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov Dec. 11, 2013, 4:49 p.m. UTC
The following patch fixes PR59466

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59466

   LRA on PPC sometimes generates inefficient code

addis 9,2,.LC77@toc@ha
addi 9,9,.LC77@toc@l
ld 9,0(9)

   instead of

addis 9,2,.LC77@toc@ha
ld 9,.LC77@toc@l(9)

   I can not create a small test for this.  The smallest file when I
found is bzip2.c from SPEC2000.

   LRA generates move insn with invalid memory [unspec[`*.LC29',%2:DI] 
45] but it can handle it (make it valid) very efficiently after that
trying different numerous transformations.  PPC target machinary
through validize_mem just put all address in a pseudo.

   I could prevent use validize_mem in rs6000.c but I prefer to do it
in change_addres_1 as other targets might have the same problem and it
is better to have one for all solution.

   Still it does not fully solve the problem as insn
r257:DI=[unspec[`*.LC29',%2:DI] 45] cant be recognized as
*movdi... pattern has operand predicates rejecting memory because of
invalid address.  To fix this a change in general_operand is done.  As
LRA can not work properly with regular insn recognition, I added an
assert for this in lra_set_insn_recog_data to figure out this
situation earlier.

   Again, LRA has a very good code for legitimize address by itself and
it is better to use it.

   After applying patch I see code size reduction on SPEC2000.

Before the patch (this is relative reload generated code):

----------------CFP2000-----------------
-0.471%          27171          27043 168.wupwise
-0.457%          14006          13942 171.swim
-0.392%          24515          24419 172.mgrid
  0.226%          85079          85271 173.applu
  0.751%         728891         734363 177.mesa
  0.194%         214357         214773 178.galgel
-0.295%          21683          21619 179.art
  0.412%          31089          31217 183.equake
-0.520%          79976          79560 187.facerec
  0.000%         152504         152504 188.ammp
  0.000%          43758          43758 189.lucas
-0.181%        1062265        1060337 191.fma3d
  1.035%        1041684        1052468 200.sixtrack
-0.105%         151944         151784 301.apsi
Average = 0.0139775%
----------------CINT2000-----------------
  0.252%          76242          76434 164.gzip
  0.172%         186152         186472 175.vpr
-0.215%        2084612        2080132 176.gcc
  0.000%          16716          16716 181.mcf
  0.085%         225316         225508 186.crafty
-0.015%         210100         210068 197.parser
  0.622%         433635         436332 252.eon
-0.298%         762014         759742 253.perlbmk
=0.000%         904784         904784 254.gap
-0.285%         706432         704416 255.vortex
  0.220%          58297          58425 256.bzip2
  0.314%         265334         266166 300.twolf
Average = 0.070863%

After the patch:
----------------CFP2000-----------------
-0.589%          27171          27011 168.wupwise
-0.457%          14006          13942 171.swim
-0.392%          24515          24419 172.mgrid
-0.113%          85079          84983 173.applu
  0.654%         728891         733659 177.mesa
  0.060%         214357         214485 178.galgel
-0.295%          21683          21619 179.art
-0.412%          31089          30961 183.equake
-0.520%          79976          79560 187.facerec
  0.000%         152504         152504 188.ammp
  0.000%          43758          43758 189.lucas
-0.317%        1062265        1058897 191.fma3d
  0.356%        1041684        1045396 200.sixtrack
-0.105%         151944         151784 301.apsi
Average = -0.152103%
----------------CINT2000-----------------
  0.084%          76242          76306 164.gzip
-0.052%         186152         186056 175.vpr
-0.284%        2084612        2078692 176.gcc
  0.000%          16716          16716 181.mcf
-0.312%         225316         224612 186.crafty
-0.091%         210100         209908 197.parser
  0.622%         433635         436332 252.eon
-0.340%         762014         759422 253.perlbmk
  0.000%         904784         904784 254.gap
-0.335%         706432         704064 255.vortex
  0.110%          58297          58361 256.bzip2
-0.241%         265334         264694 300.twolf
Average = -0.070023%

Code size reduction for PPC in this case means also faster code
generation.  I see it but cannot provide reliable SPEC2000 rate
change.


The patch was successfully bootstrapped and tested on i686, x86_64,
and PPC64.

Ok to commit?


2013-12-11  Vladimir Makarov  <vmakarov@redhat.com>

         PR rtl-optimization/59466
         * emit-rtl.c (change_address_1): Don't validate address for LRA.
         * recog.c (general_operand): Accept any memory for LRA.
         * lra.c (lra_set_insn_recog_data): Add an assert.

Comments

Jeff Law Dec. 16, 2013, 4:44 p.m. UTC | #1
On 12/11/13 09:49, Vladimir Makarov wrote:
>
>    I could prevent use validize_mem in rs6000.c but I prefer to do it
> in change_addres_1 as other targets might have the same problem and it
> is better to have one for all solution.
It's certainly been speculated that a fair amount of the "target 
dependent" stuff done in LEGITIMIZE_ADDRESS and 
RELOAD_LEGITIMIZE_ADDRESS could be factored out, parameterized and done 
in a target independent way.


>
>    Still it does not fully solve the problem as insn
> r257:DI=[unspec[`*.LC29',%2:DI] 45] cant be recognized as
> *movdi... pattern has operand predicates rejecting memory because of
> invalid address.  To fix this a change in general_operand is done.  As
> LRA can not work properly with regular insn recognition, I added an
> assert for this in lra_set_insn_recog_data to figure out this
> situation earlier.
I'm getting a bit concerned about the number of places outside LRA that 
test lra_in_progress.  In theory, other code shouldn't want/need to care 
about LRA vs reload.  Both serve the same purpose, lra just does it 
better :-)

Right now the numbers are pretty comparable, but let's try and avoid 
sprinkling this stuff too far and wide.


I can see how these might be needed from time to time

>
>    Again, LRA has a very good code for legitimize address by itself and
> it is better to use it.
I have no trouble believing that. 
LEGITIMIZE_ADDRESS/LEGITIMIZE_RELOAD_ADDRESS were really hacks to avoid 
generating stupid code without having to do any serious surgery, 
particularly in reload.

> The patch was successfully bootstrapped and tested on i686, x86_64,
> and PPC64.
>
> Ok to commit?
>
>
> 2013-12-11  Vladimir Makarov  <vmakarov@redhat.com>
>
>          PR rtl-optimization/59466
>          * emit-rtl.c (change_address_1): Don't validate address for LRA.
>          * recog.c (general_operand): Accept any memory for LRA.
>          * lra.c (lra_set_insn_recog_data): Add an assert.
>
> Index: emit-rtl.c
> ===================================================================
> --- emit-rtl.c    (revision 205870)
> +++ emit-rtl.c    (working copy)
> @@ -1951,7 +1951,9 @@ change_address_1 (rtx memref, enum machi
>         && (!validate || memory_address_addr_space_p (mode, addr, as)))
>       return memref;
>
> -  if (validate)
> +  /* Don't validate address for LRA.  LRA can make the address valid
> +     by itself in most efficient way.  */
> +  if (validate && !lra_in_progress)
Presumably if LRA can't find a reasonable way to make the address valid, 
it falls back to copying the address into a register?


OK for the trunk.  Thanks,

jeff
Vladimir Makarov Dec. 16, 2013, 5:44 p.m. UTC | #2
On 12/16/2013, 11:44 AM, Jeff Law wrote:
> On 12/11/13 09:49, Vladimir Makarov wrote:
>>
>>    I could prevent use validize_mem in rs6000.c but I prefer to do it
>> in change_addres_1 as other targets might have the same problem and it
>> is better to have one for all solution.
> It's certainly been speculated that a fair amount of the "target
> dependent" stuff done in LEGITIMIZE_ADDRESS and
> RELOAD_LEGITIMIZE_ADDRESS could be factored out, parameterized and done
> in a target independent way.
>
>
>>
>>    Still it does not fully solve the problem as insn
>> r257:DI=[unspec[`*.LC29',%2:DI] 45] cant be recognized as
>> *movdi... pattern has operand predicates rejecting memory because of
>> invalid address.  To fix this a change in general_operand is done.  As
>> LRA can not work properly with regular insn recognition, I added an
>> assert for this in lra_set_insn_recog_data to figure out this
>> situation earlier.
> I'm getting a bit concerned about the number of places outside LRA that
> test lra_in_progress.  In theory, other code shouldn't want/need to care
> about LRA vs reload.  Both serve the same purpose, lra just does it
> better :-)
>
> Right now the numbers are pretty comparable, but let's try and avoid
> sprinkling this stuff too far and wide.
>

   I'd like to keep it under control too.  For me it would be 
uncomfortable to have more lra_in_progress occurrences than 
reload_in_progress ones.  The numbers (with taking this patch into 
account) are very close but lra_in_progress places are still less than 
reload_in_progress.

>
> I can see how these might be needed from time to time
>
>>
>>    Again, LRA has a very good code for legitimize address by itself and
>> it is better to use it.
> I have no trouble believing that.
> LEGITIMIZE_ADDRESS/LEGITIMIZE_RELOAD_ADDRESS were really hacks to avoid
> generating stupid code without having to do any serious surgery,
> particularly in reload.
>
>> The patch was successfully bootstrapped and tested on i686, x86_64,
>> and PPC64.
>>
>> Ok to commit?
>>
>>
>> 2013-12-11  Vladimir Makarov  <vmakarov@redhat.com>
>>
>>          PR rtl-optimization/59466
>>          * emit-rtl.c (change_address_1): Don't validate address for LRA.
>>          * recog.c (general_operand): Accept any memory for LRA.
>>          * lra.c (lra_set_insn_recog_data): Add an assert.
>>
>> Index: emit-rtl.c
>> ===================================================================
>> --- emit-rtl.c    (revision 205870)
>> +++ emit-rtl.c    (working copy)
>> @@ -1951,7 +1951,9 @@ change_address_1 (rtx memref, enum machi
>>         && (!validate || memory_address_addr_space_p (mode, addr, as)))
>>       return memref;
>>
>> -  if (validate)
>> +  /* Don't validate address for LRA.  LRA can make the address valid
>> +     by itself in most efficient way.  */
>> +  if (validate && !lra_in_progress)
> Presumably if LRA can't find a reasonable way to make the address valid,
> it falls back to copying the address into a register?
>
>
Yes, it can always fall back to putting address into a pseudo if it can 
not generate a better code.

Thanks, Jeff.
diff mbox

Patch

Index: emit-rtl.c
===================================================================
--- emit-rtl.c	(revision 205870)
+++ emit-rtl.c	(working copy)
@@ -1951,7 +1951,9 @@  change_address_1 (rtx memref, enum machi
        && (!validate || memory_address_addr_space_p (mode, addr, as)))
      return memref;

-  if (validate)
+  /* Don't validate address for LRA.  LRA can make the address valid
+     by itself in most efficient way.  */
+  if (validate && !lra_in_progress)
      {
        if (reload_in_progress || reload_completed)
  	gcc_assert (memory_address_addr_space_p (mode, addr, as));
Index: lra.c
===================================================================
--- lra.c	(revision 205870)
+++ lra.c	(working copy)
@@ -1072,9 +1072,16 @@  lra_set_insn_recog_data (rtx insn)
        nop = asm_noperands (PATTERN (insn));
        data->operand_loc = data->dup_loc = NULL;
        if (nop < 0)
-	/* Its is a special insn like USE or CLOBBER.  */
-	data->insn_static_data = insn_static_data
-	  = get_static_insn_data (-1, 0, 0, 1);
+	{
+	  /* Its is a special insn like USE or CLOBBER.  We should
+	     recognize any regular insn otherwise LRA can do nothing
+	     with this insn.  */
+	  gcc_assert (GET_CODE (PATTERN (insn)) == USE
+		      || GET_CODE (PATTERN (insn)) == CLOBBER
+		      || GET_CODE (PATTERN (insn)) == ASM_INPUT);
+	  data->insn_static_data = insn_static_data
+	    = get_static_insn_data (-1, 0, 0, 1);
+	}
        else
  	{
  	  /* expand_asm_operands makes sure there aren't too many
Index: recog.c
===================================================================
--- recog.c	(revision 205870)
+++ recog.c	(working copy)
@@ -1,3 +1,4 @@ 
+
  /* Subroutines used by or related to instruction recognition.
     Copyright (C) 1987-2013 Free Software Foundation, Inc.

@@ -1021,8 +1022,12 @@  general_operand (rtx op, enum machine_mo
        if (! volatile_ok && MEM_VOLATILE_P (op))
  	return 0;

-      /* Use the mem's mode, since it will be reloaded thus.  */
-      if (memory_address_addr_space_p (GET_MODE (op), y, MEM_ADDR_SPACE 
(op)))
+      /* Use the mem's mode, since it will be reloaded thus.  LRA can
+	 generate move insn with invalid addresses which is made valid
+	 and efficiently calculated by LRA through further numerous
+	 transformations.  */
+      if (lra_in_progress
+	  || memory_address_addr_space_p (GET_MODE (op), y, MEM_ADDR_SPACE (op)))
  	return 1;
      }