RFA: enable LRA for rs6000 [patch for WRF]

Message ID	5175D919.1070704@redhat.com
State	New
Headers	show Return-Path: <gcc-patches-return-340314-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; q=dns; s=default; b=Ix9v/STyK3dVhieWi sm9lZyR+xGm1yfXD4/adAu+FpT+oOp/4avI4HMD+rZ+sRmWmFI0+EDtWzGBDzw1q hsBqiG4VcQl+iyyTqwRyoEZ6DUQCimnBLmHvJ38R3XirQYR/dhBkGnhxmobp9SHL LzNzUGeBN4QxHcXWFQvFpk6XLo= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Message-ID: <5175D919.1070704@redhat.com> Date: Mon, 22 Apr 2013 20:43:05 -0400 From: Vladimir Makarov <vmakarov@redhat.com> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: Michael Meissner <meissner@linux.vnet.ibm.com>, gcc-patches <gcc-patches@gcc.gnu.org>, David Edelsohn <dje.gcc@gmail.com>, "Bergner, Peter" <bergner@vnet.ibm.com>, aavrunin@redhat.com Subject: Re: RFA: enable LRA for rs6000 [patch for WRF] References: <5166F34C.30901@redhat.com> <20130415224853.GA17643@ibm-tiger.the-meissners.org> <20130416225639.GA16621@ibm-tiger.the-meissners.org> <516EAE5D.4080601@redhat.com> <20130417161042.GA22186@ibm-tiger.the-meissners.org> <51705B25.6020402@redhat.com> <5171B43B.5070400@redhat.com> <20130422043535.GC22536@bubble.grove.modra.org> <51758EF5.8050209@redhat.com> <20130422193149.GA5792@ibm-tiger.the-meissners.org> In-Reply-To: <20130422193149.GA5792@ibm-tiger.the-meissners.org> Content-Type: multipart/mixed; boundary="------------080002010706080304090806"

Vladimir Makarov April 23, 2013, 12:43 a.m. UTC

On 13-04-22 3:31 PM, Michael Meissner wrote:
> On Mon, Apr 22, 2013 at 03:26:45PM -0400, Vladimir Makarov wrote:
>> On 13-04-22 12:35 AM, Alan Modra wrote:
>>> On Fri, Apr 19, 2013 at 05:16:43PM -0400, Vladimir Makarov wrote:
>>>>    I don't understand what this check means and what comments ??? means too.
>>> A lo_sum mem is only valid if you know it won't be offset (or that
>>> offsetting will never cross a 64k+32k boundary).  If the access is
>>> smaller than a word then the load or store can be done in one insn.
>>> No offset required.  If the access is a DFmode *and* you are loading
>>> or storing a floating point reg, then the access is also done in one
>>> insn.  The ??? comment is referring to the fact that you don't know
>>> for sure that the DFmode is in a floating point reg.  It usually is,
>>> but may be in two general purpose regs.  Which then need an offset to
>>> load/store the second reg.
>>>
>> Alan, thanks for the explanation. I'll search for another solution.
> I'm suspecting secondary_reload needs more tuning for TF/TD modes.
>
I've fixed dealII crash and commited the patch into the branch as rev. 
198169.

The dealII crash itself can be cured by treatment of lo_sum for LRA the 
same way as for reload (please see code checking modes for addressing 
more one word).  But in this case a few tests fail which is cured in LRA 
itself by trying to load address into pseudo using lo_sum.

The patch was successfully bootstrapped (--with-cpu=power7) and tested 
on GCC testsuite.

2013-04-22  Vladimir Makarov <vmakarov@redhat.com>

         * lra-constraints.c (process_address): Try to put lo_sum into
         register.
         * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
         lra_in_progress guard for addressing something bigger than word.

David Edelsohn April 23, 2013, 1:55 a.m. UTC | #1

On Mon, Apr 22, 2013 at 8:43 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:

>         * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
>         lra_in_progress guard for addressing something bigger than word.

That will work, but I'm worried that it is too fragile.  Previously
the code uniformly returned consistent values from the various
legitimate address functions.  Changing the response based on
lra_in_progress for various modes seems like a problem waiting to
happen.

Thanks, David

Vladimir Makarov April 23, 2013, 3:03 p.m. UTC | #2

On 04/22/2013 09:55 PM, David Edelsohn wrote:
> On Mon, Apr 22, 2013 at 8:43 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>>          * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
>>          lra_in_progress guard for addressing something bigger than word.
> That will work, but I'm worried that it is too fragile.  Previously
> the code uniformly returned consistent values from the various
> legitimate address functions.  Changing the response based on
> lra_in_progress for various modes seems like a problem waiting to
> happen.
>
   LRA approach is different from reload one.  First of all LRA can 
create pseudos (not assigned yet to anything) and secondly LRA makes 
changes incrementally opposite to reload which makes all final rewriting 
code when it decides that a final solution is found.  The difference in 
approaches means that LRA can not use all reload macros and hooks 
completely without changes.  Although I tried to minimize the changes, 
they are still present.  For example, LRA will never use hooks which 
call push_reload which is completely oriented to reload pass.

   I'd not say that "the code uniformly returned consistent values". One 
place where lra_in_progress is needed exactly because of discrepancy 
between legitimize reload address hook (which can call push_reload and 
can not be used by LRA) and legitimate address hook.  Two other places 
in rs6000_emit_move for SDmode where lra_in_progress is used are because 
LRA can create and use spilled pseudos instead of using concrete memory 
slot as reload does.  And two the rest places are in calls of 
legitimate_const_pool_address_p where LRA actually querying in strict 
sense.  So I think the changes are minimal.

David, thanks for expressing the concern.  I hope I answered it. The 
future will show the reality not just our speculations.

David Edelsohn April 23, 2013, 3:33 p.m. UTC | #3

On Tue, Apr 23, 2013 at 11:03 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:

>   LRA approach is different from reload one.  First of all LRA can create
> pseudos (not assigned yet to anything) and secondly LRA makes changes
> incrementally opposite to reload which makes all final rewriting code when
> it decides that a final solution is found.  The difference in approaches
> means that LRA can not use all reload macros and hooks completely without
> changes.  Although I tried to minimize the changes, they are still present.
> For example, LRA will never use hooks which call push_reload which is
> completely oriented to reload pass.
>
>   I'd not say that "the code uniformly returned consistent values". One
> place where lra_in_progress is needed exactly because of discrepancy between
> legitimize reload address hook (which can call push_reload and can not be
> used by LRA) and legitimate address hook.  Two other places in
> rs6000_emit_move for SDmode where lra_in_progress is used are because LRA
> can create and use spilled pseudos instead of using concrete memory slot as
> reload does.  And two the rest places are in calls of
> legitimate_const_pool_address_p where LRA actually querying in strict sense.
> So I think the changes are minimal.
>
> David, thanks for expressing the concern.  I hope I answered it. The future
> will show the reality not just our speculations.


Vlad,

I don't think that you understood my concern.  I am not concerned
about adding lra_in_progress.  I am concerned about combining
lra_in_progress with the mode.  I would like an explanation why it is
correct for the result of legitimate address functions to vary with
lra_in_progress combined with the mode.  Not simply that it works.

The issue should be if an offsettable address is valid and the size of
the offset.

Thanks, David

Vladimir Makarov April 23, 2013, 3:37 p.m. UTC | #4

On 04/23/2013 11:33 AM, David Edelsohn wrote:
> On Tue, Apr 23, 2013 at 11:03 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>>    LRA approach is different from reload one.  First of all LRA can create
>> pseudos (not assigned yet to anything) and secondly LRA makes changes
>> incrementally opposite to reload which makes all final rewriting code when
>> it decides that a final solution is found.  The difference in approaches
>> means that LRA can not use all reload macros and hooks completely without
>> changes.  Although I tried to minimize the changes, they are still present.
>> For example, LRA will never use hooks which call push_reload which is
>> completely oriented to reload pass.
>>
>>    I'd not say that "the code uniformly returned consistent values". One
>> place where lra_in_progress is needed exactly because of discrepancy between
>> legitimize reload address hook (which can call push_reload and can not be
>> used by LRA) and legitimate address hook.  Two other places in
>> rs6000_emit_move for SDmode where lra_in_progress is used are because LRA
>> can create and use spilled pseudos instead of using concrete memory slot as
>> reload does.  And two the rest places are in calls of
>> legitimate_const_pool_address_p where LRA actually querying in strict sense.
>> So I think the changes are minimal.
>>
>> David, thanks for expressing the concern.  I hope I answered it. The future
>> will show the reality not just our speculations.
>
> Vlad,
>
> I don't think that you understood my concern.  I am not concerned
> about adding lra_in_progress.  I am concerned about combining
> lra_in_progress with the mode.  I would like an explanation why it is
> correct for the result of legitimate address functions to vary with
> lra_in_progress combined with the mode.  Not simply that it works.
>
> The issue should be if an offsettable address is valid and the size of
> the offset.
>
Sorry, may be I missed some other places but
could you be more specific about the place where combining 
lra_in_progress and mode happens now for legitimate address querying.

I guess I explained in my previous emails why the following change is 
necessary in legitimate_lo_sum_address_p:

       toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
                   && small_toc_ref (x, VOIDmode));

lra_in_progress was removed in my latest change which now looks like the 
original code:

       if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
           && !(/* ??? Assume floating point reg based on mode?  */
                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
                && (mode == DFmode || mode == DDmode)))
         return false;

So I can not see other places.

David Edelsohn April 23, 2013, 3:57 p.m. UTC | #5

On Tue, Apr 23, 2013 at 11:37 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:

> Sorry, may be I missed some other places but
> could you be more specific about the place where combining lra_in_progress
> and mode happens now for legitimate address querying.
>
> I guess I explained in my previous emails why the following change is
> necessary in legitimate_lo_sum_address_p:
>
>       toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
>                   && small_toc_ref (x, VOIDmode));
>
> lra_in_progress was removed in my latest change which now looks like the
> original code:
>
>       if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
>
>           && !(/* ??? Assume floating point reg based on mode?  */
>                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
>                && (mode == DFmode || mode == DDmode)))
>         return false;
>
> So I can not see other places.

Okay. I misunderstood the change. I thought that you were adding
lra_in_progress tied to the mode.

Thanks, David

Michael Meissner April 24, 2013, 7:42 p.m. UTC | #6

I'm seeing a lot of failures with these changes in make check.

The first two that I noticed on a build that did not use --with-cpu=power7:

1) c-c++-common/dfp/call-by-value.c (and others in the directory) fails with -O0
for all targets before power7 because it can't spill SDmode.  Note, in the
earlier targets, we need to have a wider spill slot because we don't have
32-bit integer load/store to the FPR registers.

FAIL: c-c++-common/dfp/call-by-value.c (internal compiler error)
FAIL: c-c++-common/dfp/call-by-value.c (test for excess errors)
Excess errors:
/home/meissner/fsf-src/meissner-lra/gcc/testsuite/c-c++-common/dfp/call-by-value.c:43:1: internal compiler error: in assign_by_spills, at lra-assigns.c:1268
0x104ceff7 assign_by_spills
        /home/meissner/fsf-src/meissner-lra/gcc/lra-assigns.c:1268
0x104cfe43 lra_assign()
        /home/meissner/fsf-src/meissner-lra/gcc/lra-assigns.c:1425
0x104ca837 lra(_IO_FILE*)
        /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2309
0x1047d6eb do_reload
        /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x1047d6eb rest_of_handle_reload
        /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731

2) A lot of fortran tests are failing for all optimization levels due to
segmentation violations at runtime:

AIL: gfortran.dg/advance_1.f90  -O0  execution test
Executing on host: /home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../gfortran -B/home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../ -B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unkno
wn-linux-gnu/./libgfortran/ /home/meissner/fsf-src/meissner-lra/gcc/testsuite/gfortran.dg/advance_1.f90  -fno-diagnostics-show-caret   -O1   -pedantic-errors  -B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libg
fortran/.libs -L/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs -L/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs  -lm   -m32 -o ./advance_1.exe    (t
imeout = 300)
spawn /home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../gfortran -B/home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../ -B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/
./libgfortran/ /home/meissner/fsf-src/meissner-lra/gcc/testsuite/gfortran.dg/advance_1.f90 -fno-diagnostics-show-caret -O1 -pedantic-errors -B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs -L/ho
me/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs -L/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs -lm -m32 -o ./advance_1.exe
PASS: gfortran.dg/advance_1.f90  -O1  (test for excess errors)
Setting LD_LIBRARY_PATH to .:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs:/home/meissner/fsf-build-
ppc64/meissner-lra/gcc:/home/meissner/fsf-build-ppc64/meissner-lra/gcc/32:.:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu
/32/libgfortran/.libs:/home/meissner/fsf-build-ppc64/meissner-lra/gcc:/home/meissner/fsf-build-ppc64/meissner-lra/gcc/32:/home/meissner/tools/ppc64/lib:/home/meissner/tools/ppc32/lib:/home/meissner/tools-binutils/ppc64/lib:/home/meissner/to
ols-binutils/ppc32/lib
spawn [open ...]

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Vladimir Makarov April 25, 2013, 3:29 p.m. UTC | #7

On 04/24/2013 03:42 PM, Michael Meissner wrote:
> I'm seeing a lot of failures with these changes in make check.
>
I've reproduced it, Mike.  I'll work on it.  Thanks.

RFA: enable LRA for rs6000 [patch for WRF]

Commit Message

Comments

Patch