regrename: Skip renaming register pairs [PR115860]

Message ID	20240722064605.2060454-1-stefansf@gcc.gnu.org
State	New
Headers	show Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 56EBC3858C53 From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org> To: gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org> Subject: [PATCH] regrename: Skip renaming register pairs [PR115860] Date: Mon, 22 Jul 2024 08:46:05 +0200 Message-ID: <20240722064605.2060454-1-stefansf@gcc.gnu.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	regrename: Skip renaming register pairs [PR115860] \| expand regrename: Skip renaming register pairs [PR115860]

Stefan Schulze Frielinghaus July 22, 2024, 6:46 a.m. UTC

It is not trivial to decide when a write of a register pair terminates
or starts a new chain.  For example, prior regrename we have

(insn 91 38 36 5 (set (reg:FPRX2 16 %f0 [orig:76 x ] [76])
        (const_double:FPRX2 0.0 [0x0.0p+0])) "float-cast-overflow-7-reduced.c":5:55 discrim 2 1507 {*movfprx2_64}
     (expr_list:REG_EQUAL (const_double:FPRX2 0.0 [0x0.0p+0])
        (nil)))
(insn 36 91 37 5 (set (subreg:DF (reg:FPRX2 16 %f0 [orig:76 x ] [76]) 0)
        (mem/c:DF (plus:DI (reg/f:DI 15 %r15)
                (const_int 160 [0xa0])) [7 %sfp+-32 S8 A64])) "float-cast-overflow-7-reduced.c":5:55 discrim 2 1512 {*movdf_64dfp}
     (nil))
(insn 37 36 43 5 (set (subreg:DF (reg:FPRX2 16 %f0 [orig:76 x ] [76]) 8)
        (mem/c:DF (plus:DI (reg/f:DI 15 %r15)
                (const_int 168 [0xa8])) [7 %sfp+-24 S8 A64])) "float-cast-overflow-7-reduced.c":5:55 discrim 2 1512 {*movdf_64dfp}
     (nil))

where insn 91 writes both registers of a register pair and it is clear
that an existing chain must be terminated and a new started.  Insn 36
and 37 write only into one register of a corresponding register pair.
For each write on its own it is not obvious when to terminate an
existing chain and to start a new one.  In other words, once insn 36
materializes and 37 didn't we are kind of in a limbo state.  Tracking
this correctly is inherently hard and I'm not entirely sure whether
optimizations could even lead to more complicated cases where it is even
less clear when a chain terminates and a new has to be started.
Therefore, skip renaming of register pairs.

Bootstrapped and regtested on x86_64, aarch64, powerpc64le, and s390.
Ok for mainline?

This fixes on s390:
FAIL: g++.dg/cpp23/ext-floating14.C  -std=gnu++23 execution test
FAIL: g++.dg/cpp23/ext-floating14.C  -std=gnu++26 execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O3 -g  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -Os  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O3 -g  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -Os  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O3 -g  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -Os  execution test
FAIL: gfortran.dg/pr96711.f90   -O0  execution test
FAIL: TestSignalForwardingExternal
FAIL: go test misc/cgo/testcarchive
FAIL: libffi.closures/nested_struct5.c -W -Wall -Wno-psabi -O2 output pattern test
FAIL: libphobos.phobos/std/algorithm/mutation.d execution test
FAIL: libphobos.phobos/std/conv.d execution test
FAIL: libphobos.phobos/std/internal/math/errorfunction.d execution test
FAIL: libphobos.phobos/std/variant.d execution test
FAIL: libphobos.phobos_shared/std/algorithm/mutation.d execution test
FAIL: libphobos.phobos_shared/std/conv.d execution test
FAIL: libphobos.phobos_shared/std/internal/math/errorfunction.d execution test
FAIL: libphobos.phobos_shared/std/variant.d execution test

gcc/ChangeLog:

	PR rtl-optimiztion/115860
	* regrename.cc (scan_rtx_reg): Do not try to rename register
	pairs.
---
 gcc/regrename.cc | 4 ++++
 1 file changed, 4 insertions(+)

Andrew Pinski July 23, 2024, 3:16 a.m. UTC | #1

On Sun, Jul 21, 2024 at 11:47 PM Stefan Schulze Frielinghaus
<stefansf@gcc.gnu.org> wrote:
>
> It is not trivial to decide when a write of a register pair terminates
> or starts a new chain.  For example, prior regrename we have
>
> (insn 91 38 36 5 (set (reg:FPRX2 16 %f0 [orig:76 x ] [76])
>         (const_double:FPRX2 0.0 [0x0.0p+0])) "float-cast-overflow-7-reduced.c":5:55 discrim 2 1507 {*movfprx2_64}
>      (expr_list:REG_EQUAL (const_double:FPRX2 0.0 [0x0.0p+0])
>         (nil)))
> (insn 36 91 37 5 (set (subreg:DF (reg:FPRX2 16 %f0 [orig:76 x ] [76]) 0)
>         (mem/c:DF (plus:DI (reg/f:DI 15 %r15)
>                 (const_int 160 [0xa0])) [7 %sfp+-32 S8 A64])) "float-cast-overflow-7-reduced.c":5:55 discrim 2 1512 {*movdf_64dfp}
>      (nil))
> (insn 37 36 43 5 (set (subreg:DF (reg:FPRX2 16 %f0 [orig:76 x ] [76]) 8)
>         (mem/c:DF (plus:DI (reg/f:DI 15 %r15)
>                 (const_int 168 [0xa8])) [7 %sfp+-24 S8 A64])) "float-cast-overflow-7-reduced.c":5:55 discrim 2 1512 {*movdf_64dfp}
>      (nil))
>
> where insn 91 writes both registers of a register pair and it is clear
> that an existing chain must be terminated and a new started.  Insn 36
> and 37 write only into one register of a corresponding register pair.
> For each write on its own it is not obvious when to terminate an
> existing chain and to start a new one.  In other words, once insn 36
> materializes and 37 didn't we are kind of in a limbo state.  Tracking
> this correctly is inherently hard and I'm not entirely sure whether
> optimizations could even lead to more complicated cases where it is even
> less clear when a chain terminates and a new has to be started.
> Therefore, skip renaming of register pairs.
>
> Bootstrapped and regtested on x86_64, aarch64, powerpc64le, and s390.
> Ok for mainline?
>
> This fixes on s390:
> FAIL: g++.dg/cpp23/ext-floating14.C  -std=gnu++23 execution test
> FAIL: g++.dg/cpp23/ext-floating14.C  -std=gnu++26 execution test
> FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2  execution test
> FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
> FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O0  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O1  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O3 -g  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -Os  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O0  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O1  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O3 -g  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -Os  execution test
> FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O0  execution test
> FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O1  execution test
> FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2  execution test
> FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
> FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
> FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O3 -g  execution test
> FAIL: gcc.dg/torture/fp-int-convert-timode.c   -Os  execution test
> FAIL: gfortran.dg/pr96711.f90   -O0  execution test
> FAIL: TestSignalForwardingExternal
> FAIL: go test misc/cgo/testcarchive
> FAIL: libffi.closures/nested_struct5.c -W -Wall -Wno-psabi -O2 output pattern test
> FAIL: libphobos.phobos/std/algorithm/mutation.d execution test
> FAIL: libphobos.phobos/std/conv.d execution test
> FAIL: libphobos.phobos/std/internal/math/errorfunction.d execution test
> FAIL: libphobos.phobos/std/variant.d execution test
> FAIL: libphobos.phobos_shared/std/algorithm/mutation.d execution test
> FAIL: libphobos.phobos_shared/std/conv.d execution test
> FAIL: libphobos.phobos_shared/std/internal/math/errorfunction.d execution test
> FAIL: libphobos.phobos_shared/std/variant.d execution test
>
> gcc/ChangeLog:
>
>         PR rtl-optimiztion/115860
>         * regrename.cc (scan_rtx_reg): Do not try to rename register
>         pairs.
> ---
>  gcc/regrename.cc | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/regrename.cc b/gcc/regrename.cc
> index 054e601740b..6ae5a2309d0 100644
> --- a/gcc/regrename.cc
> +++ b/gcc/regrename.cc
> @@ -1113,6 +1113,10 @@ scan_rtx_reg (rtx_insn *insn, rtx *loc, enum reg_class cl, enum scan_actions act
>
>           c = create_new_chain (this_regno, this_nregs, loc, insn, cl);
>
> +         /* Give up early in case of register pairs.  */
> +         if (this_nregs != 1)
> +           c->cannot_rename = 1;


I am a bit worried this will make TImode (and DImode for 32bit targets) worse.
And it might make aarch64's vector struct types much worse than they
are currently.
It is interesting how there is a subreg of a hardregister after reload
showing up here. Is that on purpose?
They come from:
```
(define_insn "*tf_to_fprx2_0"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
        (subreg:DF (match_operand:TF    1 "general_operand"       "v") 0))]
...
(define_insn "*tf_to_fprx2_1"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
        (subreg:DF (match_operand:TF    1 "general_operand"       "v") 8))]

```

I am not sure if that is a valid thing to do. s390 backend is the only
one that has insn patterns like this. all that uses "+" use either
strict_lowpart of zero_extract for the lhs or just a pure set.
Maybe there is a better way of representing this. Maybe using unspec here?

Thanks,
Andrew Pinski





> +
>           /* We try to tie chains in a move instruction for
>              a single output.  */
>           if (recog_data.n_operands == 2
> --
> 2.45.2
>

Jeff Law July 23, 2024, 1:28 p.m. UTC | #2

On 7/22/24 9:16 PM, Andrew Pinski wrote:

> It is interesting how there is a subreg of a hardregister after reload
> showing up here. Is that on purpose?
In general subregs of hard regs shouldn't exist after allocation.  There 
are just a few exceptions to that rule.  I don't remember where the code 
is, but there's a pass over all the insns after reloading which should 
have removed them.


> They come from:
> ```
> (define_insn "*tf_to_fprx2_0"
>    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
>          (subreg:DF (match_operand:TF    1 "general_operand"       "v") 0))]
> ...
> (define_insn "*tf_to_fprx2_1"
>    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
>          (subreg:DF (match_operand:TF    1 "general_operand"       "v") 8))]
This kind of stuff may inhibit the elimination of hard register subregs 
since after removing the subreg these patterns probably won't match anymore.

Jeff

Stefan Schulze Frielinghaus July 23, 2024, 3:45 p.m. UTC | #3

On Mon, Jul 22, 2024 at 08:16:16PM -0700, Andrew Pinski wrote:
> On Sun, Jul 21, 2024 at 11:47 PM Stefan Schulze Frielinghaus
> > diff --git a/gcc/regrename.cc b/gcc/regrename.cc
> > index 054e601740b..6ae5a2309d0 100644
> > --- a/gcc/regrename.cc
> > +++ b/gcc/regrename.cc
> > @@ -1113,6 +1113,10 @@ scan_rtx_reg (rtx_insn *insn, rtx *loc, enum reg_class cl, enum scan_actions act
> >
> >           c = create_new_chain (this_regno, this_nregs, loc, insn, cl);
> >
> > +         /* Give up early in case of register pairs.  */
> > +         if (this_nregs != 1)
> > +           c->cannot_rename = 1;
> 
> 
> I am a bit worried this will make TImode (and DImode for 32bit targets) worse.
> And it might make aarch64's vector struct types much worse than they
> are currently.
> It is interesting how there is a subreg of a hardregister after reload
> showing up here. Is that on purpose?

Good catch.  I don't think this was on purpose.  When looking at the
dump I rather thought this is valid RTL and didn't question it since
subregs for register pairs got "expanded" during final.

> They come from:
> ```
> (define_insn "*tf_to_fprx2_0"
>   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
>         (subreg:DF (match_operand:TF    1 "general_operand"       "v") 0))]
> ...
> (define_insn "*tf_to_fprx2_1"
>   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
>         (subreg:DF (match_operand:TF    1 "general_operand"       "v") 8))]
> 
> ```
> 
> I am not sure if that is a valid thing to do. s390 backend is the only
> one that has insn patterns like this. all that uses "+" use either
> strict_lowpart of zero_extract for the lhs or just a pure set.
> Maybe there is a better way of representing this. Maybe using unspec here?

I gave unspec a try and came up with

(define_insn "*tf_to_fprx2_0"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
        (unspec:DF [(match_operand:TF    1 "general_operand"       "v")] UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
  "vpdi\t%v0,%v1,%v0,1"
  [(set_attr "op_type" "VRR")])

(define_insn "*tf_to_fprx2_1"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
        (unspec:DF [(match_operand:TF    1 "general_operand"       "v")] UNSPEC_TF_TO_FPRX2_1))]
  "TARGET_VXE"
  ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
  "vpdi\t%V0,%v1,%V0,5"
  [(set_attr "op_type" "VRR")])

which seems to work.  However, I'm still getting subregs at final:

(insn 3 18 7 (set (reg/v:TF 18 %f4 [orig:62 x ] [62])
        (mem/c:TF (reg:DI 2 %r2 [65]) [1 x+0 S16 A64])) "t.c":3:1 421 {movtf_vr}
     (expr_list:REG_DEAD (reg:DI 2 %r2 [65])
        (nil)))
(insn 7 3 8 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 0)
        (unspec:DF [
                (reg/v:TF 18 %f4 [orig:62 x ] [62])
            ] UNSPEC_TF_TO_FPRX2_0)) "t.c":4:10 569 {*tf_to_fprx2_0}
     (nil))
(insn 8 7 14 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 8)
        (unspec:DF [
                (reg/v:TF 18 %f4 [orig:62 x ] [62])
            ] UNSPEC_TF_TO_FPRX2_1)) "t.c":4:10 570 {*tf_to_fprx2_1}
     (expr_list:REG_DEAD (reg/v:TF 18 %f4 [orig:62 x ] [62])
        (nil)))

Thus, I'm not sure whether this really solves the problem or rather
shifts around it.  I'm still a bit puzzled why the initial RTL is
invalid.  If I understood you correctly Jeff, then we are missing a
pattern which would match once the subregs are eliminated.  Since none
exists the subregs survive and regrename gets confused.  This basically
means that subregs of register pairs must not survive RA and the unspec
solution from above is no real solution.

Since the only purpose of tf_to_fprx2_0 and tf_to_fprx2_1 are to move a
long double from a vector register into a FP register pair one could
also merge both insn into one and emit two instructions in the assembler
template.  This would at least circumvent the subreg issue.

(define_insn "tf_to_fprx2"
  [(set (match_operand:FPRX2             0 "nonimmediate_operand" "=f")
        (unspec:FPRX2 [(match_operand:TF 1 "general_operand"       "v")] UNSPEC_TF_TO_FPRX2))]
  "TARGET_VXE"
  "vpdi\t%v0,%v1,%v0,1;vpdi\t%V0,%v1,%V0,5"
  [(set_attr "length" "12")
   (set_attr "op_type" "VRR")])

I will give this a try tomorrow.

Thanks,
Stefan

Jeff Law July 23, 2024, 5:40 p.m. UTC | #4

On 7/23/24 9:45 AM, Stefan Schulze Frielinghaus wrote:

> 
>> They come from:
>> ```
>> (define_insn "*tf_to_fprx2_0"
>>    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
>>          (subreg:DF (match_operand:TF    1 "general_operand"       "v") 0))]
>> ...
>> (define_insn "*tf_to_fprx2_1"
>>    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
>>          (subreg:DF (match_operand:TF    1 "general_operand"       "v") 8))]
>>
>> ```
>>
>> I am not sure if that is a valid thing to do. s390 backend is the only
>> one that has insn patterns like this. all that uses "+" use either
>> strict_lowpart of zero_extract for the lhs or just a pure set.
>> Maybe there is a better way of representing this. Maybe using unspec here?
> 
> I gave unspec a try and came up with
> 
> (define_insn "*tf_to_fprx2_0"
>    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
>          (unspec:DF [(match_operand:TF    1 "general_operand"       "v")] UNSPEC_TF_TO_FPRX2_0))]
>    "TARGET_VXE"
>    ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
>    "vpdi\t%v0,%v1,%v0,1"
>    [(set_attr "op_type" "VRR")])
> 
> (define_insn "*tf_to_fprx2_1"
>    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
>          (unspec:DF [(match_operand:TF    1 "general_operand"       "v")] UNSPEC_TF_TO_FPRX2_1))]
>    "TARGET_VXE"
>    ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
>    "vpdi\t%V0,%v1,%V0,5"
>    [(set_attr "op_type" "VRR")])
> 
> which seems to work.  However, I'm still getting subregs at final:
> 
> (insn 3 18 7 (set (reg/v:TF 18 %f4 [orig:62 x ] [62])
>          (mem/c:TF (reg:DI 2 %r2 [65]) [1 x+0 S16 A64])) "t.c":3:1 421 {movtf_vr}
>       (expr_list:REG_DEAD (reg:DI 2 %r2 [65])
>          (nil)))
> (insn 7 3 8 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 0)
>          (unspec:DF [
>                  (reg/v:TF 18 %f4 [orig:62 x ] [62])
>              ] UNSPEC_TF_TO_FPRX2_0)) "t.c":4:10 569 {*tf_to_fprx2_0}
>       (nil))
> (insn 8 7 14 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 8)
>          (unspec:DF [
>                  (reg/v:TF 18 %f4 [orig:62 x ] [62])
>              ] UNSPEC_TF_TO_FPRX2_1)) "t.c":4:10 570 {*tf_to_fprx2_1}
>       (expr_list:REG_DEAD (reg/v:TF 18 %f4 [orig:62 x ] [62])
>          (nil)))
> 
> Thus, I'm not sure whether this really solves the problem or rather
> shifts around it.  I'm still a bit puzzled why the initial RTL is
> invalid.  If I understood you correctly Jeff, then we are missing a
> pattern which would match once the subregs are eliminated.  Since none
> exists the subregs survive and regrename gets confused.  This basically
> means that subregs of register pairs must not survive RA and the unspec
> solution from above is no real solution.
I'd tend to agree.  The routine in question is cleanup_subreg_operands 
and from a quick looksie it's not going to work for the insn in question 
because cleanup_subreg_operands actually looks down into the recog data 
structures for each operand.  In the case above the subreg is explicit 
in the RTL rather than matched by the operand predicate.

Jeff

Stefan Schulze Frielinghaus July 24, 2024, 7:14 a.m. UTC | #5

On Tue, Jul 23, 2024 at 11:40:00AM -0600, Jeff Law wrote:
> 
> 
> On 7/23/24 9:45 AM, Stefan Schulze Frielinghaus wrote:
> 
> > 
> > > They come from:
> > > ```
> > > (define_insn "*tf_to_fprx2_0"
> > >    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
> > >          (subreg:DF (match_operand:TF    1 "general_operand"       "v") 0))]
> > > ...
> > > (define_insn "*tf_to_fprx2_1"
> > >    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
> > >          (subreg:DF (match_operand:TF    1 "general_operand"       "v") 8))]
> > > 
> > > ```
> > > 
> > > I am not sure if that is a valid thing to do. s390 backend is the only
> > > one that has insn patterns like this. all that uses "+" use either
> > > strict_lowpart of zero_extract for the lhs or just a pure set.
> > > Maybe there is a better way of representing this. Maybe using unspec here?
> > 
> > I gave unspec a try and came up with
> > 
> > (define_insn "*tf_to_fprx2_0"
> >    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
> >          (unspec:DF [(match_operand:TF    1 "general_operand"       "v")] UNSPEC_TF_TO_FPRX2_0))]
> >    "TARGET_VXE"
> >    ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
> >    "vpdi\t%v0,%v1,%v0,1"
> >    [(set_attr "op_type" "VRR")])
> > 
> > (define_insn "*tf_to_fprx2_1"
> >    [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
> >          (unspec:DF [(match_operand:TF    1 "general_operand"       "v")] UNSPEC_TF_TO_FPRX2_1))]
> >    "TARGET_VXE"
> >    ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
> >    "vpdi\t%V0,%v1,%V0,5"
> >    [(set_attr "op_type" "VRR")])
> > 
> > which seems to work.  However, I'm still getting subregs at final:
> > 
> > (insn 3 18 7 (set (reg/v:TF 18 %f4 [orig:62 x ] [62])
> >          (mem/c:TF (reg:DI 2 %r2 [65]) [1 x+0 S16 A64])) "t.c":3:1 421 {movtf_vr}
> >       (expr_list:REG_DEAD (reg:DI 2 %r2 [65])
> >          (nil)))
> > (insn 7 3 8 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 0)
> >          (unspec:DF [
> >                  (reg/v:TF 18 %f4 [orig:62 x ] [62])
> >              ] UNSPEC_TF_TO_FPRX2_0)) "t.c":4:10 569 {*tf_to_fprx2_0}
> >       (nil))
> > (insn 8 7 14 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 8)
> >          (unspec:DF [
> >                  (reg/v:TF 18 %f4 [orig:62 x ] [62])
> >              ] UNSPEC_TF_TO_FPRX2_1)) "t.c":4:10 570 {*tf_to_fprx2_1}
> >       (expr_list:REG_DEAD (reg/v:TF 18 %f4 [orig:62 x ] [62])
> >          (nil)))
> > 
> > Thus, I'm not sure whether this really solves the problem or rather
> > shifts around it.  I'm still a bit puzzled why the initial RTL is
> > invalid.  If I understood you correctly Jeff, then we are missing a
> > pattern which would match once the subregs are eliminated.  Since none
> > exists the subregs survive and regrename gets confused.  This basically
> > means that subregs of register pairs must not survive RA and the unspec
> > solution from above is no real solution.
> I'd tend to agree.  The routine in question is cleanup_subreg_operands and
> from a quick looksie it's not going to work for the insn in question because
> cleanup_subreg_operands actually looks down into the recog data structures
> for each operand.  In the case above the subreg is explicit in the RTL
> rather than matched by the operand predicate.

Right, I did some further tests over night where I also added patterns
in order to match variants where the subregs are eliminated and that
seems to work.  I still haven't made up my mind which route would be
best.  Anyhow, it is clear that this patch should be dropped and I will
come up with a solution for the target.

Thank you Andrew and Jeff for pointing this out.  Some myths about
subregs have been revealed for me :)

Cheers,
Stefan

regrename: Skip renaming register pairs [PR115860]

Commit Message

Comments

Patch