diff mbox series

[PATCH/RFC] LRA: Don't emit move for substituted CONSTATNT_P operand [PR116170]

Message ID 55fe76af-a0c5-1066-7b94-022cfcf46556@linux.ibm.com
State New
Headers show
Series [PATCH/RFC] LRA: Don't emit move for substituted CONSTATNT_P operand [PR116170] | expand

Commit Message

Kewen.Lin Aug. 9, 2024, 9:49 a.m. UTC
Hi,

Commit r15-2084 exposes one ICE in LRA.  Firstly, before
r15-2084 KFmode has 126 bit precision while V1TImode has 128
bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
paradoxical_subreg_p, which stops some passes from doing
some optimization.  After r15-2084, KFmode has the same mode
precision as V1TImode, passes are able to optimize more, but
it causes this ICE in LRA as described below:

For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
which matches pattern

(define_insn "*vsx_le_perm_store_<mode>"
  [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
        (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
  "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
   && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
  "@
   #
   #"
  [(set_attr "type" "vecstore,store")
   (set_attr "length" "12,8")
   (set_attr "isa" "<VSisa>,*")])

LRA makes equivalence substitution on r133 with const double
(const_double:KF 0.0), selects alternative 0 and fixes up
operand 1 for constraint "wa", because operand 1 is OP_INOUT,
so it considers assigning back to it as well, that is:

  lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);

But because old has been changed to const_double in equivalence
substitution, the move is actually assigning to const_double,
which is invalid and cause ICE.

Considering reg:KF 133 is equivalent with (const_double:KF 0.0)
even though this operand is OP_INOUT, IMHO there should not be
any following uses of reg:KF 133, otherwise it doesn't have the
chance to be equivalent to (const_double:KF 0.0).  From this
perspective, I think we can guard the lra_emit_move with
nonimmediate_operand to exclude such case.

Does it sound reasonable?

btw, I also tried with disallowing equivalence substitution with
CONSTANT_P value if the corresponding operand is OP_INOUT or
OP_OUT, it can also fix this issue, but with more thinking it
seems not necessary to stop such substitution if we can handle it
later as above.

Bootstrapped and regtested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.

BR,
Kewen
-----
	PR rtl-optimization/116170

gcc/ChangeLog:

	* lra-constraints.cc (curr_insn_transform): Don't emit move back to
	old operand if it's nonimmediate_operand.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr116170.c: New test.
---
 gcc/lra-constraints.cc                      |  3 ++-
 gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c

--
2.43.5

Comments

Vladimir Makarov Aug. 9, 2024, 5:47 p.m. UTC | #1
On 8/9/24 05:49, Kewen.Lin wrote:
> Hi,
>
> Commit r15-2084 exposes one ICE in LRA.  Firstly, before
> r15-2084 KFmode has 126 bit precision while V1TImode has 128
> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
> paradoxical_subreg_p, which stops some passes from doing
> some optimization.  After r15-2084, KFmode has the same mode
> precision as V1TImode, passes are able to optimize more, but
> it causes this ICE in LRA as described below:
>
> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
> which matches pattern
>
> (define_insn "*vsx_le_perm_store_<mode>"
>    [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
>          (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>     && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>    "@
>     #
>     #"
>    [(set_attr "type" "vecstore,store")
>     (set_attr "length" "12,8")
>     (set_attr "isa" "<VSisa>,*")])
> LRA makes equivalence substitution on r133 with const double
> (const_double:KF 0.0), selects alternative 0 and fixes up
> operand 1 for constraint "wa", because operand 1 is OP_INOUT,
> so it considers assigning back to it as well, that is:
>
>    lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
>
> But because old has been changed to const_double in equivalence
> substitution, the move is actually assigning to const_double,
> which is invalid and cause ICE.
>
> Considering reg:KF 133 is equivalent with (const_double:KF 0.0)
> even though this operand is OP_INOUT, IMHO there should not be
> any following uses of reg:KF 133, otherwise it doesn't have the
> chance to be equivalent to (const_double:KF 0.0).  From this
> perspective, I think we can guard the lra_emit_move with
> nonimmediate_operand to exclude such case.
>
> Does it sound reasonable?
Yes.
> btw, I also tried with disallowing equivalence substitution with
> CONSTANT_P value if the corresponding operand is OP_INOUT or
> OP_OUT, it can also fix this issue, but with more thinking it
> seems not necessary to stop such substitution if we can handle it
> later as above.
>
> Bootstrapped and regtested on x86_64-redhat-linux and
> powerpc64{,le}-linux-gnu.
>
Thank you for the good explanation of the problem.  The patch is ok for 
me.  It would be nice to add a comment before `nonimmediate_operand` 
that `old` can be an equivalent constant and we chose insn alternative 
before the equivalent substitution.

Thank you for fixing the PR.

> 	PR rtl-optimization/116170
>
> gcc/ChangeLog:
>
> 	* lra-constraints.cc (curr_insn_transform): Don't emit move back to
> 	old operand if it's nonimmediate_operand.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/powerpc/pr116170.c: New test.
> ---
>   gcc/lra-constraints.cc                      |  3 ++-
>   gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++
>   2 files changed, 20 insertions(+), 1 deletion(-)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c
>
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index 92b343fa99a..024c85c87d9 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p)
>   	    }
>   	  *loc = new_reg;
>   	  if (type != OP_IN
> -	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
> +	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX
> +	      && nonimmediate_operand (old, GET_MODE (old)))
>   	    {
>   	      start_sequence ();
>   	      lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr116170.c b/gcc/testsuite/gcc.target/powerpc/pr116170.c
> new file mode 100644
> index 00000000000..6f6ca0f1ae9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr116170.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ppc_float128_sw } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong -ffloat-store" } */
> +
> +/* Verify there is no ICE.  */
> +
> +int a, d;
> +_Float128 b, c;
> +void
> +e ()
> +{
> +  int f = 0;
> +  if (a)
> +    if (b || c)
> +      f = 1;
> +  if (d)
> +    e (f ? 0 : b);
> +}
> --
> 2.43.5
>
Kewen.Lin Aug. 12, 2024, 1:50 a.m. UTC | #2
Hi Vladimir,

on 2024/8/10 01:47, Vladimir Makarov wrote:
> 
> On 8/9/24 05:49, Kewen.Lin wrote:
>> Hi,
>>
>> Commit r15-2084 exposes one ICE in LRA.  Firstly, before
>> r15-2084 KFmode has 126 bit precision while V1TImode has 128
>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
>> paradoxical_subreg_p, which stops some passes from doing
>> some optimization.  After r15-2084, KFmode has the same mode
>> precision as V1TImode, passes are able to optimize more, but
>> it causes this ICE in LRA as described below:
>>
>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
>> which matches pattern
>>
>> (define_insn "*vsx_le_perm_store_<mode>"
>>    [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
>>          (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
>>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>>     && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>>    "@
>>     #
>>     #"
>>    [(set_attr "type" "vecstore,store")
>>     (set_attr "length" "12,8")
>>     (set_attr "isa" "<VSisa>,*")])
>> LRA makes equivalence substitution on r133 with const double
>> (const_double:KF 0.0), selects alternative 0 and fixes up
>> operand 1 for constraint "wa", because operand 1 is OP_INOUT,
>> so it considers assigning back to it as well, that is:
>>
>>    lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
>>
>> But because old has been changed to const_double in equivalence
>> substitution, the move is actually assigning to const_double,
>> which is invalid and cause ICE.
>>
>> Considering reg:KF 133 is equivalent with (const_double:KF 0.0)
>> even though this operand is OP_INOUT, IMHO there should not be
>> any following uses of reg:KF 133, otherwise it doesn't have the
>> chance to be equivalent to (const_double:KF 0.0).  From this
>> perspective, I think we can guard the lra_emit_move with
>> nonimmediate_operand to exclude such case.
>>
>> Does it sound reasonable?
> Yes.
>> btw, I also tried with disallowing equivalence substitution with
>> CONSTANT_P value if the corresponding operand is OP_INOUT or
>> OP_OUT, it can also fix this issue, but with more thinking it
>> seems not necessary to stop such substitution if we can handle it
>> later as above.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux and
>> powerpc64{,le}-linux-gnu.
>>
> Thank you for the good explanation of the problem.  The patch is ok for me.  It would be nice to add a comment before `nonimmediate_operand` that `old` can be an equivalent constant and we chose insn alternative before the equivalent substitution.

Thanks for your comments!!  If I read the code right, the function chooses
insn alternative after equivalence substitution, the related code is:

4207             if (lra_dump_file != NULL)
4208               {
4209                 fprintf (lra_dump_file,
4210                          "Changing pseudo %d in operand %i of insn %u on equiv ",
4211                          REGNO (old), i, INSN_UID (curr_insn));
4212                 dump_value_slim (lra_dump_file, subst, 1);
4213                 fprintf (lra_dump_file, "\n");
4214               }
4215             op_change_p = change_p = true;
4216           }
4217         if (simplify_operand_subreg (i, GET_MODE (old)) || op_change_p)
4218           {
4219             change_p = true;
4220             lra_update_dup (curr_id, i);
4221           }

4411       fprintf (lra_dump_file, "      Choosing alt %d in insn %u:",
4412                goal_alt_number, INSN_UID (curr_insn));
4413       print_curr_insn_alt (goal_alt_number);

so I just added a comment as you suggested but stripping "and ...":

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 92b343fa99a..f355c6c6168 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4742,7 +4742,9 @@ curr_insn_transform (bool check_only_p)
             }
           *loc = new_reg;
           if (type != OP_IN
-              && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
+              && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX
+              /* OLD can be an equivalent constant here.  */
+              && nonimmediate_operand (old, GET_MODE (old)))
             {
               start_sequence ();
               lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);

Does it look good to you?  Or did I miss something here?

Thanks again!

BR,
Kewen

> 
> Thank you for fixing the PR.
> 
>>     PR rtl-optimization/116170
>>
>> gcc/ChangeLog:
>>
>>     * lra-constraints.cc (curr_insn_transform): Don't emit move back to
>>     old operand if it's nonimmediate_operand.
>>
>> gcc/testsuite/ChangeLog:
>>
>>     * gcc.target/powerpc/pr116170.c: New test.
>> ---
>>   gcc/lra-constraints.cc                      |  3 ++-
>>   gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++
>>   2 files changed, 20 insertions(+), 1 deletion(-)
>>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c
>>
>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>> index 92b343fa99a..024c85c87d9 100644
>> --- a/gcc/lra-constraints.cc
>> +++ b/gcc/lra-constraints.cc
>> @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p)
>>           }
>>         *loc = new_reg;
>>         if (type != OP_IN
>> -          && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
>> +          && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX
>> +          && nonimmediate_operand (old, GET_MODE (old)))
>>           {
>>             start_sequence ();
>>             lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr116170.c b/gcc/testsuite/gcc.target/powerpc/pr116170.c
>> new file mode 100644
>> index 00000000000..6f6ca0f1ae9
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr116170.c
>> @@ -0,0 +1,18 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target ppc_float128_sw } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong -ffloat-store" } */
>> +
>> +/* Verify there is no ICE.  */
>> +
>> +int a, d;
>> +_Float128 b, c;
>> +void
>> +e ()
>> +{
>> +  int f = 0;
>> +  if (a)
>> +    if (b || c)
>> +      f = 1;
>> +  if (d)
>> +    e (f ? 0 : b);
>> +}
>> -- 
>> 2.43.5
>>
>
Richard Sandiford Aug. 12, 2024, 8:55 a.m. UTC | #3
"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi,
>
> Commit r15-2084 exposes one ICE in LRA.  Firstly, before
> r15-2084 KFmode has 126 bit precision while V1TImode has 128
> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
> paradoxical_subreg_p, which stops some passes from doing
> some optimization.  After r15-2084, KFmode has the same mode
> precision as V1TImode, passes are able to optimize more, but
> it causes this ICE in LRA as described below:
>
> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
> which matches pattern
>
> (define_insn "*vsx_le_perm_store_<mode>"
>   [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
>         (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]

Is it well-formed to have "+" on an operand that is only semantically
an input?  df and most passes would consider it to be a pure input
and so would disregard any write that is intended to take place.

E.g. if we have:

  (set (reg:M R2) (reg:M R1)) ;; R1 dead
  (set (reg:M R3) (reg:M R2))

then it would be valid to change that to:

  (set (reg:M R2) (reg:M R1))
  (set (reg:M R3) (reg:M R1))

without considering the "+" on the input to the first instruction.

But if we do stick with the patch...

>   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>    && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>   "@
>    #
>    #"
>   [(set_attr "type" "vecstore,store")
>    (set_attr "length" "12,8")
>    (set_attr "isa" "<VSisa>,*")])
>
> LRA makes equivalence substitution on r133 with const double
> (const_double:KF 0.0), selects alternative 0 and fixes up
> operand 1 for constraint "wa", because operand 1 is OP_INOUT,
> so it considers assigning back to it as well, that is:
>
>   lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
>
> But because old has been changed to const_double in equivalence
> substitution, the move is actually assigning to const_double,
> which is invalid and cause ICE.
>
> Considering reg:KF 133 is equivalent with (const_double:KF 0.0)
> even though this operand is OP_INOUT, IMHO there should not be
> any following uses of reg:KF 133, otherwise it doesn't have the
> chance to be equivalent to (const_double:KF 0.0).  From this
> perspective, I think we can guard the lra_emit_move with
> nonimmediate_operand to exclude such case.
>
> Does it sound reasonable?
>
> btw, I also tried with disallowing equivalence substitution with
> CONSTANT_P value if the corresponding operand is OP_INOUT or
> OP_OUT, it can also fix this issue, but with more thinking it
> seems not necessary to stop such substitution if we can handle it
> later as above.
>
> Bootstrapped and regtested on x86_64-redhat-linux and
> powerpc64{,le}-linux-gnu.
>
> BR,
> Kewen
> -----
> 	PR rtl-optimization/116170
>
> gcc/ChangeLog:
>
> 	* lra-constraints.cc (curr_insn_transform): Don't emit move back to
> 	old operand if it's nonimmediate_operand.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/powerpc/pr116170.c: New test.
> ---
>  gcc/lra-constraints.cc                      |  3 ++-
>  gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c
>
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index 92b343fa99a..024c85c87d9 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p)
>  	    }
>  	  *loc = new_reg;
>  	  if (type != OP_IN
> -	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
> +	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX
> +	      && nonimmediate_operand (old, GET_MODE (old)))

...IMO it would be better to use CONSTANT_P instead.  It's sometimes
useful to accept constants in specific instructions only, meaning that
they aren't general enough to be immediate_operands.

Thanks,
Richard

>  	    {
>  	      start_sequence ();
>  	      lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr116170.c b/gcc/testsuite/gcc.target/powerpc/pr116170.c
> new file mode 100644
> index 00000000000..6f6ca0f1ae9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr116170.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ppc_float128_sw } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong -ffloat-store" } */
> +
> +/* Verify there is no ICE.  */
> +
> +int a, d;
> +_Float128 b, c;
> +void
> +e ()
> +{
> +  int f = 0;
> +  if (a)
> +    if (b || c)
> +      f = 1;
> +  if (d)
> +    e (f ? 0 : b);
> +}
> --
> 2.43.5
Kewen.Lin Aug. 12, 2024, 10:33 a.m. UTC | #4
Hi Richard,

Thanks for the comments!

on 2024/8/12 16:55, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> Hi,
>>
>> Commit r15-2084 exposes one ICE in LRA.  Firstly, before
>> r15-2084 KFmode has 126 bit precision while V1TImode has 128
>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
>> paradoxical_subreg_p, which stops some passes from doing
>> some optimization.  After r15-2084, KFmode has the same mode
>> precision as V1TImode, passes are able to optimize more, but
>> it causes this ICE in LRA as described below:
>>
>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
>> which matches pattern
>>
>> (define_insn "*vsx_le_perm_store_<mode>"
>>   [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
>>         (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
> 
> Is it well-formed to have "+" on an operand that is only semantically
> an input?  df and most passes would consider it to be a pure input
> and so would disregard any write that is intended to take place.
> 
> E.g. if we have:
> 
>   (set (reg:M R2) (reg:M R1)) ;; R1 dead
>   (set (reg:M R3) (reg:M R2))
> 
> then it would be valid to change that to:
> 
>   (set (reg:M R2) (reg:M R1))
>   (set (reg:M R3) (reg:M R1))
> 
> without considering the "+" on the input to the first instruction.

Good question, I guess the "+" is to match the fact that operand[1]
can be both read and written by this insn, operand[1] has to be
re-used as the dest register of vector rotate 64bit (doubleword swap).
but it'll get swapped back so it's safe if the register is still live
(like the above example).  For this case that writing to but later
restoring, I'm not sure if we can take it as "no write" (strip "+").

;; The post-reload split requires that we re-permute the source
;; register in case it is still live.
(define_split
  [(set (match_operand:VSX_LE_128 0 "memory_operand")
        (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
   && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
  [(const_int 0)]
{
  rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
  rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode);
  rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
  DONE;
})

CC the author Mike as well.

> 
> But if we do stick with the patch...
> 
>>   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>>    && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>>   "@
>>    #
>>    #"
>>   [(set_attr "type" "vecstore,store")
>>    (set_attr "length" "12,8")
>>    (set_attr "isa" "<VSisa>,*")])
>>
>> LRA makes equivalence substitution on r133 with const double
>> (const_double:KF 0.0), selects alternative 0 and fixes up
>> operand 1 for constraint "wa", because operand 1 is OP_INOUT,
>> so it considers assigning back to it as well, that is:
>>
>>   lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
>>
>> But because old has been changed to const_double in equivalence
>> substitution, the move is actually assigning to const_double,
>> which is invalid and cause ICE.
>>
>> Considering reg:KF 133 is equivalent with (const_double:KF 0.0)
>> even though this operand is OP_INOUT, IMHO there should not be
>> any following uses of reg:KF 133, otherwise it doesn't have the
>> chance to be equivalent to (const_double:KF 0.0).  From this
>> perspective, I think we can guard the lra_emit_move with
>> nonimmediate_operand to exclude such case.
>>
>> Does it sound reasonable?
>>
>> btw, I also tried with disallowing equivalence substitution with
>> CONSTANT_P value if the corresponding operand is OP_INOUT or
>> OP_OUT, it can also fix this issue, but with more thinking it
>> seems not necessary to stop such substitution if we can handle it
>> later as above.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux and
>> powerpc64{,le}-linux-gnu.
>>
>> BR,
>> Kewen
>> -----
>> 	PR rtl-optimization/116170
>>
>> gcc/ChangeLog:
>>
>> 	* lra-constraints.cc (curr_insn_transform): Don't emit move back to
>> 	old operand if it's nonimmediate_operand.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.target/powerpc/pr116170.c: New test.
>> ---
>>  gcc/lra-constraints.cc                      |  3 ++-
>>  gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++
>>  2 files changed, 20 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c
>>
>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>> index 92b343fa99a..024c85c87d9 100644
>> --- a/gcc/lra-constraints.cc
>> +++ b/gcc/lra-constraints.cc
>> @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p)
>>  	    }
>>  	  *loc = new_reg;
>>  	  if (type != OP_IN
>> -	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
>> +	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX
>> +	      && nonimmediate_operand (old, GET_MODE (old)))
> 
> ...IMO it would be better to use CONSTANT_P instead.  It's sometimes
> useful to accept constants in specific instructions only, meaning that
> they aren't general enough to be immediate_operands.

OK, will do.  I used "CONSTANT_P" in the 1st draft, but considered it's
guarding a move, by checking many existing mov optab predicates (they
only accept general operands) so felt nonimmediate_operand seems to be a
better fit here.

BR,
Kewen
Richard Sandiford Aug. 12, 2024, 1:02 p.m. UTC | #5
"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi Richard,
>
> Thanks for the comments!
>
> on 2024/8/12 16:55, Richard Sandiford wrote:
>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>> Hi,
>>>
>>> Commit r15-2084 exposes one ICE in LRA.  Firstly, before
>>> r15-2084 KFmode has 126 bit precision while V1TImode has 128
>>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
>>> paradoxical_subreg_p, which stops some passes from doing
>>> some optimization.  After r15-2084, KFmode has the same mode
>>> precision as V1TImode, passes are able to optimize more, but
>>> it causes this ICE in LRA as described below:
>>>
>>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
>>> which matches pattern
>>>
>>> (define_insn "*vsx_le_perm_store_<mode>"
>>>   [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
>>>         (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
>> 
>> Is it well-formed to have "+" on an operand that is only semantically
>> an input?  df and most passes would consider it to be a pure input
>> and so would disregard any write that is intended to take place.
>> 
>> E.g. if we have:
>> 
>>   (set (reg:M R2) (reg:M R1)) ;; R1 dead
>>   (set (reg:M R3) (reg:M R2))
>> 
>> then it would be valid to change that to:
>> 
>>   (set (reg:M R2) (reg:M R1))
>>   (set (reg:M R3) (reg:M R1))
>> 
>> without considering the "+" on the input to the first instruction.
>
> Good question, I guess the "+" is to match the fact that operand[1]
> can be both read and written by this insn, operand[1] has to be
> re-used as the dest register of vector rotate 64bit (doubleword swap).
> but it'll get swapped back so it's safe if the register is still live
> (like the above example).  For this case that writing to but later
> restoring, I'm not sure if we can take it as "no write" (strip "+").

Yeah.  AIUI, there is no way of satisfying the constraints in a way
that makes operand 0 overlap operand 1, and:

>
> ;; The post-reload split requires that we re-permute the source
> ;; register in case it is still live.
> (define_split
>   [(set (match_operand:VSX_LE_128 0 "memory_operand")
>         (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>   "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>    && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>   [(const_int 0)]
> {
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>   rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode);
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);

would not work correctly for that kind of overlap in any case.
 
So it looks on the face of it like the pattern would be (more) correct
without the "+".

Thanks,
Richard

>   DONE;
> })
>
> CC the author Mike as well.
>
>> 
>> But if we do stick with the patch...
>> 
>>>   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>>>    && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>>>   "@
>>>    #
>>>    #"
>>>   [(set_attr "type" "vecstore,store")
>>>    (set_attr "length" "12,8")
>>>    (set_attr "isa" "<VSisa>,*")])
>>>
>>> LRA makes equivalence substitution on r133 with const double
>>> (const_double:KF 0.0), selects alternative 0 and fixes up
>>> operand 1 for constraint "wa", because operand 1 is OP_INOUT,
>>> so it considers assigning back to it as well, that is:
>>>
>>>   lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
>>>
>>> But because old has been changed to const_double in equivalence
>>> substitution, the move is actually assigning to const_double,
>>> which is invalid and cause ICE.
>>>
>>> Considering reg:KF 133 is equivalent with (const_double:KF 0.0)
>>> even though this operand is OP_INOUT, IMHO there should not be
>>> any following uses of reg:KF 133, otherwise it doesn't have the
>>> chance to be equivalent to (const_double:KF 0.0).  From this
>>> perspective, I think we can guard the lra_emit_move with
>>> nonimmediate_operand to exclude such case.
>>>
>>> Does it sound reasonable?
>>>
>>> btw, I also tried with disallowing equivalence substitution with
>>> CONSTANT_P value if the corresponding operand is OP_INOUT or
>>> OP_OUT, it can also fix this issue, but with more thinking it
>>> seems not necessary to stop such substitution if we can handle it
>>> later as above.
>>>
>>> Bootstrapped and regtested on x86_64-redhat-linux and
>>> powerpc64{,le}-linux-gnu.
>>>
>>> BR,
>>> Kewen
>>> -----
>>> 	PR rtl-optimization/116170
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* lra-constraints.cc (curr_insn_transform): Don't emit move back to
>>> 	old operand if it's nonimmediate_operand.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.target/powerpc/pr116170.c: New test.
>>> ---
>>>  gcc/lra-constraints.cc                      |  3 ++-
>>>  gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++
>>>  2 files changed, 20 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c
>>>
>>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>>> index 92b343fa99a..024c85c87d9 100644
>>> --- a/gcc/lra-constraints.cc
>>> +++ b/gcc/lra-constraints.cc
>>> @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p)
>>>  	    }
>>>  	  *loc = new_reg;
>>>  	  if (type != OP_IN
>>> -	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
>>> +	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX
>>> +	      && nonimmediate_operand (old, GET_MODE (old)))
>> 
>> ...IMO it would be better to use CONSTANT_P instead.  It's sometimes
>> useful to accept constants in specific instructions only, meaning that
>> they aren't general enough to be immediate_operands.
>
> OK, will do.  I used "CONSTANT_P" in the 1st draft, but considered it's
> guarding a move, by checking many existing mov optab predicates (they
> only accept general operands) so felt nonimmediate_operand seems to be a
> better fit here.
>
> BR,
> Kewen
Vladimir Makarov Aug. 12, 2024, 1:05 p.m. UTC | #6
On 8/11/24 21:50, Kewen.Lin wrote:
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index 92b343fa99a..f355c6c6168 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -4742,7 +4742,9 @@ curr_insn_transform (bool check_only_p)
>               }
>             *loc = new_reg;
>             if (type != OP_IN
> -              && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
> +              && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX
> +              /* OLD can be an equivalent constant here.  */
> +              && nonimmediate_operand (old, GET_MODE (old)))
>               {
>                 start_sequence ();
>                 lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
>
> Does it look good to you?  Or did I miss something here?
>
The comment looks ok to me.

Also I am agree with Richard Sandiford's remark that CONSTANT_P instead 
of nonimmediate_operand would be a better condition.  It is more safe.  
With CONSTANT_P change, the patch can be already pushed into the trunk.

Thank you.
Segher Boessenkool Aug. 12, 2024, 2:40 p.m. UTC | #7
Hi!

On Mon, Aug 12, 2024 at 09:55:07AM +0100, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
> > (define_insn "*vsx_le_perm_store_<mode>"
> >   [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
> >         (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
> 
> Is it well-formed to have "+" on an operand that is only semantically
> an input?  df and most passes would consider it to be a pure input
> and so would disregard any write that is intended to take place.

It is well-formed, but it doesn't make sense, indeed :-)  Named patterns
have requirements on their arguments, but everything else is whatever
the target wants :-)

Hopefully *no* passes will consider this a pure input, we have that "+"
after all!  It would be better if it wasn't there, sure.


Segher
Kewen.Lin Aug. 13, 2024, 9:32 a.m. UTC | #8
on 2024/8/12 21:05, Vladimir Makarov wrote:
> On 8/11/24 21: 50, Kewen. Lin wrote: diff --git a/gcc/lra-constraints. cc b/gcc/lra-constraints. cc index 92b343fa99a. . f355c6c6168 100644 --- a/gcc/lra-constraints. cc +++ b/gcc/lra-constraints. cc @@ -4742,7 +4742,9 @@ curr_insn_transform (bool check_only_p)
> 
> 
> 
> On 8/11/24 21:50, Kewen.Lin wrote:
>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>> index 92b343fa99a..f355c6c6168 100644
>> --- a/gcc/lra-constraints.cc
>> +++ b/gcc/lra-constraints.cc
>> @@ -4742,7 +4742,9 @@ curr_insn_transform (bool check_only_p)
>>              }
>>            *loc = new_reg;
>>            if (type != OP_IN
>> -              && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
>> +              && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX
>> +              /* OLD can be an equivalent constant here.  */
>> +              && nonimmediate_operand (old, GET_MODE (old)))
>>              {
>>                start_sequence ();
>>                lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
>>
>> Does it look good to you?  Or did I miss something here?
>>
> The comment looks ok to me.
> 
> Also I am agree with Richard Sandiford's remark that CONSTANT_P instead of nonimmediate_operand would be a better condition.  It is more safe.  With CONSTANT_P change, the patch can be already pushed into the trunk.

Thanks Vladimir!  Pushed with CONSTANT_P change as r15-2899-g49d5e21d41aed8.

BR,
Kewen
Kewen.Lin Aug. 13, 2024, 9:50 a.m. UTC | #9
on 2024/8/12 21:02, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> Hi Richard,
>>
>> Thanks for the comments!
>>
>> on 2024/8/12 16:55, Richard Sandiford wrote:
>>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>>> Hi,
>>>>
>>>> Commit r15-2084 exposes one ICE in LRA.  Firstly, before
>>>> r15-2084 KFmode has 126 bit precision while V1TImode has 128
>>>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
>>>> paradoxical_subreg_p, which stops some passes from doing
>>>> some optimization.  After r15-2084, KFmode has the same mode
>>>> precision as V1TImode, passes are able to optimize more, but
>>>> it causes this ICE in LRA as described below:
>>>>
>>>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
>>>> which matches pattern
>>>>
>>>> (define_insn "*vsx_le_perm_store_<mode>"
>>>>   [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
>>>>         (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
>>>
>>> Is it well-formed to have "+" on an operand that is only semantically
>>> an input?  df and most passes would consider it to be a pure input
>>> and so would disregard any write that is intended to take place.
>>>
>>> E.g. if we have:
>>>
>>>   (set (reg:M R2) (reg:M R1)) ;; R1 dead
>>>   (set (reg:M R3) (reg:M R2))
>>>
>>> then it would be valid to change that to:
>>>
>>>   (set (reg:M R2) (reg:M R1))
>>>   (set (reg:M R3) (reg:M R1))
>>>
>>> without considering the "+" on the input to the first instruction.
>>
>> Good question, I guess the "+" is to match the fact that operand[1]
>> can be both read and written by this insn, operand[1] has to be
>> re-used as the dest register of vector rotate 64bit (doubleword swap).
>> but it'll get swapped back so it's safe if the register is still live
>> (like the above example).  For this case that writing to but later
>> restoring, I'm not sure if we can take it as "no write" (strip "+").
> 
> Yeah.  AIUI, there is no way of satisfying the constraints in a way
> that makes operand 0 overlap operand 1, and:
> 
>>
>> ;; The post-reload split requires that we re-permute the source
>> ;; register in case it is still live.
>> (define_split
>>   [(set (match_operand:VSX_LE_128 0 "memory_operand")
>>         (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>>   "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>>    && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>>   [(const_int 0)]
>> {
>>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>>   rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode);
>>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
> 
> would not work correctly for that kind of overlap in any case.

Guessing you meant the case that both operands are REGs?  I may miss
something, but the case here op0 is MEM and op1 is REG, is it still
possible to overlap?

>  
> So it looks on the face of it like the pattern would be (more) correct
> without the "+".
> 

I just posted a patch to get rid of the "+" at [1], thanks for your
insightful comments!!

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660238.html

BR,
Kewen
Richard Sandiford Aug. 13, 2024, 10:02 a.m. UTC | #10
"Kewen.Lin" <linkw@linux.ibm.com> writes:
> on 2024/8/12 21:02, Richard Sandiford wrote:
>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>> Hi Richard,
>>>
>>> Thanks for the comments!
>>>
>>> on 2024/8/12 16:55, Richard Sandiford wrote:
>>>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>>>> Hi,
>>>>>
>>>>> Commit r15-2084 exposes one ICE in LRA.  Firstly, before
>>>>> r15-2084 KFmode has 126 bit precision while V1TImode has 128
>>>>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
>>>>> paradoxical_subreg_p, which stops some passes from doing
>>>>> some optimization.  After r15-2084, KFmode has the same mode
>>>>> precision as V1TImode, passes are able to optimize more, but
>>>>> it causes this ICE in LRA as described below:
>>>>>
>>>>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
>>>>> which matches pattern
>>>>>
>>>>> (define_insn "*vsx_le_perm_store_<mode>"
>>>>>   [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
>>>>>         (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
>>>>
>>>> Is it well-formed to have "+" on an operand that is only semantically
>>>> an input?  df and most passes would consider it to be a pure input
>>>> and so would disregard any write that is intended to take place.
>>>>
>>>> E.g. if we have:
>>>>
>>>>   (set (reg:M R2) (reg:M R1)) ;; R1 dead
>>>>   (set (reg:M R3) (reg:M R2))
>>>>
>>>> then it would be valid to change that to:
>>>>
>>>>   (set (reg:M R2) (reg:M R1))
>>>>   (set (reg:M R3) (reg:M R1))
>>>>
>>>> without considering the "+" on the input to the first instruction.
>>>
>>> Good question, I guess the "+" is to match the fact that operand[1]
>>> can be both read and written by this insn, operand[1] has to be
>>> re-used as the dest register of vector rotate 64bit (doubleword swap).
>>> but it'll get swapped back so it's safe if the register is still live
>>> (like the above example).  For this case that writing to but later
>>> restoring, I'm not sure if we can take it as "no write" (strip "+").
>> 
>> Yeah.  AIUI, there is no way of satisfying the constraints in a way
>> that makes operand 0 overlap operand 1, and:
>> 
>>>
>>> ;; The post-reload split requires that we re-permute the source
>>> ;; register in case it is still live.
>>> (define_split
>>>   [(set (match_operand:VSX_LE_128 0 "memory_operand")
>>>         (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>>>   "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>>>    && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>>>   [(const_int 0)]
>>> {
>>>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>>>   rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode);
>>>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>> 
>> would not work correctly for that kind of overlap in any case.
>
> Guessing you meant the case that both operands are REGs?  I may miss
> something, but the case here op0 is MEM and op1 is REG, is it still
> possible to overlap?

Not unless the source register can be used in addresses, which I assume
can't happen for vector registers.

That's why I said "AIUI, there is no way of satisfying the constraints
in a way that makes operand 0 overlap operand 1".

Thanks,
Richard
Kewen.Lin Aug. 14, 2024, 2:03 a.m. UTC | #11
on 2024/8/13 18:02, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> on 2024/8/12 21:02, Richard Sandiford wrote:
>>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>>> Hi Richard,
>>>>
>>>> Thanks for the comments!
>>>>
>>>> on 2024/8/12 16:55, Richard Sandiford wrote:
>>>>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>>>>> Hi,
>>>>>>
>>>>>> Commit r15-2084 exposes one ICE in LRA.  Firstly, before
>>>>>> r15-2084 KFmode has 126 bit precision while V1TImode has 128
>>>>>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
>>>>>> paradoxical_subreg_p, which stops some passes from doing
>>>>>> some optimization.  After r15-2084, KFmode has the same mode
>>>>>> precision as V1TImode, passes are able to optimize more, but
>>>>>> it causes this ICE in LRA as described below:
>>>>>>
>>>>>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
>>>>>> which matches pattern
>>>>>>
>>>>>> (define_insn "*vsx_le_perm_store_<mode>"
>>>>>>   [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
>>>>>>         (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
>>>>>
>>>>> Is it well-formed to have "+" on an operand that is only semantically
>>>>> an input?  df and most passes would consider it to be a pure input
>>>>> and so would disregard any write that is intended to take place.
>>>>>
>>>>> E.g. if we have:
>>>>>
>>>>>   (set (reg:M R2) (reg:M R1)) ;; R1 dead
>>>>>   (set (reg:M R3) (reg:M R2))
>>>>>
>>>>> then it would be valid to change that to:
>>>>>
>>>>>   (set (reg:M R2) (reg:M R1))
>>>>>   (set (reg:M R3) (reg:M R1))
>>>>>
>>>>> without considering the "+" on the input to the first instruction.
>>>>
>>>> Good question, I guess the "+" is to match the fact that operand[1]
>>>> can be both read and written by this insn, operand[1] has to be
>>>> re-used as the dest register of vector rotate 64bit (doubleword swap).
>>>> but it'll get swapped back so it's safe if the register is still live
>>>> (like the above example).  For this case that writing to but later
>>>> restoring, I'm not sure if we can take it as "no write" (strip "+").
>>>
>>> Yeah.  AIUI, there is no way of satisfying the constraints in a way
>>> that makes operand 0 overlap operand 1, and:
>>>
>>>>
>>>> ;; The post-reload split requires that we re-permute the source
>>>> ;; register in case it is still live.
>>>> (define_split
>>>>   [(set (match_operand:VSX_LE_128 0 "memory_operand")
>>>>         (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>>>>   "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>>>>    && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>>>>   [(const_int 0)]
>>>> {
>>>>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>>>>   rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode);
>>>>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>>>
>>> would not work correctly for that kind of overlap in any case.
>>
>> Guessing you meant the case that both operands are REGs?  I may miss
>> something, but the case here op0 is MEM and op1 is REG, is it still
>> possible to overlap?
> 
> Not unless the source register can be used in addresses, which I assume
> can't happen for vector registers.
> 
> That's why I said "AIUI, there is no way of satisfying the constraints
> in a way that makes operand 0 overlap operand 1".

Got it, thanks for clarifying!

BR,
Kewen
diff mbox series

Patch

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 92b343fa99a..024c85c87d9 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4742,7 +4742,8 @@  curr_insn_transform (bool check_only_p)
 	    }
 	  *loc = new_reg;
 	  if (type != OP_IN
-	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
+	      && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX
+	      && nonimmediate_operand (old, GET_MODE (old)))
 	    {
 	      start_sequence ();
 	      lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr116170.c b/gcc/testsuite/gcc.target/powerpc/pr116170.c
new file mode 100644
index 00000000000..6f6ca0f1ae9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr116170.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong -ffloat-store" } */
+
+/* Verify there is no ICE.  */
+
+int a, d;
+_Float128 b, c;
+void
+e ()
+{
+  int f = 0;
+  if (a)
+    if (b || c)
+      f = 1;
+  if (d)
+    e (f ? 0 : b);
+}