Message ID | 55fe76af-a0c5-1066-7b94-022cfcf46556@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [PATCH/RFC] LRA: Don't emit move for substituted CONSTATNT_P operand [PR116170] | expand |
On 8/9/24 05:49, Kewen.Lin wrote: > Hi, > > Commit r15-2084 exposes one ICE in LRA. Firstly, before > r15-2084 KFmode has 126 bit precision while V1TImode has 128 > bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is > paradoxical_subreg_p, which stops some passes from doing > some optimization. After r15-2084, KFmode has the same mode > precision as V1TImode, passes are able to optimize more, but > it causes this ICE in LRA as described below: > > For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)), > which matches pattern > > (define_insn "*vsx_le_perm_store_<mode>" > [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") > (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] > "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR > && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" > "@ > # > #" > [(set_attr "type" "vecstore,store") > (set_attr "length" "12,8") > (set_attr "isa" "<VSisa>,*")]) > LRA makes equivalence substitution on r133 with const double > (const_double:KF 0.0), selects alternative 0 and fixes up > operand 1 for constraint "wa", because operand 1 is OP_INOUT, > so it considers assigning back to it as well, that is: > > lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); > > But because old has been changed to const_double in equivalence > substitution, the move is actually assigning to const_double, > which is invalid and cause ICE. > > Considering reg:KF 133 is equivalent with (const_double:KF 0.0) > even though this operand is OP_INOUT, IMHO there should not be > any following uses of reg:KF 133, otherwise it doesn't have the > chance to be equivalent to (const_double:KF 0.0). From this > perspective, I think we can guard the lra_emit_move with > nonimmediate_operand to exclude such case. > > Does it sound reasonable? Yes. > btw, I also tried with disallowing equivalence substitution with > CONSTANT_P value if the corresponding operand is OP_INOUT or > OP_OUT, it can also fix this issue, but with more thinking it > seems not necessary to stop such substitution if we can handle it > later as above. > > Bootstrapped and regtested on x86_64-redhat-linux and > powerpc64{,le}-linux-gnu. > Thank you for the good explanation of the problem. The patch is ok for me. It would be nice to add a comment before `nonimmediate_operand` that `old` can be an equivalent constant and we chose insn alternative before the equivalent substitution. Thank you for fixing the PR. > PR rtl-optimization/116170 > > gcc/ChangeLog: > > * lra-constraints.cc (curr_insn_transform): Don't emit move back to > old operand if it's nonimmediate_operand. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr116170.c: New test. > --- > gcc/lra-constraints.cc | 3 ++- > gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++ > 2 files changed, 20 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc > index 92b343fa99a..024c85c87d9 100644 > --- a/gcc/lra-constraints.cc > +++ b/gcc/lra-constraints.cc > @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p) > } > *loc = new_reg; > if (type != OP_IN > - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) > + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX > + && nonimmediate_operand (old, GET_MODE (old))) > { > start_sequence (); > lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); > diff --git a/gcc/testsuite/gcc.target/powerpc/pr116170.c b/gcc/testsuite/gcc.target/powerpc/pr116170.c > new file mode 100644 > index 00000000000..6f6ca0f1ae9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr116170.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target ppc_float128_sw } */ > +/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong -ffloat-store" } */ > + > +/* Verify there is no ICE. */ > + > +int a, d; > +_Float128 b, c; > +void > +e () > +{ > + int f = 0; > + if (a) > + if (b || c) > + f = 1; > + if (d) > + e (f ? 0 : b); > +} > -- > 2.43.5 >
Hi Vladimir, on 2024/8/10 01:47, Vladimir Makarov wrote: > > On 8/9/24 05:49, Kewen.Lin wrote: >> Hi, >> >> Commit r15-2084 exposes one ICE in LRA. Firstly, before >> r15-2084 KFmode has 126 bit precision while V1TImode has 128 >> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is >> paradoxical_subreg_p, which stops some passes from doing >> some optimization. After r15-2084, KFmode has the same mode >> precision as V1TImode, passes are able to optimize more, but >> it causes this ICE in LRA as described below: >> >> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)), >> which matches pattern >> >> (define_insn "*vsx_le_perm_store_<mode>" >> [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") >> (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] >> "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR >> && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" >> "@ >> # >> #" >> [(set_attr "type" "vecstore,store") >> (set_attr "length" "12,8") >> (set_attr "isa" "<VSisa>,*")]) >> LRA makes equivalence substitution on r133 with const double >> (const_double:KF 0.0), selects alternative 0 and fixes up >> operand 1 for constraint "wa", because operand 1 is OP_INOUT, >> so it considers assigning back to it as well, that is: >> >> lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); >> >> But because old has been changed to const_double in equivalence >> substitution, the move is actually assigning to const_double, >> which is invalid and cause ICE. >> >> Considering reg:KF 133 is equivalent with (const_double:KF 0.0) >> even though this operand is OP_INOUT, IMHO there should not be >> any following uses of reg:KF 133, otherwise it doesn't have the >> chance to be equivalent to (const_double:KF 0.0). From this >> perspective, I think we can guard the lra_emit_move with >> nonimmediate_operand to exclude such case. >> >> Does it sound reasonable? > Yes. >> btw, I also tried with disallowing equivalence substitution with >> CONSTANT_P value if the corresponding operand is OP_INOUT or >> OP_OUT, it can also fix this issue, but with more thinking it >> seems not necessary to stop such substitution if we can handle it >> later as above. >> >> Bootstrapped and regtested on x86_64-redhat-linux and >> powerpc64{,le}-linux-gnu. >> > Thank you for the good explanation of the problem. The patch is ok for me. It would be nice to add a comment before `nonimmediate_operand` that `old` can be an equivalent constant and we chose insn alternative before the equivalent substitution. Thanks for your comments!! If I read the code right, the function chooses insn alternative after equivalence substitution, the related code is: 4207 if (lra_dump_file != NULL) 4208 { 4209 fprintf (lra_dump_file, 4210 "Changing pseudo %d in operand %i of insn %u on equiv ", 4211 REGNO (old), i, INSN_UID (curr_insn)); 4212 dump_value_slim (lra_dump_file, subst, 1); 4213 fprintf (lra_dump_file, "\n"); 4214 } 4215 op_change_p = change_p = true; 4216 } 4217 if (simplify_operand_subreg (i, GET_MODE (old)) || op_change_p) 4218 { 4219 change_p = true; 4220 lra_update_dup (curr_id, i); 4221 } 4411 fprintf (lra_dump_file, " Choosing alt %d in insn %u:", 4412 goal_alt_number, INSN_UID (curr_insn)); 4413 print_curr_insn_alt (goal_alt_number); so I just added a comment as you suggested but stripping "and ...": diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index 92b343fa99a..f355c6c6168 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lra-constraints.cc @@ -4742,7 +4742,9 @@ curr_insn_transform (bool check_only_p) } *loc = new_reg; if (type != OP_IN - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX + /* OLD can be an equivalent constant here. */ + && nonimmediate_operand (old, GET_MODE (old))) { start_sequence (); lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); Does it look good to you? Or did I miss something here? Thanks again! BR, Kewen > > Thank you for fixing the PR. > >> PR rtl-optimization/116170 >> >> gcc/ChangeLog: >> >> * lra-constraints.cc (curr_insn_transform): Don't emit move back to >> old operand if it's nonimmediate_operand. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr116170.c: New test. >> --- >> gcc/lra-constraints.cc | 3 ++- >> gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++ >> 2 files changed, 20 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c >> >> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc >> index 92b343fa99a..024c85c87d9 100644 >> --- a/gcc/lra-constraints.cc >> +++ b/gcc/lra-constraints.cc >> @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p) >> } >> *loc = new_reg; >> if (type != OP_IN >> - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) >> + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX >> + && nonimmediate_operand (old, GET_MODE (old))) >> { >> start_sequence (); >> lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr116170.c b/gcc/testsuite/gcc.target/powerpc/pr116170.c >> new file mode 100644 >> index 00000000000..6f6ca0f1ae9 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr116170.c >> @@ -0,0 +1,18 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target ppc_float128_sw } */ >> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong -ffloat-store" } */ >> + >> +/* Verify there is no ICE. */ >> + >> +int a, d; >> +_Float128 b, c; >> +void >> +e () >> +{ >> + int f = 0; >> + if (a) >> + if (b || c) >> + f = 1; >> + if (d) >> + e (f ? 0 : b); >> +} >> -- >> 2.43.5 >> >
"Kewen.Lin" <linkw@linux.ibm.com> writes: > Hi, > > Commit r15-2084 exposes one ICE in LRA. Firstly, before > r15-2084 KFmode has 126 bit precision while V1TImode has 128 > bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is > paradoxical_subreg_p, which stops some passes from doing > some optimization. After r15-2084, KFmode has the same mode > precision as V1TImode, passes are able to optimize more, but > it causes this ICE in LRA as described below: > > For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)), > which matches pattern > > (define_insn "*vsx_le_perm_store_<mode>" > [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") > (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] Is it well-formed to have "+" on an operand that is only semantically an input? df and most passes would consider it to be a pure input and so would disregard any write that is intended to take place. E.g. if we have: (set (reg:M R2) (reg:M R1)) ;; R1 dead (set (reg:M R3) (reg:M R2)) then it would be valid to change that to: (set (reg:M R2) (reg:M R1)) (set (reg:M R3) (reg:M R1)) without considering the "+" on the input to the first instruction. But if we do stick with the patch... > "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR > && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" > "@ > # > #" > [(set_attr "type" "vecstore,store") > (set_attr "length" "12,8") > (set_attr "isa" "<VSisa>,*")]) > > LRA makes equivalence substitution on r133 with const double > (const_double:KF 0.0), selects alternative 0 and fixes up > operand 1 for constraint "wa", because operand 1 is OP_INOUT, > so it considers assigning back to it as well, that is: > > lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); > > But because old has been changed to const_double in equivalence > substitution, the move is actually assigning to const_double, > which is invalid and cause ICE. > > Considering reg:KF 133 is equivalent with (const_double:KF 0.0) > even though this operand is OP_INOUT, IMHO there should not be > any following uses of reg:KF 133, otherwise it doesn't have the > chance to be equivalent to (const_double:KF 0.0). From this > perspective, I think we can guard the lra_emit_move with > nonimmediate_operand to exclude such case. > > Does it sound reasonable? > > btw, I also tried with disallowing equivalence substitution with > CONSTANT_P value if the corresponding operand is OP_INOUT or > OP_OUT, it can also fix this issue, but with more thinking it > seems not necessary to stop such substitution if we can handle it > later as above. > > Bootstrapped and regtested on x86_64-redhat-linux and > powerpc64{,le}-linux-gnu. > > BR, > Kewen > ----- > PR rtl-optimization/116170 > > gcc/ChangeLog: > > * lra-constraints.cc (curr_insn_transform): Don't emit move back to > old operand if it's nonimmediate_operand. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr116170.c: New test. > --- > gcc/lra-constraints.cc | 3 ++- > gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++ > 2 files changed, 20 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc > index 92b343fa99a..024c85c87d9 100644 > --- a/gcc/lra-constraints.cc > +++ b/gcc/lra-constraints.cc > @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p) > } > *loc = new_reg; > if (type != OP_IN > - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) > + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX > + && nonimmediate_operand (old, GET_MODE (old))) ...IMO it would be better to use CONSTANT_P instead. It's sometimes useful to accept constants in specific instructions only, meaning that they aren't general enough to be immediate_operands. Thanks, Richard > { > start_sequence (); > lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); > diff --git a/gcc/testsuite/gcc.target/powerpc/pr116170.c b/gcc/testsuite/gcc.target/powerpc/pr116170.c > new file mode 100644 > index 00000000000..6f6ca0f1ae9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr116170.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target ppc_float128_sw } */ > +/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong -ffloat-store" } */ > + > +/* Verify there is no ICE. */ > + > +int a, d; > +_Float128 b, c; > +void > +e () > +{ > + int f = 0; > + if (a) > + if (b || c) > + f = 1; > + if (d) > + e (f ? 0 : b); > +} > -- > 2.43.5
Hi Richard, Thanks for the comments! on 2024/8/12 16:55, Richard Sandiford wrote: > "Kewen.Lin" <linkw@linux.ibm.com> writes: >> Hi, >> >> Commit r15-2084 exposes one ICE in LRA. Firstly, before >> r15-2084 KFmode has 126 bit precision while V1TImode has 128 >> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is >> paradoxical_subreg_p, which stops some passes from doing >> some optimization. After r15-2084, KFmode has the same mode >> precision as V1TImode, passes are able to optimize more, but >> it causes this ICE in LRA as described below: >> >> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)), >> which matches pattern >> >> (define_insn "*vsx_le_perm_store_<mode>" >> [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") >> (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] > > Is it well-formed to have "+" on an operand that is only semantically > an input? df and most passes would consider it to be a pure input > and so would disregard any write that is intended to take place. > > E.g. if we have: > > (set (reg:M R2) (reg:M R1)) ;; R1 dead > (set (reg:M R3) (reg:M R2)) > > then it would be valid to change that to: > > (set (reg:M R2) (reg:M R1)) > (set (reg:M R3) (reg:M R1)) > > without considering the "+" on the input to the first instruction. Good question, I guess the "+" is to match the fact that operand[1] can be both read and written by this insn, operand[1] has to be re-used as the dest register of vector rotate 64bit (doubleword swap). but it'll get swapped back so it's safe if the register is still live (like the above example). For this case that writing to but later restoring, I'm not sure if we can take it as "no write" (strip "+"). ;; The post-reload split requires that we re-permute the source ;; register in case it is still live. (define_split [(set (match_operand:VSX_LE_128 0 "memory_operand") (match_operand:VSX_LE_128 1 "vsx_register_operand"))] "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" [(const_int 0)] { rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode); rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); DONE; }) CC the author Mike as well. > > But if we do stick with the patch... > >> "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR >> && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" >> "@ >> # >> #" >> [(set_attr "type" "vecstore,store") >> (set_attr "length" "12,8") >> (set_attr "isa" "<VSisa>,*")]) >> >> LRA makes equivalence substitution on r133 with const double >> (const_double:KF 0.0), selects alternative 0 and fixes up >> operand 1 for constraint "wa", because operand 1 is OP_INOUT, >> so it considers assigning back to it as well, that is: >> >> lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); >> >> But because old has been changed to const_double in equivalence >> substitution, the move is actually assigning to const_double, >> which is invalid and cause ICE. >> >> Considering reg:KF 133 is equivalent with (const_double:KF 0.0) >> even though this operand is OP_INOUT, IMHO there should not be >> any following uses of reg:KF 133, otherwise it doesn't have the >> chance to be equivalent to (const_double:KF 0.0). From this >> perspective, I think we can guard the lra_emit_move with >> nonimmediate_operand to exclude such case. >> >> Does it sound reasonable? >> >> btw, I also tried with disallowing equivalence substitution with >> CONSTANT_P value if the corresponding operand is OP_INOUT or >> OP_OUT, it can also fix this issue, but with more thinking it >> seems not necessary to stop such substitution if we can handle it >> later as above. >> >> Bootstrapped and regtested on x86_64-redhat-linux and >> powerpc64{,le}-linux-gnu. >> >> BR, >> Kewen >> ----- >> PR rtl-optimization/116170 >> >> gcc/ChangeLog: >> >> * lra-constraints.cc (curr_insn_transform): Don't emit move back to >> old operand if it's nonimmediate_operand. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr116170.c: New test. >> --- >> gcc/lra-constraints.cc | 3 ++- >> gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++ >> 2 files changed, 20 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c >> >> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc >> index 92b343fa99a..024c85c87d9 100644 >> --- a/gcc/lra-constraints.cc >> +++ b/gcc/lra-constraints.cc >> @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p) >> } >> *loc = new_reg; >> if (type != OP_IN >> - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) >> + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX >> + && nonimmediate_operand (old, GET_MODE (old))) > > ...IMO it would be better to use CONSTANT_P instead. It's sometimes > useful to accept constants in specific instructions only, meaning that > they aren't general enough to be immediate_operands. OK, will do. I used "CONSTANT_P" in the 1st draft, but considered it's guarding a move, by checking many existing mov optab predicates (they only accept general operands) so felt nonimmediate_operand seems to be a better fit here. BR, Kewen
"Kewen.Lin" <linkw@linux.ibm.com> writes: > Hi Richard, > > Thanks for the comments! > > on 2024/8/12 16:55, Richard Sandiford wrote: >> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>> Hi, >>> >>> Commit r15-2084 exposes one ICE in LRA. Firstly, before >>> r15-2084 KFmode has 126 bit precision while V1TImode has 128 >>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is >>> paradoxical_subreg_p, which stops some passes from doing >>> some optimization. After r15-2084, KFmode has the same mode >>> precision as V1TImode, passes are able to optimize more, but >>> it causes this ICE in LRA as described below: >>> >>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)), >>> which matches pattern >>> >>> (define_insn "*vsx_le_perm_store_<mode>" >>> [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") >>> (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] >> >> Is it well-formed to have "+" on an operand that is only semantically >> an input? df and most passes would consider it to be a pure input >> and so would disregard any write that is intended to take place. >> >> E.g. if we have: >> >> (set (reg:M R2) (reg:M R1)) ;; R1 dead >> (set (reg:M R3) (reg:M R2)) >> >> then it would be valid to change that to: >> >> (set (reg:M R2) (reg:M R1)) >> (set (reg:M R3) (reg:M R1)) >> >> without considering the "+" on the input to the first instruction. > > Good question, I guess the "+" is to match the fact that operand[1] > can be both read and written by this insn, operand[1] has to be > re-used as the dest register of vector rotate 64bit (doubleword swap). > but it'll get swapped back so it's safe if the register is still live > (like the above example). For this case that writing to but later > restoring, I'm not sure if we can take it as "no write" (strip "+"). Yeah. AIUI, there is no way of satisfying the constraints in a way that makes operand 0 overlap operand 1, and: > > ;; The post-reload split requires that we re-permute the source > ;; register in case it is still live. > (define_split > [(set (match_operand:VSX_LE_128 0 "memory_operand") > (match_operand:VSX_LE_128 1 "vsx_register_operand"))] > "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR > && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" > [(const_int 0)] > { > rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); > rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode); > rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); would not work correctly for that kind of overlap in any case. So it looks on the face of it like the pattern would be (more) correct without the "+". Thanks, Richard > DONE; > }) > > CC the author Mike as well. > >> >> But if we do stick with the patch... >> >>> "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR >>> && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" >>> "@ >>> # >>> #" >>> [(set_attr "type" "vecstore,store") >>> (set_attr "length" "12,8") >>> (set_attr "isa" "<VSisa>,*")]) >>> >>> LRA makes equivalence substitution on r133 with const double >>> (const_double:KF 0.0), selects alternative 0 and fixes up >>> operand 1 for constraint "wa", because operand 1 is OP_INOUT, >>> so it considers assigning back to it as well, that is: >>> >>> lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); >>> >>> But because old has been changed to const_double in equivalence >>> substitution, the move is actually assigning to const_double, >>> which is invalid and cause ICE. >>> >>> Considering reg:KF 133 is equivalent with (const_double:KF 0.0) >>> even though this operand is OP_INOUT, IMHO there should not be >>> any following uses of reg:KF 133, otherwise it doesn't have the >>> chance to be equivalent to (const_double:KF 0.0). From this >>> perspective, I think we can guard the lra_emit_move with >>> nonimmediate_operand to exclude such case. >>> >>> Does it sound reasonable? >>> >>> btw, I also tried with disallowing equivalence substitution with >>> CONSTANT_P value if the corresponding operand is OP_INOUT or >>> OP_OUT, it can also fix this issue, but with more thinking it >>> seems not necessary to stop such substitution if we can handle it >>> later as above. >>> >>> Bootstrapped and regtested on x86_64-redhat-linux and >>> powerpc64{,le}-linux-gnu. >>> >>> BR, >>> Kewen >>> ----- >>> PR rtl-optimization/116170 >>> >>> gcc/ChangeLog: >>> >>> * lra-constraints.cc (curr_insn_transform): Don't emit move back to >>> old operand if it's nonimmediate_operand. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/pr116170.c: New test. >>> --- >>> gcc/lra-constraints.cc | 3 ++- >>> gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++ >>> 2 files changed, 20 insertions(+), 1 deletion(-) >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c >>> >>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc >>> index 92b343fa99a..024c85c87d9 100644 >>> --- a/gcc/lra-constraints.cc >>> +++ b/gcc/lra-constraints.cc >>> @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p) >>> } >>> *loc = new_reg; >>> if (type != OP_IN >>> - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) >>> + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX >>> + && nonimmediate_operand (old, GET_MODE (old))) >> >> ...IMO it would be better to use CONSTANT_P instead. It's sometimes >> useful to accept constants in specific instructions only, meaning that >> they aren't general enough to be immediate_operands. > > OK, will do. I used "CONSTANT_P" in the 1st draft, but considered it's > guarding a move, by checking many existing mov optab predicates (they > only accept general operands) so felt nonimmediate_operand seems to be a > better fit here. > > BR, > Kewen
On 8/11/24 21:50, Kewen.Lin wrote: > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc > index 92b343fa99a..f355c6c6168 100644 > --- a/gcc/lra-constraints.cc > +++ b/gcc/lra-constraints.cc > @@ -4742,7 +4742,9 @@ curr_insn_transform (bool check_only_p) > } > *loc = new_reg; > if (type != OP_IN > - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) > + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX > + /* OLD can be an equivalent constant here. */ > + && nonimmediate_operand (old, GET_MODE (old))) > { > start_sequence (); > lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); > > Does it look good to you? Or did I miss something here? > The comment looks ok to me. Also I am agree with Richard Sandiford's remark that CONSTANT_P instead of nonimmediate_operand would be a better condition. It is more safe. With CONSTANT_P change, the patch can be already pushed into the trunk. Thank you.
Hi! On Mon, Aug 12, 2024 at 09:55:07AM +0100, Richard Sandiford wrote: > "Kewen.Lin" <linkw@linux.ibm.com> writes: > > (define_insn "*vsx_le_perm_store_<mode>" > > [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") > > (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] > > Is it well-formed to have "+" on an operand that is only semantically > an input? df and most passes would consider it to be a pure input > and so would disregard any write that is intended to take place. It is well-formed, but it doesn't make sense, indeed :-) Named patterns have requirements on their arguments, but everything else is whatever the target wants :-) Hopefully *no* passes will consider this a pure input, we have that "+" after all! It would be better if it wasn't there, sure. Segher
on 2024/8/12 21:05, Vladimir Makarov wrote: > On 8/11/24 21: 50, Kewen. Lin wrote: diff --git a/gcc/lra-constraints. cc b/gcc/lra-constraints. cc index 92b343fa99a. . f355c6c6168 100644 --- a/gcc/lra-constraints. cc +++ b/gcc/lra-constraints. cc @@ -4742,7 +4742,9 @@ curr_insn_transform (bool check_only_p) > > > > On 8/11/24 21:50, Kewen.Lin wrote: >> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc >> index 92b343fa99a..f355c6c6168 100644 >> --- a/gcc/lra-constraints.cc >> +++ b/gcc/lra-constraints.cc >> @@ -4742,7 +4742,9 @@ curr_insn_transform (bool check_only_p) >> } >> *loc = new_reg; >> if (type != OP_IN >> - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) >> + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX >> + /* OLD can be an equivalent constant here. */ >> + && nonimmediate_operand (old, GET_MODE (old))) >> { >> start_sequence (); >> lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); >> >> Does it look good to you? Or did I miss something here? >> > The comment looks ok to me. > > Also I am agree with Richard Sandiford's remark that CONSTANT_P instead of nonimmediate_operand would be a better condition. It is more safe. With CONSTANT_P change, the patch can be already pushed into the trunk. Thanks Vladimir! Pushed with CONSTANT_P change as r15-2899-g49d5e21d41aed8. BR, Kewen
on 2024/8/12 21:02, Richard Sandiford wrote: > "Kewen.Lin" <linkw@linux.ibm.com> writes: >> Hi Richard, >> >> Thanks for the comments! >> >> on 2024/8/12 16:55, Richard Sandiford wrote: >>> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>>> Hi, >>>> >>>> Commit r15-2084 exposes one ICE in LRA. Firstly, before >>>> r15-2084 KFmode has 126 bit precision while V1TImode has 128 >>>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is >>>> paradoxical_subreg_p, which stops some passes from doing >>>> some optimization. After r15-2084, KFmode has the same mode >>>> precision as V1TImode, passes are able to optimize more, but >>>> it causes this ICE in LRA as described below: >>>> >>>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)), >>>> which matches pattern >>>> >>>> (define_insn "*vsx_le_perm_store_<mode>" >>>> [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") >>>> (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] >>> >>> Is it well-formed to have "+" on an operand that is only semantically >>> an input? df and most passes would consider it to be a pure input >>> and so would disregard any write that is intended to take place. >>> >>> E.g. if we have: >>> >>> (set (reg:M R2) (reg:M R1)) ;; R1 dead >>> (set (reg:M R3) (reg:M R2)) >>> >>> then it would be valid to change that to: >>> >>> (set (reg:M R2) (reg:M R1)) >>> (set (reg:M R3) (reg:M R1)) >>> >>> without considering the "+" on the input to the first instruction. >> >> Good question, I guess the "+" is to match the fact that operand[1] >> can be both read and written by this insn, operand[1] has to be >> re-used as the dest register of vector rotate 64bit (doubleword swap). >> but it'll get swapped back so it's safe if the register is still live >> (like the above example). For this case that writing to but later >> restoring, I'm not sure if we can take it as "no write" (strip "+"). > > Yeah. AIUI, there is no way of satisfying the constraints in a way > that makes operand 0 overlap operand 1, and: > >> >> ;; The post-reload split requires that we re-permute the source >> ;; register in case it is still live. >> (define_split >> [(set (match_operand:VSX_LE_128 0 "memory_operand") >> (match_operand:VSX_LE_128 1 "vsx_register_operand"))] >> "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR >> && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" >> [(const_int 0)] >> { >> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); >> rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode); >> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); > > would not work correctly for that kind of overlap in any case. Guessing you meant the case that both operands are REGs? I may miss something, but the case here op0 is MEM and op1 is REG, is it still possible to overlap? > > So it looks on the face of it like the pattern would be (more) correct > without the "+". > I just posted a patch to get rid of the "+" at [1], thanks for your insightful comments!! [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660238.html BR, Kewen
"Kewen.Lin" <linkw@linux.ibm.com> writes: > on 2024/8/12 21:02, Richard Sandiford wrote: >> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>> Hi Richard, >>> >>> Thanks for the comments! >>> >>> on 2024/8/12 16:55, Richard Sandiford wrote: >>>> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>>>> Hi, >>>>> >>>>> Commit r15-2084 exposes one ICE in LRA. Firstly, before >>>>> r15-2084 KFmode has 126 bit precision while V1TImode has 128 >>>>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is >>>>> paradoxical_subreg_p, which stops some passes from doing >>>>> some optimization. After r15-2084, KFmode has the same mode >>>>> precision as V1TImode, passes are able to optimize more, but >>>>> it causes this ICE in LRA as described below: >>>>> >>>>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)), >>>>> which matches pattern >>>>> >>>>> (define_insn "*vsx_le_perm_store_<mode>" >>>>> [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") >>>>> (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] >>>> >>>> Is it well-formed to have "+" on an operand that is only semantically >>>> an input? df and most passes would consider it to be a pure input >>>> and so would disregard any write that is intended to take place. >>>> >>>> E.g. if we have: >>>> >>>> (set (reg:M R2) (reg:M R1)) ;; R1 dead >>>> (set (reg:M R3) (reg:M R2)) >>>> >>>> then it would be valid to change that to: >>>> >>>> (set (reg:M R2) (reg:M R1)) >>>> (set (reg:M R3) (reg:M R1)) >>>> >>>> without considering the "+" on the input to the first instruction. >>> >>> Good question, I guess the "+" is to match the fact that operand[1] >>> can be both read and written by this insn, operand[1] has to be >>> re-used as the dest register of vector rotate 64bit (doubleword swap). >>> but it'll get swapped back so it's safe if the register is still live >>> (like the above example). For this case that writing to but later >>> restoring, I'm not sure if we can take it as "no write" (strip "+"). >> >> Yeah. AIUI, there is no way of satisfying the constraints in a way >> that makes operand 0 overlap operand 1, and: >> >>> >>> ;; The post-reload split requires that we re-permute the source >>> ;; register in case it is still live. >>> (define_split >>> [(set (match_operand:VSX_LE_128 0 "memory_operand") >>> (match_operand:VSX_LE_128 1 "vsx_register_operand"))] >>> "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR >>> && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" >>> [(const_int 0)] >>> { >>> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); >>> rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode); >>> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); >> >> would not work correctly for that kind of overlap in any case. > > Guessing you meant the case that both operands are REGs? I may miss > something, but the case here op0 is MEM and op1 is REG, is it still > possible to overlap? Not unless the source register can be used in addresses, which I assume can't happen for vector registers. That's why I said "AIUI, there is no way of satisfying the constraints in a way that makes operand 0 overlap operand 1". Thanks, Richard
on 2024/8/13 18:02, Richard Sandiford wrote: > "Kewen.Lin" <linkw@linux.ibm.com> writes: >> on 2024/8/12 21:02, Richard Sandiford wrote: >>> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>>> Hi Richard, >>>> >>>> Thanks for the comments! >>>> >>>> on 2024/8/12 16:55, Richard Sandiford wrote: >>>>> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>>>>> Hi, >>>>>> >>>>>> Commit r15-2084 exposes one ICE in LRA. Firstly, before >>>>>> r15-2084 KFmode has 126 bit precision while V1TImode has 128 >>>>>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is >>>>>> paradoxical_subreg_p, which stops some passes from doing >>>>>> some optimization. After r15-2084, KFmode has the same mode >>>>>> precision as V1TImode, passes are able to optimize more, but >>>>>> it causes this ICE in LRA as described below: >>>>>> >>>>>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)), >>>>>> which matches pattern >>>>>> >>>>>> (define_insn "*vsx_le_perm_store_<mode>" >>>>>> [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") >>>>>> (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] >>>>> >>>>> Is it well-formed to have "+" on an operand that is only semantically >>>>> an input? df and most passes would consider it to be a pure input >>>>> and so would disregard any write that is intended to take place. >>>>> >>>>> E.g. if we have: >>>>> >>>>> (set (reg:M R2) (reg:M R1)) ;; R1 dead >>>>> (set (reg:M R3) (reg:M R2)) >>>>> >>>>> then it would be valid to change that to: >>>>> >>>>> (set (reg:M R2) (reg:M R1)) >>>>> (set (reg:M R3) (reg:M R1)) >>>>> >>>>> without considering the "+" on the input to the first instruction. >>>> >>>> Good question, I guess the "+" is to match the fact that operand[1] >>>> can be both read and written by this insn, operand[1] has to be >>>> re-used as the dest register of vector rotate 64bit (doubleword swap). >>>> but it'll get swapped back so it's safe if the register is still live >>>> (like the above example). For this case that writing to but later >>>> restoring, I'm not sure if we can take it as "no write" (strip "+"). >>> >>> Yeah. AIUI, there is no way of satisfying the constraints in a way >>> that makes operand 0 overlap operand 1, and: >>> >>>> >>>> ;; The post-reload split requires that we re-permute the source >>>> ;; register in case it is still live. >>>> (define_split >>>> [(set (match_operand:VSX_LE_128 0 "memory_operand") >>>> (match_operand:VSX_LE_128 1 "vsx_register_operand"))] >>>> "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR >>>> && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" >>>> [(const_int 0)] >>>> { >>>> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); >>>> rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode); >>>> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); >>> >>> would not work correctly for that kind of overlap in any case. >> >> Guessing you meant the case that both operands are REGs? I may miss >> something, but the case here op0 is MEM and op1 is REG, is it still >> possible to overlap? > > Not unless the source register can be used in addresses, which I assume > can't happen for vector registers. > > That's why I said "AIUI, there is no way of satisfying the constraints > in a way that makes operand 0 overlap operand 1". Got it, thanks for clarifying! BR, Kewen
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index 92b343fa99a..024c85c87d9 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lra-constraints.cc @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p) } *loc = new_reg; if (type != OP_IN - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX + && nonimmediate_operand (old, GET_MODE (old))) { start_sequence (); lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); diff --git a/gcc/testsuite/gcc.target/powerpc/pr116170.c b/gcc/testsuite/gcc.target/powerpc/pr116170.c new file mode 100644 index 00000000000..6f6ca0f1ae9 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr116170.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_sw } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong -ffloat-store" } */ + +/* Verify there is no ICE. */ + +int a, d; +_Float128 b, c; +void +e () +{ + int f = 0; + if (a) + if (b || c) + f = 1; + if (d) + e (f ? 0 : b); +}