Message ID | DB5PR08MB10304DAA91A309666E817B6483940@DB5PR08MB1030.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | [AArch64] Fix aarch64_ira_change_pseudo_allocno_class | expand |
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > A recent commit removing '*' from the md files caused a large regression > in h264ref. > It turns out aarch64_ira_change_pseudo_allocno_class is no longer > effective after the > SVE changes, and the combination results in the regression. This patch > fixes it by > using the new POINTER_AND_FP_REGS register class which is now used > instead of ALL_REGS. > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. > > Passes regress, OK for commit? > > Since it is a regression introduced in GCC8, OK to backport to GCC8? > > ChangeLog: > 2018-05-22 Wilco Dijkstra <wdijkstr@arm.com> > > * config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class): > Use POINTER_AND_FP_REGSinstead of ALL_REGS. > * config/aarch64/aarch64-simd.md (aarch64_get_lane): Increase > cost of r=w alternative. > -- > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index 2ebd256329c1a6a6b790d16955cbcee3feca456c..3d5fe44b53198a92afb726712c6e9dee890afe38 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2961,7 +2961,7 @@ (define_insn "*aarch64_get_lane_zero_extendsi<mode>" > ;; is guaranteed so upper bits should be considered undefined. > ;; RTL uses GCC vector extension indices throughout so flip only for assembly. > (define_insn "aarch64_get_lane<mode>" > - [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv") > + [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=?r, w, Utv") > (vec_select:<VEL> > (match_operand:VALL_F16 1 "register_operand" "w, w, w") > (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))] > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 47d98dfd095cdcd15908a86091cf2f8a4d6137b1..a119760c7f332aded200fa1b5bcfb1bbac7b6420 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -1059,16 +1059,17 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg) > } > > /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS. > - The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have > - the same cost even if ALL_REGS has a much larger cost. ALL_REGS is also > - used if the cost of both FP_REGS and GENERAL_REGS is lower than the memory > - cost (in this case the best class is the lowest cost one). Using ALL_REGS > - irrespectively of its cost results in bad allocations with many redundant > - int<->FP moves which are expensive on various cores. > - To avoid this we don't allow ALL_REGS as the allocno class, but force a > - decision between FP_REGS and GENERAL_REGS. We use the allocno class if it > - isn't ALL_REGS. Similarly, use the best class if it isn't ALL_REGS. > - Otherwise set the allocno class depending on the mode. > + The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and > + GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS has a much > + higher cost. POINTER_AND_FP_REGS is also used if the cost of both FP_REGS > + and GENERAL_REGS is lower than the memory cost (in this case the best class > + is the lowest cost one). Using POINTER_AND_FP_REGS irrespectively of its > + cost results in bad allocations with many redundant int<->FP moves which > + are expensive on various cores. > + To avoid this we don't allow POINTER_AND_FP_REGS as the allocno class, but > + force a decision between FP_REGS and GENERAL_REGS. We use the allocno class > + if it isn't POINTER_AND_FP_REGS. Similarly, use the best class if it isn't > + POINTER_AND_FP_REGS. Otherwise set the allocno class depending on the mode. > The result of this is that it is no longer inefficient to have a higher > memory move cost than the register move cost. > */ > @@ -1079,10 +1080,10 @@ aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class, > { > machine_mode mode; > > - if (allocno_class != ALL_REGS) > + if (allocno_class != POINTER_AND_FP_REGS) > return allocno_class; > > - if (best_class != ALL_REGS) > + if (best_class != POINTER_AND_FP_REGS) > return best_class; > > mode = PSEUDO_REGNO_MODE (regno); I think it'd be better to use !reg_class_subset_p (POINTER_AND_FP_REGS, ...) instead of ... != POINTER_AND_FP_REGS, since this in principle still applies to ALL_REGS too. FWIW, the patch looks good to me with that change. Thanks, Richard
Richard Sandiford wrote: > - if (allocno_class != ALL_REGS) > + if (allocno_class != POINTER_AND_FP_REGS) > return allocno_class; > > - if (best_class != ALL_REGS) > + if (best_class != POINTER_AND_FP_REGS) > return best_class; > > mode = PSEUDO_REGNO_MODE (regno); > I think it'd be better to use !reg_class_subset_p (POINTER_AND_FP_REGS, ...) > instead of ... != POINTER_AND_FP_REGS, since this in principle still applies > to ALL_REGS too. > > FWIW, the patch looks good to me with that change. How does reg_class_subset_p help? In my testing I didn't see ALL_REGS ever used (and I don't believe it's possible to get it with SVE either). And it's not obvious without looking at the implementation whether subset here means strict subset or not, so it would obfuscate the clear meaning of the existing patch. Wilco
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > Richard Sandiford wrote: >> - if (allocno_class != ALL_REGS) >> + if (allocno_class != POINTER_AND_FP_REGS) >> return allocno_class; >> >> - if (best_class != ALL_REGS) >> + if (best_class != POINTER_AND_FP_REGS) >> return best_class; >> >> mode = PSEUDO_REGNO_MODE (regno); > >> I think it'd be better to use !reg_class_subset_p (POINTER_AND_FP_REGS, ...) >> instead of ... != POINTER_AND_FP_REGS, since this in principle still applies >> to ALL_REGS too. >> >> FWIW, the patch looks good to me with that change. > > How does reg_class_subset_p help? In my testing I didn't see ALL_REGS ever > used (and I don't believe it's possible to get it with SVE either). And > it's not obvious > without looking at the implementation whether subset here means strict > subset or not, > so it would obfuscate the clear meaning of the existing patch. But I think the fact that we need this patch shows why hard-coding the names of union classes is dangerous. IMO the question isn't whether we see ALL_REGS used but whether there's a reason in principle why it wouldn't be used. E.g. ALL_REGS is the starting class for the best_class calculation, and LRA uses ALL_REGS as the default choice for scratch reload registers. It's not like we can claim that the testsuite will flag up if this goes wrong again, since AIUI there are no tests that show the reason we need to make this change. (I realise the patch includes an md change to keep the testsuite happy, but that's not the same thing. I mean more a test that shows why removing the '*'s made things worse, through no fault of its own.) Conceptually what we're saying here is that if the given classes include both GENERAL_REGS and FP_REGS, we'll choose between them based on the mode of the register. And that makes sense for any class that includes both GENERAL_REGS and FP_REGS. We could write it that way if it seems better, i.e.: if (!reg_class_subset_p (GENERAL_REGS, ...) || !reg_class_subset_p (FP_REGS, ...)) ... That way we don't mention any union classes, and I think the meaning is clear in the context of eventually returning GENERAL_REGS or FP_REGS. reg_class_subset_p tests for the normal inclusive subset relation rather than "strict subset". Thanks, Richard
Richard Sandiford wrote: > Conceptually what we're saying here is that if the given classes > include both GENERAL_REGS and FP_REGS, we'll choose between them > based on the mode of the register. And that makes sense for any > class that includes both GENERAL_REGS and FP_REGS. We could write > it that way if it seems better, i.e.: > > if (!reg_class_subset_p (GENERAL_REGS, ...) > || !reg_class_subset_p (FP_REGS, ...)) > ... > > That way we don't mention any union classes, and I think the meaning > is clear in the context of eventually returning GENERAL_REGS or FP_REGS. > > reg_class_subset_p tests for the normal inclusive subset relation > rather than "strict subset". Right, checking for a subset of GENERAL_REGS and FP_REGS does make sense and is more clear as well. It appears to behave identically, so here is the new version: A recent commit removing '*' from the md files caused a large regression in h264ref. It turns out aarch64_ira_change_pseudo_allocno_class is no longer effective after the SVE changes, and the combination results in the regression. This patch fixes it by explicitly checking for a subset of GENERAL_REGS and FP_REGS. Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. Passes regress, OK for commit? Since it is a regression introduced in GCC8, OK to backport to GCC8? ChangeLog: 2018-05-25 Wilco Dijkstra <wdijkstr@arm.com> * config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class): Check for subset of GENERAL_REGS and FP_REGS. * config/aarch64/aarch64-simd.md (aarch64_get_lane): Increase cost of r=w alternative. -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 2ebd256329c1a6a6b790d16955cbcee3feca456c..3d5fe44b53198a92afb726712c6e9dee890afe38 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2961,7 +2961,7 @@ (define_insn "*aarch64_get_lane_zero_extendsi<mode>" ;; is guaranteed so upper bits should be considered undefined. ;; RTL uses GCC vector extension indices throughout so flip only for assembly. (define_insn "aarch64_get_lane<mode>" - [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv") + [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=?r, w, Utv") (vec_select:<VEL> (match_operand:VALL_F16 1 "register_operand" "w, w, w") (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 47d98dfd095cdcd15908a86091cf2f8a4d6137b1..6e7722187f0f79195c8b6c43f463a3ac9aa61742 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1059,16 +1059,17 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg) } /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS. - The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have - the same cost even if ALL_REGS has a much larger cost. ALL_REGS is also - used if the cost of both FP_REGS and GENERAL_REGS is lower than the memory - cost (in this case the best class is the lowest cost one). Using ALL_REGS - irrespectively of its cost results in bad allocations with many redundant - int<->FP moves which are expensive on various cores. - To avoid this we don't allow ALL_REGS as the allocno class, but force a - decision between FP_REGS and GENERAL_REGS. We use the allocno class if it - isn't ALL_REGS. Similarly, use the best class if it isn't ALL_REGS. - Otherwise set the allocno class depending on the mode. + The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and + GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS has a much + higher cost. POINTER_AND_FP_REGS is also used if the cost of both FP_REGS + and GENERAL_REGS is lower than the memory cost (in this case the best class + is the lowest cost one). Using POINTER_AND_FP_REGS irrespectively of its + cost results in bad allocations with many redundant int<->FP moves which + are expensive on various cores. + To avoid this we don't allow POINTER_AND_FP_REGS as the allocno class, but + force a decision between FP_REGS and GENERAL_REGS. We use the allocno class + if it isn't POINTER_AND_FP_REGS. Similarly, use the best class if it isn't + POINTER_AND_FP_REGS. Otherwise set the allocno class depending on the mode. The result of this is that it is no longer inefficient to have a higher memory move cost than the register move cost. */ @@ -1079,10 +1080,12 @@ aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class, { machine_mode mode; - if (allocno_class != ALL_REGS) + if (reg_class_subset_p (allocno_class, GENERAL_REGS) + || reg_class_subset_p (allocno_class, FP_REGS)) return allocno_class; - if (best_class != ALL_REGS) + if (reg_class_subset_p (best_class, GENERAL_REGS) + || reg_class_subset_p (best_class, FP_REGS)) return best_class; mode = PSEUDO_REGNO_MODE (regno);
On Fri, May 25, 2018 at 08:16:03AM -0500, Wilco Dijkstra wrote: > Richard Sandiford wrote: > > > Conceptually what we're saying here is that if the given classes > > include both GENERAL_REGS and FP_REGS, we'll choose between them > > based on the mode of the register. And that makes sense for any > > class that includes both GENERAL_REGS and FP_REGS. We could write > > it that way if it seems better, i.e.: > > > > if (!reg_class_subset_p (GENERAL_REGS, ...) > > || !reg_class_subset_p (FP_REGS, ...)) > > ... > > > > That way we don't mention any union classes, and I think the meaning > > is clear in the context of eventually returning GENERAL_REGS or FP_REGS. > > > > reg_class_subset_p tests for the normal inclusive subset relation > > rather than "strict subset". > > Right, checking for a subset of GENERAL_REGS and FP_REGS does make sense > and is more clear as well. It appears to behave identically, so here is the new version: > > > A recent commit removing '*' from the md files caused a large regression in h264ref. > It turns out aarch64_ira_change_pseudo_allocno_class is no longer effective after the > SVE changes, and the combination results in the regression. This patch fixes it by > explicitly checking for a subset of GENERAL_REGS and FP_REGS. OK for trunk. > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. I'd prefer more detail than this for a workaround; which test, why did it start to fail, why is this the right solution, etc. Thanks, James > ChangeLog: > 2018-05-25 Wilco Dijkstra <wdijkstr@arm.com> > > * config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class): > Check for subset of GENERAL_REGS and FP_REGS. > * config/aarch64/aarch64-simd.md (aarch64_get_lane): Increase cost of r=w alternative.
James Greenhalgh wrote: > > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. > > > I'd prefer more detail than this for a workaround; which test, why did it > > start to fail, why is this the right solution, etc. It was gcc.target/aarch64/vect_copy_lane_1.c generating: test_copy_laneq_f64: umov x0, v1.d[1] fmov d0, x0 ret For some reason returning a double uses DImode temporaries, so it's essential to prefer FP_REGS here and mark the lane copy correctly. Wilco
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > James Greenhalgh wrote: > >> > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. >> >> > I'd prefer more detail than this for a workaround; which test, why did it >> > start to fail, why is this the right solution, etc. > > It was gcc.target/aarch64/vect_copy_lane_1.c generating: > > test_copy_laneq_f64: > umov x0, v1.d[1] > fmov d0, x0 > ret > > For some reason returning a double uses DImode temporaries, so it's essential > to prefer FP_REGS here and mark the lane copy correctly. The "?" change seems to make intrinsic sense given the extra cost of the GPR alternative. But I think the real reason for this failure is that we define no V1DF patterns, and target-independent code falls back to using moves in the corresponding *integer* mode. So for that function we generate the rather ugly code: (note 6 1 3 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 3 6 2 2 (clobber (reg/v:V1DF 92 [ aD.21157 ])) "vect_copy_lane_1.c":45 -1 (nil)) (insn 2 3 4 2 (set (subreg:DI (reg/v:V1DF 92 [ aD.21157 ]) 0) (reg:DI 32 v0 [ aD.21157 ])) "vect_copy_lane_1.c":45 47 {*movdi_aarch64} (nil)) (insn 4 2 5 2 (set (reg/v:V2DF 93 [ bD.21158 ]) (reg:V2DF 33 v1 [ bD.21158 ])) "vect_copy_lane_1.c":45 1063 {*aarch64_simd_movv2df} (nil)) (note 5 4 8 2 NOTE_INSN_FUNCTION_BEG) (insn 8 5 9 2 (set (reg:DF 95) (vec_select:DF (reg/v:V2DF 93 [ bD.21158 ]) (parallel [ (const_int 1 [0x1]) ]))) "./include/arm_neon.h":14441 1993 {aarch64_get_lanev2df} (nil)) (insn 9 8 11 2 (set (reg:DI 96) (subreg:DI (reg:DF 95) 0)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64} (nil)) (insn 11 9 10 2 (clobber (reg:V1DF 91 [ <retval> ])) "vect_copy_lane_1.c":45 -1 (nil)) (insn 10 11 15 2 (set (subreg:DI (reg:V1DF 91 [ <retval> ]) 0) (reg:DI 96)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64} (nil)) (insn 15 10 16 2 (set (reg:DI 32 v0) (subreg:DI (reg:V1DF 91 [ <retval> ]) 0)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64} (nil)) (insn 16 15 0 2 (use (reg/i:V1DF 32 v0)) "vect_copy_lane_1.c":45 -1 (nil)) which by IRA gets optimised to: (insn 9 8 15 2 (set (subreg:DF (reg:DI 96) 0) (vec_select:DF (reg:V2DF 33 v1 [ bD.21158 ]) (parallel [ (const_int 1 [0x1]) ]))) "vect_copy_lane_1.c":45 1993 {aarch64_get_lanev2df} (expr_list:REG_DEAD (reg:V2DF 33 v1 [ bD.21158 ]) (nil))) (insn 15 9 16 2 (set (reg:DI 32 v0) (reg:DI 96)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 96) (nil))) (insn 16 15 18 2 (use (reg/i:V1DF 32 v0)) "vect_copy_lane_1.c":45 -1 (nil)) with the move now being done purely in DImode. This defeats the heuristic in aarch64_ira_change_pseudo_allocno_class because the pseudo appears to be a normal integer rather than a (float) vector. Although the "?" fixes this particular instance, I think more complicated V1DF code would still regress by being forced to use GENERAL_REGS. Of course, the fix is to add the move pattern rather than disable the heuristic... Thanks, Richard
Richard Sandiford <richard.sandiford@linaro.org> > The "?" change seems to make intrinsic sense given the extra cost of the > GPR alternative. But I think the real reason for this failure is that > we define no V1DF patterns, and target-independent code falls back to > using moves in the corresponding *integer* mode. So for that function > we generate the rather ugly code: This: typedef struct { double x; } X; X f2(X *p) { return *p; } emits at expand: (insn 6 3 7 2 (set (reg:DF 90 [ D.21009 ]) (mem:DF (reg/v/f:DI 92 [ p ]) [2 *p_2(D)+0 S8 A64])) "vect_copy_lane_1.c":26 -1 (nil)) (insn 7 6 8 2 (set (subreg:DF (reg:DI 94) 0) (reg:DF 90 [ D.21009 ])) "vect_copy_lane_1.c":26 -1 (nil)) (insn 8 7 9 2 (set (reg:DI 95) (reg:DI 94)) "vect_copy_lane_1.c":26 -1 (nil)) (insn 9 8 13 2 (set (reg:DF 91 [ <retval> ]) (subreg:DF (reg:DI 95) 0)) "vect_copy_lane_1.c":26 -1 (nil)) So the underlying cause is the structure passing code. Things get worse when you return 2 doubles and it really becomes horrific at 3... Wilco
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > Richard Sandiford <richard.sandiford@linaro.org> >> The "?" change seems to make intrinsic sense given the extra cost of the >> GPR alternative. But I think the real reason for this failure is that >> we define no V1DF patterns, and target-independent code falls back to >> using moves in the corresponding *integer* mode. So for that function >> we generate the rather ugly code: > > This: > > typedef struct { double x; } X; > X f2(X *p) > { > return *p; > } > > emits at expand: > > (insn 6 3 7 2 (set (reg:DF 90 [ D.21009 ]) > (mem:DF (reg/v/f:DI 92 [ p ]) [2 *p_2(D)+0 S8 A64])) "vect_copy_lane_1.c":26 -1 > (nil)) > (insn 7 6 8 2 (set (subreg:DF (reg:DI 94) 0) > (reg:DF 90 [ D.21009 ])) "vect_copy_lane_1.c":26 -1 > (nil)) > (insn 8 7 9 2 (set (reg:DI 95) > (reg:DI 94)) "vect_copy_lane_1.c":26 -1 > (nil)) > (insn 9 8 13 2 (set (reg:DF 91 [ <retval> ]) > (subreg:DF (reg:DI 95) 0)) "vect_copy_lane_1.c":26 -1 > (nil)) > > So the underlying cause is the structure passing code. Things get > worse when you return 2 doubles and it really becomes horrific at 3... Yeah, the handling of structures can also be poor, but float64x1_t is a vector type rather than a structure, so I don't think the above is the problem in the specific case of test_copy_laneq_f64. float64x1_t has the TYPE_MODE we want (V1DF). But because we have no V1DF move pattern, it ends up being moved as a DI instead. Thanks, Richard
On 29 May 2018 at 19:34, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > James Greenhalgh wrote: > >> > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. >> >> > I'd prefer more detail than this for a workaround; which test, why did it >> > start to fail, why is this the right solution, etc. > > It was gcc.target/aarch64/vect_copy_lane_1.c generating: > > test_copy_laneq_f64: > umov x0, v1.d[1] > fmov d0, x0 > ret > > For some reason returning a double uses DImode temporaries, so it's essential > to prefer FP_REGS here and mark the lane copy correctly. > > Wilco > Hi Wilco, This has probably been reported elsewhere already but I can't find such a report, so sorry for possible duplicate, but this patch is causing ICEs on aarch64 FAIL: gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve (internal compiler error) FAIL: gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve (internal compiler error) and also many scan-assembler regressions: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html Can you check? Thanks Christophe
Christophe Lyon <christophe.lyon@linaro.org> writes: > On 29 May 2018 at 19:34, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: >> James Greenhalgh wrote: >> >>> > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. >>> >>> > I'd prefer more detail than this for a workaround; which test, why did it >>> > start to fail, why is this the right solution, etc. >> >> It was gcc.target/aarch64/vect_copy_lane_1.c generating: >> >> test_copy_laneq_f64: >> umov x0, v1.d[1] >> fmov d0, x0 >> ret >> >> For some reason returning a double uses DImode temporaries, so it's essential >> to prefer FP_REGS here and mark the lane copy correctly. >> >> Wilco >> > > Hi Wilco, > > This has probably been reported elsewhere already but I can't find > such a report, so sorry for possible duplicate, > but this patch is causing ICEs on aarch64 > FAIL: gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve > (internal compiler error) > FAIL: gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve > (internal compiler error) > > and also many scan-assembler regressions: > > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html Thanks for the heads-up. Looks like they're all SVE, so I'll take this. Richard
Richard Sandiford wrote: >> This has probably been reported elsewhere already but I can't find >> such a report, so sorry for possible duplicate, >> but this patch is causing ICEs on aarch64 >> FAIL: gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve >> (internal compiler error) >> FAIL: gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve >> (internal compiler error) >> >> and also many scan-assembler regressions: >> >> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html > > Thanks for the heads-up. Looks like they're all SVE, so I'll take this. It seems this is due to unnecessary spills of PR_REGS - the subset doesn't work for those. The original proposal doing: if (allocno_class != POINTER_AND_FP_REGS) return allocno_class; doesn't appear to affect SVE. However the question is whether the register allocator can get confused about PR_REGS and end up with POINTER_AND_FP_REGS for both the allocno_class and best_class? If so then the return needs to support predicate modes too. Wilco
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > Richard Sandiford wrote: > >>> This has probably been reported elsewhere already but I can't find >>> such a report, so sorry for possible duplicate, >>> but this patch is causing ICEs on aarch64 >>> FAIL: gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve >>> (internal compiler error) >>> FAIL: gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve >>> (internal compiler error) >>> >>> and also many scan-assembler regressions: >>> >>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html >> >> Thanks for the heads-up. Looks like they're all SVE, so I'll take this. > > It seems this is due to unnecessary spills of PR_REGS - the subset doesn't work for those. It does, but I'd originally suggested: if (!reg_class_subset_p (GENERAL_REGS, ...) || !reg_class_subset_p (FP_REGS, ...)) ...bail out... whereas the committed patch had: if (reg_class_subset_p (..., GENERAL_REGS) || reg_class_subset_p (..., FP_REGS)) ...bail out... That's an important difference. The idea with the first was that we should only make a choice between GENERAL_REGS and FP_REGS if the original classes included both of them. And that's what we want because the new class has to be a refinement of the original: it shouldn't include entirely new registers. The committed version instead says that we won't make a choice between GENERAL_REGS and FP_REGS if one of the classes is already specific to one of them. I think this would also lead to us changing POINTER_REGS to GENERAL_REGS, although I don't know how much that matters in practice. > The original proposal doing: > > if (allocno_class != POINTER_AND_FP_REGS) > return allocno_class; > > doesn't appear to affect SVE. However the question is whether the > register allocator can get confused about PR_REGS and end up with > POINTER_AND_FP_REGS for both the allocno_class and best_class? If so > then the return needs to support predicate modes too. Yeah, that shouldn't happen, since predicate modes are only allowed in predicate registers. I think the reduc_1 ICE is a separate bug that I'll post a patch for, but it goes latent again after the patch below. Tested on aarch64-linux-gnu. I don't think it can be called obvious given the above, and it's only SVE-specifc by chance, so: OK to install? Thanks, Richard 2018-05-31 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class): Fix subreg tests so that we only return a choice between GENERAL_REGS and FP_REGS if the original classes included both. Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2018-05-30 19:31:14.212387813 +0100 +++ gcc/config/aarch64/aarch64.c 2018-05-31 13:12:56.836974021 +0100 @@ -1108,12 +1108,12 @@ aarch64_ira_change_pseudo_allocno_class { machine_mode mode; - if (reg_class_subset_p (allocno_class, GENERAL_REGS) - || reg_class_subset_p (allocno_class, FP_REGS)) + if (!reg_class_subset_p (GENERAL_REGS, allocno_class) + || !reg_class_subset_p (FP_REGS, allocno_class)) return allocno_class; - if (reg_class_subset_p (best_class, GENERAL_REGS) - || reg_class_subset_p (best_class, FP_REGS)) + if (!reg_class_subset_p (GENERAL_REGS, best_class) + || !reg_class_subset_p (FP_REGS, best_class)) return best_class; mode = PSEUDO_REGNO_MODE (regno);
On Thu, May 31, 2018 at 07:23:29AM -0500, Richard Sandiford wrote: > Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > > Richard Sandiford wrote: > > > >>> This has probably been reported elsewhere already but I can't find > >>> such a report, so sorry for possible duplicate, > >>> but this patch is causing ICEs on aarch64 > >>> FAIL: gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve > >>> (internal compiler error) > >>> FAIL: gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve > >>> (internal compiler error) > >>> > >>> and also many scan-assembler regressions: > >>> > >>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html > >> > >> Thanks for the heads-up. Looks like they're all SVE, so I'll take this. > > > > It seems this is due to unnecessary spills of PR_REGS - the subset doesn't work for those. > > It does, but I'd originally suggested: > > if (!reg_class_subset_p (GENERAL_REGS, ...) > || !reg_class_subset_p (FP_REGS, ...)) > ...bail out... > > whereas the committed patch had: > > if (reg_class_subset_p (..., GENERAL_REGS) > || reg_class_subset_p (..., FP_REGS)) > ...bail out... > > That's an important difference. The idea with the first was that > we should only make a choice between GENERAL_REGS and FP_REGS > if the original classes included both of them. And that's what > we want because the new class has to be a refinement of the > original: it shouldn't include entirely new registers. > > The committed version instead says that we won't make a choice > between GENERAL_REGS and FP_REGS if one of the classes is already > specific to one of them. I think this would also lead to us changing > POINTER_REGS to GENERAL_REGS, although I don't know how much that > matters in practice. Sorry to have missed this detail in review. > > The original proposal doing: > > > > if (allocno_class != POINTER_AND_FP_REGS) > > return allocno_class; > > > > doesn't appear to affect SVE. However the question is whether the > > register allocator can get confused about PR_REGS and end up with > > POINTER_AND_FP_REGS for both the allocno_class and best_class? If so > > then the return needs to support predicate modes too. > > Yeah, that shouldn't happen, since predicate modes are only allowed in > predicate registers. > > I think the reduc_1 ICE is a separate bug that I'll post a patch for, > but it goes latent again after the patch below. > > Tested on aarch64-linux-gnu. I don't think it can be called obvious > given the above, and it's only SVE-specifc by chance, so: OK to install? This is OK for trunk. Thanks, James > 2018-05-31 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class): > Fix subreg tests so that we only return a choice between > GENERAL_REGS and FP_REGS if the original classes included both.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 2ebd256329c1a6a6b790d16955cbcee3feca456c..3d5fe44b53198a92afb726712c6e9dee890afe38 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2961,7 +2961,7 @@ (define_insn "*aarch64_get_lane_zero_extendsi<mode>" ;; is guaranteed so upper bits should be considered undefined. ;; RTL uses GCC vector extension indices throughout so flip only for assembly. (define_insn "aarch64_get_lane<mode>" - [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv") + [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=?r, w, Utv") (vec_select:<VEL> (match_operand:VALL_F16 1 "register_operand" "w, w, w") (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 47d98dfd095cdcd15908a86091cf2f8a4d6137b1..a119760c7f332aded200fa1b5bcfb1bbac7b6420 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1059,16 +1059,17 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg) } /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS. - The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have - the same cost even if ALL_REGS has a much larger cost. ALL_REGS is also - used if the cost of both FP_REGS and GENERAL_REGS is lower than the memory - cost (in this case the best class is the lowest cost one). Using ALL_REGS - irrespectively of its cost results in bad allocations with many redundant - int<->FP moves which are expensive on various cores. - To avoid this we don't allow ALL_REGS as the allocno class, but force a - decision between FP_REGS and GENERAL_REGS. We use the allocno class if it - isn't ALL_REGS. Similarly, use the best class if it isn't ALL_REGS. - Otherwise set the allocno class depending on the mode. + The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and + GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS has a much + higher cost. POINTER_AND_FP_REGS is also used if the cost of both FP_REGS + and GENERAL_REGS is lower than the memory cost (in this case the best class + is the lowest cost one). Using POINTER_AND_FP_REGS irrespectively of its + cost results in bad allocations with many redundant int<->FP moves which + are expensive on various cores. + To avoid this we don't allow POINTER_AND_FP_REGS as the allocno class, but + force a decision between FP_REGS and GENERAL_REGS. We use the allocno class + if it isn't POINTER_AND_FP_REGS. Similarly, use the best class if it isn't + POINTER_AND_FP_REGS. Otherwise set the allocno class depending on the mode. The result of this is that it is no longer inefficient to have a higher memory move cost than the register move cost. */ @@ -1079,10 +1080,10 @@ aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class, { machine_mode mode; - if (allocno_class != ALL_REGS) + if (allocno_class != POINTER_AND_FP_REGS) return allocno_class; - if (best_class != ALL_REGS) + if (best_class != POINTER_AND_FP_REGS) return best_class; mode = PSEUDO_REGNO_MODE (regno);