Message ID | 54635096.1040508@arm.com |
---|---|
State | New |
Headers | show |
Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) using this snippet on top of original patch; no regressions. Alan Lawrence wrote: > So I'm no expert on RS6000 here, but following on from Segher's observation > about the change in pattern...so the difference in 'expand' is exactly that, a > vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a > vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by combining > the two previous insns. > > However, inspecting the logs from -fdump-rtl-combine-all, *without* my patch, > when the combiner tries to put those two together, I see: > > Trying 30 -> 31: > Failed to match this instruction: > (set (reg:DF 179 [ stmp_s_5.7D.2196 ]) > (vec_select:DF (plus:V2DF (vec_select:V2DF (reg:V2DF 173 [ vect_s_5.6D.2195 ]) > (parallel [ > (const_int 1 [0x1]) > (const_int 0 [0]) > ])) > (reg:V2DF 173 [ vect_s_5.6D.2195 ])) > (parallel [ > (const_int 1 [0x1]) > ]))) > > That is, it looks like combine_simplify_rtx has transformed the (vec_concat > (vec_select ... 1) (vec_select ... 0)) from the vsx_reduc_plus_v2df insn, into a > single vec_select, which does not match the vsx_reduc_plus_v2df_scalar insn. > > So despite the comment (in vsx.md): > > ;; Combiner patterns with the vector reduction patterns that knows we can get > ;; to the top element of the V2DF array without doing an extract. > > It looks like the code generation prior to my patch, considered better, was > because the combiner didn't actually use the pattern? > > In that case whilst you may want to dig into register allocation, > cannot_change_mode_class, etc., for other reasons, I think the best fix for > migrating to reduc_plus_scal... is simply to avoid using the "Combiner" patterns > and just emit two insns, the old pattern followed by a vec_extract. The attached > snippet does this (I won't call it a patch yet, and it applies on top of the > previous patch - I went the route of calling the two gen functions rather than > copying their RTL sequences, but could do the latter if that were > preferable???), and restores code generation to the original form on your > example above; it bootstraps OK but I'm still running check-gcc on the Compile > Farm... > > However, again on your example above, I note that if I *remove* the > reduc_plus_scal_v2df pattern altogether, I get: > > .sum: > li 10,512 # 52 *movdi_internal64/4 [length = 4] > ld 9,.LC2@toc(2) # 20 *movdi_internal64/2 [length = 4] > xxlxor 0,0,0 # 17 *vsx_movv2df/12 [length = 4] > mtctr 10 # 48 *movdi_internal64/11 [length = 4] > .align 4 > .L2: > lxvd2x 12,0,9 # 23 *vsx_movv2df/2 [length = 4] > addi 9,9,16 # 25 *adddi3_internal1/2 [length = 4] > xvadddp 0,0,12 # 24 *vsx_addv2df3/1 [length = 4] > bdnz .L2 # 47 *ctrdi_internal1/1 [length = 4] > xxsldwi 12,0,0,2 # 30 vsx_xxsldwi_v2df [length = 4] > xvadddp 1,0,12 # 31 *vsx_addv2df3/1 [length = 4] > nop # 37 *vsx_extract_v2df_internal2/1 [length = 4] > blr # 55 return [length = 4] > > this is presumably using gcc's scalar reduction code, but (to my untrained eye > on powerpc!) it looks even better than the first form above (the same in the > loop, and in the reduction, an xxpermdi is replaced by a nop !)... > > --Alan > > > Segher Boessenkool wrote: >> On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote: >>> However, the double pattern is completely broken. This cannot go in. >> [snip] >> >>> It is unacceptable to have to do the inner loop doing a load, vector add, and >>> store in the loop. >> Before the patch, the final reduction used *vsx_reduc_splus_v2df; after >> the patch, it is *vsx_reduc_plus_v2df_scalar. The former does a vector >> add, the latter a float add. And it uses the same pseudoregister for the >> accumulator throughout. IRA decides a register is more expensive than >> memory for this, I suppose because it wants both V2DF and DF? It doesn't >> seem to like the subreg very much. >> >> The new code does look nicer otherwise :-) >> >> >> Segher
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index 54b18aa..18a7f08 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -1078,21 +1078,15 @@ ;; Vector reduction expanders for VSX (define_expand "reduc_<VEC_reduc_name>_scal_v2df" - [(parallel [(set (match_operand:DF 0 "vfloat_operand" "") - (vec_select:DF - (VEC_reduc:V2DF - (vec_concat:V2DF - (vec_select:DF - (match_operand:V2DF 1 "vfloat_operand" "") - (parallel [(const_int 1)])) - (vec_select:DF - (match_dup 1) - (parallel [(const_int 0)]))) - (match_dup 1)) - (parallel [(const_int 1)]))) - (clobber (match_scratch:DF 2 ""))])] + [(match_operand:DF 0 "register_operand" "") + (VEC_reduc:V2DF (match_operand:V2DF 1 "vfloat_operand" "") (const_int 0))] "VECTOR_UNIT_VSX_P (V2DFmode)" - "") + { + rtx vec = gen_reg_rtx (V2DFmode); + emit_insn (gen_vsx_reduc_<VEC_reduc_name>_v2df (vec, operand1)); + emit_insn (gen_vsx_extract_v2df (operand0, vec, const1_rtx)); + DONE; + }) ; The (VEC_reduc:V4SF ; (op1) diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 7aa0f12..8df6a45 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -2150,7 +2150,7 @@ ;; Vector reduction insns and splitters -(define_insn_and_split "*vsx_reduc_<VEC_reduc_name>_v2df" +(define_insn_and_split "vsx_reduc_<VEC_reduc_name>_v2df" [(set (match_operand:V2DF 0 "vfloat_operand" "=&wd,&?wa,wd,?wa") (VEC_reduc:V2DF (vec_concat:V2DF