Message ID | or7flxhw2r.fsf@livre.home |
---|---|
State | New |
Headers | show |
On Thu, Nov 5, 2015 at 6:08 AM, Alexandre Oliva <aoliva@redhat.com> wrote: > On Sep 23, 2015, Alexandre Oliva <aoliva@redhat.com> wrote: > >> @@ -2982,38 +2887,39 @@ assign_parm_setup_block (struct assign_parm_data_all *all, > [snip] >> + if (GET_CODE (reg) != CONCAT) >> + stack_parm = reg; >> + else >> + /* This will use or allocate a stack slot that we'd rather >> + avoid. FIXME: Could we avoid it in more cases? */ >> + target_reg = reg; > > It turns out that we can, and that helps fixing PR67753. In the end, I > ended up using the ABI-reserved stack slot if there is one, but just > allocating an unsplit complex pseudo fixes all remaining cases that used > to require the allocation of a stack slot. Yay! > > As for pr67753 proper, we emitted the store of the PARALLEL entry_parm > into the stack parm only in the conversion seq, which was ultimately > emitted after the copy from stack_parm to target_reg that was supposed > to copy the value originally in entry_parm. So we copied an > uninitialized stack slot, and the subsequent store in the conversion seq > was optimized out as dead. > > This caused a number of regressions on hppa-linux-gnu. The fix for this > is to arrange for the copy to target_reg to be emitted in the conversion > seq if the copy to stack_parm was. I can't determine whether this fix > all reported regressions, but from visual inspection of the generated > code I'm pretty sure it fixes at least gcc.c-torture/execute/pr38969.c. > > > When we do NOT have an ABI-reserved stack slot, the store of the > PARALLEL entry_parm into the intermediate pseudo doesn't need to go in > the conversion seq (emit_group_store from a PARALLEL to a pseudo only > uses registers, according to another comment in function.c), so I've > simplified that case. > > > This was regstrapped on x86_64-linux-gnu, i686-linux-gnu, > ppc64-linux-gnu, ppc64el-linux-gnu, and cross-build-tested for all > targets for which I've tested the earlier patches in the patchset. > Ok to install? Ok. Thanks, Richard. > > > [PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg > > From: Alexandre Oliva <aoliva@redhat.com> > > In assign_parms_setup_block, the copy of args in PARALLELs from > entry_parm to stack_parm is deferred to the parm conversion insn seq, > but the copy from stack_parm to target_reg was inserted in the normal > copy seq, that is executed before the conversion insn seq. Oops. > > We could do away with the need for an actual stack_parm in general, > which would have avoided the need for emitting the copy to target_reg > in the conversion seq, but at least on pa, due to the need for stack > to copy between SI and SF modes, it seems like using the reserved > stack slot is beneficial, so I put in logic to use a pre-reserved > stack slot when there is one, and emit the copy to target_reg in the > conversion seq if stack_parm was set up there. > > for gcc/ChangeLog > > PR rtl-optimization/67753 > PR rtl-optimization/64164 > * function.c (assign_parm_setup_block): Avoid allocating a > stack slot if we don't have an ABI-reserved one. Emit the > copy to target_reg in the conversion seq if the copy from > entry_parm is in it too. Don't use the conversion seq to copy > a PARALLEL to a REG or a CONCAT. > --- > gcc/function.c | 39 ++++++++++++++++++++++++++++++++++----- > 1 file changed, 34 insertions(+), 5 deletions(-) > > diff --git a/gcc/function.c b/gcc/function.c > index aaf49a4..156c72b 100644 > --- a/gcc/function.c > +++ b/gcc/function.c > @@ -2879,6 +2879,7 @@ assign_parm_setup_block (struct assign_parm_data_all *all, > rtx entry_parm = data->entry_parm; > rtx stack_parm = data->stack_parm; > rtx target_reg = NULL_RTX; > + bool in_conversion_seq = false; > HOST_WIDE_INT size; > HOST_WIDE_INT size_stored; > > @@ -2895,9 +2896,23 @@ assign_parm_setup_block (struct assign_parm_data_all *all, > if (GET_CODE (reg) != CONCAT) > stack_parm = reg; > else > - /* This will use or allocate a stack slot that we'd rather > - avoid. FIXME: Could we avoid it in more cases? */ > - target_reg = reg; > + { > + target_reg = reg; > + /* Avoid allocating a stack slot, if there isn't one > + preallocated by the ABI. It might seem like we should > + always prefer a pseudo, but converting between > + floating-point and integer modes goes through the stack > + on various machines, so it's better to use the reserved > + stack slot than to risk wasting it and allocating more > + for the conversion. */ > + if (stack_parm == NULL_RTX) > + { > + int save = generating_concat_p; > + generating_concat_p = 0; > + stack_parm = gen_reg_rtx (mode); > + generating_concat_p = save; > + } > + } > data->stack_parm = NULL; > } > > @@ -2938,7 +2953,9 @@ assign_parm_setup_block (struct assign_parm_data_all *all, > mem = validize_mem (copy_rtx (stack_parm)); > > /* Handle values in multiple non-contiguous locations. */ > - if (GET_CODE (entry_parm) == PARALLEL) > + if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (mem)) > + emit_group_store (mem, entry_parm, data->passed_type, size); > + else if (GET_CODE (entry_parm) == PARALLEL) > { > push_to_sequence2 (all->first_conversion_insn, > all->last_conversion_insn); > @@ -2946,6 +2963,7 @@ assign_parm_setup_block (struct assign_parm_data_all *all, > all->first_conversion_insn = get_insns (); > all->last_conversion_insn = get_last_insn (); > end_sequence (); > + in_conversion_seq = true; > } > > else if (size == 0) > @@ -3025,11 +3043,22 @@ assign_parm_setup_block (struct assign_parm_data_all *all, > all->first_conversion_insn = get_insns (); > all->last_conversion_insn = get_last_insn (); > end_sequence (); > + in_conversion_seq = true; > } > > if (target_reg) > { > - emit_move_insn (target_reg, stack_parm); > + if (!in_conversion_seq) > + emit_move_insn (target_reg, stack_parm); > + else > + { > + push_to_sequence2 (all->first_conversion_insn, > + all->last_conversion_insn); > + emit_move_insn (target_reg, stack_parm); > + all->first_conversion_insn = get_insns (); > + all->last_conversion_insn = get_last_insn (); > + end_sequence (); > + } > stack_parm = target_reg; > } > > > > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
On 05/11/15 05:08, Alexandre Oliva wrote: > [PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg > for gcc/ChangeLog > > PR rtl-optimization/67753 > PR rtl-optimization/64164 > * function.c (assign_parm_setup_block): Avoid allocating a > stack slot if we don't have an ABI-reserved one. Emit the > copy to target_reg in the conversion seq if the copy from > entry_parm is in it too. Don't use the conversion seq to copy > a PARALLEL to a REG or a CONCAT. Since this change, we have on aarch64_be: FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -O1 FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -O2 FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -O3 -g FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -Os FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -Og -g The difference in the assembler looks as follows (this is at -Og): func_return_val_10: - sub sp, sp, #16 - lsr x2, x1, 48 - lsr x1, x1, 32 + ubfx x2, x1, 16, 16 fmov x3, d0 // Start of user assembly // 23 "func-ret-4.c" 1 mov x0, x30 // 0 "" 2 // End of user assembly adrp x3, saved_return_address str x0, [x3, #:lo12:saved_return_address] adrp x0, myfunc add x0, x0, :lo12:myfunc // Start of user assembly // 23 "func-ret-4.c" 1 mov x30, x0 // 0 "" 2 // End of user assembly bfi w0, w2, 16, 16 bfi w0, w1, 0, 16 lsl x0, x0, 32 - add sp, sp, 16 (ubfx is a bitfield extract, the first immediate is the lsbit, the second the width. lsr = logical shift right.) And in the RTL dump, this (before the patch): (insn 4 3 5 2 (set (mem/c:DI (plus:DI (reg/f:DI 68 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 t+0 S8 A64]) (reg:DI 1 x1)) func-ret-4.c:23 -1 (nil)) (insn 5 4 6 2 (set (reg:HI 78 [ t ]) (mem/c:HI (plus:DI (reg/f:DI 68 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 t+0 S2 A64])) func-ret-4.c:23 -1 (nil)) (insn 6 5 7 2 (set (reg:HI 79 [ t+2 ]) (mem/c:HI (plus:DI (reg/f:DI 68 virtual-stack-vars) (const_int -6 [0xfffffffffffffffa])) [0 t+2 S2 A16])) func-ret-4.c:23 -1 (nil)) becomes (after the patch): (insn 4 3 5 2 (set (subreg:SI (reg:CHI 80) 0) (reg:SI 1 x1 [ t ])) func-ret-4.c:23 -1 (nil)) (insn 5 4 6 2 (set (reg:SI 81) (subreg:SI (reg:CHI 80) 0)) func-ret-4.c:23 -1 (nil)) (insn 6 5 7 2 (set (subreg:DI (reg:HI 82) 0) (zero_extract:DI (subreg:DI (reg:SI 81) 0) (const_int 16 [0x10]) (const_int 16 [0x10]))) func-ret-4.c:23 -1 (nil)) (insn 7 6 8 2 (set (reg:HI 78 [ t ]) (reg:HI 82)) func-ret-4.c:23 -1 (nil)) (insn 8 7 9 2 (set (reg:SI 83) (subreg:SI (reg:CHI 80) 0)) func-ret-4.c:23 -1 (nil)) (insn 9 8 10 2 (set (reg:HI 79 [ t+2 ]) (subreg:HI (reg:SI 83) 2)) func-ret-4.c:23 -1 (nil)) --Alan
diff --git a/gcc/function.c b/gcc/function.c index aaf49a4..156c72b 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -2879,6 +2879,7 @@ assign_parm_setup_block (struct assign_parm_data_all *all, rtx entry_parm = data->entry_parm; rtx stack_parm = data->stack_parm; rtx target_reg = NULL_RTX; + bool in_conversion_seq = false; HOST_WIDE_INT size; HOST_WIDE_INT size_stored; @@ -2895,9 +2896,23 @@ assign_parm_setup_block (struct assign_parm_data_all *all, if (GET_CODE (reg) != CONCAT) stack_parm = reg; else - /* This will use or allocate a stack slot that we'd rather - avoid. FIXME: Could we avoid it in more cases? */ - target_reg = reg; + { + target_reg = reg; + /* Avoid allocating a stack slot, if there isn't one + preallocated by the ABI. It might seem like we should + always prefer a pseudo, but converting between + floating-point and integer modes goes through the stack + on various machines, so it's better to use the reserved + stack slot than to risk wasting it and allocating more + for the conversion. */ + if (stack_parm == NULL_RTX) + { + int save = generating_concat_p; + generating_concat_p = 0; + stack_parm = gen_reg_rtx (mode); + generating_concat_p = save; + } + } data->stack_parm = NULL; } @@ -2938,7 +2953,9 @@ assign_parm_setup_block (struct assign_parm_data_all *all, mem = validize_mem (copy_rtx (stack_parm)); /* Handle values in multiple non-contiguous locations. */ - if (GET_CODE (entry_parm) == PARALLEL) + if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (mem)) + emit_group_store (mem, entry_parm, data->passed_type, size); + else if (GET_CODE (entry_parm) == PARALLEL) { push_to_sequence2 (all->first_conversion_insn, all->last_conversion_insn); @@ -2946,6 +2963,7 @@ assign_parm_setup_block (struct assign_parm_data_all *all, all->first_conversion_insn = get_insns (); all->last_conversion_insn = get_last_insn (); end_sequence (); + in_conversion_seq = true; } else if (size == 0) @@ -3025,11 +3043,22 @@ assign_parm_setup_block (struct assign_parm_data_all *all, all->first_conversion_insn = get_insns (); all->last_conversion_insn = get_last_insn (); end_sequence (); + in_conversion_seq = true; } if (target_reg) { - emit_move_insn (target_reg, stack_parm); + if (!in_conversion_seq) + emit_move_insn (target_reg, stack_parm); + else + { + push_to_sequence2 (all->first_conversion_insn, + all->last_conversion_insn); + emit_move_insn (target_reg, stack_parm); + all->first_conversion_insn = get_insns (); + all->last_conversion_insn = get_last_insn (); + end_sequence (); + } stack_parm = target_reg; }