Message ID | 20180801222347.18903-2-rth@twiddle.net |
---|---|
State | New |
Headers | show |
Series | aarch64: Update ld.so for vector abi | expand |
On 01/08/18 23:23, rth@twiddle.net wrote: > From: Richard Henderson <richard.henderson@linaro.org> > > * sysdeps/aarch64/dl-trampoline.S (_dl_runtime_resolve): > Do not record unwind info for arguments; this is unneeded; > do not save x9 just to have a register to pair with x8; > properly include the 16 bytes of PLT stack into the unwind; > create a frame pointer with the spare stack slot; > rearrange the exit to only adjust the stack once. i thought the cfi annotations were needed for all registers in case the debugger wants to investigate register content across a _dl_runtime_resolve frame (possibly several frames up in the call stack), this may not be a common use case though and i don't know what's the convention in glibc asm, the compiler seems to emit annotation for all spilled registers with -g. > --- a/sysdeps/aarch64/dl-trampoline.S > +++ b/sysdeps/aarch64/dl-trampoline.S > @@ -32,7 +32,6 @@ > .text > .globl _dl_runtime_resolve > .type _dl_runtime_resolve, #function > - cfi_startproc > .align 2 > _dl_runtime_resolve: > /* AArch64 we get called with: > @@ -41,46 +40,24 @@ _dl_runtime_resolve: > [sp, #8] lr > [sp, #0] &PLTGOT[n] > */ > - > + cfi_startproc is there a problem keeping it at its original place above? the tlsdesc asm has cfi_startproc at the same place. otherwise the patch looks ok.
On 08/02/2018 12:03 PM, Szabolcs Nagy wrote: > On 01/08/18 23:23, rth@twiddle.net wrote: >> From: Richard Henderson <richard.henderson@linaro.org> >> >> * sysdeps/aarch64/dl-trampoline.S (_dl_runtime_resolve): >> Do not record unwind info for arguments; this is unneeded; >> do not save x9 just to have a register to pair with x8; >> properly include the 16 bytes of PLT stack into the unwind; >> create a frame pointer with the spare stack slot; >> rearrange the exit to only adjust the stack once. > > i thought the cfi annotations were needed for all registers > in case the debugger wants to investigate register content > across a _dl_runtime_resolve frame (possibly several frames > up in the call stack), However that's typically for the call-saved registers, where the compiler might save data in that register across the call. These are not call-saved registers. They are argument registers. There will not be any debug info that refers to them. > this may not be a common use case though > and i don't know what's the convention in glibc asm, the compiler > seems to emit annotation for all spilled registers with -g. Sure, because the compiler is spilling call-saved registers. The others it just clobbers with no annotation. >> - cfi_startproc >> .align 2 >> _dl_runtime_resolve: >> /* AArch64 we get called with: >> @@ -41,46 +40,24 @@ _dl_runtime_resolve: >> [sp, #8] lr >> [sp, #0] &PLTGOT[n] >> */ >> - >> + cfi_startproc > > is there a problem keeping it at its original place above? > the tlsdesc asm has cfi_startproc at the same place. It could stay where it is, but I thought it clearer to place the following two annotations immediately adjacent (because it's state incoming, not anything we are doing here, and should not be separated from the start). Further, to place all of the annotations immediately after the comment that describes why. r~
On 02/08/18 18:04, Richard Henderson wrote: > On 08/02/2018 12:03 PM, Szabolcs Nagy wrote: >> On 01/08/18 23:23, rth@twiddle.net wrote: >>> From: Richard Henderson <richard.henderson@linaro.org> >>> >>> * sysdeps/aarch64/dl-trampoline.S (_dl_runtime_resolve): >>> Do not record unwind info for arguments; this is unneeded; >>> do not save x9 just to have a register to pair with x8; >>> properly include the 16 bytes of PLT stack into the unwind; >>> create a frame pointer with the spare stack slot; >>> rearrange the exit to only adjust the stack once. >> >> i thought the cfi annotations were needed for all registers >> in case the debugger wants to investigate register content >> across a _dl_runtime_resolve frame (possibly several frames >> up in the call stack), > > However that's typically for the call-saved registers, where the compiler might > save data in that register across the call. These are not call-saved > registers. They are argument registers. There will not be any debug info that > refers to them. > >> this may not be a common use case though >> and i don't know what's the convention in glibc asm, the compiler >> seems to emit annotation for all spilled registers with -g. > > Sure, because the compiler is spilling call-saved registers. > The others it just clobbers with no annotation. > >>> - cfi_startproc >>> .align 2 >>> _dl_runtime_resolve: >>> /* AArch64 we get called with: >>> @@ -41,46 +40,24 @@ _dl_runtime_resolve: >>> [sp, #8] lr >>> [sp, #0] &PLTGOT[n] >>> */ >>> - >>> + cfi_startproc >> >> is there a problem keeping it at its original place above? >> the tlsdesc asm has cfi_startproc at the same place. > > It could stay where it is, but I thought it clearer to place the following two > annotations immediately adjacent (because it's state incoming, not anything we > are doing here, and should not be separated from the start). Further, to place > all of the annotations immediately after the comment that describes why. > i see, then this patch is OK to commit, thanks.
diff --git a/sysdeps/aarch64/dl-trampoline.S b/sysdeps/aarch64/dl-trampoline.S index a86d0722d4..e8e2af485a 100644 --- a/sysdeps/aarch64/dl-trampoline.S +++ b/sysdeps/aarch64/dl-trampoline.S @@ -32,7 +32,6 @@ .text .globl _dl_runtime_resolve .type _dl_runtime_resolve, #function - cfi_startproc .align 2 _dl_runtime_resolve: /* AArch64 we get called with: @@ -41,46 +40,24 @@ _dl_runtime_resolve: [sp, #8] lr [sp, #0] &PLTGOT[n] */ - + cfi_startproc + cfi_adjust_cfa_offset(16) /* Incorporate PLT */ cfi_rel_offset (lr, 8) /* Save arguments. */ - stp x8, x9, [sp, #-(80+8*16)]! + stp x29, x8, [sp, #-(80+8*16)]! cfi_adjust_cfa_offset (80+8*16) - cfi_rel_offset (x8, 0) - cfi_rel_offset (x9, 8) + cfi_rel_offset (x29, 0) + mov x29, sp stp x6, x7, [sp, #16] - cfi_rel_offset (x6, 16) - cfi_rel_offset (x7, 24) - stp x4, x5, [sp, #32] - cfi_rel_offset (x4, 32) - cfi_rel_offset (x5, 40) - stp x2, x3, [sp, #48] - cfi_rel_offset (x2, 48) - cfi_rel_offset (x3, 56) - stp x0, x1, [sp, #64] - cfi_rel_offset (x0, 64) - cfi_rel_offset (x1, 72) - stp q0, q1, [sp, #(80+0*16)] - cfi_rel_offset (q0, 80+0*16) - cfi_rel_offset (q1, 80+1*16) - stp q2, q3, [sp, #(80+2*16)] - cfi_rel_offset (q0, 80+2*16) - cfi_rel_offset (q1, 80+3*16) - stp q4, q5, [sp, #(80+4*16)] - cfi_rel_offset (q0, 80+4*16) - cfi_rel_offset (q1, 80+5*16) - stp q6, q7, [sp, #(80+6*16)] - cfi_rel_offset (q0, 80+6*16) - cfi_rel_offset (q1, 80+7*16) /* Get pointer to linker struct. */ ldr PTR_REG (0), [ip0, #-PTR_SIZE] @@ -101,25 +78,26 @@ _dl_runtime_resolve: mov ip0, x0 /* Get arguments and return address back. */ - ldp q0, q1, [sp, #(80+0*16)] - ldp q2, q3, [sp, #(80+2*16)] - ldp q4, q5, [sp, #(80+4*16)] + ldr lr, [sp, #80+8*16+8] ldp q6, q7, [sp, #(80+6*16)] + ldp q4, q5, [sp, #(80+4*16)] + ldp q2, q3, [sp, #(80+2*16)] + ldp q0, q1, [sp, #(80+0*16)] ldp x0, x1, [sp, #64] ldp x2, x3, [sp, #48] ldp x4, x5, [sp, #32] ldp x6, x7, [sp, #16] - ldp x8, x9, [sp], #(80+8*16) - cfi_adjust_cfa_offset (-(80+8*16)) - - ldp ip1, lr, [sp], #16 - cfi_adjust_cfa_offset (-16) + ldp x29, x8, [sp], 80+8*16+16 + cfi_adjust_cfa_offset (-(80+8*16+16)) + cfi_restore (lr) + cfi_restore (x29) /* Jump to the newly found address. */ br ip0 cfi_endproc .size _dl_runtime_resolve, .-_dl_runtime_resolve + #ifndef PROF .globl _dl_runtime_profile .type _dl_runtime_profile, #function
From: Richard Henderson <richard.henderson@linaro.org> * sysdeps/aarch64/dl-trampoline.S (_dl_runtime_resolve): Do not record unwind info for arguments; this is unneeded; do not save x9 just to have a register to pair with x8; properly include the 16 bytes of PLT stack into the unwind; create a frame pointer with the spare stack slot; rearrange the exit to only adjust the stack once. --- sysdeps/aarch64/dl-trampoline.S | 50 +++++++++------------------------ 1 file changed, 14 insertions(+), 36 deletions(-)