Message ID | 55E4C300.9080800@linaro.org |
---|---|
State | New |
Headers | show |
On Mon, 2015-08-31 at 18:11 -0300, Adhemerval Zanella wrote: > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/arm/syscall_cancel.S > @@ -0,0 +1,69 @@ > +/* Cancellable syscall wrapper - aarch64 version. I guess this should say "arm version". > + .thumb > + .syntax unified I think this (at least the .thumb part) should be inside #ifdef __thumb2__. Otherwise this will break compiling on ARMv4 which I think otherwise still works. > + > +ENTRY (__syscall_cancel_arch) > + .fnstart > + mov ip,sp > + stmfd sp!,{r4,r5,r6,r7,lr} > + .save {r4,r5,r6,r7,lr} Why are you stacking r7 and lr here? p.
Hi, Thanks for the review. On 01-09-2015 20:08, Phil Blundell wrote: > On Mon, 2015-08-31 at 18:11 -0300, Adhemerval Zanella wrote: >> --- /dev/null >> +++ b/sysdeps/unix/sysv/linux/arm/syscall_cancel.S >> @@ -0,0 +1,69 @@ >> +/* Cancellable syscall wrapper - aarch64 version. > > I guess this should say "arm version". Indeed, I will change that. > >> + .thumb >> + .syntax unified > > I think this (at least the .thumb part) should be inside #ifdef > __thumb2__. Otherwise this will break compiling on ARMv4 which I think > otherwise still works. Based on other ARM code I think only .thumb part, I will change that. > >> + >> +ENTRY (__syscall_cancel_arch) >> + .fnstart >> + mov ip,sp >> + stmfd sp!,{r4,r5,r6,r7,lr} >> + .save {r4,r5,r6,r7,lr} > > Why are you stacking r7 and lr here? Rechecking and for 'r7' it is not really required. However afaik to use the glibc cancellation mechanism based on libgcc unwind for arm it requires to at least save the 'lr'. I have not debug in details, but omitting lr save I see failures with this mechanism for tst-cancel{4,5,17,18}x. > > p. > >
On Wed, 2015-09-02 at 18:23 -0300, Adhemerval Zanella wrote: > Rechecking and for 'r7' it is not really required. However afaik to use > the glibc cancellation mechanism based on libgcc unwind for arm it > requires to at least save the 'lr'. I have not debug in details, but > omitting lr save I see failures with this mechanism for > tst-cancel{4,5,17,18}x. Ah right, fair enough. As a general rule we want to stack an even number of registers so that the stack pointer remains 64-bit aligned. If you just drop the pushing of r7 then you'd have four here. Also. some more questions: > + bl __syscall_do_cancel It might be nice to make it more obvious that this isn't expected to return. Right now it looks as though it will just fall off the end of __syscall_cancel_arch() which is a bit disconcerting. Either change "bl" to "b", or at a minimum add a comment about it. Is __syscall_cancel_arch() ever called from anywhere other than PSEUDO_CANCEL_AFTER()? If not, is there a reason that its contents couldn't just be inlined into PSEUDO? The amount of argument shuffling that you're having to do in order to make an ABI-conformant call to __syscall_cancel_arch is substantial and it seems like a lot of this could be eliminated. In fact, it looks to me as though almost all of PSEUDO_CANCEL_BEFORE could be eliminated even if __syscall_cancel_arch stays, and replaced with simply: push {r3} mov r3, r2 mov r2, r1 mov r1, r0 This would also allow the stacking of r4/r5 in PSEUDO() to be eliminated. I can't see any obvious reason PSEUDO_CANCEL_BEFORE needs to reserve 20 bytes of stack space but perhaps I'm overlooking something there. Similarly, the PSEUDO_CANCEL_BEFORE/PSEUDO_CANCEL_AFTER thing seems a bit of an artificial distinction since all that happens between them is that you load a constant into a register. If these macros aren't used anywhere else then it seems like the code could be somewhat simplified. p.
On 03-09-2015 07:26, Phil Blundell wrote: > On Wed, 2015-09-02 at 18:23 -0300, Adhemerval Zanella wrote: >> Rechecking and for 'r7' it is not really required. However afaik to use >> the glibc cancellation mechanism based on libgcc unwind for arm it >> requires to at least save the 'lr'. I have not debug in details, but >> omitting lr save I see failures with this mechanism for >> tst-cancel{4,5,17,18}x. > > Ah right, fair enough. As a general rule we want to stack an even > number of registers so that the stack pointer remains 64-bit aligned. > If you just drop the pushing of r7 then you'd have four here. > > Also. some more questions: > >> + bl __syscall_do_cancel > > It might be nice to make it more obvious that this isn't expected to > return. Right now it looks as though it will just fall off the end of > __syscall_cancel_arch() which is a bit disconcerting. Either change > "bl" to "b", or at a minimum add a comment about it. I can change to: mov lr, pc b __syscall_do_cancel Which explicit state it is a tail cail that do not return. > > Is __syscall_cancel_arch() ever called from anywhere other than > PSEUDO_CANCEL_AFTER()? If not, is there a reason that its contents > couldn't just be inlined into PSEUDO? The amount of argument shuffling > that you're having to do in order to make an ABI-conformant call to > __syscall_cancel_arch is substantial and it seems like a lot of this > could be eliminated. In fact, it looks to me as though almost all of > PSEUDO_CANCEL_BEFORE could be eliminated even if __syscall_cancel_arch > stays, and replaced with simply: > > push {r3} > mov r3, r2 > mov r2, r1 > mov r1, r0 > > This would also allow the stacking of r4/r5 in PSEUDO() to be > eliminated. I can't see any obvious reason PSEUDO_CANCEL_BEFORE needs > to reserve 20 bytes of stack space but perhaps I'm overlooking something > there. > > Similarly, the PSEUDO_CANCEL_BEFORE/PSEUDO_CANCEL_AFTER thing seems a > bit of an artificial distinction since all that happens between them is > that you load a constant into a register. If these macros aren't used > anywhere else then it seems like the code could be somewhat simplified. If you check my first message (00/08: nptl: Fix Race conditions in pthread cancellation (BZ#12683)) the idea of this modification it exactly to route *all* cancellable syscall to the __syscall_cancel_arch. It is required because the __syscall_cancel_arch have the global marks the signal cancel handler (sigcancel_handle in nptl/nptl-init.c) will use to check if the instruction pointer falls within the cancellable syscall code. > > p. > >
On Thu, 2015-09-03 at 12:52 -0300, Adhemerval Zanella wrote: > I can change to: > > mov lr, pc > b __syscall_do_cancel > > Which explicit state it is a tail cail that do not return. If you set lr like that then the implication is that it will return (since otherwise lr would be unnecessary). A regular tail call would just do the branch without changing lr at all. > If you check my first message (00/08: nptl: Fix Race conditions in pthread > cancellation (BZ#12683)) the idea of this modification it exactly to route > *all* cancellable syscall to the __syscall_cancel_arch. It is required > because the __syscall_cancel_arch have the global marks the signal cancel > handler (sigcancel_handle in nptl/nptl-init.c) will use to check if the > instruction pointer falls within the cancellable syscall code. Ah, I see. Your original message didn't actually say that, but now I understand how it's supposed to work. However, I think the amount of stack shuffling that you're doing is still rather excessive. If __syscall_cancel_arch simply needs to be a delineated block of code and isn't called from anywhere except PSEUDO then it doesn't necessarily need to obey the normal ABI calling conventions. But even if it does, I think it should be possible to achieve this result with rather less complexity than you seem to have at the moment. p.
On 03-09-2015 13:19, Phil Blundell wrote: > On Thu, 2015-09-03 at 12:52 -0300, Adhemerval Zanella wrote: >> I can change to: >> >> mov lr, pc >> b __syscall_do_cancel >> >> Which explicit state it is a tail cail that do not return. > > If you set lr like that then the implication is that it will return > (since otherwise lr would be unnecessary). A regular tail call would > just do the branch without changing lr at all. My first approach was to use compiler output to check how it handle tail call for ARM. For instance, the code: -- void __do_cancel () __attribute__ ((noreturn)); long int foo (long int nr); long int __syscall (long int *d, long int nr) { if (*d & 0x1) __do_cancel (); return foo (nr); } -- generates the 'bl' for __do_cancel with both GCC 4.9 and GCC 6.0. I did try to use just use a 'b' instruction, but again the tst-cancelx{4,5,17,18} fails with some early cancellation related to pause and aio functions. If you can make the tests pass using a 'b' function call please let me know. > >> If you check my first message (00/08: nptl: Fix Race conditions in pthread >> cancellation (BZ#12683)) the idea of this modification it exactly to route >> *all* cancellable syscall to the __syscall_cancel_arch. It is required >> because the __syscall_cancel_arch have the global marks the signal cancel >> handler (sigcancel_handle in nptl/nptl-init.c) will use to check if the >> instruction pointer falls within the cancellable syscall code. > > Ah, I see. Your original message didn't actually say that, but now I > understand how it's supposed to work. However, I think the amount of > stack shuffling that you're doing is still rather excessive. > > If __syscall_cancel_arch simply needs to be a delineated block of code > and isn't called from anywhere except PSEUDO then it doesn't necessarily > need to obey the normal ABI calling conventions. But even if it does, I > think it should be possible to achieve this result with rather less > complexity than you seem to have at the moment. Correct calling sequence should be something like: long int syscall (...) if SINGLE_THREAD_P INLINE_SYSCALL (...) else syscall_cancel (...) I am far from an ARM assembly expert, so if you any suggestion on how to accomplish with less instruction that current one please let me know. Right now my approach is to use the current assembly hooks to generate the cancellable syscalls. My idea for future patches is to just get rid of this assembly macros and code it with C code instead, auto-generating the above code. > > p. > >
On Thu, 3 Sep 2015, Adhemerval Zanella wrote: > On 03-09-2015 13:19, Phil Blundell wrote: > > On Thu, 2015-09-03 at 12:52 -0300, Adhemerval Zanella wrote: > >> I can change to: > >> > >> mov lr, pc > >> b __syscall_do_cancel > >> > >> Which explicit state it is a tail cail that do not return. > > > > If you set lr like that then the implication is that it will return > > (since otherwise lr would be unnecessary). A regular tail call would > > just do the branch without changing lr at all. > > My first approach was to use compiler output to check how it handle tail > call for ARM. For instance, the code: > > -- > > void __do_cancel () __attribute__ ((noreturn)); > > long int foo (long int nr); > > long int __syscall (long int *d, long int nr) > { > if (*d & 0x1) > __do_cancel (); > > return foo (nr); > } > > -- > > generates the 'bl' for __do_cancel with both GCC 4.9 and GCC 6.0. To provide a bit of (potentially non-obvious) context: GCC deliberately suppresses tail call optimization when the call target is a noreturn function, with the intent of providing full backtraces if the noreturn function terminates the program. In the example above, call to 'foo' would be tail-call-optimized, while the call to '__do_cancel' wouldn't. Alexander
On 03-09-2015 14:48, Alexander Monakov wrote: > On Thu, 3 Sep 2015, Adhemerval Zanella wrote: >> On 03-09-2015 13:19, Phil Blundell wrote: >>> On Thu, 2015-09-03 at 12:52 -0300, Adhemerval Zanella wrote: >>>> I can change to: >>>> >>>> mov lr, pc >>>> b __syscall_do_cancel >>>> >>>> Which explicit state it is a tail cail that do not return. >>> >>> If you set lr like that then the implication is that it will return >>> (since otherwise lr would be unnecessary). A regular tail call would >>> just do the branch without changing lr at all. >> >> My first approach was to use compiler output to check how it handle tail >> call for ARM. For instance, the code: >> >> -- >> >> void __do_cancel () __attribute__ ((noreturn)); >> >> long int foo (long int nr); >> >> long int __syscall (long int *d, long int nr) >> { >> if (*d & 0x1) >> __do_cancel (); >> >> return foo (nr); >> } >> >> -- >> >> generates the 'bl' for __do_cancel with both GCC 4.9 and GCC 6.0. > > To provide a bit of (potentially non-obvious) context: GCC deliberately > suppresses tail call optimization when the call target is a noreturn function, > with the intent of providing full backtraces if the noreturn function > terminates the program. > > In the example above, call to 'foo' would be tail-call-optimized, while the > call to '__do_cancel' wouldn't. And my understanding is it requires because it is based on unwind information. I have not dig on why exactly using 'b' shows some failures in tst-cancex{4,5,17,18}, but my wild guess is some backtracing information missing. > > Alexander >
diff --git a/sysdeps/unix/sysv/linux/arm/syscall_cancel.S b/sysdeps/unix/sysv/linux/arm/syscall_cancel.S new file mode 100644 index 0000000..dc56de1 --- /dev/null +++ b/sysdeps/unix/sysv/linux/arm/syscall_cancel.S @@ -0,0 +1,69 @@ +/* Cancellable syscall wrapper - aarch64 version. + Copyright (C) 2015 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <sysdep.h> + +/* long int [r0] __syscall_cancel_arch (int *cancelhandling [r0], + long int nr [r1], + long int arg1 [r2], + long int arg2 [r3], + long int arg3 [SP], + long int arg4 [SP+4], + long int arg5 [SP+8], + long int arg6 [SP+12]) */ + + .thumb + .syntax unified + +ENTRY (__syscall_cancel_arch) + .fnstart + mov ip,sp + stmfd sp!,{r4,r5,r6,r7,lr} + .save {r4,r5,r6,r7,lr} + + cfi_adjust_cfa_offset (20) + cfi_rel_offset (lr, 16) + + .globl __syscall_cancel_arch_start +__syscall_cancel_arch_start: + + /* if (*cancelhandling & CANCELED_BITMASK) + __syscall_do_cancel() */ + ldr r0,[r0] + tst r0, #4 + bne 1f + + /* Issue a 6 argument syscall, the nr [r1] being the syscall + number. */ + mov r7,r1 + mov r0,r2 + mov r1,r3 + ldmfd ip,{r2,r3,r4,r5,r6} + svc 0x0 + + .globl __syscall_cancel_arch_end +__syscall_cancel_arch_end: + ldmfd sp!,{r4,r5,r6,r7,lr} + cfi_adjust_cfa_offset (-16); + bx lr + +1: + bl __syscall_do_cancel + .fnend +END (__syscall_cancel_arch) +libc_hidden_def (__syscall_cancel_arch) diff --git a/sysdeps/unix/sysv/linux/arm/sysdep-cancel.h b/sysdeps/unix/sysv/linux/arm/sysdep-cancel.h index bdefa80..9f03bb5 100644 --- a/sysdeps/unix/sysv/linux/arm/sysdep-cancel.h +++ b/sysdeps/unix/sysv/linux/arm/sysdep-cancel.h @@ -19,10 +19,17 @@ #include <tls.h> #ifndef __ASSEMBLER__ # include <nptl/pthreadP.h> +# include <sys/ucontext.h> #endif #if IS_IN (libc) || IS_IN (libpthread) || IS_IN (librt) +# if IS_IN (libc) +# define JMP_SYSCALL_CANCEL HIDDEN_JUMPTARGET(__syscall_cancel) +# else +# define JMP_SYSCALL_CANCEL __syscall_cancel(PLT) +# endif + /* NOTE: We do mark syscalls with unwind annotations, for the benefit of cancellation; but they're really only accurate at the point of the syscall. The ARM unwind directives are not rich enough without adding @@ -31,16 +38,10 @@ # undef PSEUDO # define PSEUDO(name, syscall_name, args) \ .text; \ - ENTRY (__##syscall_name##_nocancel); \ - CFI_SECTIONS; \ - DO_CALL (syscall_name, args); \ - cmn r0, $4096; \ - PSEUDO_RET; \ - END (__##syscall_name##_nocancel); \ ENTRY (name); \ SINGLE_THREAD_P; \ - DOARGS_##args; \ bne .Lpseudo_cancel; \ + DOARGS_##args; \ cfi_remember_state; \ ldr r7, =SYS_ify (syscall_name); \ swi 0x0; \ @@ -50,20 +51,31 @@ cfi_restore_state; \ .Lpseudo_cancel: \ .fnstart; /* matched by the .fnend in UNDOARGS below. */ \ - DOCARGS_##args; /* save syscall args etc. around CENABLE. */ \ - CENABLE; \ - mov ip, r0; /* put mask in safe place. */ \ - UNDOCARGS_##args; /* restore syscall args. */ \ - ldr r7, =SYS_ify (syscall_name); \ - swi 0x0; /* do the call. */ \ - mov r7, r0; /* save syscall return value. */ \ - mov r0, ip; /* get mask back. */ \ - CDISABLE; \ - mov r0, r7; /* retrieve return value. */ \ - RESTORE_LR_##args; \ - UNDOARGS_##args; \ + push {r4, r5, lr}; \ + .save {r4, r5, lr}; \ + PSEUDO_CANCEL_BEFORE; \ + movw r0, SYS_ify (syscall_name); \ + PSEUDO_CANCEL_AFTER; \ + pop {r4, r5, pc}; \ + .fnend; \ cmn r0, $4096 +# define PSEUDO_CANCEL_BEFORE \ + .pad #20; \ + sub sp, sp, #20; \ + ldr r5, [sp, #32]; \ + ldr r4, [sp, #36]; \ + str r3, [sp]; \ + mov r3, r2; \ + str r5, [sp, #4]; \ + mov r2, r1; \ + str r4, [sp, #8]; \ + mov r1, r0 + +# define PSEUDO_CANCEL_AFTER \ + bl JMP_SYSCALL_CANCEL; \ + add sp, sp, #20 + /* DOARGS pushes eight bytes on the stack for five arguments, twelve bytes for six arguments, and four bytes for fewer. In order to preserve doubleword alignment, sometimes we must save an extra register. */ @@ -182,18 +194,9 @@ RESTORE_LR_0 # if IS_IN (libpthread) -# define CENABLE bl PLTJMP(__pthread_enable_asynccancel) -# define CDISABLE bl PLTJMP(__pthread_disable_asynccancel) # define __local_multiple_threads __pthread_multiple_threads # elif IS_IN (libc) -# define CENABLE bl PLTJMP(__libc_enable_asynccancel) -# define CDISABLE bl PLTJMP(__libc_disable_asynccancel) # define __local_multiple_threads __libc_multiple_threads -# elif IS_IN (librt) -# define CENABLE bl PLTJMP(__librt_enable_asynccancel) -# define CDISABLE bl PLTJMP(__librt_disable_asynccancel) -# else -# error Unsupported library # endif # if IS_IN (libpthread) || IS_IN (libc) @@ -238,4 +241,10 @@ extern int __local_multiple_threads attribute_hidden; # define RTLD_SINGLE_THREAD_P \ __builtin_expect (THREAD_GETMEM (THREAD_SELF, \ header.multiple_threads) == 0, 1) + +static inline +long int __pthread_get_ip (const struct ucontext *uc) +{ + return uc->uc_mcontext.arm_pc; +} #endif diff --git a/sysdeps/unix/sysv/linux/arm/sysdep.h b/sysdeps/unix/sysv/linux/arm/sysdep.h index 200f77a..6b18e34 100644 --- a/sysdeps/unix/sysv/linux/arm/sysdep.h +++ b/sysdeps/unix/sysv/linux/arm/sysdep.h @@ -387,6 +387,14 @@ __local_syscall_error: \ #undef INTERNAL_SYSCALL_ERRNO #define INTERNAL_SYSCALL_ERRNO(val, err) (-(val)) +#undef SYSCALL_CANCEL_ERROR +#define SYSCALL_CANCEL_ERROR(__val) \ + ((unsigned int) (__val) >= 0xfffff001u) + +#undef SYSCALL_CANCEL_ERRNO +#define SYSCALL_CANCEL_ERRNO(__val) \ + (-(__val)) + /* List of system calls which are supported as vsyscalls. */ #define HAVE_CLOCK_GETTIME_VSYSCALL 1 #define HAVE_GETTIMEOFDAY_VSYSCALL 1