Message ID | 1456770820-21341-4-git-send-email-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
On 29/02/16 18:33, Adhemerval Zanella wrote: > This patch implements a new posix_spawn{p} implementation for Linux. The main > difference is it uses the clone syscall directly with CLONE_VM and CLONE_VFORK > flags and a direct allocated stack. The new stack and start function solves > most the vfork limitation (possible parent clobber due stack spilling). The > remaning issue are related to signal handling: > > 1. That no signal handlers must run in child context, to avoid corrupt > parent's state. > 2. Child must synchronize with parent to enforce stack deallocation and > to possible return execv issues. > > The first one is solved by blocking all signals in child, even NPTL-internal > ones (SIGCANCEL and SIGSETXID). The second issue is done by a stack allocation > in parent and a synchronization with using a pipe or waitpid (in case or error). > The pipe has the advantage of allowing the child signal an exec error (checked > with new tst-spawn2 test). > > There is an inherent race condition in pipe2 usage for architectures that do not > support the syscall directly. In such cases the a pipe plus fctnl is used > instead and it may lead to file descriptor leak in parent (as decribed by fcntl > documentation). > > The child process stack is allocate with a mmap with MAP_STACK flag using > default architecture stack size. Although it is slower than use a stack buffer > from parent, it allows some slack for the compatibility code to run scripts > with no shebang (which may use a buffer with size depending of argument list > count). > > Performance should be similar to the vfork default posix implementation and > way faster than fork path (vfork on mostly linux ports are basically > clone with CLONE_VM plus CLONE_VFORK). The only difference is the syscalls > required for the stack allocation/deallocation. > > It fixes BZ#10354, BZ#14750, and BZ#18433. > > Tested on i386, x86_64, powerpc64le, and aarch64. > on aarch64 this caused FAIL: nptl/tst-exec1 with error msg "join in thread in parent returned!?" i think the cause is that glibc clone sets tcb.tid = tcb.pid = -1 if CLONE_VM|CLONE_VFORK is used, which means the vfork child clobbers the parent tcb.tid. (and pthread_join tests tcb.tid and thus fails in tst-exec1.) i believe all targets are affected.
On 12-04-2016 15:06, Szabolcs Nagy wrote: > On 29/02/16 18:33, Adhemerval Zanella wrote: >> This patch implements a new posix_spawn{p} implementation for Linux. The main >> difference is it uses the clone syscall directly with CLONE_VM and CLONE_VFORK >> flags and a direct allocated stack. The new stack and start function solves >> most the vfork limitation (possible parent clobber due stack spilling). The >> remaning issue are related to signal handling: >> >> 1. That no signal handlers must run in child context, to avoid corrupt >> parent's state. >> 2. Child must synchronize with parent to enforce stack deallocation and >> to possible return execv issues. >> >> The first one is solved by blocking all signals in child, even NPTL-internal >> ones (SIGCANCEL and SIGSETXID). The second issue is done by a stack allocation >> in parent and a synchronization with using a pipe or waitpid (in case or error). >> The pipe has the advantage of allowing the child signal an exec error (checked >> with new tst-spawn2 test). >> >> There is an inherent race condition in pipe2 usage for architectures that do not >> support the syscall directly. In such cases the a pipe plus fctnl is used >> instead and it may lead to file descriptor leak in parent (as decribed by fcntl >> documentation). >> >> The child process stack is allocate with a mmap with MAP_STACK flag using >> default architecture stack size. Although it is slower than use a stack buffer >> from parent, it allows some slack for the compatibility code to run scripts >> with no shebang (which may use a buffer with size depending of argument list >> count). >> >> Performance should be similar to the vfork default posix implementation and >> way faster than fork path (vfork on mostly linux ports are basically >> clone with CLONE_VM plus CLONE_VFORK). The only difference is the syscalls >> required for the stack allocation/deallocation. >> >> It fixes BZ#10354, BZ#14750, and BZ#18433. >> >> Tested on i386, x86_64, powerpc64le, and aarch64. >> > > on aarch64 this caused > > FAIL: nptl/tst-exec1 > > with error msg "join in thread in parent returned!?" > > i think the cause is that glibc clone sets > tcb.tid = tcb.pid = -1 if CLONE_VM|CLONE_VFORK > is used, which means the vfork child clobbers > the parent tcb.tid. (and pthread_join tests > tcb.tid and thus fails in tst-exec1.) > i believe all targets are affected. I think the problem is glibc's vfork only changes the TCB's pid fields while fork changes both pid and tid in case on CLONE_VM. Now I am trying to found out why exactly this difference handling of internal fields...
On 12-04-2016 15:15, Adhemerval Zanella wrote: > > > On 12-04-2016 15:06, Szabolcs Nagy wrote: >> On 29/02/16 18:33, Adhemerval Zanella wrote: >>> This patch implements a new posix_spawn{p} implementation for Linux. The main >>> difference is it uses the clone syscall directly with CLONE_VM and CLONE_VFORK >>> flags and a direct allocated stack. The new stack and start function solves >>> most the vfork limitation (possible parent clobber due stack spilling). The >>> remaning issue are related to signal handling: >>> >>> 1. That no signal handlers must run in child context, to avoid corrupt >>> parent's state. >>> 2. Child must synchronize with parent to enforce stack deallocation and >>> to possible return execv issues. >>> >>> The first one is solved by blocking all signals in child, even NPTL-internal >>> ones (SIGCANCEL and SIGSETXID). The second issue is done by a stack allocation >>> in parent and a synchronization with using a pipe or waitpid (in case or error). >>> The pipe has the advantage of allowing the child signal an exec error (checked >>> with new tst-spawn2 test). >>> >>> There is an inherent race condition in pipe2 usage for architectures that do not >>> support the syscall directly. In such cases the a pipe plus fctnl is used >>> instead and it may lead to file descriptor leak in parent (as decribed by fcntl >>> documentation). >>> >>> The child process stack is allocate with a mmap with MAP_STACK flag using >>> default architecture stack size. Although it is slower than use a stack buffer >>> from parent, it allows some slack for the compatibility code to run scripts >>> with no shebang (which may use a buffer with size depending of argument list >>> count). >>> >>> Performance should be similar to the vfork default posix implementation and >>> way faster than fork path (vfork on mostly linux ports are basically >>> clone with CLONE_VM plus CLONE_VFORK). The only difference is the syscalls >>> required for the stack allocation/deallocation. >>> >>> It fixes BZ#10354, BZ#14750, and BZ#18433. >>> >>> Tested on i386, x86_64, powerpc64le, and aarch64. >>> >> >> on aarch64 this caused >> >> FAIL: nptl/tst-exec1 >> >> with error msg "join in thread in parent returned!?" >> >> i think the cause is that glibc clone sets >> tcb.tid = tcb.pid = -1 if CLONE_VM|CLONE_VFORK >> is used, which means the vfork child clobbers >> the parent tcb.tid. (and pthread_join tests >> tcb.tid and thus fails in tst-exec1.) >> i believe all targets are affected. > > I think the problem is glibc's vfork only changes the TCB's pid fields > while fork changes both pid and tid in case on CLONE_VM. Now I am trying > to found out why exactly this difference handling of internal fields... In fact I think clone.S should not reset the tid member in TCB for fork(CLONE_VM|...) and it is what vfork does internally. It is used on INVALID_TD_P and INVALID_NOT_TERMINATED_TD_P macros used on pthread_join and pthread_cancel. A value equal to -1 will issue an error for such case, which mean that if a multithread program issues a clone(CLONE_VM|...) from a posix_spawn, a subsequent pthread_join might fail due invalid tid value. And I think it is what is happening in your system on tst-exec1: although pthread_create is issue before posix_spawn, I think posix_spawn clone might be executing before the pthread_join in some cases leading to this issue. I will propose a patch to modify clone to not reset the tid value.
On 12-04-2016 17:20, Adhemerval Zanella wrote: > > > On 12-04-2016 15:15, Adhemerval Zanella wrote: >> >> >> On 12-04-2016 15:06, Szabolcs Nagy wrote: >>> On 29/02/16 18:33, Adhemerval Zanella wrote: >>>> This patch implements a new posix_spawn{p} implementation for Linux. The main >>>> difference is it uses the clone syscall directly with CLONE_VM and CLONE_VFORK >>>> flags and a direct allocated stack. The new stack and start function solves >>>> most the vfork limitation (possible parent clobber due stack spilling). The >>>> remaning issue are related to signal handling: >>>> >>>> 1. That no signal handlers must run in child context, to avoid corrupt >>>> parent's state. >>>> 2. Child must synchronize with parent to enforce stack deallocation and >>>> to possible return execv issues. >>>> >>>> The first one is solved by blocking all signals in child, even NPTL-internal >>>> ones (SIGCANCEL and SIGSETXID). The second issue is done by a stack allocation >>>> in parent and a synchronization with using a pipe or waitpid (in case or error). >>>> The pipe has the advantage of allowing the child signal an exec error (checked >>>> with new tst-spawn2 test). >>>> >>>> There is an inherent race condition in pipe2 usage for architectures that do not >>>> support the syscall directly. In such cases the a pipe plus fctnl is used >>>> instead and it may lead to file descriptor leak in parent (as decribed by fcntl >>>> documentation). >>>> >>>> The child process stack is allocate with a mmap with MAP_STACK flag using >>>> default architecture stack size. Although it is slower than use a stack buffer >>>> from parent, it allows some slack for the compatibility code to run scripts >>>> with no shebang (which may use a buffer with size depending of argument list >>>> count). >>>> >>>> Performance should be similar to the vfork default posix implementation and >>>> way faster than fork path (vfork on mostly linux ports are basically >>>> clone with CLONE_VM plus CLONE_VFORK). The only difference is the syscalls >>>> required for the stack allocation/deallocation. >>>> >>>> It fixes BZ#10354, BZ#14750, and BZ#18433. >>>> >>>> Tested on i386, x86_64, powerpc64le, and aarch64. >>>> >>> >>> on aarch64 this caused >>> >>> FAIL: nptl/tst-exec1 >>> >>> with error msg "join in thread in parent returned!?" >>> >>> i think the cause is that glibc clone sets >>> tcb.tid = tcb.pid = -1 if CLONE_VM|CLONE_VFORK >>> is used, which means the vfork child clobbers >>> the parent tcb.tid. (and pthread_join tests >>> tcb.tid and thus fails in tst-exec1.) >>> i believe all targets are affected. >> >> I think the problem is glibc's vfork only changes the TCB's pid fields >> while fork changes both pid and tid in case on CLONE_VM. Now I am trying >> to found out why exactly this difference handling of internal fields... > > In fact I think clone.S should not reset the tid member in TCB for > fork(CLONE_VM|...) and it is what vfork does internally. It is > used on INVALID_TD_P and INVALID_NOT_TERMINATED_TD_P macros used on > pthread_join and pthread_cancel. A value equal to -1 will issue > an error for such case, which mean that if a multithread program > issues a clone(CLONE_VM|...) from a posix_spawn, a subsequent > pthread_join might fail due invalid tid value. > > And I think it is what is happening in your system on tst-exec1: > although pthread_create is issue before posix_spawn, I think > posix_spawn clone might be executing before the pthread_join > in some cases leading to this issue. > > I will propose a patch to modify clone to not reset the tid value. > But I am not sure if we just should avoid touch any internal TCB headers if CLONE_VFORK is set, thoughts?
Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > +static int > +__spawni_child (void *arguments) <snip> > +fail: > + /* Since sizeof errno < PIPE_BUF, the write is atomic. */ > + ret = -ret; > + if (ret) > + while (write_not_cancel (p, &ret, sizeof ret) < 0) > + continue; > + exit (SPAWN_ERROR); > +} Sorry for not noticing earlier, but shouldn't it be "_exit" instead of "exit" when the child fails? At least that's what CLONE_VFORK section of the clone(2) manpage documents, and what normal vfork requires. > +static int > +__spawnix (pid_t * pid, const char *file, And the comment in __spawnix mentions "_exit", too: > + /* The clone flags used will create a new child that will run in the same > + memory space (CLONE_VM) and the execution of calling thread will be > + suspend until the child calls execve or _exit. These condition as > + signal below either by pipe write (_exit with SPAWN_ERROR) or > + a successful execve. > + Also since the calling thread execution will be suspend, there is not > + need for CLONE_SETTLS. Although parent and child share the same TLS > + namespace, there will be no concurrent access for TLS variables (errno > + for instance). */ > + new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size, > + CLONE_VM | CLONE_VFORK | SIGCHLD, &args); Anyways thanks for implementing this so I can look forward to using this faster version :)
On 05/29/2016 11:54 AM, Eric Wong wrote: > Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: >> +static int >> +__spawni_child (void *arguments) > > <snip> > >> +fail: >> + /* Since sizeof errno < PIPE_BUF, the write is atomic. */ >> + ret = -ret; >> + if (ret) >> + while (write_not_cancel (p, &ret, sizeof ret) < 0) >> + continue; >> + exit (SPAWN_ERROR); >> +} > > Sorry for not noticing earlier, but shouldn't it be > "_exit" instead of "exit" when the child fails? Good point. Would you file a bug for this, please? Thanks, Florian
On 30/05/2016 08:02, Florian Weimer wrote: > On 05/29/2016 11:54 AM, Eric Wong wrote: >> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: >>> +static int >>> +__spawni_child (void *arguments) >> >> <snip> >> >>> +fail: >>> + /* Since sizeof errno < PIPE_BUF, the write is atomic. */ >>> + ret = -ret; >>> + if (ret) >>> + while (write_not_cancel (p, &ret, sizeof ret) < 0) >>> + continue; >>> + exit (SPAWN_ERROR); >>> +} >> >> Sorry for not noticing earlier, but shouldn't it be >> "_exit" instead of "exit" when the child fails? > > Good point. Would you file a bug for this, please? > > Thanks, > Florian Thanks for reporting, I opened BZ#20178. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=20178
Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > Thanks for reporting, I opened BZ#20178. > > [1] https://sourceware.org/bugzilla/show_bug.cgi?id=20178 Thanks for the quick fix! I also suspect the old/non-Linux sysdeps/posix/spawni.c should be handling POSIX_SPAWN_SETSIGMASK after POSIX_SPAWN_SETSIGDEF like the new Linux one does. This probably won't affect a lot of people, anymore, though.
On 30/05/2016 17:37, Eric Wong wrote: > Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: >> Thanks for reporting, I opened BZ#20178. >> >> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=20178 > > Thanks for the quick fix! > > I also suspect the old/non-Linux sysdeps/posix/spawni.c > should be handling POSIX_SPAWN_SETSIGMASK after POSIX_SPAWN_SETSIGDEF > like the new Linux one does. > > This probably won't affect a lot of people, anymore, though. > I think it indeed make sense, although I would not bother on fixing it now. I will add on my backlog.
diff --git a/include/sched.h b/include/sched.h index 4f59397..b4d7406 100644 --- a/include/sched.h +++ b/include/sched.h @@ -19,7 +19,9 @@ extern int __sched_rr_get_interval (__pid_t __pid, struct timespec *__t); /* These are Linux specific. */ extern int __clone (int (*__fn) (void *__arg), void *__child_stack, int __flags, void *__arg, ...); +libc_hidden_proto (__clone) extern int __clone2 (int (*__fn) (void *__arg), void *__child_stack_base, size_t __child_stack_size, int __flags, void *__arg, ...); +libc_hidden_proto (__clone2) #endif #endif diff --git a/include/unistd.h b/include/unistd.h index 5152f64..d2802b2 100644 --- a/include/unistd.h +++ b/include/unistd.h @@ -81,6 +81,7 @@ char *__canonicalize_directory_name_internal (const char *__thisdir, size_t __size) attribute_hidden; extern int __dup (int __fd); +libc_hidden_proto (__dup) extern int __dup2 (int __fd, int __fd2); libc_hidden_proto (__dup2) extern int __dup3 (int __fd, int __fd2, int flags); diff --git a/posix/Makefile b/posix/Makefile index 7ec110e..5b0e298 100644 --- a/posix/Makefile +++ b/posix/Makefile @@ -94,7 +94,7 @@ tests := tstgetopt testfnm runtests runptests \ xtests := bug-ga2 ifeq (yes,$(build-shared)) test-srcs := globtest -tests += wordexp-test tst-exec tst-spawn +tests += wordexp-test tst-exec tst-spawn tst-spawn2 endif tests-static = tst-exec-static tst-spawn-static tests += $(tests-static) diff --git a/posix/tst-spawn2.c b/posix/tst-spawn2.c new file mode 100644 index 0000000..41bfde8 --- /dev/null +++ b/posix/tst-spawn2.c @@ -0,0 +1,73 @@ +/* Further tests for spawn in case of invalid binary paths. + Copyright (C) 2016 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <spawn.h> +#include <error.h> +#include <errno.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <sys/wait.h> + +#include <stdio.h> + +int +posix_spawn_test (void) +{ + /* Check if posix_spawn correctly returns an error and an invalid pid + by trying to spawn an invalid binary. */ + + const char *program = "/path/to/invalid/binary"; + char * const args[] = { 0 }; + pid_t pid = -1; + + int ret = posix_spawn (&pid, program, 0, 0, args, environ); + if (ret != ENOENT) + error (EXIT_FAILURE, errno, "posix_spawn"); + + /* POSIX states the value returned on pid variable in case of an error + is not specified. GLIBC will update the value iff the child + execution is successful. */ + if (pid != -1) + error (EXIT_FAILURE, errno, "posix_spawn returned pid != -1"); + + /* Check if no child is actually created. */ + ret = waitpid (-1, NULL, 0); + if (ret != -1 || errno != ECHILD) + error (EXIT_FAILURE, errno, "waitpid"); + + /* Same as before, but with posix_spawnp. */ + char *args2[] = { (char*) program, 0 }; + + ret = posix_spawnp (&pid, args2[0], 0, 0, args2, environ); + if (ret != ENOENT) + error (EXIT_FAILURE, errno, "posix_spawnp"); + + if (pid != -1) + error (EXIT_FAILURE, errno, "posix_spawnp returned pid != -1"); + + ret = waitpid (-1, NULL, 0); + if (ret != -1 || errno != ECHILD) + error (EXIT_FAILURE, errno, "waitpid"); + + return 0; +} + +#define TEST_FUNCTION posix_spawn_test () +#include "../test-skeleton.c" + diff --git a/sysdeps/posix/dup.c b/sysdeps/posix/dup.c index 9525e76..abf3ff5 100644 --- a/sysdeps/posix/dup.c +++ b/sysdeps/posix/dup.c @@ -26,5 +26,5 @@ __dup (int fd) { return fcntl (fd, F_DUPFD, 0); } - +libc_hidden_def (__dup) weak_alias (__dup, dup) diff --git a/sysdeps/unix/sysv/linux/aarch64/clone.S b/sysdeps/unix/sysv/linux/aarch64/clone.S index 596fb9c..8f3ab40 100644 --- a/sysdeps/unix/sysv/linux/aarch64/clone.S +++ b/sysdeps/unix/sysv/linux/aarch64/clone.S @@ -93,4 +93,5 @@ thread_start: cfi_endproc .size thread_start, .-thread_start +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/alpha/clone.S b/sysdeps/unix/sysv/linux/alpha/clone.S index ec9c06d..556aa63 100644 --- a/sysdeps/unix/sysv/linux/alpha/clone.S +++ b/sysdeps/unix/sysv/linux/alpha/clone.S @@ -137,4 +137,5 @@ thread_start: cfi_endproc .end thread_start +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/arm/clone.S b/sysdeps/unix/sysv/linux/arm/clone.S index 460a8f5..c103123 100644 --- a/sysdeps/unix/sysv/linux/arm/clone.S +++ b/sysdeps/unix/sysv/linux/arm/clone.S @@ -93,4 +93,5 @@ PSEUDO_END (__clone) .fnend +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/hppa/clone.S b/sysdeps/unix/sysv/linux/hppa/clone.S index e80fd8d..0a18137 100644 --- a/sysdeps/unix/sysv/linux/hppa/clone.S +++ b/sysdeps/unix/sysv/linux/hppa/clone.S @@ -175,4 +175,5 @@ ENTRY(__clone) PSEUDO_END(__clone) +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/i386/clone.S b/sysdeps/unix/sysv/linux/i386/clone.S index 7d818c1..ef447d1 100644 --- a/sysdeps/unix/sysv/linux/i386/clone.S +++ b/sysdeps/unix/sysv/linux/i386/clone.S @@ -139,4 +139,5 @@ L(nomoregetpid): cfi_startproc PSEUDO_END (__clone) +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/ia64/clone2.S b/sysdeps/unix/sysv/linux/ia64/clone2.S index 9b257b3..fc046eb 100644 --- a/sysdeps/unix/sysv/linux/ia64/clone2.S +++ b/sysdeps/unix/sysv/linux/ia64/clone2.S @@ -96,6 +96,8 @@ ENTRY(__clone2) ret /* Not reached. */ PSEUDO_END(__clone2) +libc_hidden_def (__clone2) + /* For now we leave __clone undefined. This is unlikely to be a */ /* problem, since at least the i386 __clone in glibc always failed */ /* with a 0 sp (eventhough the kernel explicitly handled it). */ diff --git a/sysdeps/unix/sysv/linux/m68k/clone.S b/sysdeps/unix/sysv/linux/m68k/clone.S index 8b40df2..33474cc 100644 --- a/sysdeps/unix/sysv/linux/m68k/clone.S +++ b/sysdeps/unix/sysv/linux/m68k/clone.S @@ -127,4 +127,5 @@ donepid: cfi_startproc PSEUDO_END (__clone) +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/microblaze/clone.S b/sysdeps/unix/sysv/linux/microblaze/clone.S index 035d88b..cbc95ce 100644 --- a/sysdeps/unix/sysv/linux/microblaze/clone.S +++ b/sysdeps/unix/sysv/linux/microblaze/clone.S @@ -69,4 +69,5 @@ L(thread_start): nop PSEUDO_END(__clone) +libc_hidden_def (__clone) weak_alias (__clone,clone) diff --git a/sysdeps/unix/sysv/linux/mips/clone.S b/sysdeps/unix/sysv/linux/mips/clone.S index 755e8cc..feb8250 100644 --- a/sysdeps/unix/sysv/linux/mips/clone.S +++ b/sysdeps/unix/sysv/linux/mips/clone.S @@ -166,4 +166,5 @@ L(gotpid): END(__thread_start) +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/nios2/clone.S b/sysdeps/unix/sysv/linux/nios2/clone.S index 4da5c19..e4d3970 100644 --- a/sysdeps/unix/sysv/linux/nios2/clone.S +++ b/sysdeps/unix/sysv/linux/nios2/clone.S @@ -104,4 +104,5 @@ thread_start: cfi_startproc PSEUDO_END (__clone) +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/nptl-signals.h b/sysdeps/unix/sysv/linux/nptl-signals.h index 7560a21..01f34c2 100644 --- a/sysdeps/unix/sysv/linux/nptl-signals.h +++ b/sysdeps/unix/sysv/linux/nptl-signals.h @@ -16,6 +16,8 @@ License along with the GNU C Library; if not, see <http://www.gnu.org/licenses/>. */ +#include <signal.h> + /* The signal used for asynchronous cancelation. */ #define SIGCANCEL __SIGRTMIN @@ -29,5 +31,13 @@ /* Signal used to implement the setuid et.al. functions. */ #define SIGSETXID (__SIGRTMIN + 1) + +/* Return is sig is used internally. */ +static inline int +__nptl_is_internal_signal (int sig) +{ + return (sig == SIGCANCEL) || (sig == SIGTIMER) || (sig == SIGSETXID); +} + /* Used to communicate with signal handler. */ extern struct xid_command *__xidcmd attribute_hidden; diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S b/sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S index 28948ea..eb973db 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S @@ -108,4 +108,5 @@ L(badargs): cfi_startproc END (__clone) +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S index c8c6de8..959fdb7 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S @@ -126,4 +126,5 @@ L(parent): END (__clone) +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/clone.S b/sysdeps/unix/sysv/linux/s390/s390-32/clone.S index cb2afbb..f774e90 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/clone.S +++ b/sysdeps/unix/sysv/linux/s390/s390-32/clone.S @@ -70,4 +70,6 @@ thread_start: xc 0(4,%r15),0(%r15) basr %r14,%r1 /* jump to fn */ DO_CALL (exit, 1) + +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/clone.S b/sysdeps/unix/sysv/linux/s390/s390-64/clone.S index eddab35..928a881 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/clone.S +++ b/sysdeps/unix/sysv/linux/s390/s390-64/clone.S @@ -73,4 +73,6 @@ thread_start: xc 0(8,%r15),0(%r15) basr %r14,%r1 /* jump to fn */ DO_CALL (exit, 1) + +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/sh/clone.S b/sysdeps/unix/sysv/linux/sh/clone.S index 53cc08b..a73dd7d 100644 --- a/sysdeps/unix/sysv/linux/sh/clone.S +++ b/sysdeps/unix/sysv/linux/sh/clone.S @@ -123,4 +123,5 @@ ENTRY(__clone) .word TID - TLS_PRE_TCB_SIZE PSEUDO_END (__clone) +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/clone.S b/sysdeps/unix/sysv/linux/sparc/sparc32/clone.S index 6f41f0f..68f8202 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/clone.S +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/clone.S @@ -100,4 +100,5 @@ __thread_start: .size __thread_start, .-__thread_start +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/clone.S b/sysdeps/unix/sysv/linux/sparc/sparc64/clone.S index e2d3914..cecffa7 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/clone.S +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/clone.S @@ -96,4 +96,5 @@ __thread_start: .size __thread_start, .-__thread_start +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c new file mode 100644 index 0000000..906cda2 --- /dev/null +++ b/sysdeps/unix/sysv/linux/spawni.c @@ -0,0 +1,401 @@ +/* POSIX spawn interface. Linux version. + Copyright (C) 2016 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <spawn.h> +#include <fcntl.h> +#include <paths.h> +#include <string.h> +#include <sys/resource.h> +#include <sys/wait.h> +#include <sys/param.h> +#include <sys/mman.h> +#include <not-cancel.h> +#include <local-setxid.h> +#include <shlib-compat.h> +#include <nptl/pthreadP.h> +#include <dl-sysdep.h> +#include <ldsodefs.h> +#include <libc-internal.h> +#include "spawn_int.h" + +/* The Linux implementation of posix_spawn{p} uses the clone syscall directly + with CLONE_VM and CLONE_VFORK flags and an allocated stack. The new stack + and start function solves most the vfork limitation (possible parent + clobber due stack spilling). The remaining issue are: + + 1. That no signal handlers must run in child context, to avoid corrupting + parent's state. + 2. The parent must ensure child's stack freeing. + 3. Child must synchronize with parent to enforce 2. and to possible + return execv issues. + + The first issue is solved by blocking all signals in child, even the + NPTL-internal ones (SIGCANCEL and SIGSETXID). The second and third issue + is done by a stack allocation in parent and a synchronization with using + a pipe or waitpid (in case or error). The pipe has the advantage of + allowing the child the communicate an exec error. */ + + +/* The Unix standard contains a long explanation of the way to signal + an error after the fork() was successful. Since no new wait status + was wanted there is no way to signal an error using one of the + available methods. The committee chose to signal an error by a + normal program exit with the exit code 127. */ +#define SPAWN_ERROR 127 + +/* We need to block both SIGCANCEL and SIGSETXID. */ +#define SIGALL_SET \ + ((__sigset_t) { .__val = {[0 ... _SIGSET_NWORDS-1 ] = -1 } }) + +#ifdef __ia64__ +# define CLONE(__fn, __stack, __stacksize, __flags, __args) \ + __clone2 (__fn, __stack, __stacksize, __flags, __args, 0, 0, 0) +#else +# define CLONE(__fn, __stack, __stacksize, __flags, __args) \ + __clone (__fn, __stack, __flags, __args) +#endif + +#if _STACK_GROWS_DOWN +# define STACK(__stack, __stack_size) (__stack + __stack_size) +#elif _STACK_GROWS_UP +# define STACK(__stack, __stack_size) (__stack) +#endif + + +struct posix_spawn_args +{ + int pipe[2]; + sigset_t oldmask; + const char *file; + int (*exec) (const char *, char *const *, char *const *); + const posix_spawn_file_actions_t *fa; + const posix_spawnattr_t *restrict attr; + char *const *argv; + ptrdiff_t argc; + char *const *envp; + int xflags; +}; + +/* Older version requires that shell script without shebang definition + to be called explicitly using /bin/sh (_PATH_BSHELL). */ +static void +maybe_script_execute (struct posix_spawn_args *args) +{ + if (SHLIB_COMPAT (libc, GLIBC_2_2, GLIBC_2_15) + && (args->xflags & SPAWN_XFLAGS_TRY_SHELL) && errno == ENOEXEC) + { + char *const *argv = args->argv; + ptrdiff_t argc = args->argc; + + /* Construct an argument list for the shell. */ + char *new_argv[argc + 1]; + new_argv[0] = (char *) _PATH_BSHELL; + new_argv[1] = (char *) args->file; + memcpy (new_argv + 2, argv + 1, argc * sizeof(char *)); + + /* Execute the shell. */ + args->exec (new_argv[0], new_argv, args->envp); + } +} + +/* Function used in the clone call to setup the signals mask, posix_spawn + attributes, and file actions. It run on its own stack (provided by the + posix_spawn call). */ +static int +__spawni_child (void *arguments) +{ + struct posix_spawn_args *args = arguments; + const posix_spawnattr_t *restrict attr = args->attr; + const posix_spawn_file_actions_t *file_actions = args->fa; + int p = args->pipe[1]; + int ret; + + close_not_cancel (args->pipe[0]); + + /* The child must ensure that no signal handler are enabled because it shared + memory with parent, so the signal disposition must be either SIG_DFL or + SIG_IGN. It does by iterating over all signals and although it could + possibly be more optimized (by tracking which signal potentially have a + signal handler), it might requires system specific solutions (since the + sigset_t data type can be very different on different architectures). */ + struct sigaction sa; + memset (&sa, '\0', sizeof (sa)); + + sigset_t hset; + __sigprocmask (SIG_BLOCK, 0, &hset); + for (int sig = 1; sig < _NSIG; ++sig) + { + if ((attr->__flags & POSIX_SPAWN_SETSIGDEF) + && sigismember (&attr->__sd, sig)) + { + sa.sa_handler = SIG_DFL; + } + else if (sigismember (&hset, sig)) + { + if (__nptl_is_internal_signal (sig)) + sa.sa_handler = SIG_IGN; + else + { + __libc_sigaction (sig, 0, &sa); + if (sa.sa_handler == SIG_IGN) + continue; + sa.sa_handler = SIG_DFL; + } + } + else + continue; + + __libc_sigaction (sig, &sa, 0); + } + +#ifdef _POSIX_PRIORITY_SCHEDULING + /* Set the scheduling algorithm and parameters. */ + if ((attr->__flags & (POSIX_SPAWN_SETSCHEDPARAM | POSIX_SPAWN_SETSCHEDULER)) + == POSIX_SPAWN_SETSCHEDPARAM) + { + if ((ret = __sched_setparam (0, &attr->__sp)) == -1) + goto fail; + } + else if ((attr->__flags & POSIX_SPAWN_SETSCHEDULER) != 0) + { + if ((ret = __sched_setscheduler (0, attr->__policy, &attr->__sp)) == -1) + goto fail; + } +#endif + + /* Set the process group ID. */ + if ((attr->__flags & POSIX_SPAWN_SETPGROUP) != 0 + && (ret = __setpgid (0, attr->__pgrp)) != 0) + goto fail; + + /* Set the effective user and group IDs. */ + if ((attr->__flags & POSIX_SPAWN_RESETIDS) != 0 + && ((ret = local_seteuid (__getuid ())) != 0 + || (ret = local_setegid (__getgid ())) != 0)) + goto fail; + + /* Execute the file actions. */ + if (file_actions != 0) + { + int cnt; + struct rlimit64 fdlimit; + bool have_fdlimit = false; + + for (cnt = 0; cnt < file_actions->__used; ++cnt) + { + struct __spawn_action *action = &file_actions->__actions[cnt]; + + /* Dup the pipe fd onto an unoccupied one to avoid any file + operation to clobber it. */ + if ((action->action.close_action.fd == p) + || (action->action.open_action.fd == p) + || (action->action.dup2_action.fd == p)) + { + if ((ret = __dup (p)) < 0) + goto fail; + p = ret; + } + + switch (action->tag) + { + case spawn_do_close: + if ((ret = + close_not_cancel (action->action.close_action.fd)) != 0) + { + if (!have_fdlimit) + { + __getrlimit64 (RLIMIT_NOFILE, &fdlimit); + have_fdlimit = true; + } + + /* Signal errors only for file descriptors out of range. */ + if (action->action.close_action.fd < 0 + || action->action.close_action.fd >= fdlimit.rlim_cur) + goto fail; + } + break; + + case spawn_do_open: + { + ret = open_not_cancel (action->action.open_action.path, + action->action. + open_action.oflag | O_LARGEFILE, + action->action.open_action.mode); + + if (ret == -1) + goto fail; + + int new_fd = ret; + + /* Make sure the desired file descriptor is used. */ + if (ret != action->action.open_action.fd) + { + if ((ret = __dup2 (new_fd, action->action.open_action.fd)) + != action->action.open_action.fd) + goto fail; + + if ((ret = close_not_cancel (new_fd)) != 0) + goto fail; + } + } + break; + + case spawn_do_dup2: + if ((ret = __dup2 (action->action.dup2_action.fd, + action->action.dup2_action.newfd)) + != action->action.dup2_action.newfd) + goto fail; + break; + } + } + } + + /* Set the initial signal mask of the child if POSIX_SPAWN_SETSIGMASK + is set, otherwise restore the previous one. */ + __sigprocmask (SIG_SETMASK, (attr->__flags & POSIX_SPAWN_SETSIGMASK) + ? &attr->__ss : &args->oldmask, 0); + + args->exec (args->file, args->argv, args->envp); + + /* This is compatibility function required to enable posix_spawn run + script without shebang definition for older posix_spawn versions + (2.15). */ + maybe_script_execute (args); + + ret = -errno; + +fail: + /* Since sizeof errno < PIPE_BUF, the write is atomic. */ + ret = -ret; + if (ret) + while (write_not_cancel (p, &ret, sizeof ret) < 0) + continue; + exit (SPAWN_ERROR); +} + +/* Spawn a new process executing PATH with the attributes describes in *ATTRP. + Before running the process perform the actions described in FILE-ACTIONS. */ +static int +__spawnix (pid_t * pid, const char *file, + const posix_spawn_file_actions_t * file_actions, + const posix_spawnattr_t * attrp, char *const argv[], + char *const envp[], int xflags, + int (*exec) (const char *, char *const *, char *const *)) +{ + pid_t new_pid; + struct posix_spawn_args args; + int ec; + + if (__pipe2 (args.pipe, O_CLOEXEC)) + return errno; + + /* To avoid imposing hard limits on posix_spawn{p} the total number of + arguments is first calculated to allocate a mmap to hold all possible + values. */ + ptrdiff_t argc = 0; + /* Linux allows at most max (0x7FFFFFFF, 1/4 stack size) arguments + to be used in a execve call. We limit to INT_MAX minus one due the + compatiblity code that may execute a shell script (maybe_script_execute) + where it will construct another argument list with an additional + argument. */ + ptrdiff_t limit = INT_MAX - 1; + while (argv[argc++] != NULL) + if (argc == limit) + { + errno = E2BIG; + return errno; + } + + int prot = (PROT_READ | PROT_WRITE + | ((GL (dl_stack_flags) & PF_X) ? PROT_EXEC : 0)); + + /* Add a slack area for child's stack. */ + size_t argv_size = (argc * sizeof (void *)) + 512; + size_t stack_size = ALIGN_UP (argv_size, GLRO(dl_pagesize)); + void *stack = __mmap (NULL, stack_size, prot, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); + if (__glibc_unlikely (stack == MAP_FAILED)) + { + close_not_cancel (args.pipe[0]); + close_not_cancel (args.pipe[1]); + return errno; + } + + /* Disable asynchronous cancellation. */ + int cs = LIBC_CANCEL_ASYNC (); + + args.file = file; + args.exec = exec; + args.fa = file_actions; + args.attr = attrp ? attrp : &(const posix_spawnattr_t) { 0 }; + args.argv = argv; + args.argc = argc; + args.envp = envp; + args.xflags = xflags; + + __sigprocmask (SIG_BLOCK, &SIGALL_SET, &args.oldmask); + + /* The clone flags used will create a new child that will run in the same + memory space (CLONE_VM) and the execution of calling thread will be + suspend until the child calls execve or _exit. These condition as + signal below either by pipe write (_exit with SPAWN_ERROR) or + a successful execve. + Also since the calling thread execution will be suspend, there is not + need for CLONE_SETTLS. Although parent and child share the same TLS + namespace, there will be no concurrent access for TLS variables (errno + for instance). */ + new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size, + CLONE_VM | CLONE_VFORK | SIGCHLD, &args); + + close_not_cancel (args.pipe[1]); + + if (new_pid > 0) + { + if (__read (args.pipe[0], &ec, sizeof ec) != sizeof ec) + ec = 0; + else + __waitpid (new_pid, NULL, 0); + } + else + ec = -new_pid; + + __munmap (stack, stack_size); + + close_not_cancel (args.pipe[0]); + + if (!ec && new_pid) + *pid = new_pid; + + __sigprocmask (SIG_SETMASK, &args.oldmask, 0); + + LIBC_CANCEL_RESET (cs); + + return ec; +} + +/* Spawn a new process executing PATH with the attributes describes in *ATTRP. + Before running the process perform the actions described in FILE-ACTIONS. */ +int +__spawni (pid_t * pid, const char *file, + const posix_spawn_file_actions_t * acts, + const posix_spawnattr_t * attrp, char *const argv[], + char *const envp[], int xflags) +{ + return __spawnix (pid, file, acts, attrp, argv, envp, xflags, + xflags & SPAWN_XFLAGS_USE_PATH ? __execvpe : __execve); +} diff --git a/sysdeps/unix/sysv/linux/tile/clone.S b/sysdeps/unix/sysv/linux/tile/clone.S index 02fe3a8..a300eb5 100644 --- a/sysdeps/unix/sysv/linux/tile/clone.S +++ b/sysdeps/unix/sysv/linux/tile/clone.S @@ -219,4 +219,5 @@ ENTRY (__clone) } PSEUDO_END (__clone) +libc_hidden_def (__clone) weak_alias (__clone, clone) diff --git a/sysdeps/unix/sysv/linux/x86_64/clone.S b/sysdeps/unix/sysv/linux/x86_64/clone.S index 382568a..1a50957 100644 --- a/sysdeps/unix/sysv/linux/x86_64/clone.S +++ b/sysdeps/unix/sysv/linux/x86_64/clone.S @@ -115,4 +115,5 @@ L(thread_start): cfi_startproc; PSEUDO_END (__clone) +libc_hidden_def (__clone) weak_alias (__clone, clone)