Message ID | 5640A89D.80804@linaro.org |
---|---|
State | New |
Headers | show |
On 11/09/2015 03:07 PM, Adhemerval Zanella wrote: >> Could you write a glibc test case for this? I don't feel comfortable >> with removing the workaround without a test, and I can't find an >> existing one. > > What about: > > [BZ #16364] > * sysdeps/unix/sysv/linux/sleep.c (__sleep): Simplify code to use > nanosleep without requiring to handle spurious wakeups. > * posix/tst-nanosleep-signal.c: New file. > * posix/Makefile (test): Add tst-nanosleep-signal. Thanks, but see below for details. > + int ret; > + pid_t f = fork (); > + if (f == 0) > + { > + // child > + struct timespec tv = { .tv_sec = 0, .tv_nsec = 250000000UL }; > + nanosleep (&tv, &tv); > + exit (0); This needs to be “_exit (0);”, to avoid running termination handlers twice. > + } > + else > + { > + // parent > + struct timespec tv = { .tv_sec = 0, .tv_nsec = 500000000UL }; > + ret = nanosleep (&tv, &tv); > + } > + return ret; > +} The parent needs to check for fork errors. > + > +void > +check_nanosleep_sigdfl(void) Space before paren. > +{ > + /* For SIG_DFL nanosleep should not be interrupted by SIGCHLD. */ > + signal(SIGCHLD, SIG_DFL); Space before paren. > + if (check_nanosleep_base ()) > + { > + puts ("error: nanosleep was interrupted with SIG_DFL"); > + errors = 1; > + } > +} This should check that errno is EINTR. > + > +void > +check_nanosleep_dummy(void) Space before paren. > +{ > + /* With a dummy handler nanosleep should be interrupted. */ > + signal(SIGCHLD, dummy); Space before paren. > + int ret = check_nanosleep_base (); > + if (ret == 0) > + { > + puts ("error: nanosleep was not interrupted with dummy handler"); > + errors = 1; > + } > +} Missing EINTR check. > + > +void > +check_nanosleep_sigign(void) Space before paren. > +{ > + /* For SIG_IGN nanosleep should not be interrupted by SIGCHLD. */ > + signal(SIGCHLD, SIG_IGN); Space before paren. The test is racy in two ways: the child could exit before nanosleep in the parent starts, or the child exit could be delayed after nanosleep in the parent ends. I'm not sure if there is way to make this more reliable. The larger question is whether the EINTR check is sufficient, or if a time-based check is needed as well. That is, if the kernel bug consistent of silent early termination of nanosleep. Florian
<https://sourceware.org/ml/libc-hacker/1998-11/msg00210.html> has what was claimed to be a test for the kernel bug (albeit one with "I believe", not asserted to have actually been tested). I found some claims that nanosleep restarting in some cases was fixed in 2.6(.0), e.g. <https://lkml.org/lkml/2004/11/25/5> <https://lkml.org/lkml/2003/11/8/50>, so before the git history.
On 09-11-2015 12:28, Florian Weimer wrote: > On 11/09/2015 03:07 PM, Adhemerval Zanella wrote: > >>> Could you write a glibc test case for this? I don't feel comfortable >>> with removing the workaround without a test, and I can't find an >>> existing one. >> >> What about: >> >> [BZ #16364] >> * sysdeps/unix/sysv/linux/sleep.c (__sleep): Simplify code to use >> nanosleep without requiring to handle spurious wakeups. >> * posix/tst-nanosleep-signal.c: New file. >> * posix/Makefile (test): Add tst-nanosleep-signal. > > Thanks, but see below for details. > >> + int ret; >> + pid_t f = fork (); >> + if (f == 0) >> + { >> + // child >> + struct timespec tv = { .tv_sec = 0, .tv_nsec = 250000000UL }; >> + nanosleep (&tv, &tv); >> + exit (0); > > > This needs to be “_exit (0);”, to avoid running termination handlers twice. > I will change it. >> + } >> + else >> + { >> + // parent >> + struct timespec tv = { .tv_sec = 0, .tv_nsec = 500000000UL }; >> + ret = nanosleep (&tv, &tv); >> + } >> + return ret; >> +} > > The parent needs to check for fork errors. > Right, I will change it. >> + >> +void >> +check_nanosleep_sigdfl(void) > > Space before paren. > > >> +{ >> + /* For SIG_DFL nanosleep should not be interrupted by SIGCHLD. */ >> + signal(SIGCHLD, SIG_DFL); > > Space before paren. > >> + if (check_nanosleep_base ()) >> + { >> + puts ("error: nanosleep was interrupted with SIG_DFL"); >> + errors = 1; >> + } >> +} > > This should check that errno is EINTR. I do not think this check is strictly required: the other failures (EFAULT and EINVAL) should also indicate something very flaky on the system. > >> + >> +void >> +check_nanosleep_dummy(void) > > Space before paren. > >> +{ >> + /* With a dummy handler nanosleep should be interrupted. */ >> + signal(SIGCHLD, dummy); > > Space before paren. > >> + int ret = check_nanosleep_base (); >> + if (ret == 0) >> + { >> + puts ("error: nanosleep was not interrupted with dummy handler"); >> + errors = 1; >> + } >> +} > > Missing EINTR check. Also the idea here is to indicate any failure for nanosleep (same as before). > >> + >> +void >> +check_nanosleep_sigign(void) > > Space before paren. > >> +{ >> + /* For SIG_IGN nanosleep should not be interrupted by SIGCHLD. */ >> + signal(SIGCHLD, SIG_IGN); > > Space before paren. > > The test is racy in two ways: the child could exit before nanosleep in > the parent starts, or the child exit could be delayed after nanosleep in > the parent ends. I'm not sure if there is way to make this more reliable. I am aware of that, that's why I try to mitigate this by making the time on parent twice for the child and some magnitude higher than the syscalls itself. However I also can't see a way to make this test entirely reliable: even by doing some synchronization (either by shared semaphores or pthread barrier) and/or sending the SIGCHLD directly using signal the two race scenarios you describe will still have a small window to occur. I am not sure which strategy will be better and I think we should not rely or add hacks to try to mitigate for such kernel failures. > > The larger question is whether the EINTR check is sufficient, or if a > time-based check is needed as well. That is, if the kernel bug > consistent of silent early termination of nanosleep. My understanding is on old kernels the nanosleep calls was not restarted in a nanosleep call (with the restart_syscall), so it nanosleep will early terminate. > > Florian >
On 11/09/2015 05:30 PM, Adhemerval Zanella wrote: > I do not think this check is strictly required: the other failures (EFAULT and > EINVAL) should also indicate something very flaky on the system. We should always check for unexpected errors. We might have been overlooking something. >> The test is racy in two ways: the child could exit before nanosleep in >> the parent starts, or the child exit could be delayed after nanosleep in >> the parent ends. I'm not sure if there is way to make this more reliable. > > I am aware of that, that's why I try to mitigate this by making the time on > parent twice for the child and some magnitude higher than the syscalls itself. > However I also can't see a way to make this test entirely reliable: even by > doing some synchronization (either by shared semaphores or pthread barrier) > and/or sending the SIGCHLD directly using signal the two race scenarios you > describe will still have a small window to occur. I am not sure which > strategy will be better and I think we should not rely or add hacks to try > to mitigate for such kernel failures. I think you could make it more likely to hit the window if you forked several child processes. >> The larger question is whether the EINTR check is sufficient, or if a >> time-based check is needed as well. That is, if the kernel bug >> consistent of silent early termination of nanosleep. > > My understanding is on old kernels the nanosleep calls was not restarted > in a nanosleep call (with the restart_syscall), so it nanosleep will > early terminate. To be sure, you should check how much time has elapsed in the nanosleep calls, with clock_gettime(CLOCK_MONOTONIC). This would cover both the -1/EINTR case and the return value 0 case. The existing code does not look at EINTR, so if there was a bug, it was the return value 0 scenario. Florian
On 09-11-2015 15:18, Florian Weimer wrote: > On 11/09/2015 05:30 PM, Adhemerval Zanella wrote: > >> I do not think this check is strictly required: the other failures (EFAULT and >> EINVAL) should also indicate something very flaky on the system. > > We should always check for unexpected errors. We might have been > overlooking something. > Fair enough, I will change that. >>> The test is racy in two ways: the child could exit before nanosleep in >>> the parent starts, or the child exit could be delayed after nanosleep in >>> the parent ends. I'm not sure if there is way to make this more reliable. >> >> I am aware of that, that's why I try to mitigate this by making the time on >> parent twice for the child and some magnitude higher than the syscalls itself. >> However I also can't see a way to make this test entirely reliable: even by >> doing some synchronization (either by shared semaphores or pthread barrier) >> and/or sending the SIGCHLD directly using signal the two race scenarios you >> describe will still have a small window to occur. I am not sure which >> strategy will be better and I think we should not rely or add hacks to try >> to mitigate for such kernel failures. > > I think you could make it more likely to hit the window if you forked > several child processes. Or if the system load is high enough. My question is if we should push or not this test even with this racy condition and if so which strategy would be better to mitigate it. > >>> The larger question is whether the EINTR check is sufficient, or if a >>> time-based check is needed as well. That is, if the kernel bug >>> consistent of silent early termination of nanosleep. >> >> My understanding is on old kernels the nanosleep calls was not restarted >> in a nanosleep call (with the restart_syscall), so it nanosleep will >> early terminate. > > To be sure, you should check how much time has elapsed in the nanosleep > calls, with clock_gettime(CLOCK_MONOTONIC). This would cover both the > -1/EINTR case and the return value 0 case. The existing code does not > look at EINTR, so if there was a bug, it was the return value 0 scenario. For nanosleep check itself, glibc already have posix/tst-nanosleep.c (which uses gettimeofday.c). However my main questioning and the reason of this bikeshedding regarding is if GLIBC should continue to provide a hack for non-compliant kernels and if we should really check for this issue with a possible racy testcase. > > Florian > >
On 09-11-2015 14:19, Joseph Myers wrote: > <https://sourceware.org/ml/libc-hacker/1998-11/msg00210.html> has what was > claimed to be a test for the kernel bug (albeit one with "I believe", not > asserted to have actually been tested). This test is essentially what I have added, it just spawn more children with different sleep periods. > > I found some claims that nanosleep restarting in some cases was fixed in > 2.6(.0), e.g. <https://lkml.org/lkml/2004/11/25/5> > <https://lkml.org/lkml/2003/11/8/50>, so before the git history. > Thanks, I tried to track down the kernel source which handles the nanosleep (kernel/posix-cpu-timers.c), but found it has been fixed also before git history. I will resend the patch with this information.
diff --git a/posix/Makefile b/posix/Makefile index aeb9890..cec06a6 100644 --- a/posix/Makefile +++ b/posix/Makefile @@ -74,8 +74,8 @@ tests := tstgetopt testfnm runtests runptests \ bug-regex21 bug-regex22 bug-regex23 bug-regex24 \ bug-regex25 bug-regex26 bug-regex27 bug-regex28 \ bug-regex29 bug-regex30 bug-regex31 bug-regex32 \ - bug-regex33 tst-nice tst-nanosleep tst-regex2 \ - transbug tst-rxspencer tst-pcre tst-boost \ + bug-regex33 tst-nice tst-nanosleep tst-nanosleep-signal \ + tst-regex2 transbug tst-rxspencer tst-pcre tst-boost \ bug-ga1 tst-vfork1 tst-vfork2 tst-vfork3 tst-waitid \ tst-getaddrinfo2 bug-glob1 bug-glob2 bug-glob3 tst-sysconf \ tst-execvp1 tst-execvp2 tst-execlp1 tst-execlp2 \ diff --git a/posix/tst-nanosleep-signal.c b/posix/tst-nanosleep-signal.c new file mode 100644 index 0000000..e6abab4 --- /dev/null +++ b/posix/tst-nanosleep-signal.c @@ -0,0 +1,105 @@ +/* Check if nanosleep handles correctly SIGCHLD. + Copyright (C) 2015 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <time.h> +#include <signal.h> +#include <sys/time.h> + +static int errors = 0; + +static void +dummy (int sig) +{ +} + +/* Fork and terminate the child in 0.25s while parent waits on a nanosleep + for 0.5s. */ +static int +check_nanosleep_base (void) +{ + int ret; + pid_t f = fork (); + if (f == 0) + { + // child + struct timespec tv = { .tv_sec = 0, .tv_nsec = 250000000UL }; + nanosleep (&tv, &tv); + exit (0); + } + else + { + // parent + struct timespec tv = { .tv_sec = 0, .tv_nsec = 500000000UL }; + ret = nanosleep (&tv, &tv); + } + return ret; +} + +void +check_nanosleep_sigdfl(void) +{ + /* For SIG_DFL nanosleep should not be interrupted by SIGCHLD. */ + signal(SIGCHLD, SIG_DFL); + if (check_nanosleep_base ()) + { + puts ("error: nanosleep was interrupted with SIG_DFL"); + errors = 1; + } +} + +void +check_nanosleep_dummy(void) +{ + /* With a dummy handler nanosleep should be interrupted. */ + signal(SIGCHLD, dummy); + int ret = check_nanosleep_base (); + if (ret == 0) + { + puts ("error: nanosleep was not interrupted with dummy handler"); + errors = 1; + } +} + +void +check_nanosleep_sigign(void) +{ + /* For SIG_IGN nanosleep should not be interrupted by SIGCHLD. */ + signal(SIGCHLD, SIG_IGN); + if (check_nanosleep_base ()) + { + puts ("error: nanosleep was interrupted with SIG_IGN"); + errors = 1; + } +} + + +static int +do_test (void) +{ + check_nanosleep_sigdfl (); + check_nanosleep_dummy (); + check_nanosleep_sigign (); + + return errors; +} + +#define TEST_FUNCTION do_test () +#include "../test-skeleton.c" diff --git a/sysdeps/unix/sysv/linux/sleep.c b/sysdeps/unix/sysv/linux/sleep.c index 2a06d5a..3885a34 100644 --- a/sysdeps/unix/sysv/linux/sleep.c +++ b/sysdeps/unix/sysv/linux/sleep.c @@ -17,133 +17,16 @@ License along with the GNU C Library; if not, see <http://www.gnu.org/licenses/>. */ -#include <errno.h> #include <time.h> -#include <signal.h> -#include <string.h> /* For the real memset prototype. */ #include <unistd.h> -#include <sys/param.h> -#include <nptl/pthreadP.h> +#include <sysdep-cancel.h> - -#if 0 -static void -cl (void *arg) -{ - (void) __sigprocmask (SIG_SETMASK, arg, (sigset_t *) NULL); -} -#endif - - -/* We are going to use the `nanosleep' syscall of the kernel. But the - kernel does not implement the stupid SysV SIGCHLD vs. SIG_IGN - behaviour for this syscall. Therefore we have to emulate it here. */ unsigned int __sleep (unsigned int seconds) { - const unsigned int max - = (unsigned int) (((unsigned long int) (~((time_t) 0))) >> 1); - struct timespec ts; - sigset_t set, oset; - unsigned int result; - - /* This is not necessary but some buggy programs depend on this. */ - if (__glibc_unlikely (seconds == 0)) - { -#ifdef CANCELLATION_P - CANCELLATION_P (THREAD_SELF); -#endif - return 0; - } - - ts.tv_sec = 0; - ts.tv_nsec = 0; - again: - if (sizeof (ts.tv_sec) <= sizeof (seconds)) - { - /* Since SECONDS is unsigned assigning the value to .tv_sec can - overflow it. In this case we have to wait in steps. */ - ts.tv_sec += MIN (seconds, max); - seconds -= (unsigned int) ts.tv_sec; - } - else - { - ts.tv_sec = (time_t) seconds; - seconds = 0; - } - - /* Linux will wake up the system call, nanosleep, when SIGCHLD - arrives even if SIGCHLD is ignored. We have to deal with it - in libc. We block SIGCHLD first. */ - __sigemptyset (&set); - __sigaddset (&set, SIGCHLD); - if (__sigprocmask (SIG_BLOCK, &set, &oset)) - return -1; - - /* If SIGCHLD is already blocked, we don't have to do anything. */ - if (!__sigismember (&oset, SIGCHLD)) - { - int saved_errno; - struct sigaction oact; - - __sigemptyset (&set); - __sigaddset (&set, SIGCHLD); - - /* We get the signal handler for SIGCHLD. */ - if (__sigaction (SIGCHLD, (struct sigaction *) NULL, &oact) < 0) - { - saved_errno = errno; - /* Restore the original signal mask. */ - (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL); - __set_errno (saved_errno); - return -1; - } - - /* Note the sleep() is a cancellation point. But since we call - nanosleep() which itself is a cancellation point we do not - have to do anything here. */ - if (oact.sa_handler == SIG_IGN) - { - //__libc_cleanup_push (cl, &oset); - - /* We should leave SIGCHLD blocked. */ - while (1) - { - result = __nanosleep (&ts, &ts); - - if (result != 0 || seconds == 0) - break; - - if (sizeof (ts.tv_sec) <= sizeof (seconds)) - { - ts.tv_sec = MIN (seconds, max); - seconds -= (unsigned int) ts.tv_nsec; - } - } - - //__libc_cleanup_pop (0); - - saved_errno = errno; - /* Restore the original signal mask. */ - (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL); - __set_errno (saved_errno); - - goto out; - } - - /* We should unblock SIGCHLD. Restore the original signal mask. */ - (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL); - } - - result = __nanosleep (&ts, &ts); - if (result == 0 && seconds != 0) - goto again; - - out: - if (result != 0) - /* Round remaining time. */ - result = seconds + (unsigned int) ts.tv_sec + (ts.tv_nsec >= 500000000L); - - return result; + struct timespec ts = { .tv_sec = seconds, .tv_nsec = 0 }; + if (__nanosleep (&ts, &ts)) + return ts.tv_sec; + return 0; } weak_alias (__sleep, sleep)