Message ID | 1444849395-18800-1-git-send-email-adhemerval.zanella@linaro.com |
---|---|
State | New |
Headers | show |
On Wed, Oct 14, 2015 at 04:03:15PM -0300, Adhemerval Zanella wrote: > The tst-cancel20 open two pipes and creates a thread which blocks > reading the first pipe. It then issues a signal to activate an > handler which also blocks reading the second pipe. Finally the > cancellation cleanup-up handlers are tested by first closing the > all the pipe ends and issuing a pthread_cancel. The tst-cancel21 > have a similar behavior, but use an extra fork after the test itself. > > The race condition occurs if the cancellation handling acts after the > pipe close: in this case read will return EOF (indicating side-effects) > and thus the cancellation must not act. However current GLIBC > cancellation behavior acts regardless the syscalls returns with > sid-effects. > > This patch adjust the test by moving the pipe closing after the > cancellation handling. This avoid spurious cancellation for the > case described. > > Checked on x86_64 and i386. I was involved in the discussion of this and believe that the fix is correct. The only reason the tests "worked" before was that cancellation was wrongly being acted upon after read succeeded in reading EOF. Note that, with this change, the tests will now timeout if read fails to act on cancellation, rather than exiting with a reportable error. This could be fixed with some very complicated machinery involving an additional signal handler and AS-safe synchronization mechanisms to control the ordering of close with respect to interruption of read, but as long as timeout is an acceptable way of detecting test failure, I see no reason to complicate the test logic like that. Rich
Ping. On 14-10-2015 16:58, Rich Felker wrote: > On Wed, Oct 14, 2015 at 04:03:15PM -0300, Adhemerval Zanella wrote: >> The tst-cancel20 open two pipes and creates a thread which blocks >> reading the first pipe. It then issues a signal to activate an >> handler which also blocks reading the second pipe. Finally the >> cancellation cleanup-up handlers are tested by first closing the >> all the pipe ends and issuing a pthread_cancel. The tst-cancel21 >> have a similar behavior, but use an extra fork after the test itself. >> >> The race condition occurs if the cancellation handling acts after the >> pipe close: in this case read will return EOF (indicating side-effects) >> and thus the cancellation must not act. However current GLIBC >> cancellation behavior acts regardless the syscalls returns with >> sid-effects. >> >> This patch adjust the test by moving the pipe closing after the >> cancellation handling. This avoid spurious cancellation for the >> case described. >> >> Checked on x86_64 and i386. > > I was involved in the discussion of this and believe that the fix is > correct. The only reason the tests "worked" before was that > cancellation was wrongly being acted upon after read succeeded in > reading EOF. > > Note that, with this change, the tests will now timeout if read fails > to act on cancellation, rather than exiting with a reportable error. > This could be fixed with some very complicated machinery involving an > additional signal handler and AS-safe synchronization mechanisms to > control the ordering of close with respect to interruption of read, > but as long as timeout is an acceptable way of detecting test failure, > I see no reason to complicate the test logic like that. > > Rich >
Ping. On 19-10-2015 17:01, Adhemerval Zanella wrote: > Ping. > > On 14-10-2015 16:58, Rich Felker wrote: >> On Wed, Oct 14, 2015 at 04:03:15PM -0300, Adhemerval Zanella wrote: >>> The tst-cancel20 open two pipes and creates a thread which blocks >>> reading the first pipe. It then issues a signal to activate an >>> handler which also blocks reading the second pipe. Finally the >>> cancellation cleanup-up handlers are tested by first closing the >>> all the pipe ends and issuing a pthread_cancel. The tst-cancel21 >>> have a similar behavior, but use an extra fork after the test itself. >>> >>> The race condition occurs if the cancellation handling acts after the >>> pipe close: in this case read will return EOF (indicating side-effects) >>> and thus the cancellation must not act. However current GLIBC >>> cancellation behavior acts regardless the syscalls returns with >>> sid-effects. >>> >>> This patch adjust the test by moving the pipe closing after the >>> cancellation handling. This avoid spurious cancellation for the >>> case described. >>> >>> Checked on x86_64 and i386. >> >> I was involved in the discussion of this and believe that the fix is >> correct. The only reason the tests "worked" before was that >> cancellation was wrongly being acted upon after read succeeded in >> reading EOF. >> >> Note that, with this change, the tests will now timeout if read fails >> to act on cancellation, rather than exiting with a reportable error. >> This could be fixed with some very complicated machinery involving an >> additional signal handler and AS-safe synchronization mechanisms to >> control the ordering of close with respect to interruption of read, >> but as long as timeout is an acceptable way of detecting test failure, >> I see no reason to complicate the test logic like that. >> >> Rich >>
This looks ok to me. Failing with a timeout seems better than incorrectly reporting a pass. Though, a 40 second timeout seems a bit high to me. Tested on ppc64. Paul On 10/29/2015 12:28 PM, Adhemerval Zanella wrote: > Ping. > > On 19-10-2015 17:01, Adhemerval Zanella wrote: >> Ping. >> >> On 14-10-2015 16:58, Rich Felker wrote: >>> On Wed, Oct 14, 2015 at 04:03:15PM -0300, Adhemerval Zanella wrote: >>>> The tst-cancel20 open two pipes and creates a thread which blocks >>>> reading the first pipe. It then issues a signal to activate an >>>> handler which also blocks reading the second pipe. Finally the >>>> cancellation cleanup-up handlers are tested by first closing the >>>> all the pipe ends and issuing a pthread_cancel. The tst-cancel21 >>>> have a similar behavior, but use an extra fork after the test itself. >>>> >>>> The race condition occurs if the cancellation handling acts after the >>>> pipe close: in this case read will return EOF (indicating side-effects) >>>> and thus the cancellation must not act. However current GLIBC >>>> cancellation behavior acts regardless the syscalls returns with >>>> sid-effects. >>>> >>>> This patch adjust the test by moving the pipe closing after the >>>> cancellation handling. This avoid spurious cancellation for the >>>> case described. >>>> >>>> Checked on x86_64 and i386. >>> >>> I was involved in the discussion of this and believe that the fix is >>> correct. The only reason the tests "worked" before was that >>> cancellation was wrongly being acted upon after read succeeded in >>> reading EOF. >>> >>> Note that, with this change, the tests will now timeout if read fails >>> to act on cancellation, rather than exiting with a reportable error. >>> This could be fixed with some very complicated machinery involving an >>> additional signal handler and AS-safe synchronization mechanisms to >>> control the ordering of close with respect to interruption of read, >>> but as long as timeout is an acceptable way of detecting test failure, >>> I see no reason to complicate the test logic like that. >>> >>> Rich >>> >
Thanks for checking this out. Anyone more have any more comments on that? On 29-10-2015 17:59, Paul E. Murphy wrote: > This looks ok to me. Failing with a timeout seems better than > incorrectly reporting a pass. Though, a 40 second timeout seems > a bit high to me. > > Tested on ppc64. > > Paul > > On 10/29/2015 12:28 PM, Adhemerval Zanella wrote: >> Ping. >> >> On 19-10-2015 17:01, Adhemerval Zanella wrote: >>> Ping. >>> >>> On 14-10-2015 16:58, Rich Felker wrote: >>>> On Wed, Oct 14, 2015 at 04:03:15PM -0300, Adhemerval Zanella wrote: >>>>> The tst-cancel20 open two pipes and creates a thread which blocks >>>>> reading the first pipe. It then issues a signal to activate an >>>>> handler which also blocks reading the second pipe. Finally the >>>>> cancellation cleanup-up handlers are tested by first closing the >>>>> all the pipe ends and issuing a pthread_cancel. The tst-cancel21 >>>>> have a similar behavior, but use an extra fork after the test itself. >>>>> >>>>> The race condition occurs if the cancellation handling acts after the >>>>> pipe close: in this case read will return EOF (indicating side-effects) >>>>> and thus the cancellation must not act. However current GLIBC >>>>> cancellation behavior acts regardless the syscalls returns with >>>>> sid-effects. >>>>> >>>>> This patch adjust the test by moving the pipe closing after the >>>>> cancellation handling. This avoid spurious cancellation for the >>>>> case described. >>>>> >>>>> Checked on x86_64 and i386. >>>> >>>> I was involved in the discussion of this and believe that the fix is >>>> correct. The only reason the tests "worked" before was that >>>> cancellation was wrongly being acted upon after read succeeded in >>>> reading EOF. >>>> >>>> Note that, with this change, the tests will now timeout if read fails >>>> to act on cancellation, rather than exiting with a reportable error. >>>> This could be fixed with some very complicated machinery involving an >>>> additional signal handler and AS-safe synchronization mechanisms to >>>> control the ordering of close with respect to interruption of read, >>>> but as long as timeout is an acceptable way of detecting test failure, >>>> I see no reason to complicate the test logic like that. >>>> >>>> Rich >>>> >> >
If noone opposes I will commit this shortly. On 09-11-2015 10:21, Adhemerval Zanella wrote: > Thanks for checking this out. Anyone more have any more comments on that? > > On 29-10-2015 17:59, Paul E. Murphy wrote: >> This looks ok to me. Failing with a timeout seems better than >> incorrectly reporting a pass. Though, a 40 second timeout seems >> a bit high to me. >> >> Tested on ppc64. >> >> Paul >> >> On 10/29/2015 12:28 PM, Adhemerval Zanella wrote: >>> Ping. >>> >>> On 19-10-2015 17:01, Adhemerval Zanella wrote: >>>> Ping. >>>> >>>> On 14-10-2015 16:58, Rich Felker wrote: >>>>> On Wed, Oct 14, 2015 at 04:03:15PM -0300, Adhemerval Zanella wrote: >>>>>> The tst-cancel20 open two pipes and creates a thread which blocks >>>>>> reading the first pipe. It then issues a signal to activate an >>>>>> handler which also blocks reading the second pipe. Finally the >>>>>> cancellation cleanup-up handlers are tested by first closing the >>>>>> all the pipe ends and issuing a pthread_cancel. The tst-cancel21 >>>>>> have a similar behavior, but use an extra fork after the test itself. >>>>>> >>>>>> The race condition occurs if the cancellation handling acts after the >>>>>> pipe close: in this case read will return EOF (indicating side-effects) >>>>>> and thus the cancellation must not act. However current GLIBC >>>>>> cancellation behavior acts regardless the syscalls returns with >>>>>> sid-effects. >>>>>> >>>>>> This patch adjust the test by moving the pipe closing after the >>>>>> cancellation handling. This avoid spurious cancellation for the >>>>>> case described. >>>>>> >>>>>> Checked on x86_64 and i386. >>>>> >>>>> I was involved in the discussion of this and believe that the fix is >>>>> correct. The only reason the tests "worked" before was that >>>>> cancellation was wrongly being acted upon after read succeeded in >>>>> reading EOF. >>>>> >>>>> Note that, with this change, the tests will now timeout if read fails >>>>> to act on cancellation, rather than exiting with a reportable error. >>>>> This could be fixed with some very complicated machinery involving an >>>>> additional signal handler and AS-safe synchronization mechanisms to >>>>> control the ordering of close with respect to interruption of read, >>>>> but as long as timeout is an acceptable way of detecting test failure, >>>>> I see no reason to complicate the test logic like that. >>>>> >>>>> Rich >>>>> >>> >>
On 10/14/2015 09:03 PM, Adhemerval Zanella wrote: > + /* The pipe closing must be issued after the cancellation handling to avoid > + a race condition where the cancellation runs after both pipe ends are > + closed. In this case the read syscall returns EOF and the cancellation > + must not act. */ I find this comment confusing. What is actually happening during a spurious failure? The thread which is supposed to be canceled sees a failed read from the pipe instead? In this case, the return value from the read should be checked in a more stringent fashion. Or is the read not supposed to return at all? Then why have two read calls instead of one? Florian
On 02-12-2015 17:00, Florian Weimer wrote: > On 10/14/2015 09:03 PM, Adhemerval Zanella wrote: >> + /* The pipe closing must be issued after the cancellation handling to avoid >> + a race condition where the cancellation runs after both pipe ends are >> + closed. In this case the read syscall returns EOF and the cancellation >> + must not act. */ > > I find this comment confusing. What is actually happening during a > spurious failure? The thread which is supposed to be canceled sees a > failed read from the pipe instead? For current GLIBC cancellation mechanism it does not matter if the read have side-effects or not, the cancellation will act regardless. However if we change the cancellation to *not* work if side-effects are returned from the syscall (which is the case for new cancellation mechanism I am currently working and the musl libc), then we must not cancel if the 'read' syscall returns EOF. This is the case of this testcase, where there is no synchronization between the read in the thread and close in the main thread. If the cancellation signal handler acts *after* close syscalls, the read on the other side of the pipe will return with side-effects (EOF). This does not fail for GLIBC, even with the racy condition. > > In this case, the return value from the read should be checked in a more > stringent fashion. Or is the read not supposed to return at all? Then > why have two read calls instead of one? My understanding of the testcase if read to act as a blocking point and to check if it is cancelled correctly. As I put before, if the syscall returns of not with side-effects, it does not matter to *GLIBC*. However, since we are moving to a more robust cancellation mechanism we must not cancel in this case. Thus why I moved the closed after the thread is finished, if the cancellation is not triggered, the testcase will still fail with a timeout. > > Florian >
diff --git a/nptl/tst-cancel20.c b/nptl/tst-cancel20.c index 51b558e..91452fb 100644 --- a/nptl/tst-cancel20.c +++ b/nptl/tst-cancel20.c @@ -145,12 +145,6 @@ do_one_test (void) return 1; } - /* This will cause the read in the child to return. */ - close (fd[0]); - close (fd[1]); - close (fd[2]); - close (fd[3]); - void *ret; if (pthread_join (th, &ret) != 0) { @@ -170,6 +164,15 @@ do_one_test (void) return 1; } + /* The pipe closing must be issued after the cancellation handling to avoid + a race condition where the cancellation runs after both pipe ends are + closed. In this case the read syscall returns EOF and the cancellation + must not act. */ + close (fd[0]); + close (fd[1]); + close (fd[2]); + close (fd[3]); + return 0; } diff --git a/nptl/tst-cancel21.c b/nptl/tst-cancel21.c index b54f236..d082776 100644 --- a/nptl/tst-cancel21.c +++ b/nptl/tst-cancel21.c @@ -123,12 +123,6 @@ tf (void *arg) exit (1); } - /* This will cause the read in the initial thread to return. */ - close (fd[0]); - close (fd[1]); - close (fd[2]); - close (fd[3]); - void *ret; if (pthread_join (th, &ret) != 0) { @@ -154,6 +148,15 @@ tf (void *arg) exit (1); } + /* The pipe closing must be issued after the cancellation handling to avoid + a race condition where the cancellation runs after both pipe ends are + closed. In this case the read syscall returns EOF and the cancellation + must not act. */ + close (fd[0]); + close (fd[1]); + close (fd[2]); + close (fd[3]); + exit (0); }