Message ID | 448cd43d-8f24-7d02-e6dc-b6fa8e0dc44e@redhat.com |
---|---|
State | New |
Headers | show |
Series | nptl: Fix join/tryjoin comments. | expand |
* Carlos O'Donell: > This commit fixes several inaccurate comments in the implementation of > __pthread_timedjoin_ex(), pthread_tryjoin_np, and lll_wait_tid, all of > which are used to implement POSIX thread joining. GNU style is not to use () after function names. > Firstly, in pthread_tryjoin_np() we only attempt the join if a read of > pd->tid == 0, because that means the thread has already been reaped by > the kernel and we can safely join it without blocking. > > Secondly, all join implementations call the common > __pthread_timedjoin_ex and only if abstime is NULL (block) do we > actually need to use cancellation (to cancel the potentially infinite > wait). I think the change regarding abstime == NULL is wrong. This is about pthread_join/pthread_timedjoin_np (blocking) on the one hand and pthread_tryjoin_np on the other (not blocking). Thanks, Florian
On 2/11/19 4:37 PM, Florian Weimer wrote: > * Carlos O'Donell: > >> This commit fixes several inaccurate comments in the implementation of >> __pthread_timedjoin_ex(), pthread_tryjoin_np, and lll_wait_tid, all of >> which are used to implement POSIX thread joining. > > GNU style is not to use () after function names. Commit message fixed (by removing ()). >> Firstly, in pthread_tryjoin_np() we only attempt the join if a read of >> pd->tid == 0, because that means the thread has already been reaped by >> the kernel and we can safely join it without blocking. >> >> Secondly, all join implementations call the common >> __pthread_timedjoin_ex and only if abstime is NULL (block) do we >> actually need to use cancellation (to cancel the potentially infinite >> wait). > > I think the change regarding abstime == NULL is wrong. This is about > pthread_join/pthread_timedjoin_np (blocking) on the one hand and > pthread_tryjoin_np on the other (not blocking). Why is the comment wrong? Or do you mean to say the comment could be made more accurate? The same core __pthread_timedjoin_ex is used for everything, so it handles 3 cases: * join * tryjoin * timedjoin In a join with abstime NULL we switch to asynchronous cancellation, and this is done behind the scenes by lll_wait_tid calling lll_futex_wait_cancel when abstime is NULL. In a tryjoin we only call __pthread_timedjoin_ex when __tid is 0, and we optimize away the lll_wait_tid by setting block to true. It doesn't change the fact that if we had called lll_wait_tid, we would not have blocked anyway, we would have stopped at the first atomic_load_acquire and never waited. Only in timedjoin, with abstime non-NULL, do we ever reach the call to lll_futex_wait_cancel. It's the only time we switch to asychronous ancellation because it's what we do around the *_cancel variants (until we have a better method for doing cancellation). Given this information: (a) For join, block == true, and abstime == NULL, so we do use async cancel for the futex wait. And the old comment was fine, but the new comment is clearer. The old comment also was for when we *unconditionally* called CANCEL_ASYNC() (which means "turn on async cancel"), but this code was removed in 2018 by ce7eb0e9031, making this not something we do unconditionally anymore. (b) For tryjoin, block == false, and abstime == NULL, in theory the comment in pthread_tryjoin should be deleted because block == false will mean we *never* call lll_wait_tid... 29 /* If pd->tid != 0 then lll_wait_tid will not block on futex 30 operation. */ ^^^^ With the block == false optimization we never call lll_wait_tid! But the comment is wrong, probably just a typo in 2017 by 4735850f7a4. 31 return __pthread_timedjoin_ex (threadid, thread_return, NULL, false); but I like the comment because it holds regardless of the optimization. (c) For timedjoin, block == true, and abstime != NULL, and the comment is wrong, there is no async cancel involved. We call lll_wait_tid and the abstime != NULL case calls __lll_timedwait_tid which is not a cancellation point. No cancellation for timedjoin. So in (c) we show that it's not about block vs non-block, it's about making only pthread_join a cancellation point. Is this a bug? Should __lll_timedwait_tid for very long timeouts have used cancellation? No. Since the timedjoin is a GNU addition it is ours to define as a cancellation point or not, and we choose not to. While join is mandated by POSIX to be a cancellation point and it is (the only one of the joins to be so).
* Carlos O'Donell: > Why is the comment wrong? Or do you mean to say the comment could be > made more accurate? Let's go back a step. My understanding is that pthread_join and pthread_timedjoin_np are both cancellation points, but pthread_tryjoin_np is not. Are you saying that pthread_timedjoin_np is not a cancellation point? I assume that would be a bug. Thanks, Florian
On 12/02/2019 07:55, Florian Weimer wrote: > * Carlos O'Donell: > >> Why is the comment wrong? Or do you mean to say the comment could be >> made more accurate? > > Let's go back a step. My understanding is that pthread_join and > pthread_timedjoin_np are both cancellation points, but > pthread_tryjoin_np is not. Are you saying that pthread_timedjoin_np is > not a cancellation point? > > I assume that would be a bug. This is my mistake done by ce7eb0e90315eb1939ac29f656d39b3db858c092, I refactored lll_wait_tid and wrongly assumed __lll_timedwait_tid is a cancellation point. The lll_wait_tid is only used on __pthread_timedjoin_ex and replicated on arch lowlevellock.h implementation, I think it would be simpler to just move it to nptl/pthread_join_common.c.
On 12/02/2019 09:39, Adhemerval Zanella wrote: > > > On 12/02/2019 07:55, Florian Weimer wrote: >> * Carlos O'Donell: >> >>> Why is the comment wrong? Or do you mean to say the comment could be >>> made more accurate? >> >> Let's go back a step. My understanding is that pthread_join and >> pthread_timedjoin_np are both cancellation points, but >> pthread_tryjoin_np is not. Are you saying that pthread_timedjoin_np is >> not a cancellation point? >> >> I assume that would be a bug. > > This is my mistake done by ce7eb0e90315eb1939ac29f656d39b3db858c092, > I refactored lll_wait_tid and wrongly assumed __lll_timedwait_tid is > a cancellation point. The lll_wait_tid is only used on > __pthread_timedjoin_ex and replicated on arch lowlevellock.h > implementation, I think it would be simpler to just move it to > nptl/pthread_join_common.c. > > I am working on fixing it.
* Adhemerval Zanella: > On 12/02/2019 09:39, Adhemerval Zanella wrote: >> >> >> On 12/02/2019 07:55, Florian Weimer wrote: >>> * Carlos O'Donell: >>> >>>> Why is the comment wrong? Or do you mean to say the comment could be >>>> made more accurate? >>> >>> Let's go back a step. My understanding is that pthread_join and >>> pthread_timedjoin_np are both cancellation points, but >>> pthread_tryjoin_np is not. Are you saying that pthread_timedjoin_np is >>> not a cancellation point? >>> >>> I assume that would be a bug. >> >> This is my mistake done by ce7eb0e90315eb1939ac29f656d39b3db858c092, >> I refactored lll_wait_tid and wrongly assumed __lll_timedwait_tid is >> a cancellation point. The lll_wait_tid is only used on >> __pthread_timedjoin_ex and replicated on arch lowlevellock.h >> implementation, I think it would be simpler to just move it to >> nptl/pthread_join_common.c. >> >> > > I am working on fixing it. Thanks. Meanwhile, I filed <https://sourceware.org/bugzilla/show_bug.cgi?id=24215> with an out-of-tree test case. Florian
diff --git a/ChangeLog b/ChangeLog index 6f1d967f62..37e7babf75 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2019-02-11 Carlos O'Donell <carlos@redhat.com> + + * nptl/lll_timedwait_tid.c: Use GNU-style comment. + * nptl/pthread_join_common.c: Fix comment. + * nptl/pthread_tryjoin.c: Likewise. + * sysdeps/nptl/lowlevellock.h: Likewise. + 2019-02-11 Paul A. Clarke <pc@us.ibm.com> * sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrtf): diff --git a/nptl/lll_timedwait_tid.c b/nptl/lll_timedwait_tid.c index 59ecbb4446..8f22da9b05 100644 --- a/nptl/lll_timedwait_tid.c +++ b/nptl/lll_timedwait_tid.c @@ -60,8 +60,7 @@ __lll_timedwait_tid (int *tidp, const struct timespec *abstime) /* If *tidp == tid, wait until thread terminates or the wait times out. The kernel up to version 3.16.3 does not use the private futex - operations for futex wake-up when the clone terminates. - */ + operations for futex wake-up when the clone terminates. */ if (lll_futex_timed_wait (tidp, tid, &rt, LLL_SHARED) == -ETIMEDOUT) return ETIMEDOUT; } diff --git a/nptl/pthread_join_common.c b/nptl/pthread_join_common.c index ecb78ffba5..e99854ac39 100644 --- a/nptl/pthread_join_common.c +++ b/nptl/pthread_join_common.c @@ -76,8 +76,8 @@ __pthread_timedjoin_ex (pthread_t threadid, void **thread_return, if (block) { - /* During the wait we change to asynchronous cancellation. If we - are cancelled the thread we are waiting for must be marked as + /* If abstime is NULL we switch to asynchronous cancellation. If we + are cancelled then the thread we are waiting for must be marked as un-wait-ed for again. */ pthread_cleanup_push (cleanup, &pd->joinid); diff --git a/nptl/pthread_tryjoin.c b/nptl/pthread_tryjoin.c index aa4fe07af5..7dbf22868d 100644 --- a/nptl/pthread_tryjoin.c +++ b/nptl/pthread_tryjoin.c @@ -26,7 +26,7 @@ pthread_tryjoin_np (pthread_t threadid, void **thread_return) if (pd->tid != 0) return EBUSY; - /* If pd->tid != 0 then lll_wait_tid will not block on futex + /* If pd->tid == 0 then lll_wait_tid will not block on futex operation. */ return __pthread_timedjoin_ex (threadid, thread_return, NULL, false); } diff --git a/sysdeps/nptl/lowlevellock.h b/sysdeps/nptl/lowlevellock.h index 36bbc472e7..2581ed5ca9 100644 --- a/sysdeps/nptl/lowlevellock.h +++ b/sysdeps/nptl/lowlevellock.h @@ -185,7 +185,7 @@ extern int __lll_timedwait_tid (int *, const struct timespec *) operations for futex wake-up when the clone terminates. If ABSTIME is not NULL, is used a timeout for futex call. If the timeout occurs then return ETIMEOUT, if ABSTIME is invalid, return EINVAL. - The futex operation are issues with cancellable versions. */ + If abstime is NULL we use a cancellable futex wait operation. */ #define lll_wait_tid(tid, abstime) \ ({ \ int __res = 0; \
Florian, I noticed these comment mistakes while reviewing your patch for BZ #24211. OK for master? ~~~ This commit fixes several inaccurate comments in the implementation of __pthread_timedjoin_ex(), pthread_tryjoin_np, and lll_wait_tid, all of which are used to implement POSIX thread joining. Firstly, in pthread_tryjoin_np() we only attempt the join if a read of pd->tid == 0, because that means the thread has already been reaped by the kernel and we can safely join it without blocking. Secondly, all join implementations call the common __pthread_timedjoin_ex and only if abstime is NULL (block) do we actually need to use cancellation (to cancel the potentially infinite wait). Lastly, in lll_wait_tid we only call the canellation futex operations if abstime is non-NULL. With the comments corrected the implementation notes are consistent and logical. Signed-off-by: Carlos O'Donell <carlos@redhat.com> --- ChangeLog | 7 +++++++ nptl/lll_timedwait_tid.c | 3 +-- nptl/pthread_join_common.c | 4 ++-- nptl/pthread_tryjoin.c | 2 +- sysdeps/nptl/lowlevellock.h | 2 +- 5 files changed, 12 insertions(+), 6 deletions(-)