Message ID | 53F26769.20801@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On Mon, Aug 18, 2014 at 05:51:53PM -0300, Adhemerval Zanella wrote: > On 18-08-2014 16:53, Rich Felker wrote: > > A couple days ago I posted ideas for a fix for this issue on the bug > > tracker: > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=13165#c38 > > > > Anybody who does glibc development/builds/testing up for trying my > > idea and seeing if it works? > > > > Rich > > > If I understood correctly, you are proposing something like: > > diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c > index fc5eac4..a16c5d5 100644 > --- a/nptl/pthread_cond_wait.c > +++ b/nptl/pthread_cond_wait.c > @@ -118,14 +118,6 @@ __pthread_cond_wait (cond, mutex) > /* Make sure we are alone. */ > lll_lock (cond->__data.__lock, pshared); > > - /* Now we can release the mutex. */ > - err = __pthread_mutex_unlock_usercnt (mutex, 0); > - if (__glibc_unlikely (err)) > - { > - lll_unlock (cond->__data.__lock, pshared); > - return err; > - } > - > /* We have one new user of the condvar. */ > ++cond->__data.__total_seq; > ++cond->__data.__futex; > @@ -153,6 +145,14 @@ __pthread_cond_wait (cond, mutex) > /* Remember the broadcast counter. */ > cbuffer.bc_seq = cond->__data.__broadcast_seq; > > + /* Now we can release the mutex. */ > + err = __pthread_mutex_unlock_usercnt (mutex, 0); > + if (__glibc_unlikely (err)) > + { > + lll_unlock (cond->__data.__lock, pshared); > + return err; > + } > + > do > { > unsigned int futex_val = cond->__data.__futex; > > correct? Yes, that looks like what I had in mind. But on second thought I'm not sure it should be necessary; I missed the internal lock that's being taken before the mutex is unlocked. > I saw not NPTL issues in nether powerpc64 or x86_64. You mean you can't reproduce the issue on these targets? Maybe it's only present on targets that are using non-default versions of the code. If so, and if the offending target-specific versions have been removed, maybe the bug is fixed now? I haven't had a chance to try the test case on latest glibc, but the bug was present on x86 (32-bit) last time I checked. It would be nice if this bug has already been fixed without taking any specific action to do so... Rich
On Mon, 2014-08-18 at 17:27 -0400, Rich Felker wrote: > On Mon, Aug 18, 2014 at 05:51:53PM -0300, Adhemerval Zanella wrote: > > On 18-08-2014 16:53, Rich Felker wrote: > > > A couple days ago I posted ideas for a fix for this issue on the bug > > > tracker: > > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=13165#c38 > > > > > > Anybody who does glibc development/builds/testing up for trying my > > > idea and seeing if it works? > > > > > > Rich > > > > > If I understood correctly, you are proposing something like: > > > > diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c > > index fc5eac4..a16c5d5 100644 > > --- a/nptl/pthread_cond_wait.c > > +++ b/nptl/pthread_cond_wait.c > > @@ -118,14 +118,6 @@ __pthread_cond_wait (cond, mutex) > > /* Make sure we are alone. */ > > lll_lock (cond->__data.__lock, pshared); > > > > - /* Now we can release the mutex. */ > > - err = __pthread_mutex_unlock_usercnt (mutex, 0); > > - if (__glibc_unlikely (err)) > > - { > > - lll_unlock (cond->__data.__lock, pshared); > > - return err; > > - } > > - > > /* We have one new user of the condvar. */ > > ++cond->__data.__total_seq; > > ++cond->__data.__futex; > > @@ -153,6 +145,14 @@ __pthread_cond_wait (cond, mutex) > > /* Remember the broadcast counter. */ > > cbuffer.bc_seq = cond->__data.__broadcast_seq; > > > > + /* Now we can release the mutex. */ > > + err = __pthread_mutex_unlock_usercnt (mutex, 0); > > + if (__glibc_unlikely (err)) > > + { > > + lll_unlock (cond->__data.__lock, pshared); > > + return err; > > + } > > + > > do > > { > > unsigned int futex_val = cond->__data.__futex; > > > > correct? > > Yes, that looks like what I had in mind. But on second thought I'm not > sure it should be necessary; I missed the internal lock that's being > taken before the mutex is unlocked. > > > I saw not NPTL issues in nether powerpc64 or x86_64. > > You mean you can't reproduce the issue on these targets? Maybe it's > only present on targets that are using non-default versions of the > code. If so, and if the offending target-specific versions have been > removed, maybe the bug is fixed now? I haven't had a chance to try the > test case on latest glibc, but the bug was present on x86 (32-bit) > last time I checked. > > It would be nice if this bug has already been fixed without taking any > specific action to do so... The bug is not fixed. It is an algorithmic issue -- though of course it may or may not trigger in a particular execution depending on the respective interleavings.
On 18-08-2014 19:18, Torvald Riegel wrote: > On Mon, 2014-08-18 at 17:27 -0400, Rich Felker wrote: >> On Mon, Aug 18, 2014 at 05:51:53PM -0300, Adhemerval Zanella wrote: >>> On 18-08-2014 16:53, Rich Felker wrote: >>>> A couple days ago I posted ideas for a fix for this issue on the bug >>>> tracker: >>>> >>>> https://sourceware.org/bugzilla/show_bug.cgi?id=13165#c38 >>>> >>>> Anybody who does glibc development/builds/testing up for trying my >>>> idea and seeing if it works? >>>> >>>> Rich >>>> >>> If I understood correctly, you are proposing something like: >>> >>> diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c >>> index fc5eac4..a16c5d5 100644 >>> --- a/nptl/pthread_cond_wait.c >>> +++ b/nptl/pthread_cond_wait.c >>> @@ -118,14 +118,6 @@ __pthread_cond_wait (cond, mutex) >>> /* Make sure we are alone. */ >>> lll_lock (cond->__data.__lock, pshared); >>> >>> - /* Now we can release the mutex. */ >>> - err = __pthread_mutex_unlock_usercnt (mutex, 0); >>> - if (__glibc_unlikely (err)) >>> - { >>> - lll_unlock (cond->__data.__lock, pshared); >>> - return err; >>> - } >>> - >>> /* We have one new user of the condvar. */ >>> ++cond->__data.__total_seq; >>> ++cond->__data.__futex; >>> @@ -153,6 +145,14 @@ __pthread_cond_wait (cond, mutex) >>> /* Remember the broadcast counter. */ >>> cbuffer.bc_seq = cond->__data.__broadcast_seq; >>> >>> + /* Now we can release the mutex. */ >>> + err = __pthread_mutex_unlock_usercnt (mutex, 0); >>> + if (__glibc_unlikely (err)) >>> + { >>> + lll_unlock (cond->__data.__lock, pshared); >>> + return err; >>> + } >>> + >>> do >>> { >>> unsigned int futex_val = cond->__data.__futex; >>> >>> correct? >> Yes, that looks like what I had in mind. But on second thought I'm not >> sure it should be necessary; I missed the internal lock that's being >> taken before the mutex is unlocked. >> >>> I saw not NPTL issues in nether powerpc64 or x86_64. >> You mean you can't reproduce the issue on these targets? Maybe it's >> only present on targets that are using non-default versions of the >> code. If so, and if the offending target-specific versions have been >> removed, maybe the bug is fixed now? I haven't had a chance to try the >> test case on latest glibc, but the bug was present on x86 (32-bit) >> last time I checked. >> >> It would be nice if this bug has already been fixed without taking any >> specific action to do so... > The bug is not fixed. It is an algorithmic issue -- though of course it > may or may not trigger in a particular execution depending on the > respective interleavings. > I just noted I wasn't specific, by 'not issues' I meant NPTL testcase ran fine, however the testcase from bug stills fails.
diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c index fc5eac4..a16c5d5 100644 --- a/nptl/pthread_cond_wait.c +++ b/nptl/pthread_cond_wait.c @@ -118,14 +118,6 @@ __pthread_cond_wait (cond, mutex) /* Make sure we are alone. */ lll_lock (cond->__data.__lock, pshared); - /* Now we can release the mutex. */ - err = __pthread_mutex_unlock_usercnt (mutex, 0); - if (__glibc_unlikely (err)) - { - lll_unlock (cond->__data.__lock, pshared); - return err; - } - /* We have one new user of the condvar. */ ++cond->__data.__total_seq; ++cond->__data.__futex; @@ -153,6 +145,14 @@ __pthread_cond_wait (cond, mutex) /* Remember the broadcast counter. */ cbuffer.bc_seq = cond->__data.__broadcast_seq; + /* Now we can release the mutex. */ + err = __pthread_mutex_unlock_usercnt (mutex, 0); + if (__glibc_unlikely (err)) + { + lll_unlock (cond->__data.__lock, pshared); + return err; + } + do { unsigned int futex_val = cond->__data.__futex;