Message ID | 535A078F.3050003@huawei.com |
---|---|
State | New |
Headers | show |
On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: > Hi, > > > I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found > there are more than one threads can access shared data in pthread_cond_timedwait. > > I added print messages as follow code: > > --- libc/nptl/pthread_cond_timedwait.c > +++ libc/nptl/pthread_cond_timedwait.c > @@ -34,6 +34,7 @@ > #else > # include <bits/libc-vdso.h> > #endif > +#include <stdio.h> > > /* Cleanup handler, defined in pthread_cond_wait.c. */ > extern void __condvar_cleanup (void *arg) > @@ -235,7 +239,9 @@ > > bc_out: > > +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); > cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; > +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); > > /* If pthread_cond_destroy was called on this variable already, > notify the pthread_cond_destroy caller all waiters have left > > > I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. > Here is the result: > > start do sub :45, lock:1 0xb6d9a460 > end do sub :43, lock:1 0xb6d9a460 > start do sub :43, lock:1 0xb6d9e460 > end do sub :41, lock:2 0xb6d9e460 > start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data > start do sub :41, lock:1 0xb6daa460 > end do sub :39, lock:2 0xb6daa460 > start do sub :39, lock:2 0xb6de6460 > end do sub :37, lock:2 0xb6de6460 > start do sub :37, lock:2 0xb6db6460 > end do sub :35, lock:2 0xb6db6460 > start do sub :35, lock:2 0xb6dc2460 > end do sub :33, lock:2 0xb6dc2460 > end do sub :37, lock:2 0xb6dbe460 > start do sub :33, lock:2 0xb6dc6460 > end do sub :31, lock:0 0xb6dc6460 > start do sub :31, lock:2 0xb6dae460 > end do sub :29, lock:2 0xb6dae460 > start do sub :29, lock:2 0xb6db2460 > end do sub :27, lock:2 0xb6db2460 > start do sub :27, lock:2 0xb6dba460 > end do sub :25, lock:2 0xb6dba460 > start do sub :25, lock:2 0xb6da2460 > end do sub :23, lock:2 0xb6da2460 > > Is lll_lock (cond->__data.__lock, pshared) failed? > > pshared is LLL_SHARED. I have had a quick look at this and there is no obvious reason I can see for this behaviour, unless there is some way that IO buffering could cause the messages to be strangely interleaved. The other alternative that may be worth investigating is whether or not ldrex/strex is working correctly in your SMP system.
On 2014/4/25 17:43, Will Newton wrote: > On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: >> Hi, >> >> >> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found >> there are more than one threads can access shared data in pthread_cond_timedwait. >> >> I added print messages as follow code: >> >> --- libc/nptl/pthread_cond_timedwait.c >> +++ libc/nptl/pthread_cond_timedwait.c >> @@ -34,6 +34,7 @@ >> #else >> # include <bits/libc-vdso.h> >> #endif >> +#include <stdio.h> >> >> /* Cleanup handler, defined in pthread_cond_wait.c. */ >> extern void __condvar_cleanup (void *arg) >> @@ -235,7 +239,9 @@ >> >> bc_out: >> >> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >> cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; >> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >> >> /* If pthread_cond_destroy was called on this variable already, >> notify the pthread_cond_destroy caller all waiters have left >> >> >> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. >> Here is the result: >> >> start do sub :45, lock:1 0xb6d9a460 >> end do sub :43, lock:1 0xb6d9a460 >> start do sub :43, lock:1 0xb6d9e460 >> end do sub :41, lock:2 0xb6d9e460 >> start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data >> start do sub :41, lock:1 0xb6daa460 >> end do sub :39, lock:2 0xb6daa460 >> start do sub :39, lock:2 0xb6de6460 >> end do sub :37, lock:2 0xb6de6460 >> start do sub :37, lock:2 0xb6db6460 >> end do sub :35, lock:2 0xb6db6460 >> start do sub :35, lock:2 0xb6dc2460 >> end do sub :33, lock:2 0xb6dc2460 >> end do sub :37, lock:2 0xb6dbe460 >> start do sub :33, lock:2 0xb6dc6460 >> end do sub :31, lock:0 0xb6dc6460 >> start do sub :31, lock:2 0xb6dae460 >> end do sub :29, lock:2 0xb6dae460 >> start do sub :29, lock:2 0xb6db2460 >> end do sub :27, lock:2 0xb6db2460 >> start do sub :27, lock:2 0xb6dba460 >> end do sub :25, lock:2 0xb6dba460 >> start do sub :25, lock:2 0xb6da2460 >> end do sub :23, lock:2 0xb6da2460 >> >> Is lll_lock (cond->__data.__lock, pshared) failed? >> >> pshared is LLL_SHARED. > > I have had a quick look at this and there is no obvious reason I can > see for this behaviour, unless there is some way that IO buffering > could cause the messages to be strangely interleaved. The other > alternative that may be worth investigating is whether or not > ldrex/strex is working correctly in your SMP system. > After doing some investigation, it looks like atomic_compare_and_exchange_val_acq not doing atomic. So two threads can both acquire lock when futex is 0. Is there something wrong in atomic_compare_and_exchange_val_acq ? #define __lll_lock(futex, private) \ ((void) ({ \ int *__futex = (futex); \ if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex, \ 1, 0), 0)) \ { \ if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ __lll_lock_wait_private (__futex); \ else \ __lll_lock_wait (__futex, private); \ } \ }))
On 2014/4/26 14:45, Yang Yingliang wrote: > On 2014/4/25 17:43, Will Newton wrote: >> On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: >>> Hi, >>> >>> >>> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found >>> there are more than one threads can access shared data in pthread_cond_timedwait. >>> >>> I added print messages as follow code: >>> >>> --- libc/nptl/pthread_cond_timedwait.c >>> +++ libc/nptl/pthread_cond_timedwait.c >>> @@ -34,6 +34,7 @@ >>> #else >>> # include <bits/libc-vdso.h> >>> #endif >>> +#include <stdio.h> >>> >>> /* Cleanup handler, defined in pthread_cond_wait.c. */ >>> extern void __condvar_cleanup (void *arg) >>> @@ -235,7 +239,9 @@ >>> >>> bc_out: >>> >>> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >>> cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; >>> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >>> >>> /* If pthread_cond_destroy was called on this variable already, >>> notify the pthread_cond_destroy caller all waiters have left >>> >>> >>> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. >>> Here is the result: >>> >>> start do sub :45, lock:1 0xb6d9a460 >>> end do sub :43, lock:1 0xb6d9a460 >>> start do sub :43, lock:1 0xb6d9e460 >>> end do sub :41, lock:2 0xb6d9e460 >>> start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data >>> start do sub :41, lock:1 0xb6daa460 >>> end do sub :39, lock:2 0xb6daa460 >>> start do sub :39, lock:2 0xb6de6460 >>> end do sub :37, lock:2 0xb6de6460 >>> start do sub :37, lock:2 0xb6db6460 >>> end do sub :35, lock:2 0xb6db6460 >>> start do sub :35, lock:2 0xb6dc2460 >>> end do sub :33, lock:2 0xb6dc2460 >>> end do sub :37, lock:2 0xb6dbe460 >>> start do sub :33, lock:2 0xb6dc6460 >>> end do sub :31, lock:0 0xb6dc6460 >>> start do sub :31, lock:2 0xb6dae460 >>> end do sub :29, lock:2 0xb6dae460 >>> start do sub :29, lock:2 0xb6db2460 >>> end do sub :27, lock:2 0xb6db2460 >>> start do sub :27, lock:2 0xb6dba460 >>> end do sub :25, lock:2 0xb6dba460 >>> start do sub :25, lock:2 0xb6da2460 >>> end do sub :23, lock:2 0xb6da2460 >>> >>> Is lll_lock (cond->__data.__lock, pshared) failed? >>> >>> pshared is LLL_SHARED. >> >> I have had a quick look at this and there is no obvious reason I can >> see for this behaviour, unless there is some way that IO buffering >> could cause the messages to be strangely interleaved. The other >> alternative that may be worth investigating is whether or not >> ldrex/strex is working correctly in your SMP system. >> > > After doing some investigation, it looks like atomic_compare_and_exchange_val_acq > not doing atomic. So two threads can both acquire lock when futex is 0. Is there > something wrong in atomic_compare_and_exchange_val_acq ? > > #define __lll_lock(futex, private) \ > ((void) ({ \ > int *__futex = (futex); \ > if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex, \ > 1, 0), 0)) \ > { \ > if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ > __lll_lock_wait_private (__futex); \ > else \ > __lll_lock_wait (__futex, private); \ > } \ > })) > I noticed that atomic_compare_and_exchange_val_acq are not atomic for ARM in glibc-2.18. I will try glibc-2.19.
On 26 April 2014 10:48, Yang Yingliang <yangyingliang@huawei.com> wrote: > On 2014/4/26 14:45, Yang Yingliang wrote: >> On 2014/4/25 17:43, Will Newton wrote: >>> On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: >>>> Hi, >>>> >>>> >>>> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found >>>> there are more than one threads can access shared data in pthread_cond_timedwait. >>>> >>>> I added print messages as follow code: >>>> >>>> --- libc/nptl/pthread_cond_timedwait.c >>>> +++ libc/nptl/pthread_cond_timedwait.c >>>> @@ -34,6 +34,7 @@ >>>> #else >>>> # include <bits/libc-vdso.h> >>>> #endif >>>> +#include <stdio.h> >>>> >>>> /* Cleanup handler, defined in pthread_cond_wait.c. */ >>>> extern void __condvar_cleanup (void *arg) >>>> @@ -235,7 +239,9 @@ >>>> >>>> bc_out: >>>> >>>> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >>>> cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; >>>> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >>>> >>>> /* If pthread_cond_destroy was called on this variable already, >>>> notify the pthread_cond_destroy caller all waiters have left >>>> >>>> >>>> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. >>>> Here is the result: >>>> >>>> start do sub :45, lock:1 0xb6d9a460 >>>> end do sub :43, lock:1 0xb6d9a460 >>>> start do sub :43, lock:1 0xb6d9e460 >>>> end do sub :41, lock:2 0xb6d9e460 >>>> start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data >>>> start do sub :41, lock:1 0xb6daa460 >>>> end do sub :39, lock:2 0xb6daa460 >>>> start do sub :39, lock:2 0xb6de6460 >>>> end do sub :37, lock:2 0xb6de6460 >>>> start do sub :37, lock:2 0xb6db6460 >>>> end do sub :35, lock:2 0xb6db6460 >>>> start do sub :35, lock:2 0xb6dc2460 >>>> end do sub :33, lock:2 0xb6dc2460 >>>> end do sub :37, lock:2 0xb6dbe460 >>>> start do sub :33, lock:2 0xb6dc6460 >>>> end do sub :31, lock:0 0xb6dc6460 >>>> start do sub :31, lock:2 0xb6dae460 >>>> end do sub :29, lock:2 0xb6dae460 >>>> start do sub :29, lock:2 0xb6db2460 >>>> end do sub :27, lock:2 0xb6db2460 >>>> start do sub :27, lock:2 0xb6dba460 >>>> end do sub :25, lock:2 0xb6dba460 >>>> start do sub :25, lock:2 0xb6da2460 >>>> end do sub :23, lock:2 0xb6da2460 >>>> >>>> Is lll_lock (cond->__data.__lock, pshared) failed? >>>> >>>> pshared is LLL_SHARED. >>> >>> I have had a quick look at this and there is no obvious reason I can >>> see for this behaviour, unless there is some way that IO buffering >>> could cause the messages to be strangely interleaved. The other >>> alternative that may be worth investigating is whether or not >>> ldrex/strex is working correctly in your SMP system. >>> >> >> After doing some investigation, it looks like atomic_compare_and_exchange_val_acq >> not doing atomic. So two threads can both acquire lock when futex is 0. Is there >> something wrong in atomic_compare_and_exchange_val_acq ? >> >> #define __lll_lock(futex, private) \ >> ((void) ({ \ >> int *__futex = (futex); \ >> if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex, \ >> 1, 0), 0)) \ >> { \ >> if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ >> __lll_lock_wait_private (__futex); \ >> else \ >> __lll_lock_wait (__futex, private); \ >> } \ >> })) >> > > I noticed that atomic_compare_and_exchange_val_acq are not atomic for ARM in glibc-2.18. > I will try glibc-2.19. Which version of gcc are you using? Does the disassembly of the code look reasonably correct i.e. ldrex/strex?
On Sat, 2014-04-26 at 17:48 +0800, Yang Yingliang wrote: > On 2014/4/26 14:45, Yang Yingliang wrote: > > On 2014/4/25 17:43, Will Newton wrote: > >> On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: > >>> Hi, > >>> > >>> > >>> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found > >>> there are more than one threads can access shared data in pthread_cond_timedwait. > >>> > >>> I added print messages as follow code: > >>> > >>> --- libc/nptl/pthread_cond_timedwait.c > >>> +++ libc/nptl/pthread_cond_timedwait.c > >>> @@ -34,6 +34,7 @@ > >>> #else > >>> # include <bits/libc-vdso.h> > >>> #endif > >>> +#include <stdio.h> > >>> > >>> /* Cleanup handler, defined in pthread_cond_wait.c. */ > >>> extern void __condvar_cleanup (void *arg) > >>> @@ -235,7 +239,9 @@ > >>> > >>> bc_out: > >>> > >>> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); > >>> cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; > >>> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); > >>> > >>> /* If pthread_cond_destroy was called on this variable already, > >>> notify the pthread_cond_destroy caller all waiters have left > >>> > >>> > >>> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. > >>> Here is the result: > >>> > >>> start do sub :45, lock:1 0xb6d9a460 > >>> end do sub :43, lock:1 0xb6d9a460 > >>> start do sub :43, lock:1 0xb6d9e460 > >>> end do sub :41, lock:2 0xb6d9e460 > >>> start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data > >>> start do sub :41, lock:1 0xb6daa460 > >>> end do sub :39, lock:2 0xb6daa460 > >>> start do sub :39, lock:2 0xb6de6460 > >>> end do sub :37, lock:2 0xb6de6460 > >>> start do sub :37, lock:2 0xb6db6460 > >>> end do sub :35, lock:2 0xb6db6460 > >>> start do sub :35, lock:2 0xb6dc2460 > >>> end do sub :33, lock:2 0xb6dc2460 > >>> end do sub :37, lock:2 0xb6dbe460 > >>> start do sub :33, lock:2 0xb6dc6460 > >>> end do sub :31, lock:0 0xb6dc6460 > >>> start do sub :31, lock:2 0xb6dae460 > >>> end do sub :29, lock:2 0xb6dae460 > >>> start do sub :29, lock:2 0xb6db2460 > >>> end do sub :27, lock:2 0xb6db2460 > >>> start do sub :27, lock:2 0xb6dba460 > >>> end do sub :25, lock:2 0xb6dba460 > >>> start do sub :25, lock:2 0xb6da2460 > >>> end do sub :23, lock:2 0xb6da2460 > >>> > >>> Is lll_lock (cond->__data.__lock, pshared) failed? > >>> > >>> pshared is LLL_SHARED. > >> > >> I have had a quick look at this and there is no obvious reason I can > >> see for this behaviour, unless there is some way that IO buffering > >> could cause the messages to be strangely interleaved. The other > >> alternative that may be worth investigating is whether or not > >> ldrex/strex is working correctly in your SMP system. > >> > > > > After doing some investigation, it looks like atomic_compare_and_exchange_val_acq > > not doing atomic. So two threads can both acquire lock when futex is 0. Is there > > something wrong in atomic_compare_and_exchange_val_acq ? > > > > #define __lll_lock(futex, private) \ > > ((void) ({ \ > > int *__futex = (futex); \ > > if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex, \ > > 1, 0), 0)) \ > > { \ > > if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ > > __lll_lock_wait_private (__futex); \ > > else \ > > __lll_lock_wait (__futex, private); \ > > } \ > > })) > > > > I noticed that atomic_compare_and_exchange_val_acq are not atomic for ARM in glibc-2.18. > I will try glibc-2.19. Could you share the full test case for this, please? Did you just test whether some mutex can be acquired several times?
--- libc/nptl/pthread_cond_timedwait.c +++ libc/nptl/pthread_cond_timedwait.c @@ -34,6 +34,7 @@ #else # include <bits/libc-vdso.h> #endif +#include <stdio.h> /* Cleanup handler, defined in pthread_cond_wait.c. */ extern void __condvar_cleanup (void *arg) @@ -235,7 +239,9 @@ bc_out: +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); /* If pthread_cond_destroy was called on this variable already, notify the pthread_cond_destroy caller all waiters have left