Message ID | 534710B9.6010303@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On Thu, 2014-04-10 at 18:44 -0300, Adhemerval Zanella wrote: > On 10-04-2014 16:57, Torvald Riegel wrote: > > On Mon, 2014-04-07 at 17:17 -0300, Adhemerval Zanella wrote: > >> Digging up on current code and the unified one, I noted PowerPC one is currently doing load+and+store > >> condition followed by a full barrier. > > Hmm, the code I see is using __lll_acq_instr, which is an isync, which > > AFAICT is not a full barrier (hwsync). > >> This is overkill for some scenarios: if the initialization is > >> done (val & 2) we don't need a full barrier (isync), since the requirement (as Tovalds has explained) > >> is just a load acquire. > > The atomic_read_barrier used in the unified code seems to result in an > > lwsync. Is that faster than an isync? If not, the overhead may be due > > to something else. > > You are correct, bad wording from me. In fact the issue here ir, from some profiling, the load/store > with *reservation*, the isync is fact necessary in current case as acquire barrier. The atomic > read/store shows a better performance for POWER. And yes, lwsync is faster than isync (show below > by the experiment with load acquire/store release). Okay. If lwsync is preferable for an acquire load, it might be best to check the GCC atomics implementation because I believe it's following http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html, which suggests isync for acquire loads. > > > > >> I ran make check and results looks good. I also checked with GCC issues that originated the fix > >> that added the release barrier in the code (gcc.gnu.org/bugzilla/show_bug.cgi?id=52839#c10) and > >> it does not show the issue with new implementation. > > Thanks for doing the tests. Would you want to remove the powerpc > > implementation after this patch has been committed? > > Yes I planning to do it. Thanks. I've just committed the patch.
diff --git a/nptl/sysdeps/unix/sysv/linux/pthread_once.c b/nptl/sysdeps/unix/sysv/linux/pthread_once.c index 94e2427..a67ef67 100644 --- a/nptl/sysdeps/unix/sysv/linux/pthread_once.c +++ b/nptl/sysdeps/unix/sysv/linux/pthread_once.c @@ -63,8 +63,19 @@ __pthread_once (pthread_once_t *once_control, void (*init_routine) (void)) /* We need acquire memory order for this load because if the value signals that initialization has finished, we need to be see any data modifications done during initialization. */ +#if 0 val = *once_control; atomic_read_barrier(); +#endif + asm volatile ( + "ld %0,%1\n\t" + "cmpw 7,%0,%0\n\t" + "bne- 7,$+4\n\t" + "isync\n\t" + : "=r" (val) + : "m"(*once_control) + : "cr7", "memory"); + do { /* Check if the initialization has already been done. */ @@ -112,8 +123,14 @@ __pthread_once (pthread_once_t *once_control, void (*init_routine) (void)) /* Mark *once_control as having finished the initialization. We need release memory order here because we need to synchronize with other threads that want to use the initialized data. */ +#if 0 atomic_write_barrier(); *once_control = 2; +#endif + asm volatile ( + "lwsync\n\t" + "std %0,%1\n\t" + : : "r"(2), "m"(*once_control) : "memory"); /* Wake up all other threads. */ lll_futex_wake (once_control, INT_MAX, LLL_PRIVATE);