Message ID | 55BBD3E9.8040005@linaro.org |
---|---|
State | New |
Headers | show |
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > On 31-07-2015 11:57, Tulio Magno Quites Machado Filho wrote: >> Let's say t1 is the first one to read rwlock->__data fields and they're all 0. >> t1 will set is_lock_free to 0. > > I believe you mean is_lock_free will be set to '1' if all fields are '0' (since > the lock will not be held). Yes! I'm sorry. > I see now a window where the pthread_rwlock_t internal structures might have > concurrent access and the transaction is never finished. However I do not > think this is a powerpc specific and thus I think it should be fixed in generic > algorithm instead. Something like this: > > ... > > I did not fix the x86_64 code, which would require some adjustments. I checked > on ppc64le and make nptl/tests passes. I didn't test your patch, but after reading it, I'm sure it does fix the problem too. > It would be good if you could create a > testcase to stress this issue (and I do see this could be quite hard since it is > very timing dependent). Yes. I tried, but I didn't like the idea of creating a testcase that would only provide intermittent failures. Especially because it's very difficult to reproduce this failure. Any suggestions? > The problem with this approach is it couple the elide macros with rdlock > fields, and I think the idea would make this header more generic for multiple > elide algorithms. Suggestions? And that's why I preferred to modify only the powerpc implementation. Both proposals are identical, in the sense that they move the memory access to the transaction. I'd appreciate if you could explain why you prefer to change the argument list of ELIDE_LOCK instead of the powerpc specific code.
On 31-07-2015 18:44, Tulio Magno Quites Machado Filho wrote: > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > >> On 31-07-2015 11:57, Tulio Magno Quites Machado Filho wrote: >>> Let's say t1 is the first one to read rwlock->__data fields and they're all 0. >>> t1 will set is_lock_free to 0. >> >> I believe you mean is_lock_free will be set to '1' if all fields are '0' (since >> the lock will not be held). > > Yes! I'm sorry. > >> I see now a window where the pthread_rwlock_t internal structures might have >> concurrent access and the transaction is never finished. However I do not >> think this is a powerpc specific and thus I think it should be fixed in generic >> algorithm instead. Something like this: >> >> ... >> >> I did not fix the x86_64 code, which would require some adjustments. I checked >> on ppc64le and make nptl/tests passes. > > I didn't test your patch, but after reading it, I'm sure it does fix the > problem too. > >> It would be good if you could create a >> testcase to stress this issue (and I do see this could be quite hard since it is >> very timing dependent). > > Yes. > I tried, but I didn't like the idea of creating a testcase that would only > provide intermittent failures. Especially because it's very difficult to > reproduce this failure. > Any suggestions? > >> The problem with this approach is it couple the elide macros with rdlock >> fields, and I think the idea would make this header more generic for multiple >> elide algorithms. Suggestions? > > And that's why I preferred to modify only the powerpc implementation. > Both proposals are identical, in the sense that they move the memory access > to the transaction. > > I'd appreciate if you could explain why you prefer to change the argument list > of ELIDE_LOCK instead of the powerpc specific code. > Mainly because I see this issue is not powerpc specific and I think we should also fix for x86 (that currently also uses the same mechanism) and for future arches that also might potentially implement lock elision using this.
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > Mainly because I see this issue is not powerpc specific and I think we should > also fix for x86 (that currently also uses the same mechanism) and for future > arches that also might potentially implement lock elision using this. AFAICS, this is powerpc specific. x86 ensures the memory access happens inside the transaction. Here is the x86 implementation: 50 /* is_lock_free must be executed inside the transaction */ 51 52 /* Returns true if lock defined by IS_LOCK_FREE was elided. 53 ADAPT_COUNT is a pointer to per-lock state variable. */ 54 55 #define ELIDE_LOCK(adapt_count, is_lock_free) \ 56 ({ \ 57 int ret = 0; \ 58 \ 59 if ((adapt_count) <= 0) \ 60 { \ 61 for (int i = __elision_aconf.retry_try_xbegin; i > 0; i--) \ 62 { \ 63 unsigned int status; \ 64 if ((status = _xbegin ()) == _XBEGIN_STARTED) \ 65 { \ 66 if (is_lock_free) \ 67 { \ 68 ret = 1; \ 69 break; \ 70 } \ 71 _xabort (_ABORT_LOCK_BUSY); \ 72 } \ 73 if (!elision_adapt (&(adapt_count), status)) \ 74 break; \ 75 } \ 76 } \ 77 else \ 78 (adapt_count)--; /* missing updates ok */ \ 79 ret; \ 80 })
> Em 31/07/2015, às 20:05, Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com> escreveu: > > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > >> Mainly because I see this issue is not powerpc specific and I think we should >> also fix for x86 (that currently also uses the same mechanism) and for future >> arches that also might potentially implement lock elision using this. > > AFAICS, this is powerpc specific. x86 ensures the memory access happens > inside the transaction. > > Here is the x86 implementation: > Right, x86 is evaluated as a macro. Indeed your initial approach seems to be better indeed. Thanks for catching it. I would suggest add a comment similar to x86 as well. > 50 /* is_lock_free must be executed inside the transaction */ > 51 > 52 /* Returns true if lock defined by IS_LOCK_FREE was elided. > 53 ADAPT_COUNT is a pointer to per-lock state variable. */ > 54 > 55 #define ELIDE_LOCK(adapt_count, is_lock_free) \ > 56 ({ \ > 57 int ret = 0; \ > 58 \ > 59 if ((adapt_count) <= 0) \ > 60 { \ > 61 for (int i = __elision_aconf.retry_try_xbegin; i > 0; i--) \ > 62 { \ > 63 unsigned int status; \ > 64 if ((status = _xbegin ()) == _XBEGIN_STARTED) \ > 65 { \ > 66 if (is_lock_free) \ > 67 { \ > 68 ret = 1; \ > 69 break; \ > 70 } \ > 71 _xabort (_ABORT_LOCK_BUSY); \ > 72 } \ > 73 if (!elision_adapt (&(adapt_count), status)) \ > 74 break; \ > 75 } \ > 76 } \ > 77 else \ > 78 (adapt_count)--; /* missing updates ok */ \ > 79 ret; \ > 80 }) > > -- > Tulio Magno >
On Sat, 2015-08-01 at 09:23 -0300, Adhemerval Zanella wrote: > > > > > Em 31/07/2015, às 20:05, Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com> escreveu: > > > > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > > > >> Mainly because I see this issue is not powerpc specific and I think we should > >> also fix for x86 (that currently also uses the same mechanism) and for future > >> arches that also might potentially implement lock elision using this. > > > > AFAICS, this is powerpc specific. x86 ensures the memory access happens > > inside the transaction. > > > > Here is the x86 implementation: > > > > Right, x86 is evaluated as a macro. Indeed your initial approach seems to be better indeed. Thanks for catching it. I would suggest add a comment similar to x86 as well. > > > 50 /* is_lock_free must be executed inside the transaction */ > > 51 I think "evaluated" is better in this case.
diff --git a/nptl/pthread_rwlock_rdlock.c b/nptl/pthread_rwlock_rdlock.c index eb7ac8d..3756bb6 100644 --- a/nptl/pthread_rwlock_rdlock.c +++ b/nptl/pthread_rwlock_rdlock.c @@ -126,9 +126,9 @@ __pthread_rwlock_rdlock (pthread_rwlock_t *rwlock) LIBC_PROBE (rdlock_entry, 1, rwlock); if (ELIDE_LOCK (rwlock->__data.__rwelision, - rwlock->__data.__lock == 0 - && rwlock->__data.__writer == 0 - && rwlock->__data.__nr_readers == 0)) + rwlock->__data.__lock, + rwlock->__data.__writer, + rwlock->__data.__nr_readers)) return 0; /* Make sure we are alone. */ diff --git a/nptl/pthread_rwlock_tryrdlock.c b/nptl/pthread_rwlock_tryrdlock.c index 256188a..bb199e4 100644 --- a/nptl/pthread_rwlock_tryrdlock.c +++ b/nptl/pthread_rwlock_tryrdlock.c @@ -33,9 +33,9 @@ __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock) rwlock->__data.__shared == LLL_PRIVATE ? FUTEX_PRIVATE : FUTEX_SHARED; if (ELIDE_TRYLOCK (rwlock->__data.__rwelision, - rwlock->__data.__lock == 0 - && rwlock->__data.__nr_readers == 0 - && rwlock->__data.__writer, 0)) + rwlock->__data.__lock, + rwlock->__data.__writer, + rwlock->__data.__nr_readers, 0)) return 0; lll_lock (rwlock->__data.__lock, rwlock->__data.__shared); diff --git a/nptl/pthread_rwlock_trywrlock.c b/nptl/pthread_rwlock_trywrlock.c index 918a8f2..49c6757 100644 --- a/nptl/pthread_rwlock_trywrlock.c +++ b/nptl/pthread_rwlock_trywrlock.c @@ -28,9 +28,9 @@ __pthread_rwlock_trywrlock (pthread_rwlock_t *rwlock) int result = EBUSY; if (ELIDE_TRYLOCK (rwlock->__data.__rwelision, - rwlock->__data.__lock == 0 - && rwlock->__data.__nr_readers == 0 - && rwlock->__data.__writer, 1)) + rwlock->__data.__lock, + rwlock->__data.__writer, + rwlock->__data.__nr_readers, 1)) return 0; lll_lock (rwlock->__data.__lock, rwlock->__data.__shared); diff --git a/nptl/pthread_rwlock_unlock.c b/nptl/pthread_rwlock_unlock.c index bdd115d..0946d0c 100644 --- a/nptl/pthread_rwlock_unlock.c +++ b/nptl/pthread_rwlock_unlock.c @@ -35,8 +35,7 @@ __pthread_rwlock_unlock (pthread_rwlock_t *rwlock) LIBC_PROBE (rwlock_unlock, 1, rwlock); - if (ELIDE_UNLOCK (rwlock->__data.__writer == 0 - && rwlock->__data.__nr_readers == 0)) + if (ELIDE_UNLOCK (rwlock->__data.__writer, rwlock->__data.__nr_readers)) return 0; lll_lock (rwlock->__data.__lock, rwlock->__data.__shared); diff --git a/nptl/pthread_rwlock_wrlock.c b/nptl/pthread_rwlock_wrlock.c index 60fa909..0161876 100644 --- a/nptl/pthread_rwlock_wrlock.c +++ b/nptl/pthread_rwlock_wrlock.c @@ -98,9 +98,9 @@ __pthread_rwlock_wrlock (pthread_rwlock_t *rwlock) LIBC_PROBE (wrlock_entry, 1, rwlock); if (ELIDE_LOCK (rwlock->__data.__rwelision, - rwlock->__data.__lock == 0 - && rwlock->__data.__writer == 0 - && rwlock->__data.__nr_readers == 0)) + rwlock->__data.__lock, + rwlock->__data.__writer, + rwlock->__data.__nr_readers)) return 0; /* Make sure we are alone. */ diff --git a/sysdeps/generic/elide.h b/sysdeps/generic/elide.h index 80a3a03..64d9cce 100644 --- a/sysdeps/generic/elide.h +++ b/sysdeps/generic/elide.h @@ -15,11 +15,12 @@ You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see <http://www.gnu.org/licenses/>. */ + #ifndef ELIDE_H #define ELIDE_H 1 -#define ELIDE_LOCK(adapt_count, is_lock_free) 0 -#define ELIDE_TRYLOCK(adapt_count, is_lock_free, write) 0 -#define ELIDE_UNLOCK(is_lock_free) 0 +#define ELIDE_LOCK(adapt_count, lock, writer, readers) 0 +#define ELIDE_TRYLOCK(adapt_count, lock, writer, readers, write) 0 +#define ELIDE_UNLOCK(writer, readers) 0 #endif diff --git a/sysdeps/powerpc/nptl/elide.h b/sysdeps/powerpc/nptl/elide.h index 389f5a5..b3cc50a 100644 --- a/sysdeps/powerpc/nptl/elide.h +++ b/sysdeps/powerpc/nptl/elide.h @@ -27,7 +27,7 @@ ADAPT_COUNT is a pointer to per-lock state variable. */ static inline bool -__elide_lock (uint8_t *adapt_count, int is_lock_free) +__elide_lock (uint8_t *adapt_count, int *lock, int *writer, unsigned int *readers) { if (*adapt_count > 0) { @@ -39,7 +39,10 @@ __elide_lock (uint8_t *adapt_count, int is_lock_free) { if (__builtin_tbegin (0)) { - if (is_lock_free) + /* The compiler barrier is required because some GCC version might + reorder the lock read before the transaction init builtin. */ + asm volatile("" ::: "memory"); + if ((*lock == 0) && (*writer == 0) && (*readers == 0)) return true; /* Lock was busy. */ __builtin_tabort (_ABORT_LOCK_BUSY); @@ -66,30 +69,31 @@ __elide_lock (uint8_t *adapt_count, int is_lock_free) return false; } -# define ELIDE_LOCK(adapt_count, is_lock_free) \ - __elide_lock (&(adapt_count), is_lock_free) +# define ELIDE_LOCK(adapt_count, lock, writer, readers) \ + __elide_lock (&(adapt_count), &(lock), &(writer), &(readers)) static inline bool -__elide_trylock (uint8_t *adapt_count, int is_lock_free, int write) +__elide_trylock (uint8_t *adapt_count, int *lock, int *writer, + unsigned int *readers, int write) { if (__elision_aconf.try_tbegin > 0) { if (write) __builtin_tabort (_ABORT_NESTED_TRYLOCK); - return __elide_lock (adapt_count, is_lock_free); + return __elide_lock (adapt_count, lock, writer, readers); } return false; } -# define ELIDE_TRYLOCK(adapt_count, is_lock_free, write) \ - __elide_trylock (&(adapt_count), is_lock_free, write) +# define ELIDE_TRYLOCK(adapt_count, lock, writer, readers, write) \ + __elide_trylock (&(adapt_count), &(lock), &(writer), &(readers), write) static inline bool -__elide_unlock (int is_lock_free) +__elide_unlock (int *writer, unsigned int *readers) { - if (is_lock_free) + if ((*writer == 0) && (*readers == 0)) { __builtin_tend (0); return true; @@ -97,14 +101,14 @@ __elide_unlock (int is_lock_free) return false; } -# define ELIDE_UNLOCK(is_lock_free) \ - __elide_unlock (is_lock_free) +# define ELIDE_UNLOCK(writer, readers) \ + __elide_unlock (&(writer), &(readers)) # else -# define ELIDE_LOCK(adapt_count, is_lock_free) 0 -# define ELIDE_TRYLOCK(adapt_count, is_lock_free, write) 0 -# define ELIDE_UNLOCK(is_lock_free) 0 +# define ELIDE_LOCK(adapt_count, lock, writer, readers) 0 +# define ELIDE_TRYLOCK(adapt_count, lock, writer, readers, write) 0 +# define ELIDE_UNLOCK(writer, readers) 0 #endif /* ENABLE_LOCK_ELISION */