From patchwork Thu Jan 19 11:14:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 717020 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3v41TN33Y7z9sf9 for ; Thu, 19 Jan 2017 22:15:03 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="F3By5GUD"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:date:mime-version :in-reply-to:content-type:message-id; q=dns; s=default; b=FayvfE j22NS3ng/3P+p7YorBTxlR1gDYZShqQJGEDnbUVTkBv2oJuYdaQOCPJI1krrn9mu +BLYokvwWRBsbGusPTAx+0nU2PkhqiB6Gb+2Wfb4X7sCAQ6ZNBg06LWmPX0uljv/ blf7dI+sha2Rqy4yoNNZJGRYgweLqUbdn3v94= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:date:mime-version :in-reply-to:content-type:message-id; s=default; bh=ClA8fMOycxHt JKxPBmlFDog2mTU=; b=F3By5GUDBbvr0xC7ORaNWTFtoS9w8u4DP8rYnwcT4Slc iEOtE5SMZrNBU66/XHhxoOKTwJ1fTlnZFlCqaNxT0Nank2imcOrD6tiwneV7zcJ7 Nv6hy7Z1XrbOz8IJAxgtt5u6CbX15XmUPQngJexQ4T5X7zC9a1LYYC4F60KoTQ4= Received: (qmail 66202 invoked by alias); 19 Jan 2017 11:14:55 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 66179 invoked by uid 89); 19 Jan 2017 11:14:53 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=chapter, 5129, 1578, 917 X-HELO: mx0a-001b2d01.pphosted.com Subject: Re: [PATCH 4/4] S390: Optimize lock-elision by decrementing adapt_count at unlock. To: libc-alpha@sourceware.org References: <1481032315-12420-1-git-send-email-stli@linux.vnet.ibm.com> <1481032315-12420-4-git-send-email-stli@linux.vnet.ibm.com> <1484132014.5606.256.camel@redhat.com> <036b04fb-3f33-4e6b-2591-c5af87539cad@linux.vnet.ibm.com> <1484679174.5606.438.camel@redhat.com> <516ec4f6-4fe9-f1ce-f7f5-5fe2b458f987@linux.vnet.ibm.com> <1484756737.5606.454.camel@redhat.com> From: Stefan Liebler Date: Thu, 19 Jan 2017 12:14:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <1484756737.5606.454.camel@redhat.com> X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17011911-0020-0000-0000-000002604ECD X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17011911-0021-0000-0000-00001F11F3B4 Message-Id: <907f3779-9601-69f3-be30-ab676d13d940@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-01-19_03:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=3 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1701190155 On 01/18/2017 05:25 PM, Torvald Riegel wrote: > On Wed, 2017-01-18 at 13:28 +0100, Stefan Liebler wrote: >> On 01/17/2017 07:52 PM, Torvald Riegel wrote: >>> On Tue, 2017-01-17 at 16:28 +0100, Stefan Liebler wrote: >>> The rest of the diff looked OK. >>> >> If it's okay, I will make one patch with all the reviewed diffs and post >> it with a ChangeLog in this mail-thread. >> Then I would commit it before release to have a consistent state? > > Yes, please commit this. > > > Okay. I'll commit this patch. Thanks. S390: Adjust lock elision code after review. This patch adjusts s390 specific lock elision code after review of the following patches: -S390: Use own tbegin macro instead of __builtin_tbegin. (8bfc4a2ab4bebdf86c151665aae8a266e2f18fb4) -S390: Use new __libc_tbegin_retry macro in elision-lock.c. (53c5c3d5ac238901c13f28a73ba05b0678094e80) -S390: Optimize lock-elision by decrementing adapt_count at unlock. (dd037fb3df286b7c2d0b0c6f8d02a2dd8a8e8a08) The futex value is not tested before starting a transaction, __glibc_likely is used instead of __builtin_expect and comments are adjusted. ChangeLog: * sysdeps/unix/sysv/linux/s390/htm.h: Adjust comments. * sysdeps/unix/sysv/linux/s390/elision-unlock.c: Likewise. * sysdeps/unix/sysv/linux/s390/elision-lock.c: Adjust comments. (__lll_lock_elision): Do not test futex before starting a transaction. Use __glibc_likely instead of __builtin_expect. * sysdeps/unix/sysv/linux/s390/elision-trylock.c: Adjust comments. (__lll_trylock_elision): Do not test futex before starting a transaction. Use __glibc_likely instead of __builtin_expect. commit abdf7e3d264e56898f474405602205279895d9b6 Author: Stefan Liebler Date: Thu Jan 19 10:44:18 2017 +0100 S390: Adjust lock elision code after review. This patch adjusts s390 specific lock elision code after review of the following patches: -S390: Use own tbegin macro instead of __builtin_tbegin. (8bfc4a2ab4bebdf86c151665aae8a266e2f18fb4) -S390: Use new __libc_tbegin_retry macro in elision-lock.c. (53c5c3d5ac238901c13f28a73ba05b0678094e80) -S390: Optimize lock-elision by decrementing adapt_count at unlock. (dd037fb3df286b7c2d0b0c6f8d02a2dd8a8e8a08) The futex value is not tested before starting a transaction, __glibc_likely is used instead of __builtin_expect and comments are adjusted. ChangeLog: * sysdeps/unix/sysv/linux/s390/htm.h: Adjust comments. * sysdeps/unix/sysv/linux/s390/elision-unlock.c: Likewise. * sysdeps/unix/sysv/linux/s390/elision-lock.c: Adjust comments. (__lll_lock_elision): Do not test futex before starting a transaction. Use __glibc_likely instead of __builtin_expect. * sysdeps/unix/sysv/linux/s390/elision-trylock.c: Adjust comments. (__lll_trylock_elision): Do not test futex before starting a transaction. Use __glibc_likely instead of __builtin_expect. diff --git a/sysdeps/unix/sysv/linux/s390/elision-lock.c b/sysdeps/unix/sysv/linux/s390/elision-lock.c index a7c0fcc..0081537 100644 --- a/sysdeps/unix/sysv/linux/s390/elision-lock.c +++ b/sysdeps/unix/sysv/linux/s390/elision-lock.c @@ -50,27 +50,28 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private) critical section uses lock elision) and outside of transactions. Thus, we need to use atomic accesses to avoid data races. However, the value of adapt_count is just a hint, so relaxed MO accesses are - sufficient. - Do not begin a transaction if another cpu has locked the - futex with normal locking. If adapt_count is zero, it remains and the - next pthread_mutex_lock call will try to start a transaction again. */ - if (atomic_load_relaxed (futex) == 0 - && atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) + sufficient. */ + if (atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) { + /* Start a transaction and retry it automatically if it aborts with + _HTM_TBEGIN_TRANSIENT. This macro calls tbegin at most retry_cnt + + 1 times. The second argument is considered as retry_cnt. */ int status = __libc_tbegin_retry ((void *) 0, aconf.try_tbegin - 1); - if (__builtin_expect (status == _HTM_TBEGIN_STARTED, - _HTM_TBEGIN_STARTED)) + if (__glibc_likely (status == _HTM_TBEGIN_STARTED)) { /* Check the futex to make sure nobody has touched it in the mean time. This forces the futex into the cache and makes - sure the transaction aborts if some other cpu uses the - lock (writes the futex). */ - if (__builtin_expect (atomic_load_relaxed (futex) == 0, 1)) + sure the transaction aborts if another thread acquires the lock + concurrently. */ + if (__glibc_likely (atomic_load_relaxed (futex) == 0)) /* Lock was free. Return to user code in a transaction. */ return 0; - /* Lock was busy. Fall back to normal locking. */ - if (__builtin_expect (__libc_tx_nesting_depth () <= 1, 1)) + /* Lock was busy. Fall back to normal locking. + This can be the case if e.g. adapt_count was decremented to zero + by a former release and another thread has been waken up and + acquired it. */ + if (__glibc_likely (__libc_tx_nesting_depth () <= 1)) { /* In a non-nested transaction there is no need to abort, which is expensive. Simply end the started transaction. */ @@ -118,6 +119,7 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private) } } - /* Use normal locking as fallback path if transaction does not succeed. */ + /* Use normal locking as fallback path if the transaction does not + succeed. */ return LLL_LOCK ((*futex), private); } diff --git a/sysdeps/unix/sysv/linux/s390/elision-trylock.c b/sysdeps/unix/sysv/linux/s390/elision-trylock.c index 3c1d009..aa09073 100644 --- a/sysdeps/unix/sysv/linux/s390/elision-trylock.c +++ b/sysdeps/unix/sysv/linux/s390/elision-trylock.c @@ -51,31 +51,29 @@ __lll_trylock_elision (int *futex, short *adapt_count) critical section uses lock elision) and outside of transactions. Thus, we need to use atomic accesses to avoid data races. However, the value of adapt_count is just a hint, so relaxed MO accesses are - sufficient. - Do not begin a transaction if another cpu has locked the - futex with normal locking. If adapt_count is zero, it remains and the - next pthread_mutex_lock call will try to start a transaction again. */ - if (atomic_load_relaxed (futex) == 0 - && atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) + sufficient. */ + if (atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) { int status = __libc_tbegin ((void *) 0); - if (__builtin_expect (status == _HTM_TBEGIN_STARTED, - _HTM_TBEGIN_STARTED)) + if (__glibc_likely (status == _HTM_TBEGIN_STARTED)) { /* Check the futex to make sure nobody has touched it in the mean time. This forces the futex into the cache and makes - sure the transaction aborts if some other cpu uses the - lock (writes the futex). */ - if (__builtin_expect (atomic_load_relaxed (futex) == 0, 1)) + sure the transaction aborts if another thread acquires the lock + concurrently. */ + if (__glibc_likely (atomic_load_relaxed (futex) == 0)) /* Lock was free. Return to user code in a transaction. */ return 0; - /* Lock was busy. Fall back to normal locking. Since we are in - a non-nested transaction there is no need to abort, which is - expensive. Simply end the started transaction. */ + /* Lock was busy. Fall back to normal locking. + This can be the case if e.g. adapt_count was decremented to zero + by a former release and another thread has been waken up and + acquired it. + Since we are in a non-nested transaction there is no need to abort, + which is expensive. Simply end the started transaction. */ __libc_tend (); /* Note: Changing the adapt_count here might abort a transaction on a - different cpu, but that could happen anyway when the futex is + different CPU, but that could happen anyway when the futex is acquired, so there's no need to check the nesting depth here. See above for why relaxed MO is sufficient. */ if (aconf.skip_lock_busy > 0) @@ -93,6 +91,7 @@ __lll_trylock_elision (int *futex, short *adapt_count) /* Could do some retries here. */ } - /* Use normal locking as fallback path if transaction does not succeed. */ + /* Use normal locking as fallback path if the transaction does not + succeed. */ return lll_trylock (*futex); } diff --git a/sysdeps/unix/sysv/linux/s390/elision-unlock.c b/sysdeps/unix/sysv/linux/s390/elision-unlock.c index d9205fa..c062d71 100644 --- a/sysdeps/unix/sysv/linux/s390/elision-unlock.c +++ b/sysdeps/unix/sysv/linux/s390/elision-unlock.c @@ -26,8 +26,12 @@ __lll_unlock_elision(int *futex, short *adapt_count, int private) /* If the lock is free, we elided the lock earlier. This does not necessarily mean that we are in a transaction, because the user code may have closed the transaction, but that is impossible to detect reliably. - Relaxed MO access to futex is sufficient as we only need a hint, if we - started a transaction or acquired the futex in e.g. elision-lock.c. */ + Relaxed MO access to futex is sufficient because a correct program + will only release a lock it has acquired; therefore, it must either + changed the futex word's value to something !=0 or it must have used + elision; these are actions by the same thread, so these actions are + sequenced-before the relaxed load (and thus also happens-before the + relaxed load). Therefore, relaxed MO is sufficient. */ if (atomic_load_relaxed (futex) == 0) { __libc_tend (); @@ -36,17 +40,17 @@ __lll_unlock_elision(int *futex, short *adapt_count, int private) { /* Update the adapt_count while unlocking before completing the critical section. adapt_count is accessed concurrently outside of a - transaction or an aquired lock e.g. in elision-lock.c so we need to use - atomic accesses. However, the value of adapt_count is just a hint, so - relaxed MO accesses are sufficient. + transaction or a critical section (e.g. in elision-lock.c). So we need + to use atomic accesses. However, the value of adapt_count is just a + hint, so relaxed MO accesses are sufficient. If adapt_count would be decremented while locking, multiple - CPUs trying to lock the locked mutex will decrement adapt_count to + CPUs, trying to lock the acquired mutex, will decrement adapt_count to zero and another CPU will try to start a transaction, which will be immediately aborted as the mutex is locked. - If adapt_count would be decremented while unlocking after completing - the critical section, possible waiters will be waked up before - decrementing the adapt_count. Those waked up waiters could have - destroyed and freed this mutex! */ + The update of adapt_count is done before releasing the lock as POSIX' + mutex destruction requirements disallow accesses to the mutex after it + has been released and thus could have been acquired or destroyed by + another thread. */ short adapt_count_val = atomic_load_relaxed (adapt_count); if (adapt_count_val > 0) atomic_store_relaxed (adapt_count, adapt_count_val - 1); diff --git a/sysdeps/unix/sysv/linux/s390/htm.h b/sysdeps/unix/sysv/linux/s390/htm.h index 32d5a88..70d7f66 100644 --- a/sysdeps/unix/sysv/linux/s390/htm.h +++ b/sysdeps/unix/sysv/linux/s390/htm.h @@ -119,11 +119,16 @@ ar modification and fp operations. Some \ program-interruptions (e.g. a null \ pointer access) are filtered and the \ - trancsaction will abort. In this case \ + transaction will abort. In this case \ the normal lock path will execute it \ again and result in a core dump wich does \ now show at tbegin but the real executed \ - instruction. */ \ + instruction. \ + However it is not guaranteed that this \ + retry operate on the same data and thus \ + may not end in an program-interruption. \ + Note: This could also be used to probe \ + memory for being accessible! */ \ "2: tbegin 0, 0xFF0E\n\t" \ /* Branch away in abort case (this is the \ prefered sequence. See PoP in chapter 5 \ @@ -152,7 +157,8 @@ __ret; \ }) -/* These builtins are correct. Use them. */ +/* These builtins are usable in context of glibc lock elision code without any + changes. Use them. */ #define __libc_tend() \ ({ __asm__ __volatile__ (".machine push\n\t" \ ".machinemode \"zarch_nohighgprs\"\n\t" \