From patchwork Tue Aug 20 21:19:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1150374 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-104633-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="GQw8lUpi"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46CkGT2LPFz9s4Y for ; Wed, 21 Aug 2019 07:20:17 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:in-reply-to:references :message-id; q=dns; s=default; b=UhUJfDzs7mXuGHVlXfjsFXxhWKSFHjg PmviHkrXw6Ck5gZ6duV7eNaVf/wpZlJAjnyAJHB0rRk04EDqp+gcJMEzfGHiOdOB 3Mdo7ZyJkPWVaBWFSTdTPmHTkalB3hXW7qLUHaI43S+a8Vk57Sio7RLA9xs21GA8 O1FBi3OOARWc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:in-reply-to:references :message-id; s=default; bh=EvVxQLetrzp2z3d2oFrQ2p9RNiI=; b=GQw8l UpinR7CPPGAMIFJBl7+jX96+jwN1N84jqRsAZXTQxrFEWYY+VSAOROpdoQ/Qochr L9/wsghJbGdgpXyjE3FC0evB1eg2GrUSD4Od6rRSYeQap0e/pc7f5aLXr9XvVRqJ BnIWbPj4mPGOG9ssCre1IbfHaDcC3SjnQqkmRk= Received: (qmail 111298 invoked by alias); 20 Aug 2019 21:19:55 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 111164 invoked by uid 89); 20 Aug 2019 21:19:54 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 spammy=H*Ad:D*br X-HELO: mx0a-001b2d01.pphosted.com From: "Paul A. Clarke" To: libc-alpha@sourceware.org Cc: tuliom@ascii.art.br, murphyp@linux.ibm.com Subject: [PATCH 1/4] [powerpc] fe{en, dis}ableexcept, fesetmode: optimize FPSCR accesses Date: Tue, 20 Aug 2019 16:19:42 -0500 In-Reply-To: <1566335985-14601-1-git-send-email-pc@us.ibm.com> References: <1566335985-14601-1-git-send-email-pc@us.ibm.com> x-cbid: 19082021-0072-0000-0000-00000453ECD7 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00011625; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000287; SDB=6.01249686; UDB=6.00659730; IPR=6.01031235; MB=3.00028250; MTD=3.00000008; XFM=3.00000015; UTC=2019-08-20 21:19:49 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19082021-0073-0000-0000-00004CC50E13 Message-Id: <1566335985-14601-2-git-send-email-pc@us.ibm.com> From: "Paul A. Clarke" Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the "mode" (exception enable and rounding mode) bits of the Floating Point Status Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be used to write just those bits back to the FPSCR. The net is better performance. In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or they determine that it doesn't need to be written because it is not changing. In either case, the local variable holds the current values of the enable bits in the FPSCR. This local variable can be used instead of again reading the FPSCR. Also, that value of the FPSCR which is read the second time is validated against the requested enables. Since the write can't fail, this validation step is unnecessary, and can be removed. Instead, the exceptions to be enabled (or disabled) are transformed into available bits in the FPSCR, then validated after being transformed back, to ensure that all requested bits are actually being set. For example, FE_INVALID_SQRT can be requested, but cannot actually be set. This bit is not mapped during the transformations, so a test for that bit being set before and after transformations will show the bit would not be set, and the function will return -1 for failure. Finally, convert the local macros in fesetmode.c to more generally useful macros in fenv_libc.h. 2019-08-20 Paul A. Clarke * sysdeps/powerpc/fpu/fenv_libc.h (fesetenv_mode): New. (FPSCR_FPRF_MASK): New. (FPSCR_STATUS_MASK): New. * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Use lighter- weight access to FPSCR; remove unnecessary second FPSCR read and validate. * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Likewise. * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Use lighter-weight access to FPSCR; Use macros in fenv_libc.h in favor of local. --- v2: - Address issue raised by Paul Murphy. If the specified set of exceptions cannot be enabled (or disabled), then the function will return failure. - The current version of the code will enable (or disable) what it can _and_ return failure. This version will just return failure. sysdeps/powerpc/fpu/fedisblxcpt.c | 14 ++++++++------ sysdeps/powerpc/fpu/feenablxcpt.c | 15 ++++++++------- sysdeps/powerpc/fpu/fenv_libc.h | 10 +++++++++- sysdeps/powerpc/fpu/fesetmode.c | 15 +++++---------- 4 files changed, 30 insertions(+), 24 deletions(-) diff --git a/sysdeps/powerpc/fpu/fedisblxcpt.c b/sysdeps/powerpc/fpu/fedisblxcpt.c index 5cc8799..a2b7add 100644 --- a/sysdeps/powerpc/fpu/fedisblxcpt.c +++ b/sysdeps/powerpc/fpu/fedisblxcpt.c @@ -26,23 +26,25 @@ fedisableexcept (int excepts) int result, new; /* Get current exception mask to return. */ - fe.fenv = curr.fenv = fegetenv_register (); + fe.fenv = curr.fenv = fegetenv_status (); result = fenv_reg_to_exceptions (fe.l); if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID) excepts = (excepts | FE_INVALID) & ~ FE_ALL_INVALID; + new = fenv_exceptions_to_reg (excepts); + + if (fenv_reg_to_exceptions (new) != excepts) + return -1; + /* Sets the new exception mask. */ - fe.l &= ~ fenv_exceptions_to_reg (excepts); + fe.l &= ~new; if (fe.l != curr.l) - fesetenv_register (fe.fenv); + fesetenv_mode (fe.fenv); - new = __fegetexcept (); if (new == 0 && result != 0) (void)__fe_mask_env (); - if ((new & excepts) != 0) - result = -1; return result; } diff --git a/sysdeps/powerpc/fpu/feenablxcpt.c b/sysdeps/powerpc/fpu/feenablxcpt.c index 3b64398..c06a7fd 100644 --- a/sysdeps/powerpc/fpu/feenablxcpt.c +++ b/sysdeps/powerpc/fpu/feenablxcpt.c @@ -26,24 +26,25 @@ feenableexcept (int excepts) int result, new; /* Get current exception mask to return. */ - fe.fenv = curr.fenv = fegetenv_register (); + fe.fenv = curr.fenv = fegetenv_status (); result = fenv_reg_to_exceptions (fe.l); if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID) excepts = (excepts | FE_INVALID) & ~ FE_ALL_INVALID; + new = fenv_exceptions_to_reg (excepts); + + if (fenv_reg_to_exceptions (new) != excepts) + return -1; + /* Sets the new exception mask. */ - fe.l |= fenv_exceptions_to_reg (excepts); + fe.l |= new; if (fe.l != curr.l) - fesetenv_register (fe.fenv); + fesetenv_mode (fe.fenv); - new = __fegetexcept (); if (new != 0 && result == 0) (void) __fe_nomask_env_priv (); - if ((new & excepts) != excepts) - result = -1; - return result; } diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h index 853239f..8ba4832 100644 --- a/sysdeps/powerpc/fpu/fenv_libc.h +++ b/sysdeps/powerpc/fpu/fenv_libc.h @@ -70,6 +70,11 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden; __builtin_mtfsf (0xff, d); \ } while(0) +/* Set the last 2 nibbles of the FPSCR, which contain the + exception enables and the rounding mode. + 'fegetenv_status' retrieves these bits by reading the FPSCR. */ +#define fesetenv_mode(env) __builtin_mtfsf (0b00000011, (env)); + /* This very handy macro: - Sets the rounding mode to 'round to nearest'; - Sets the processor into IEEE mode; and @@ -206,8 +211,11 @@ enum { (FPSCR_VE_MASK|FPSCR_OE_MASK|FPSCR_UE_MASK|FPSCR_ZE_MASK|FPSCR_XE_MASK) #define FPSCR_BASIC_EXCEPTIONS_MASK \ (FPSCR_VX_MASK|FPSCR_OX_MASK|FPSCR_UX_MASK|FPSCR_ZX_MASK|FPSCR_XX_MASK) - +#define FPSCR_FPRF_MASK \ + (FPSCR_FPRF_C_MASK|FPSCR_FPRF_FL_MASK|FPSCR_FPRF_FG_MASK| \ + FPSCR_FPRF_FE_MASK|FPSCR_FPRF_FU_MASK) #define FPSCR_CONTROL_MASK (FPSCR_ENABLES_MASK|FPSCR_NI_MASK|FPSCR_RN_MASK) +#define FPSCR_STATUS_MASK (FPSCR_FR_MASK|FPSCR_FI_MASK|FPSCR_FPRF_MASK) /* The bits in the FENV(1) ABI for exceptions correspond one-to-one with bits in the FPSCR, albeit shifted to different but corresponding locations. diff --git a/sysdeps/powerpc/fpu/fesetmode.c b/sysdeps/powerpc/fpu/fesetmode.c index 4f4f71a..e92559b 100644 --- a/sysdeps/powerpc/fpu/fesetmode.c +++ b/sysdeps/powerpc/fpu/fesetmode.c @@ -19,11 +19,6 @@ #include #include -#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM \ - | _FPU_MASK_XM | _FPU_MASK_IM) - -#define FPU_STATUS 0xbffff700ULL - int fesetmode (const femode_t *modep) { @@ -32,18 +27,18 @@ fesetmode (const femode_t *modep) /* Logic regarding enabled exceptions as in fesetenv. */ new.fenv = *modep; - old.fenv = fegetenv_register (); - new.l = (new.l & ~FPU_STATUS) | (old.l & FPU_STATUS); + old.fenv = fegetenv_status (); + new.l = (new.l & ~FPSCR_STATUS_MASK) | (old.l & FPSCR_STATUS_MASK); if (old.l == new.l) return 0; - if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0) + if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0) (void) __fe_nomask_env_priv (); - if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0) + if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0) (void) __fe_mask_env (); - fesetenv_register (new.fenv); + fesetenv_mode (new.fenv); return 0; } From patchwork Tue Aug 20 21:19:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1150373 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-104632-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="lJQA7+X0"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46CkGJ48Y3z9s4Y for ; Wed, 21 Aug 2019 07:20:08 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=gXWr2vgE4rs3ZZzpGjzpaYdkM7s/Aey yzD8jvL+9O/EyR2cqmlQqGWyPgDI9m3PmZPkrZt1/77q8B1Ubgn5q3clHZAj4zmI QsRutNOp/ougEBoqYQAnhyJZoc6ZBvPASBi5VWMDinWCPfcf1S9AKF4krRqobijX pNGhIK9ILJ0g= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; s=default; bh=MMIFGP5VPOBe2KqS/V1ruZsXMs0=; b=lJQA7 +X0N6vK3oX4knmX23uNsI5t4VSbkNKARfKlozyom3kzOPSBQswbQ2oz5O5vteI0P 1/tYC1onCu+qkqKXZH54EBOh+eHcKa6qs03cAs5gUSIIuyGpWL4TVrVYOBnPiFj6 rpOYh6vqISSNB3M723j6+IZ9/mozwinbh4kyZA= Received: (qmail 111178 invoked by alias); 20 Aug 2019 21:19:55 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 111162 invoked by uid 89); 20 Aug 2019 21:19:54 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=H*Ad:D*br X-HELO: mx0a-001b2d01.pphosted.com From: "Paul A. Clarke" To: libc-alpha@sourceware.org Cc: tuliom@ascii.art.br, murphyp@linux.ibm.com Subject: [PATCH 2/4] [powerpc] SET_RESTORE_ROUND improvements Date: Tue, 20 Aug 2019 16:19:43 -0500 Message-Id: <1566335985-14601-3-git-send-email-pc@us.ibm.com> In-Reply-To: <1566335985-14601-1-git-send-email-pc@us.ibm.com> References: <1566335985-14601-1-git-send-email-pc@us.ibm.com> From: "Paul A. Clarke" SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and libc_feresetround_ppc_ctx to bracket a block of code where the floating point rounding mode must be set to a certain value. For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs: 1. Read/save FPSCR. 2. Create new value for FPSCR with new rounding mode and enables cleared. 3. If new value is different than current value, a. If transitioning from a state where some exceptions enabled, enter "ignore exceptions / non-stop" mode. b. Write new value to FPSCR. c. Put a mark on the wall indicating the FPSCR was changed. (1) uses the 'mffs' instruction. On POWER9, the lighter weight 'mffsl' instruction can be used, but it doesn't return all of the bits in the FPSCR. fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be used instead of fegetenv_register. (3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must instead use 'mtfsf 0b00000011' to write just the enables and the mode, because some of the rest of the bits are not valid if 'mffsl' was used. fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111' otherwise. For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with parameters such that it performs: 1. Retreive saved value of FPSCR, saved in prologue above. 2. Read FPSCR. 3. Create new value of FPSCR where: - Summary bits and exception indicators = current OR saved. - Rounding mode and enables = saved. - Status bits = current. 4. If transitioning from some exceptions enabled to none, enter "ignore exceptions / non-stop" mode. 5. If transitioning from no exceptions enabled to some, enter "catch exceptions" mode. 6. Write new value to FPSCR. The summary bits are hardwired to the exception indicators, so there is no need to restore any saved summary bits. The exception indicator bits, which are sticky and remain set unless explicitly cleared, would only need to be restored if the code block might explicitly clear any of them. This is certainly not expected. So, the only bits that need to be restored are the enables and the mode. If it is the case that only those bits are to be restored, there is no need to read the FPSCR. Steps (2) and (3) are unnecessary, and step (6) only needs to write the bits being restored. We know we are transitioning out of "ignore exceptions" mode, so step (4) is unnecessary, and in step (6), we only need to check the state we are entering. 2019-08-20 Paul A. Clarke * sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc_ctx): Utilize lightweight FPSCR read if possible, set fewer FPSCR bits if possible. (libc_feresetround_ppc): Replace call to __libc_femergeenv_ppc with simpler required steps, set fewer FPSCR bits if possible. (libc_feresetround_noex_ppc_ctx): New. (libc_feresetround_noex_ctx): New. (libc_feresetround_noexf_ctx): New. (libc_feresetround_noexl_ctx): New. --- v2: - Address issue raised by Paul Murphy. The first version of the patch was broken with respect to the "no exceptions" (NOEX) versions of the macros. sysdeps/powerpc/fpu/fenv_private.h | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h index 8c126f9..5ebe6cd 100644 --- a/sysdeps/powerpc/fpu/fenv_private.h +++ b/sysdeps/powerpc/fpu/fenv_private.h @@ -132,7 +132,17 @@ libc_fesetenv_ppc (const fenv_t *envp) static __always_inline void libc_feresetround_ppc (fenv_t *envp) { - __libc_femergeenv_ppc (envp, _FPU_MASK_TRAPS_RN, _FPU_MASK_FRAC_INEX_RET_CC); + fenv_union_t new = { .fenv = *envp }; + + /* If the old env has no enabled exceptions and the new env has any enabled + exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits. This will put the + hardware into "precise mode" and may cause the FPU to run slower on some + hardware. */ + if ((new.l & _FPU_ALL_TRAPS) != 0) + (void) __fe_nomask_env_priv (); + + /* Atomically enable and raise (if appropriate) exceptions set in `new'. */ + fesetenv_mode (new.fenv); } static __always_inline int @@ -176,9 +186,30 @@ libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r) { fenv_union_t old, new; + old.fenv = fegetenv_status (); + + new.l = (old.l & ~(FPSCR_ENABLES_MASK|FPSCR_RN_MASK)) | r; + + ctx->env = old.fenv; + if (__glibc_unlikely (new.l != old.l)) + { + if ((old.l & _FPU_ALL_TRAPS) != 0) + (void) __fe_mask_env (); + fesetenv_mode (new.fenv); + ctx->updated_status = true; + } + else + ctx->updated_status = false; +} + +static __always_inline void +libc_feholdsetround_noex_ppc_ctx (struct rm_ctx *ctx, int r) +{ + fenv_union_t old, new; + old.fenv = fegetenv_register (); - new.l = (old.l & _FPU_MASK_TRAPS_RN) | r; + new.l = (old.l & ~(FPSCR_ENABLES_MASK|FPSCR_RN_MASK)) | r; ctx->env = old.fenv; if (__glibc_unlikely (new.l != old.l)) @@ -218,6 +249,9 @@ libc_feresetround_ppc_ctx (struct rm_ctx *ctx) #define libc_feholdsetround_ctx libc_feholdsetround_ppc_ctx #define libc_feholdsetroundf_ctx libc_feholdsetround_ppc_ctx #define libc_feholdsetroundl_ctx libc_feholdsetround_ppc_ctx +#define libc_feholdsetround_noex_ctx libc_feholdsetround_noex_ppc_ctx +#define libc_feholdsetround_noexf_ctx libc_feholdsetround_noex_ppc_ctx +#define libc_feholdsetround_noexl_ctx libc_feholdsetround_noex_ppc_ctx #define libc_feresetround_ctx libc_feresetround_ppc_ctx #define libc_feresetroundf_ctx libc_feresetround_ppc_ctx #define libc_feresetroundl_ctx libc_feresetround_ppc_ctx From patchwork Tue Aug 20 21:19:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1150375 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-104634-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="GBQURmkx"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46CkGd5ptFz9s4Y for ; Wed, 21 Aug 2019 07:20:25 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=SiSTahiTFzkeZpa0Qy+iv3ojCsROMH0 E/zFVLzbyeqdiZUtv47R41kd5eZO8AKTPvPq7Uw9njwKGovmufz3GAb+Om8Yl+Ow Cy0A3BoO7rqxHk5FEW1hQ4nRqORBY28wxLRwPtkZ3BvxBuKbSyQQx5luFIpATWOD tBh2pNnT6QuA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; s=default; bh=KsCcWk9uXefcpP2EtohZ9YQvn9Q=; b=GBQUR mkxS35zxNeucT86ekhXQ4Obu6xc/4EjwoKDMMseIT3tBtPEkUbCyAawXdUiGAJqT a1FrUqC1nlV3ngKdSBsMpK24f8mRVLeLoJuAPQo2MwGO+Q4bpEffDTb+kvtnzcwP vTUZ6/os51Tg9MHUQPCPi93BHon8CkYIW/3rPs= Received: (qmail 111508 invoked by alias); 20 Aug 2019 21:19:56 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 111449 invoked by uid 89); 20 Aug 2019 21:19:56 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx0a-001b2d01.pphosted.com From: "Paul A. Clarke" To: libc-alpha@sourceware.org Cc: tuliom@ascii.art.br, murphyp@linux.ibm.com Subject: [PATCH 3/4] [powerpc] fesetenv: optimize FPSCR access Date: Tue, 20 Aug 2019 16:19:44 -0500 Message-Id: <1566335985-14601-4-git-send-email-pc@us.ibm.com> In-Reply-To: <1566335985-14601-1-git-send-email-pc@us.ibm.com> References: <1566335985-14601-1-git-send-email-pc@us.ibm.com> From: "Paul A. Clarke" fesetenv() reads the current value of the Floating-Point Status and Control Register (FPSCR) to determine the difference between the current state of exception enables and the newly requested state. All of these bits are also returned by the lighter weight 'mffsl' instruction used by fegetenv_status(). Use that instead. Also, remove a local macro _FPU_MASK_ALL in favor of a common macro, FPU_ENABLES_MASK from fenv_libc.h. Finally, use a local variable ('new') in favor of a pointer dereference ('*envp'). 2019-08-20 Paul A. Clarke * sysdeps/powerpc/fpu/fesetenv.c (__fesetenv): Utilize lightweight FPSCR read. (_FPU_MASK_ALL): Delete. --- sysdeps/powerpc/fpu/fesetenv.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c index 009a4f0..5ca15c7 100644 --- a/sysdeps/powerpc/fpu/fesetenv.c +++ b/sysdeps/powerpc/fpu/fesetenv.c @@ -19,8 +19,6 @@ #include #include -#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM | _FPU_MASK_XM | _FPU_MASK_IM) - int __fesetenv (const fenv_t *envp) { @@ -28,25 +26,23 @@ __fesetenv (const fenv_t *envp) /* get the currently set exceptions. */ new.fenv = *envp; - old.fenv = fegetenv_register (); - if (old.l == new.l) - return 0; + old.fenv = fegetenv_status (); /* If the old env has no enabled exceptions and the new env has any enabled exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits. This will put the hardware into "precise mode" and may cause the FPU to run slower on some hardware. */ - if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0) + if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0) (void) __fe_nomask_env_priv (); /* If the old env had any enabled exceptions and the new env has no enabled exceptions, then mask SIGFPE in the MSR FE0/FE1 bits. This may allow the FPU to run faster because it always takes the default action and can not generate SIGFPE. */ - if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0) + if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0) (void)__fe_mask_env (); - fesetenv_register (*envp); + fesetenv_register (new.fenv); /* Success. */ return 0; From patchwork Tue Aug 20 21:19:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1150376 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-104635-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="rVZG11h0"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46CkGn758rz9s4Y for ; Wed, 21 Aug 2019 07:20:33 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:in-reply-to:references :message-id; q=dns; s=default; b=azW+V3KuiDrFXX1+4lAzRJo2mC6buZ6 hISoFOTCAmCPqsJP+HhUHrhosHYUByYolVbbCEcsekzJD2oRhgRt58NZU7TZTavY SgU2zqOACjy60K2uDa+Pkssfq/xf7axC3jesM49inHcIUdhiwtLVMZikWxcy2JlD N4pnzIAPvnW4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:in-reply-to:references :message-id; s=default; bh=2M5gnMtG7MWDTFkoN1ulrY+JXJA=; b=rVZG1 1h0dKom9dOFPnf6s9YYXCGiDp/yn9GbgW0VUwgS7GX7l/+IWji91LmXfNRC5H8pH FQczX/cTfeaLE0LNbkLiWK3PrgnPKWGAW2MHxPXD9N9Ar8PRoImIZnGynVO5hDag AyccEA020Yd6UfgwyQ+62eyZ33uPJglA30n8kk= Received: (qmail 111713 invoked by alias); 20 Aug 2019 21:19:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 111651 invoked by uid 89); 20 Aug 2019 21:19:57 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 spammy= X-HELO: mx0a-001b2d01.pphosted.com From: "Paul A. Clarke" To: libc-alpha@sourceware.org Cc: tuliom@ascii.art.br, murphyp@linux.ibm.com Subject: [PATCH 4/4] [powerpc] fegetenv_status: simplify instruction generation Date: Tue, 20 Aug 2019 16:19:45 -0500 In-Reply-To: <1566335985-14601-1-git-send-email-pc@us.ibm.com> References: <1566335985-14601-1-git-send-email-pc@us.ibm.com> x-cbid: 19082021-0052-0000-0000-000003EC9376 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00011625; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000287; SDB=6.01249686; UDB=6.00659730; IPR=6.01031235; MB=3.00028250; MTD=3.00000008; XFM=3.00000015; UTC=2019-08-20 21:19:51 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19082021-0053-0000-0000-00006226BB02 Message-Id: <1566335985-14601-5-git-send-email-pc@us.ibm.com> From: "Paul A. Clarke" fegetenv_status() wants to use the lighter weight instruction 'mffsl' for reading the Floating-Point Status and Control Register (FPSCR). It currently will use it directly if compiled '-mcpu=power9', and will perform a runtime check (cpu_supports("arch_3_00")) otherwise. Nicely, it turns out that the 'mffsl' instruction will decode to 'mffs' on architectures older than "arch_3_00" because the additional bits set for 'mffsl' are "don't care" for 'mffs'. 'mffs' is a superset of 'mffsl'. So, just generate 'mffsl'. 2019-08-20 Paul A. Clarke * sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status_ISA300): Delete. (fegetenv_status): Generate 'mffsl' unconditionally. --- sysdeps/powerpc/fpu/fenv_libc.h | 14 +------------- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h index 8ba4832..186612b 100644 --- a/sysdeps/powerpc/fpu/fenv_libc.h +++ b/sysdeps/powerpc/fpu/fenv_libc.h @@ -37,7 +37,7 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden; /* Equivalent to fegetenv_register, but only returns bits for status, exception enables, and mode. */ -#define fegetenv_status_ISA300() \ +#define fegetenv_status() \ ({register double __fr; \ __asm__ __volatile__ ( \ ".machine push; .machine \"power9\"; mffsl %0; .machine pop" \ @@ -45,18 +45,6 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden; __fr; \ }) -#ifdef _ARCH_PWR9 -# define fegetenv_status() fegetenv_status_ISA300() -#elif defined __BUILTIN_CPU_SUPPORTS__ -# define fegetenv_status() \ - (__glibc_likely (__builtin_cpu_supports ("arch_3_00")) \ - ? fegetenv_status_ISA300() \ - : fegetenv_register() \ - ) -#else -# define fegetenv_status() fegetenv_register () -#endif - /* Equivalent to fesetenv, but takes a fenv_t instead of a pointer. */ #define fesetenv_register(env) \ do { \