From patchwork Tue Aug 23 06:58:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 661713 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sJLs25yPRz9sf9 for ; Tue, 23 Aug 2016 16:59:17 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b=I/x5lVxP; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:date:mime-version :in-reply-to:content-type:message-id; q=dns; s=default; b=IDoqV7 FEt+OlWEvPIpBuWkugzDE0UQrCuZ7mbDBRXqIqLkpC9mwzf9y9GNmF3nLIcXsVbT EE6kM++qJkGPfZe0hTTVJgZzMDDRB/5QVJ5mVvK15h3iL9UcJD9z5aL53n044QU+ mdfBS4hV7KwMfKaedlUzthPWhhFqo6Q64kuCY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:date:mime-version :in-reply-to:content-type:message-id; s=default; bh=kmUhFhbYOxfs 8eo2abI1GBOtvzA=; b=I/x5lVxPE50LSy+BOKK08+LsAE8MQa4etfvkTxZfm9TS AcvB4tHdz3hiksPg1Yoc5FiOqmn9eQVE9ziZQP+yInfZk2my0+HBJ/vVPAjcfLod 5n1yn/fk8cvmXvWnyUA5SdZNxLxOKXmvMfvgPWVj87dILVspwwhBx2h1SirDou4= Received: (qmail 14994 invoked by alias); 23 Aug 2016 06:59:10 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 14959 invoked by uid 89); 23 Aug 2016 06:59:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=Success, raising, IEEE, zarch X-HELO: mx0a-001b2d01.pphosted.com X-IBM-Helo: d06dlp01.portsmouth.uk.ibm.com X-IBM-MailFrom: stli@linux.vnet.ibm.com X-IBM-RcptTo: libc-alpha@sourceware.org Subject: Re: [PATCH] S390: Do not set FE_INEXACT with feraiseexcept (FE_OWERFLOW|FE_UNDERFLOW). To: libc-alpha@sourceware.org References: <45cef91b-59ad-a35f-70ea-68c8d1349701@linux.vnet.ibm.com> From: Stefan Liebler Date: Tue, 23 Aug 2016 08:58:52 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2 MIME-Version: 1.0 In-Reply-To: X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16082306-0028-0000-0000-000002056438 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16082306-0029-0000-0000-00002005F1A4 Message-Id: <72d18ee0-b0e0-d319-4656-f0a8651848dc@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-08-23_04:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=15 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608230071 On 08/22/2016 12:59 PM, Joseph Myers wrote: > On Mon, 22 Aug 2016, Stefan Liebler wrote: > >> For feraiseexcept I think if the z196-round-instruction is available with used >> toolchain then it should be used to avoid FE_INEXACT. Then it is the same >> behaviour as on intel. >> If it is not available feraiseexcept will use the add/div instructions as >> before with the FE_INEXACT flag/exception. And the additional _FPU_GETCW/SETCW >> usages are avoided. >> >> What's your suggestion for feraiseexcept? > > I don't have a suggestion; I was simply observing that extra steps for > this case (such as the x86 code does) are not actually needed to conform > to the standard. > Okay. Then I've updated the patch. Now it uses z196-round-instruction if available to omit FE_INEXACT. If it is not available the old behaviour is used without clearing FE_INEXACT. Bye Stefan ChangeLog: * config.h.in: (HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT) New undefine. * sysdeps/s390/configure.ac: Add test for z196 zarch support. * sysdeps/s390/configure: Regenerated. * sysdeps/s390/fpu/fraiseexcpt.c (__feraiseexcept): Use ledbra instruction for raising over-/underflow if z196 zarch is supported by default. * sysdeps/s390/fpu/fsetexcptflg.c (fesetexceptflag): Correct comment. commit 224637717d53581a081bbb42471c7e91fc137085 Author: Stefan Liebler Date: Tue Aug 23 08:44:26 2016 +0200 S390: Do not set FE_INEXACT with feraiseexcept (FE_OWERFLOW|FE_UNDERFLOW). On s390 feraiseexcept (FE_OVERFLOW|FE_UNDERFLOW) sets FE_INEXACT, too. This patch uses z196 zarch load rounded instruction which can suppress FE_INEXACT exception if gcc has z196 support in used configuration. Otherwise FE_INEXACT flag is set as before. The gcc support is tested in a new configure-check. A comment in fsetexcptflg.c is corrected as new exceptions are not executed with the next floating-point instruction if fpc is set with _FPU_SETCW macro. It seems the comment was copied e.g. from sysdeps/x86_64/fpu/fsetexcptflg.c file. ChangeLog: * config.h.in: (HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT) New undefine. * sysdeps/s390/configure.ac: Add test for z196 zarch support. * sysdeps/s390/configure: Regenerated. * sysdeps/s390/fpu/fraiseexcpt.c (__feraiseexcept): Use ledbra instruction for raising over-/underflow if z196 zarch is supported by default. * sysdeps/s390/fpu/fsetexcptflg.c (fesetexceptflag): Correct comment. diff --git a/config.h.in b/config.h.in index 856ef6a..8cd08b0 100644 --- a/config.h.in +++ b/config.h.in @@ -70,6 +70,9 @@ /* Define if assembler supports AVX512DQ. */ #undef HAVE_AVX512DQ_ASM_SUPPORT +/* Define if assembler supports z196 zarch instructions as default on S390. */ +#undef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT + /* Define if assembler supports vector instructions on S390. */ #undef HAVE_S390_VX_ASM_SUPPORT diff --git a/sysdeps/s390/configure b/sysdeps/s390/configure index c9fb69c..347ac28 100644 --- a/sysdeps/s390/configure +++ b/sysdeps/s390/configure @@ -177,5 +177,41 @@ then fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for S390 z196 zarch instruction support as default" >&5 +$as_echo_n "checking for S390 z196 zarch instruction support as default... " >&6; } +if ${libc_cv_asm_s390_min_z196_zarch+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat > conftest.c <<\EOF +float testinsn (double e) +{ + float d; + __asm__ ("ledbra %0,5,%1,4" : "=f" (d) : "f" (e) ); + return d; +} +EOF +if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS --shared conftest.c + -o conftest.o &> /dev/null' + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 + (eval $ac_try) 2>&5 + ac_status=$? + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; } ; +then + libc_cv_asm_s390_min_z196_zarch=yes +else + libc_cv_asm_s390_min_z196_zarch=no +fi +rm -f conftest* +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_asm_s390_min_z196_zarch" >&5 +$as_echo "$libc_cv_asm_s390_min_z196_zarch" >&6; } + +if test "$libc_cv_asm_s390_min_z196_zarch" = yes ; +then + $as_echo "#define HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT 1" >>confdefs.h + +fi + test -n "$critic_missing" && as_fn_error $? " *** $critic_missing" "$LINENO" 5 diff --git a/sysdeps/s390/configure.ac b/sysdeps/s390/configure.ac index 1db6d84..8a782e7 100644 --- a/sysdeps/s390/configure.ac +++ b/sysdeps/s390/configure.ac @@ -86,5 +86,31 @@ then AC_DEFINE(HAVE_S390_VX_GCC_SUPPORT) fi +AC_CACHE_CHECK(for S390 z196 zarch instruction support as default, + libc_cv_asm_s390_min_z196_zarch, [dnl +cat > conftest.c <<\EOF +float testinsn (double e) +{ + float d; + __asm__ ("ledbra %0,5,%1,4" : "=f" (d) : "f" (e) ); + return d; +} +EOF +dnl +dnl test, if assembler supports S390 z196 zarch instructions as default +if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS --shared conftest.c + -o conftest.o &> /dev/null]) ; +then + libc_cv_asm_s390_min_z196_zarch=yes +else + libc_cv_asm_s390_min_z196_zarch=no +fi +rm -f conftest* ]) + +if test "$libc_cv_asm_s390_min_z196_zarch" = yes ; +then + AC_DEFINE(HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT) +fi + test -n "$critic_missing" && AC_MSG_ERROR([ *** $critic_missing]) diff --git a/sysdeps/s390/fpu/fraiseexcpt.c b/sysdeps/s390/fpu/fraiseexcpt.c index 92a1a7d..ac6dfe7 100644 --- a/sysdeps/s390/fpu/fraiseexcpt.c +++ b/sysdeps/s390/fpu/fraiseexcpt.c @@ -35,6 +35,23 @@ fexceptadd (float d, float e) __asm__ __volatile__ ("aebr %0,%1" : : "f" (d), "f" (e) ); } +#ifdef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT +static __inline__ void +fexceptround (double e) +{ + float d; + /* Load rounded from double to float with M3 = round toward 0, M4 = Suppress + IEEE-inexact exception. + In case of e=0x1p128 and the overflow-mask bit is zero, only the + IEEE-overflow flag is set. If overflow-mask bit is one, DXC field is set to + 0x20 "IEEE overflow, exact". + In case of e=0x1p-150 and the underflow-mask bit is zero, only the + IEEE-underflow flag is set. If underflow-mask bit is one, DXC field is set + to 0x10 "IEEE underflow, exact". + This instruction is available with a zarch machine >= z196. */ + __asm__ __volatile__ ("ledbra %0,5,%1,4" : "=f" (d) : "f" (e) ); +} +#endif int __feraiseexcept (int excepts) @@ -54,13 +71,29 @@ __feraiseexcept (int excepts) /* Next: overflow. */ if (FE_OVERFLOW & excepts) - /* I don't think we can do the same trick as intel so we will have - to live with inexact coming also. */ - fexceptadd (FLT_MAX, 1.0e32); + { +#ifdef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT + fexceptround (0x1p128); +#else + /* If overflow-mask bit is zero, both IEEE-overflow and IEEE-inexact flags + are set. If overflow-mask bit is one, DXC field is set to 0x2C "IEEE + overflow, inexact and incremented". */ + fexceptadd (FLT_MAX, 1.0e32); +#endif + } /* Next: underflow. */ if (FE_UNDERFLOW & excepts) - fexceptdiv (FLT_MIN, 3.0); + { +#ifdef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT + fexceptround (0x1p-150); +#else + /* If underflow-mask bit is zero, both IEEE-underflow and IEEE-inexact + flags are set. If underflow-mask bit is one, DXC field is set to 0x1C + "IEEE underflow, inexact and incremented". */ + fexceptdiv (FLT_MIN, 3.0); +#endif + } /* Last: inexact. */ if (FE_INEXACT & excepts) diff --git a/sysdeps/s390/fpu/fsetexcptflg.c b/sysdeps/s390/fpu/fsetexcptflg.c index 25ade85..56a52c6 100644 --- a/sysdeps/s390/fpu/fsetexcptflg.c +++ b/sysdeps/s390/fpu/fsetexcptflg.c @@ -45,8 +45,7 @@ fesetexceptflag (const fexcept_t *flagp, int excepts) & newexcepts; /* Store the new status word (along with the rest of the environment. - Possibly new exceptions are set but they won't get executed unless - the next floating-point instruction. */ + Possibly new exceptions are set but they won't get executed. */ _FPU_SETCW (temp); /* Success. */