Message ID | 54627990.1060805@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On Tue, 11 Nov 2014, Adhemerval Zanella wrote: > + mtfsb0 4*cr7+lt /* Disable FE_INEXACT exception */ Disable trapping? If you disable trapping, you need to restore the previous state of that bit before returning. > + fdiv fp0,fp1,fp2 /* fp0 = -trunc (fp1 / fp2) */ > + fneg fp0,fp0 > + friz fp0,fp0 > + fmadd fp2,fp0,fp2,fp1 /* fp2 = x - (fp0) * y */ Are you sure this is correct when the integer quotient is not exactly representable in IEEE double? I'd expect you to need a loop in the case of large exponent differences. > + fcpsgn fp1,fp1,fp2 > + mtfsb0 4*cr1+eq /* Clear any FE_INEXACT exception */ Clear any FE_INEXACT bit, whether or not set before the function? It's not correct for <math.h> functions to clear exception bits that were already set - only bits they set, if previously clear. (See bug 15491 for such a bug in x86 nearbyint.)
On 11-11-2014 19:28, Joseph Myers wrote: > On Tue, 11 Nov 2014, Adhemerval Zanella wrote: > >> + mtfsb0 4*cr7+lt /* Disable FE_INEXACT exception */ > Disable trapping? If you disable trapping, you need to restore the > previous state of that bit before returning. > >> + fdiv fp0,fp1,fp2 /* fp0 = -trunc (fp1 / fp2) */ >> + fneg fp0,fp0 >> + friz fp0,fp0 >> + fmadd fp2,fp0,fp2,fp1 /* fp2 = x - (fp0) * y */ > Are you sure this is correct when the integer quotient is not exactly > representable in IEEE double? I'd expect you to need a loop in the case > of large exponent differences. Indeed I overlook it, as I am seeing now with a more extensive testing. Checking now, it seems using a FP division operation in the loop to correct it not the best strategy to achieve better performance. >> + fcpsgn fp1,fp1,fp2 >> + mtfsb0 4*cr1+eq /* Clear any FE_INEXACT exception */ > Clear any FE_INEXACT bit, whether or not set before the function? It's > not correct for <math.h> functions to clear exception bits that were > already set - only bits they set, if previously clear. (See bug 15491 for > such a bug in x86 nearbyint.) >
On Wed, 12 Nov 2014, Adhemerval Zanella wrote: > Indeed I overlook it, as I am seeing now with a more extensive testing. > Checking now, it seems using a FP division operation in the loop to > correct it not the best strategy to achieve better performance. And there are other problems with the division I didn't notice previously. If the exponent difference is large enough, the division may overflow. And you're assuming the result of (round to next integer toward 0 (divide and round in current rounding mode)) is the same as (divide in infinite precision, round once to an integer toward 0) - but depending on the arguments and rounding mode, the result of the division may be rounded away from 0 to an integer with magnitude greater than the desired result.
On 12-11-2014 14:19, Joseph Myers wrote: > On Wed, 12 Nov 2014, Adhemerval Zanella wrote: > >> Indeed I overlook it, as I am seeing now with a more extensive testing. >> Checking now, it seems using a FP division operation in the loop to >> correct it not the best strategy to achieve better performance. > And there are other problems with the division I didn't notice previously. > If the exponent difference is large enough, the division may overflow. > And you're assuming the result of (round to next integer toward 0 (divide > and round in current rounding mode)) is the same as (divide in infinite > precision, round once to an integer toward 0) - but depending on the > arguments and rounding mode, the result of the division may be rounded > away from 0 to an integer with magnitude greater than the desired result. > Yes, I noted that my naive approach is far from correct. I'm dropping this patch, thanks for the review.
diff --git a/sysdeps/powerpc/powerpc64/power8/fpu/e_fmod.S b/sysdeps/powerpc/powerpc64/power8/fpu/e_fmod.S new file mode 100644 index 0000000..e9b35a2 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/power8/fpu/e_fmod.S @@ -0,0 +1,46 @@ +/* Finite fmod optimization - PowerPC64/POWER8 version. + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <sysdep.h> +#include <math_ldbl_opt.h> + +#define MFVSRD_R3_V2 .long 0x7c430066 /* mfvsrd r3,vs1 */ + +/* double [fp1] __ieee754_fmod (double [fp1] x, double [fp2] y) */ + + .machine power8 +ENTRY(__ieee754_fmod) + /* First check if 'y' is InF and return 'x' if it is the case. */ + MFVSRD_R3_V2 + lis r9,0x7ff0 /* r9 = 0x7ff0 */ + rldicl r10,r3,0,1 /* r10 = r3 & (0x8000000000000000) */ + sldi r9,r9,32 /* r9 = r9 << 52 */ + cmpd cr7,r10,r9 /* fp1 & 0x7ff0000000000000 ? */ + beqlr cr7 + + mtfsb0 4*cr7+lt /* Disable FE_INEXACT exception */ + fdiv fp0,fp1,fp2 /* fp0 = -trunc (fp1 / fp2) */ + fneg fp0,fp0 + friz fp0,fp0 + fmadd fp2,fp0,fp2,fp1 /* fp2 = x - (fp0) * y */ + fcpsgn fp1,fp1,fp2 + mtfsb0 4*cr1+eq /* Clear any FE_INEXACT exception */ + blr +END (__ieee754_fmod) + +strong_alias (__ieee754_fmod, __fmod_finite) diff --git a/sysdeps/powerpc/powerpc64/power8/fpu/e_fmodf.S b/sysdeps/powerpc/powerpc64/power8/fpu/e_fmodf.S new file mode 100644 index 0000000..7e16374 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/power8/fpu/e_fmodf.S @@ -0,0 +1,46 @@ +/* Finite fmodf optimization - PowerPC64/POWER8 version. + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <sysdep.h> +#include <math_ldbl_opt.h> + +#define MFVSRD_R3_V2 .long 0x7c430066 /* mfvsrd r3,vs1 */ + +/* double [fp1] __ieee754_fmod (double [fp1] x, double [fp2] y) */ + + .machine power8 +ENTRY(__ieee754_fmodf) + /* First check if 'y' is InF and return 'x' if it is the case. */ + MFVSRD_R3_V2 + lis r9,0x7ff0 /* r9 = 0x7ff0 */ + rldicl r10,r3,0,1 /* r10 = r3 & (0x8000000000000000) */ + sldi r9,r9,32 /* r9 = r9 << 52 */ + cmpd cr7,r10,r9 /* fp1 & 0x7ff0000000000000 ? */ + beqlr cr7 + + mtfsb0 4*cr7+lt /* Disable FE_INEXACT exception */ + fdivs fp0,fp1,fp2 /* fp0 = -trunc (fp1 / fp2) */ + fneg fp0,fp0 + friz fp0,fp0 + fmadds fp2,fp0,fp2,fp1 /* fp2 = x - (fp0) * y */ + fcpsgn fp1,fp1,fp2 + mtfsb0 4*cr1+eq /* Clear any FE_INEXACT exception */ + blr +END (__ieee754_fmodf) + +strong_alias (__ieee754_fmodf, __fmodf_finite)