Message ID | 20240430124011.12408-1-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | powerpc: Fix __fesetround_inline_nocheck on POWER9+ (BZ 31682) | expand |
On 4/30/24 7:40 AM, Adhemerval Zanella wrote: > The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the > __fesetround_inline_nocheck implementation to use mffscrni > (through __fe_mffscrn) instead of mtfsfi. For generic powerpc > ceil/floor/trunc, the function is supposed to not change the > Floating-Point Inexact Exception Enable bit, however mffscrni > clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi. I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them. From ISA 3.1 description of mffscrni: The contents of the control bits in the FPSCR, that is, bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), are placed into the corresponding bits in register FRT. All other bits in register FRT are set to 0. The contents of bits 62:63 of the FPSCR (RN) are set to the value of RM.
On 30/04/24 15:13, Paul E Murphy wrote: > > > On 4/30/24 7:40 AM, Adhemerval Zanella wrote: >> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the >> __fesetround_inline_nocheck implementation to use mffscrni >> (through __fe_mffscrn) instead of mtfsfi. For generic powerpc >> ceil/floor/trunc, the function is supposed to not change the >> Floating-Point Inexact Exception Enable bit, however mffscrni >> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi. > > I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them. Yeah, I forgot to add that mffscrni in this context clears because there is no easy way to mask out the current bits 56:63 since only the rounding bit is provided. So maybe: The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the __fesetround_inline_nocheck implementation to use mffscrni (through __fe_mffscrn) instead of mtfsfi. For generic powerpc ceil/floor/trunc, the function is supposed to not change the Floating-Point Inexact Exception Enable bit, however mffscrni usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), since only the rounding mode is provided. > > From ISA 3.1 description of mffscrni: > > The contents of the control bits in the FPSCR, that is, > bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), > are placed into the corresponding bits in register FRT. All > other bits in register FRT are set to 0. > > The contents of bits 62:63 of the FPSCR (RN) are set to > the value of RM.
On 4/30/24 1:34 PM, Adhemerval Zanella Netto wrote: > > > On 30/04/24 15:13, Paul E Murphy wrote: >> >> >> On 4/30/24 7:40 AM, Adhemerval Zanella wrote: >>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the >>> __fesetround_inline_nocheck implementation to use mffscrni >>> (through __fe_mffscrn) instead of mtfsfi. For generic powerpc >>> ceil/floor/trunc, the function is supposed to not change the >>> Floating-Point Inexact Exception Enable bit, however mffscrni >>> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi. >> >> I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them. > > Yeah, I forgot to add that mffscrni in this context clears because > there is no easy way to mask out the current bits 56:63 since only > the rounding bit is provided. So maybe: > > The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the > __fesetround_inline_nocheck implementation to use mffscrni > (through __fe_mffscrn) instead of mtfsfi. For generic powerpc > ceil/floor/trunc, the function is supposed to not change the > Floating-Point Inexact Exception Enable bit, however mffscrni > usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), > since only the rounding mode is provided. I think the comment has mtfsfi and mffscrni swapped. The usage of mtfsfi clears bits 60 (XE) and 61 (NI) of the fpscr. mffscrni does not alter any fpscr bits besides RN. Should __fesetround_inline_nocheck be removed in favor of using __fesetround_inline? The description of __fesetround_inline_nocheck is confusing. It fails to mention that XE and NI are cleared, as implied by the usage of mtfsfi.
On 30/04/24 18:58, Paul E Murphy wrote: > > > On 4/30/24 1:34 PM, Adhemerval Zanella Netto wrote: >> >> >> On 30/04/24 15:13, Paul E Murphy wrote: >>> >>> >>> On 4/30/24 7:40 AM, Adhemerval Zanella wrote: >>>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the >>>> __fesetround_inline_nocheck implementation to use mffscrni >>>> (through __fe_mffscrn) instead of mtfsfi. For generic powerpc >>>> ceil/floor/trunc, the function is supposed to not change the >>>> Floating-Point Inexact Exception Enable bit, however mffscrni >>>> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi. >>> >>> I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them. >> >> Yeah, I forgot to add that mffscrni in this context clears because >> there is no easy way to mask out the current bits 56:63 since only >> the rounding bit is provided. So maybe: >> >> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the >> __fesetround_inline_nocheck implementation to use mffscrni >> (through __fe_mffscrn) instead of mtfsfi. For generic powerpc >> ceil/floor/trunc, the function is supposed to not change the >> Floating-Point Inexact Exception Enable bit, however mffscrni >> usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), >> since only the rounding mode is provided. > > I think the comment has mtfsfi and mffscrni swapped. The usage of mtfsfi clears bits 60 (XE) and 61 (NI) of the fpscr. mffscrni does not alter any fpscr bits besides RN. > > Should __fesetround_inline_nocheck be removed in favor of using __fesetround_inline? The description of __fesetround_inline_nocheck is confusing. It fails to mention that XE and NI are cleared, as implied by the usage of mtfsfi. The main issue is the use of __fe_mffscrn on POWER9, below it is a striped testcase from the generic ceil. If you run by building against POWER8: $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power8 -D_GNU_SOURCE test.c -o test-pwr8 -lm $ ./test-pwr8 0000000000000000000000000000000000000000000000000000000000001000 0000000000000000000000000000000000000000000000000000000000000010 Now with POWER9: $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power9 -D_GNU_SOURCE test.c -o test-pwr9 -lm $ ./test-pwr9 0000000000000000000000000000000000000000000000000000000000001000 0000000000000000000000000000000000000000000000000000000000001010 Floating point exception You can see that for POWER8, the XE bits it cleared as expected by __fesetround_inline_nocheck (because ceil/floor/trunc should not trigger any floating-point exception), while for POWER9, as you said, only the rounding bits are changed. --- #include <assert.h> #include <stdio.h> #include <stdint.h> #include <fenv.h> #include <math.h> typedef union { fenv_t fenv; unsigned long long l; } fenv_union_t; #ifdef __SET_FPSCR_RN_RETURNS_FPSCR__ #define __fe_mffscrn(rn) __builtin_set_fpscr_rn (rn) #else #define __fe_mffscrn(rn) \ ({register fenv_union_t __fr; \ if (__builtin_constant_p (rn)) \ __asm__ __volatile__ ( \ ".machine push; .machine \"power9\"; mffscrni %0,%1; .machine pop" \ : "=f" (__fr.fenv) : "n" (rn)); \ else \ { \ __fr.l = (rn); \ __asm__ __volatile__ ( \ ".machine push; .machine \"power9\"; mffscrn %0,%1; .machine pop" \ : "=f" (__fr.fenv) : "f" (__fr.fenv)); \ } \ __fr.fenv; \ }) #endif static void print_fpscr (void) { fenv_union_t fe = { .fenv = __builtin_mffs () }; printf ("%064b\n", fe.l); } static __always_inline void __fesetround_inline_nocheck (const int round) { #ifdef _ARCH_PWR9 __fe_mffscrn (round); #else asm volatile ("mtfsfi 7,%0" : : "n" (round)); #endif } enum round_mode { CEIL }; #define fegetenv_register() __builtin_mffs() static inline fenv_t set_fenv_mode (enum round_mode mode) { fenv_t fe = fegetenv_register (); __fesetround_inline_nocheck (FE_UPWARD); return fe; } static inline void reset_fenv_mode (fenv_t fe, enum round_mode mode) { __builtin_mtfsf (0xff, fe); } static inline double round_to_integer_double (enum round_mode mode, double x) { double r = x; /* Save current FPU rounding mode and inexact state. */ fenv_t fe = set_fenv_mode (mode); print_fpscr (); if (x > 0.0) { r += 0x1p+52; r -= 0x1p+52; r = fabs (r); } else if (x < 0.0) { r -= 0x1p+52; r += 0x1p+52; r = -fabs (r); } reset_fenv_mode (fe, mode); return r; } double __ceil (double x) { return round_to_integer_double (CEIL, x); } int main (int argc, char *argv[]) { assert (feenableexcept (FE_INEXACT) != -1); volatile double a, b __attribute__ ((unused)); a = 1.5; b = __ceil (a); return 0; } ---
On 5/1/24 9:01 AM, Adhemerval Zanella Netto wrote: > > > On 30/04/24 18:58, Paul E Murphy wrote: >> >> >> On 4/30/24 1:34 PM, Adhemerval Zanella Netto wrote: >>> >>> >>> On 30/04/24 15:13, Paul E Murphy wrote: >>>> >>>> >>>> On 4/30/24 7:40 AM, Adhemerval Zanella wrote: >>>>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the >>>>> __fesetround_inline_nocheck implementation to use mffscrni >>>>> (through __fe_mffscrn) instead of mtfsfi. For generic powerpc >>>>> ceil/floor/trunc, the function is supposed to not change the >>>>> Floating-Point Inexact Exception Enable bit, however mffscrni >>>>> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi. >>>> >>>> I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them. >>> >>> Yeah, I forgot to add that mffscrni in this context clears because >>> there is no easy way to mask out the current bits 56:63 since only >>> the rounding bit is provided. So maybe: >>> >>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the >>> __fesetround_inline_nocheck implementation to use mffscrni >>> (through __fe_mffscrn) instead of mtfsfi. For generic powerpc >>> ceil/floor/trunc, the function is supposed to not change the >>> Floating-Point Inexact Exception Enable bit, however mffscrni >>> usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), >>> since only the rounding mode is provided. >> >> I think the comment has mtfsfi and mffscrni swapped. The usage of mtfsfi clears bits 60 (XE) and 61 (NI) of the fpscr. mffscrni does not alter any fpscr bits besides RN. >> >> Should __fesetround_inline_nocheck be removed in favor of using __fesetround_inline? The description of __fesetround_inline_nocheck is confusing. It fails to mention that XE and NI are cleared, as implied by the usage of mtfsfi. > The main issue is the use of __fe_mffscrn on POWER9, below it is a striped > testcase from the generic ceil. If you run by building against POWER8: > > $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power8 -D_GNU_SOURCE test.c -o test-pwr8 -lm > $ ./test-pwr8 > 0000000000000000000000000000000000000000000000000000000000001000 > 0000000000000000000000000000000000000000000000000000000000000010 > > Now with POWER9: > $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power9 -D_GNU_SOURCE test.c -o test-pwr9 -lm > $ ./test-pwr9 > 0000000000000000000000000000000000000000000000000000000000001000 > 0000000000000000000000000000000000000000000000000000000000001010 > Floating point exception > > You can see that for POWER8, the XE bits it cleared as expected by > __fesetround_inline_nocheck (because ceil/floor/trunc should not trigger > any floating-point exception), while for POWER9, as you said, only the > rounding bits are changed. The commit message is still hard to digest. The caller expects the inexact exception to be disable upon return. mffscrni does not change any exception enable bits. Can you also update the description (and maybe name) of the function to explicitly state this? Likewise, note that we expect the NI bit to be 0 always. Otherwise, this patch LGTM.
On 03/05/24 15:30, Paul E Murphy wrote: > > > On 5/1/24 9:01 AM, Adhemerval Zanella Netto wrote: >> >> >> On 30/04/24 18:58, Paul E Murphy wrote: >>> >>> >>> On 4/30/24 1:34 PM, Adhemerval Zanella Netto wrote: >>>> >>>> >>>> On 30/04/24 15:13, Paul E Murphy wrote: >>>>> >>>>> >>>>> On 4/30/24 7:40 AM, Adhemerval Zanella wrote: >>>>>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the >>>>>> __fesetround_inline_nocheck implementation to use mffscrni >>>>>> (through __fe_mffscrn) instead of mtfsfi. For generic powerpc >>>>>> ceil/floor/trunc, the function is supposed to not change the >>>>>> Floating-Point Inexact Exception Enable bit, however mffscrni >>>>>> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi. >>>>> >>>>> I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them. >>>> >>>> Yeah, I forgot to add that mffscrni in this context clears because >>>> there is no easy way to mask out the current bits 56:63 since only >>>> the rounding bit is provided. So maybe: >>>> >>>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the >>>> __fesetround_inline_nocheck implementation to use mffscrni >>>> (through __fe_mffscrn) instead of mtfsfi. For generic powerpc >>>> ceil/floor/trunc, the function is supposed to not change the >>>> Floating-Point Inexact Exception Enable bit, however mffscrni >>>> usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), >>>> since only the rounding mode is provided. >>> >>> I think the comment has mtfsfi and mffscrni swapped. The usage of mtfsfi clears bits 60 (XE) and 61 (NI) of the fpscr. mffscrni does not alter any fpscr bits besides RN. >>> >>> Should __fesetround_inline_nocheck be removed in favor of using __fesetround_inline? The description of __fesetround_inline_nocheck is confusing. It fails to mention that XE and NI are cleared, as implied by the usage of mtfsfi. >> The main issue is the use of __fe_mffscrn on POWER9, below it is a striped >> testcase from the generic ceil. If you run by building against POWER8: >> >> $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power8 -D_GNU_SOURCE test.c -o test-pwr8 -lm >> $ ./test-pwr8 >> 0000000000000000000000000000000000000000000000000000000000001000 >> 0000000000000000000000000000000000000000000000000000000000000010 >> >> Now with POWER9: >> $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power9 -D_GNU_SOURCE test.c -o test-pwr9 -lm >> $ ./test-pwr9 >> 0000000000000000000000000000000000000000000000000000000000001000 >> 0000000000000000000000000000000000000000000000000000000000001010 >> Floating point exception >> >> You can see that for POWER8, the XE bits it cleared as expected by >> __fesetround_inline_nocheck (because ceil/floor/trunc should not trigger >> any floating-point exception), while for POWER9, as you said, only the >> rounding bits are changed. > > The commit message is still hard to digest. The caller expects the inexact exception to be disable upon return. mffscrni does not change any exception enable bits. Can you also update the description (and maybe name) of the function to explicitly state this? Likewise, note that we expect the NI bit to be 0 always. I give you that both the name and its description is ideal, I will update the patch. > > Otherwise, this patch LGTM.
diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h index f9167056a8..afcf8d2e61 100644 --- a/sysdeps/powerpc/fpu/fenv_libc.h +++ b/sysdeps/powerpc/fpu/fenv_libc.h @@ -182,19 +182,13 @@ __fesetround_inline (int round) return 0; } -/* Same as __fesetround_inline, however without runtime check to use DFP - mtfsfi syntax (as relax_fenv_state) or if round value is valid. */ +/* Same as __fesetround_inline, but it does not change the floating-point + exceptions bits, and without runtime check to use DFP mtfsfi syntax + (as relax_fenv_state), or if round value is valid. */ static inline void __fesetround_inline_nocheck (const int round) { -#ifdef _ARCH_PWR9 - __fe_mffscrn (round); -#else - if (__glibc_likely (GLRO(dl_hwcap2) & PPC_FEATURE2_ARCH_3_00)) - __fe_mffscrn (round); - else - asm volatile ("mtfsfi 7,%0" : : "n" (round)); -#endif + asm volatile ("mtfsfi 7,%0" : : "n" (round)); } #define FPSCR_MASK(bit) (1 << (31 - (bit)))