diff mbox series

powerpc: Fix __fesetround_inline_nocheck on POWER9+ (BZ 31682)

Message ID 20240430124011.12408-1-adhemerval.zanella@linaro.org
State New
Headers show
Series powerpc: Fix __fesetround_inline_nocheck on POWER9+ (BZ 31682) | expand

Commit Message

Adhemerval Zanella Netto April 30, 2024, 12:40 p.m. UTC
The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
__fesetround_inline_nocheck implementation to use mffscrni
(through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
ceil/floor/trunc, the function is supposed to not change the
Floating-Point Inexact Exception Enable bit, however mffscrni
clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi.

This patch fixes by reverting the optimization for the
__fesetround_inline_nocheck.

Checked on powerpc-linux-gnu.
---
 sysdeps/powerpc/fpu/fenv_libc.h | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

Comments

Paul E. Murphy April 30, 2024, 6:13 p.m. UTC | #1
On 4/30/24 7:40 AM, Adhemerval Zanella wrote:
> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
> __fesetround_inline_nocheck implementation to use mffscrni
> (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
> ceil/floor/trunc, the function is supposed to not change the
> Floating-Point Inexact Exception Enable bit, however mffscrni
> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi.

I don't think that explanation is correct. mffscrni should not alter the 
exception enable bits. It copies them into FRT, but does not clear them.

 From ISA 3.1 description of mffscrni:

The contents of the control bits in the FPSCR, that is,
bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN),
are placed into the corresponding bits in register FRT. All
other bits in register FRT are set to 0.

The contents of bits 62:63 of the FPSCR (RN) are set to
the value of RM.
Adhemerval Zanella Netto April 30, 2024, 6:34 p.m. UTC | #2
On 30/04/24 15:13, Paul E Murphy wrote:
> 
> 
> On 4/30/24 7:40 AM, Adhemerval Zanella wrote:
>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
>> __fesetround_inline_nocheck implementation to use mffscrni
>> (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
>> ceil/floor/trunc, the function is supposed to not change the
>> Floating-Point Inexact Exception Enable bit, however mffscrni
>> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi.
> 
> I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them.

Yeah, I forgot to add that mffscrni in this context clears because
there is no easy way to mask out the current bits 56:63 since only
the rounding bit is provided.  So maybe:

  The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
  __fesetround_inline_nocheck implementation to use mffscrni
  (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
  ceil/floor/trunc, the function is supposed to not change the
  Floating-Point Inexact Exception Enable bit, however mffscrni
  usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN),
  since only the rounding mode is provided.

> 
> From ISA 3.1 description of mffscrni:
> 
> The contents of the control bits in the FPSCR, that is,
> bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN),
> are placed into the corresponding bits in register FRT. All
> other bits in register FRT are set to 0.
> 
> The contents of bits 62:63 of the FPSCR (RN) are set to
> the value of RM.
Paul E. Murphy April 30, 2024, 9:58 p.m. UTC | #3
On 4/30/24 1:34 PM, Adhemerval Zanella Netto wrote:
> 
> 
> On 30/04/24 15:13, Paul E Murphy wrote:
>>
>>
>> On 4/30/24 7:40 AM, Adhemerval Zanella wrote:
>>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
>>> __fesetround_inline_nocheck implementation to use mffscrni
>>> (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
>>> ceil/floor/trunc, the function is supposed to not change the
>>> Floating-Point Inexact Exception Enable bit, however mffscrni
>>> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi.
>>
>> I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them.
> 
> Yeah, I forgot to add that mffscrni in this context clears because
> there is no easy way to mask out the current bits 56:63 since only
> the rounding bit is provided.  So maybe:
> 
>    The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
>    __fesetround_inline_nocheck implementation to use mffscrni
>    (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
>    ceil/floor/trunc, the function is supposed to not change the
>    Floating-Point Inexact Exception Enable bit, however mffscrni
>    usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN),
>    since only the rounding mode is provided.

I think the comment has mtfsfi and mffscrni swapped.  The usage of 
mtfsfi clears bits 60 (XE) and 61 (NI) of the fpscr.  mffscrni does not 
alter any fpscr bits besides RN.

Should __fesetround_inline_nocheck be removed in favor of using 
__fesetround_inline?  The description of __fesetround_inline_nocheck is 
confusing. It fails to mention that XE and NI are cleared, as implied by 
the usage of mtfsfi.
Adhemerval Zanella Netto May 1, 2024, 2:01 p.m. UTC | #4
On 30/04/24 18:58, Paul E Murphy wrote:
> 
> 
> On 4/30/24 1:34 PM, Adhemerval Zanella Netto wrote:
>>
>>
>> On 30/04/24 15:13, Paul E Murphy wrote:
>>>
>>>
>>> On 4/30/24 7:40 AM, Adhemerval Zanella wrote:
>>>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
>>>> __fesetround_inline_nocheck implementation to use mffscrni
>>>> (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
>>>> ceil/floor/trunc, the function is supposed to not change the
>>>> Floating-Point Inexact Exception Enable bit, however mffscrni
>>>> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi.
>>>
>>> I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them.
>>
>> Yeah, I forgot to add that mffscrni in this context clears because
>> there is no easy way to mask out the current bits 56:63 since only
>> the rounding bit is provided.  So maybe:
>>
>>    The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
>>    __fesetround_inline_nocheck implementation to use mffscrni
>>    (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
>>    ceil/floor/trunc, the function is supposed to not change the
>>    Floating-Point Inexact Exception Enable bit, however mffscrni
>>    usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN),
>>    since only the rounding mode is provided.
> 
> I think the comment has mtfsfi and mffscrni swapped.  The usage of mtfsfi clears bits 60 (XE) and 61 (NI) of the fpscr.  mffscrni does not alter any fpscr bits besides RN.
> 
> Should __fesetround_inline_nocheck be removed in favor of using __fesetround_inline?  The description of __fesetround_inline_nocheck is confusing. It fails to mention that XE and NI are cleared, as implied by the usage of mtfsfi.
The main issue is the use of __fe_mffscrn on POWER9, below it is a striped
testcase from the generic ceil.  If you run by building against POWER8:

$ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power8 -D_GNU_SOURCE test.c -o test-pwr8 -lm
$ ./test-pwr8
0000000000000000000000000000000000000000000000000000000000001000
0000000000000000000000000000000000000000000000000000000000000010

Now with POWER9:
$ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power9 -D_GNU_SOURCE test.c -o test-pwr9 -lm
$ ./test-pwr9
0000000000000000000000000000000000000000000000000000000000001000
0000000000000000000000000000000000000000000000000000000000001010
Floating point exception

You can see that for POWER8, the XE bits it cleared as expected by
__fesetround_inline_nocheck (because ceil/floor/trunc should not trigger 
any floating-point exception), while for POWER9, as you said, only the 
rounding bits are changed.

---
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
#include <fenv.h>
#include <math.h>

typedef union
{
  fenv_t fenv;
  unsigned long long l;
} fenv_union_t;

#ifdef __SET_FPSCR_RN_RETURNS_FPSCR__
#define __fe_mffscrn(rn)  __builtin_set_fpscr_rn (rn)
#else
#define __fe_mffscrn(rn)                                                \
  ({register fenv_union_t __fr;                                         \
    if (__builtin_constant_p (rn))                                      \
      __asm__ __volatile__ (                                            \
        ".machine push; .machine \"power9\"; mffscrni %0,%1; .machine pop" \
        : "=f" (__fr.fenv) : "n" (rn));                                 \
    else                                                                \
    {                                                                   \
      __fr.l = (rn);                                                    \
      __asm__ __volatile__ (                                            \
        ".machine push; .machine \"power9\"; mffscrn %0,%1; .machine pop" \
        : "=f" (__fr.fenv) : "f" (__fr.fenv));                          \
    }                                                                   \
    __fr.fenv;                                                          \
  })
#endif

static void
print_fpscr (void)
{
  fenv_union_t fe = { .fenv = __builtin_mffs () };
  printf ("%064b\n", fe.l);
}

static __always_inline void
__fesetround_inline_nocheck (const int round)
{
#ifdef _ARCH_PWR9
  __fe_mffscrn (round);
#else
  asm volatile ("mtfsfi 7,%0" : : "n" (round));
#endif
}

enum round_mode { CEIL };

#define fegetenv_register() __builtin_mffs()

static inline fenv_t
set_fenv_mode (enum round_mode mode)
{
  fenv_t fe = fegetenv_register ();
  __fesetround_inline_nocheck (FE_UPWARD);
  return fe;
}

static inline void
reset_fenv_mode (fenv_t fe, enum round_mode mode)
{
  __builtin_mtfsf (0xff, fe);
}

static inline double
round_to_integer_double (enum round_mode mode, double x)
{
  double r = x;

  /* Save current FPU rounding mode and inexact state.  */
  fenv_t fe = set_fenv_mode (mode);
  print_fpscr ();
  if (x > 0.0)
    {
      r += 0x1p+52;
      r -= 0x1p+52;
      r = fabs (r);
    }
  else if (x < 0.0)
    {
      r -= 0x1p+52;
      r += 0x1p+52;
      r = -fabs (r);
    }
  reset_fenv_mode (fe, mode);

  return r;
}

double
__ceil (double x)
{
  return round_to_integer_double (CEIL, x);
}

int main (int argc, char *argv[])
{
  assert (feenableexcept (FE_INEXACT) != -1);

  volatile double a, b __attribute__ ((unused));
  a = 1.5;
  b = __ceil (a);

  return 0;
}
---
Paul E. Murphy May 3, 2024, 6:30 p.m. UTC | #5
On 5/1/24 9:01 AM, Adhemerval Zanella Netto wrote:
> 
> 
> On 30/04/24 18:58, Paul E Murphy wrote:
>>
>>
>> On 4/30/24 1:34 PM, Adhemerval Zanella Netto wrote:
>>>
>>>
>>> On 30/04/24 15:13, Paul E Murphy wrote:
>>>>
>>>>
>>>> On 4/30/24 7:40 AM, Adhemerval Zanella wrote:
>>>>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
>>>>> __fesetround_inline_nocheck implementation to use mffscrni
>>>>> (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
>>>>> ceil/floor/trunc, the function is supposed to not change the
>>>>> Floating-Point Inexact Exception Enable bit, however mffscrni
>>>>> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi.
>>>>
>>>> I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them.
>>>
>>> Yeah, I forgot to add that mffscrni in this context clears because
>>> there is no easy way to mask out the current bits 56:63 since only
>>> the rounding bit is provided.  So maybe:
>>>
>>>     The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
>>>     __fesetround_inline_nocheck implementation to use mffscrni
>>>     (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
>>>     ceil/floor/trunc, the function is supposed to not change the
>>>     Floating-Point Inexact Exception Enable bit, however mffscrni
>>>     usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN),
>>>     since only the rounding mode is provided.
>>
>> I think the comment has mtfsfi and mffscrni swapped.  The usage of mtfsfi clears bits 60 (XE) and 61 (NI) of the fpscr.  mffscrni does not alter any fpscr bits besides RN.
>>
>> Should __fesetround_inline_nocheck be removed in favor of using __fesetround_inline?  The description of __fesetround_inline_nocheck is confusing. It fails to mention that XE and NI are cleared, as implied by the usage of mtfsfi.
> The main issue is the use of __fe_mffscrn on POWER9, below it is a striped
> testcase from the generic ceil.  If you run by building against POWER8:
> 
> $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power8 -D_GNU_SOURCE test.c -o test-pwr8 -lm
> $ ./test-pwr8
> 0000000000000000000000000000000000000000000000000000000000001000
> 0000000000000000000000000000000000000000000000000000000000000010
> 
> Now with POWER9:
> $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power9 -D_GNU_SOURCE test.c -o test-pwr9 -lm
> $ ./test-pwr9
> 0000000000000000000000000000000000000000000000000000000000001000
> 0000000000000000000000000000000000000000000000000000000000001010
> Floating point exception
> 
> You can see that for POWER8, the XE bits it cleared as expected by
> __fesetround_inline_nocheck (because ceil/floor/trunc should not trigger
> any floating-point exception), while for POWER9, as you said, only the
> rounding bits are changed.

The commit message is still hard to digest.  The caller expects the 
inexact exception to be disable upon return.  mffscrni does not change 
any exception enable bits.  Can you also update the description (and 
maybe name) of the function to explicitly state this?  Likewise, note 
that we expect the NI bit to be 0 always.

Otherwise, this patch LGTM.
Adhemerval Zanella Netto May 6, 2024, 5:08 p.m. UTC | #6
On 03/05/24 15:30, Paul E Murphy wrote:
> 
> 
> On 5/1/24 9:01 AM, Adhemerval Zanella Netto wrote:
>>
>>
>> On 30/04/24 18:58, Paul E Murphy wrote:
>>>
>>>
>>> On 4/30/24 1:34 PM, Adhemerval Zanella Netto wrote:
>>>>
>>>>
>>>> On 30/04/24 15:13, Paul E Murphy wrote:
>>>>>
>>>>>
>>>>> On 4/30/24 7:40 AM, Adhemerval Zanella wrote:
>>>>>> The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
>>>>>> __fesetround_inline_nocheck implementation to use mffscrni
>>>>>> (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
>>>>>> ceil/floor/trunc, the function is supposed to not change the
>>>>>> Floating-Point Inexact Exception Enable bit, however mffscrni
>>>>>> clear bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), different than mtfsfi.
>>>>>
>>>>> I don't think that explanation is correct. mffscrni should not alter the exception enable bits. It copies them into FRT, but does not clear them.
>>>>
>>>> Yeah, I forgot to add that mffscrni in this context clears because
>>>> there is no easy way to mask out the current bits 56:63 since only
>>>> the rounding bit is provided.  So maybe:
>>>>
>>>>     The e68b1151f7460d5fa88c3a567c13f66052da79a7 commit changed the
>>>>     __fesetround_inline_nocheck implementation to use mffscrni
>>>>     (through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
>>>>     ceil/floor/trunc, the function is supposed to not change the
>>>>     Floating-Point Inexact Exception Enable bit, however mffscrni
>>>>     usage always clear the the bits 56:63 (VE, OE, UE, ZE, XE, NI, RN),
>>>>     since only the rounding mode is provided.
>>>
>>> I think the comment has mtfsfi and mffscrni swapped.  The usage of mtfsfi clears bits 60 (XE) and 61 (NI) of the fpscr.  mffscrni does not alter any fpscr bits besides RN.
>>>
>>> Should __fesetround_inline_nocheck be removed in favor of using __fesetround_inline?  The description of __fesetround_inline_nocheck is confusing. It fails to mention that XE and NI are cleared, as implied by the usage of mtfsfi.
>> The main issue is the use of __fe_mffscrn on POWER9, below it is a striped
>> testcase from the generic ceil.  If you run by building against POWER8:
>>
>> $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power8 -D_GNU_SOURCE test.c -o test-pwr8 -lm
>> $ ./test-pwr8
>> 0000000000000000000000000000000000000000000000000000000000001000
>> 0000000000000000000000000000000000000000000000000000000000000010
>>
>> Now with POWER9:
>> $ powerpc64le-glibc-linux-gnu-gcc -O2 -g -std=gnu11 -mcpu=power9 -D_GNU_SOURCE test.c -o test-pwr9 -lm
>> $ ./test-pwr9
>> 0000000000000000000000000000000000000000000000000000000000001000
>> 0000000000000000000000000000000000000000000000000000000000001010
>> Floating point exception
>>
>> You can see that for POWER8, the XE bits it cleared as expected by
>> __fesetround_inline_nocheck (because ceil/floor/trunc should not trigger
>> any floating-point exception), while for POWER9, as you said, only the
>> rounding bits are changed.
> 
> The commit message is still hard to digest.  The caller expects the inexact exception to be disable upon return.  mffscrni does not change any exception enable bits.  Can you also update the description (and maybe name) of the function to explicitly state this?  Likewise, note that we expect the NI bit to be 0 always.

I give you that both the name and its description is ideal, I will update
the patch.

> 
> Otherwise, this patch LGTM.
diff mbox series

Patch

diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index f9167056a8..afcf8d2e61 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -182,19 +182,13 @@  __fesetround_inline (int round)
   return 0;
 }
 
-/* Same as __fesetround_inline, however without runtime check to use DFP
-   mtfsfi syntax (as relax_fenv_state) or if round value is valid.  */
+/* Same as __fesetround_inline, but it does not change the floating-point
+   exceptions bits, and without runtime check to use DFP mtfsfi syntax
+   (as relax_fenv_state), or if round value is valid.  */
 static inline void
 __fesetround_inline_nocheck (const int round)
 {
-#ifdef _ARCH_PWR9
-  __fe_mffscrn (round);
-#else
-  if (__glibc_likely (GLRO(dl_hwcap2) & PPC_FEATURE2_ARCH_3_00))
-    __fe_mffscrn (round);
-  else
-    asm volatile ("mtfsfi 7,%0" : : "n" (round));
-#endif
+  asm volatile ("mtfsfi 7,%0" : : "n" (round));
 }
 
 #define FPSCR_MASK(bit) (1 << (31 - (bit)))