diff mbox series

[RFC] target/ppc: do not silence snan in xscvspdpn

Message ID 20211213121320.947362-1-matheus.ferst@eldorado.org.br
State Superseded
Headers show
Series [RFC] target/ppc: do not silence snan in xscvspdpn | expand

Commit Message

Matheus K. Ferst Dec. 13, 2021, 12:13 p.m. UTC
From: Matheus Ferst <matheus.ferst@eldorado.org.br>

The non-signalling versions of VSX scalar convert to shorter/longer
precision insns doesn't silence SNaNs in the hardware. We are currently
honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
float32_to_float64 that calls parts_float_to_float, which uses
parts_return_nan that finally calls parts_silence_nan if the input is an
SNaN.

To address this problem, this patch adds a new float_status flag
(return_snan) to avoid the call to parts_silence_nan in the
float_class_snan case of parts_return_nan.

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
This behavior was observed in a Power9 and can be reproduced with the
following code:

int main(void)
{
    __uint128_t t, b = 0x7f80000200000000;

    asm("xscvspdpn %x0, %x1\n\t"
        : "=wa" (t)
        : "wa" (b << 64));
    printf("0x%016" PRIx64 "%016" PRIx64 "\n",
           (uint64_t)(t >> 64), (uint64_t)t);

    b = 0x7ff0000000000002;
    asm("xscvdpspn %x0, %x1\n\t"
        : "=wa" (t)
        : "wa" (b << 64));
    printf("0x%016" PRIx64 "%016" PRIx64 "\n",
           (uint64_t)(t >> 64), (uint64_t)t);

    return 0;
}

That results in:
$ ./xscv_non_signaling
xscvspdpn: 0x7ff00000400000000000000000000000
xscvdpspn: 0x7f8000007f8000000000000000000000
$ qemu-ppc64le ./xscv_non_signaling
xscvspdpn: 0x7ff80000400000000000000000000000
xscvdpspn: 0x7f8000007f8000000000000000000000

PowerISA is more descriptive of xscvdpspn wrt SNaN but doesn't say
anything about NaNs in xscvspdpn description. Should we match the
hardware behavior? If so, does it worth adding a new flag in
float_status for a single instruction? We could also handle that in
helper_xscvdpspn, e.g.:

float32 sp = xb >> 32;
float64 dp;

dp = float32_to_float64(xb >> 32, &tstat);
if (float32_classify(sp) == is_snan) {
    dp &= ~(1ULL << 51);
    dp |= (dp & 0x7ffffffffffffULL) == 0;
}

return dp;

but it feels kind hacky.
---
 fpu/softfloat-parts.c.inc      | 10 ++++++----
 fpu/softfloat-specialize.c.inc |  9 +++++++++
 include/fpu/softfloat-types.h  |  1 +
 target/ppc/fpu_helper.c        |  2 ++
 4 files changed, 18 insertions(+), 4 deletions(-)

Comments

Philippe Mathieu-Daudé Dec. 13, 2021, 12:36 p.m. UTC | #1
On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> The non-signalling versions of VSX scalar convert to shorter/longer
> precision insns doesn't silence SNaNs in the hardware. We are currently
> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
> float32_to_float64 that calls parts_float_to_float, which uses
> parts_return_nan that finally calls parts_silence_nan if the input is an
> SNaN.
> 
> To address this problem, this patch adds a new float_status flag
> (return_snan) to avoid the call to parts_silence_nan in the
> float_class_snan case of parts_return_nan.
> 
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> This behavior was observed in a Power9 and can be reproduced with the
> following code:
> 
> int main(void)
> {
>     __uint128_t t, b = 0x7f80000200000000;
> 
>     asm("xscvspdpn %x0, %x1\n\t"
>         : "=wa" (t)
>         : "wa" (b << 64));
>     printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>            (uint64_t)(t >> 64), (uint64_t)t);
> 
>     b = 0x7ff0000000000002;
>     asm("xscvdpspn %x0, %x1\n\t"
>         : "=wa" (t)
>         : "wa" (b << 64));
>     printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>            (uint64_t)(t >> 64), (uint64_t)t);
> 
>     return 0;
> }

Why not add this test in tests/tcg/ppc64le/ ?

> 
> That results in:
> $ ./xscv_non_signaling
> xscvspdpn: 0x7ff00000400000000000000000000000
> xscvdpspn: 0x7f8000007f8000000000000000000000
> $ qemu-ppc64le ./xscv_non_signaling
> xscvspdpn: 0x7ff80000400000000000000000000000
> xscvdpspn: 0x7f8000007f8000000000000000000000
> 
> PowerISA is more descriptive of xscvdpspn wrt SNaN but doesn't say
> anything about NaNs in xscvspdpn description. Should we match the
> hardware behavior? If so, does it worth adding a new flag in
> float_status for a single instruction? We could also handle that in
> helper_xscvdpspn, e.g.:
> 
> float32 sp = xb >> 32;
> float64 dp;
> 
> dp = float32_to_float64(xb >> 32, &tstat);
> if (float32_classify(sp) == is_snan) {
>     dp &= ~(1ULL << 51);
>     dp |= (dp & 0x7ffffffffffffULL) == 0;
> }
> 
> return dp;
> 
> but it feels kind hacky.
Richard Henderson Dec. 13, 2021, 3:46 p.m. UTC | #2
On 12/13/21 4:13 AM, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> The non-signalling versions of VSX scalar convert to shorter/longer
> precision insns doesn't silence SNaNs in the hardware. We are currently
> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
> float32_to_float64 that calls parts_float_to_float, which uses
> parts_return_nan that finally calls parts_silence_nan if the input is an
> SNaN.
> 
> To address this problem, this patch adds a new float_status flag
> (return_snan) to avoid the call to parts_silence_nan in the
> float_class_snan case of parts_return_nan.
> 
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>

I think you shouldn't be using softfloat at all for this, but use the existing 
helper_todouble function.


r~
Matheus K. Ferst Dec. 13, 2021, 8:15 p.m. UTC | #3
On 13/12/2021 09:36, Philippe Mathieu-Daudé wrote:
> On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> The non-signalling versions of VSX scalar convert to shorter/longer
>> precision insns doesn't silence SNaNs in the hardware. We are currently
>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>> float32_to_float64 that calls parts_float_to_float, which uses
>> parts_return_nan that finally calls parts_silence_nan if the input is an
>> SNaN.
>>
>> To address this problem, this patch adds a new float_status flag
>> (return_snan) to avoid the call to parts_silence_nan in the
>> float_class_snan case of parts_return_nan.
>>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>> ---
>> This behavior was observed in a Power9 and can be reproduced with the
>> following code:
>>
>> int main(void)
>> {
>>      __uint128_t t, b = 0x7f80000200000000;
>>
>>      asm("xscvspdpn %x0, %x1\n\t"
>>          : "=wa" (t)
>>          : "wa" (b << 64));
>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>             (uint64_t)(t >> 64), (uint64_t)t);
>>
>>      b = 0x7ff0000000000002;
>>      asm("xscvdpspn %x0, %x1\n\t"
>>          : "=wa" (t)
>>          : "wa" (b << 64));
>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>             (uint64_t)(t >> 64), (uint64_t)t);
>>
>>      return 0;
>> }
> 
> Why not add this test in tests/tcg/ppc64le/ ?

I'll add it in v2. Is it ok to use __uint128_t in tests?

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
Matheus K. Ferst Dec. 13, 2021, 8:15 p.m. UTC | #4
On 13/12/2021 12:46, Richard Henderson wrote:
> On 12/13/21 4:13 AM, matheus.ferst@eldorado.org.br wrote:
>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> The non-signalling versions of VSX scalar convert to shorter/longer
>> precision insns doesn't silence SNaNs in the hardware. We are currently
>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>> float32_to_float64 that calls parts_float_to_float, which uses
>> parts_return_nan that finally calls parts_silence_nan if the input is an
>> SNaN.
>>
>> To address this problem, this patch adds a new float_status flag
>> (return_snan) to avoid the call to parts_silence_nan in the
>> float_class_snan case of parts_return_nan.
>>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> I think you shouldn't be using softfloat at all for this, but use the 
> existing
> helper_todouble function.
> 

Oh, that's better, I'll send a v2.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
Philippe Mathieu-Daudé Dec. 13, 2021, 10:19 p.m. UTC | #5
On 12/13/21 21:15, Matheus K. Ferst wrote:
> On 13/12/2021 09:36, Philippe Mathieu-Daudé wrote:
>> On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
>>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>
>>> The non-signalling versions of VSX scalar convert to shorter/longer
>>> precision insns doesn't silence SNaNs in the hardware. We are currently
>>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>>> float32_to_float64 that calls parts_float_to_float, which uses
>>> parts_return_nan that finally calls parts_silence_nan if the input is an
>>> SNaN.
>>>
>>> To address this problem, this patch adds a new float_status flag
>>> (return_snan) to avoid the call to parts_silence_nan in the
>>> float_class_snan case of parts_return_nan.
>>>
>>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>> ---
>>> This behavior was observed in a Power9 and can be reproduced with the
>>> following code:
>>>
>>> int main(void)
>>> {
>>>      __uint128_t t, b = 0x7f80000200000000;
>>>
>>>      asm("xscvspdpn %x0, %x1\n\t"
>>>          : "=wa" (t)
>>>          : "wa" (b << 64));
>>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>             (uint64_t)(t >> 64), (uint64_t)t);
>>>
>>>      b = 0x7ff0000000000002;
>>>      asm("xscvdpspn %x0, %x1\n\t"
>>>          : "=wa" (t)
>>>          : "wa" (b << 64));
>>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>             (uint64_t)(t >> 64), (uint64_t)t);
>>>
>>>      return 0;
>>> }
>>
>> Why not add this test in tests/tcg/ppc64le/ ?
> 
> I'll add it in v2. Is it ok to use __uint128_t in tests?

What about:

  int main(void)
  {
  #ifndef __SIZEOF_INT128__
      printf("uint128_t not available, skipping...\n");
  #else
      ...
  #endif
      return 0;
  }
Alex Bennée Dec. 15, 2021, 3:55 p.m. UTC | #6
Philippe Mathieu-Daudé <f4bug@amsat.org> writes:

> On 12/13/21 21:15, Matheus K. Ferst wrote:
>> On 13/12/2021 09:36, Philippe Mathieu-Daudé wrote:
>>> On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
>>>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>>
>>>> The non-signalling versions of VSX scalar convert to shorter/longer
>>>> precision insns doesn't silence SNaNs in the hardware. We are currently
>>>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>>>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>>>> float32_to_float64 that calls parts_float_to_float, which uses
>>>> parts_return_nan that finally calls parts_silence_nan if the input is an
>>>> SNaN.
>>>>
>>>> To address this problem, this patch adds a new float_status flag
>>>> (return_snan) to avoid the call to parts_silence_nan in the
>>>> float_class_snan case of parts_return_nan.
>>>>
>>>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>> ---
>>>> This behavior was observed in a Power9 and can be reproduced with the
>>>> following code:
>>>>
>>>> int main(void)
>>>> {
>>>>      __uint128_t t, b = 0x7f80000200000000;
>>>>
>>>>      asm("xscvspdpn %x0, %x1\n\t"
>>>>          : "=wa" (t)
>>>>          : "wa" (b << 64));
>>>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>>             (uint64_t)(t >> 64), (uint64_t)t);
>>>>
>>>>      b = 0x7ff0000000000002;
>>>>      asm("xscvdpspn %x0, %x1\n\t"
>>>>          : "=wa" (t)
>>>>          : "wa" (b << 64));
>>>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>>             (uint64_t)(t >> 64), (uint64_t)t);
>>>>
>>>>      return 0;
>>>> }
>>>
>>> Why not add this test in tests/tcg/ppc64le/ ?
>> 
>> I'll add it in v2. Is it ok to use __uint128_t in tests?
>
> What about:
>
>   int main(void)
>   {
>   #ifndef __SIZEOF_INT128__
>       printf("uint128_t not available, skipping...\n");
>   #else
>       ...
>   #endif
>       return 0;
>   }

We have a tests/tcg/configure.sh which does feature tests although it is
mainly testing for the presence of compiler target flags.  We do this
because while the docker compilers are all pretty modern the user might
be using their own cross compiler.

I'm generally not keen on the tests silently skipping because it gives a
false impression things have been tested. If it is possible to avoid
INT128 shenanigans to load the values into the inline assembler lets do
that, otherwise lets feature test and ifdef a skip-test in the makefile.
Matheus K. Ferst Dec. 15, 2021, 8:01 p.m. UTC | #7
On 15/12/2021 12:55, Alex Bennée wrote:
> Philippe Mathieu-Daudé <f4bug@amsat.org> writes:
> 
>> On 12/13/21 21:15, Matheus K. Ferst wrote:
>>> On 13/12/2021 09:36, Philippe Mathieu-Daudé wrote:
>>>> On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
>>>>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>>>
>>>>> The non-signalling versions of VSX scalar convert to shorter/longer
>>>>> precision insns doesn't silence SNaNs in the hardware. We are currently
>>>>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>>>>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>>>>> float32_to_float64 that calls parts_float_to_float, which uses
>>>>> parts_return_nan that finally calls parts_silence_nan if the input is an
>>>>> SNaN.
>>>>>
>>>>> To address this problem, this patch adds a new float_status flag
>>>>> (return_snan) to avoid the call to parts_silence_nan in the
>>>>> float_class_snan case of parts_return_nan.
>>>>>
>>>>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>>> ---
>>>>> This behavior was observed in a Power9 and can be reproduced with the
>>>>> following code:
>>>>>
>>>>> int main(void)
>>>>> {
>>>>>       __uint128_t t, b = 0x7f80000200000000;
>>>>>
>>>>>       asm("xscvspdpn %x0, %x1\n\t"
>>>>>           : "=wa" (t)
>>>>>           : "wa" (b << 64));
>>>>>       printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>>>              (uint64_t)(t >> 64), (uint64_t)t);
>>>>>
>>>>>       b = 0x7ff0000000000002;
>>>>>       asm("xscvdpspn %x0, %x1\n\t"
>>>>>           : "=wa" (t)
>>>>>           : "wa" (b << 64));
>>>>>       printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>>>              (uint64_t)(t >> 64), (uint64_t)t);
>>>>>
>>>>>       return 0;
>>>>> }
>>>>
>>>> Why not add this test in tests/tcg/ppc64le/ ?
>>>
>>> I'll add it in v2. Is it ok to use __uint128_t in tests?
>>
>> What about:
>>
>>    int main(void)
>>    {
>>    #ifndef __SIZEOF_INT128__
>>        printf("uint128_t not available, skipping...\n");
>>    #else
>>        ...
>>    #endif
>>        return 0;
>>    }
> 
> We have a tests/tcg/configure.sh which does feature tests although it is
> mainly testing for the presence of compiler target flags.  We do this
> because while the docker compilers are all pretty modern the user might
> be using their own cross compiler.
> 
> I'm generally not keen on the tests silently skipping because it gives a
> false impression things have been tested. If it is possible to avoid
> INT128 shenanigans to load the values into the inline assembler lets do
> that, otherwise lets feature test and ifdef a skip-test in the makefile.
> 
> --
> Alex Bennée
> 

I think we can pass the value via register and use mtvsrd/mfvsrd to 
avoid the INT128 part.

There was a v2 of this patch, but I messed up the CC and only included 
get_maintainer.pl result and Philippe. The patch is now applied, so I'll 
send this change as a follow-up.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
diff mbox series

Patch

diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 41d4b17e41..ccc24f9153 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -20,10 +20,12 @@  static void partsN(return_nan)(FloatPartsN *a, float_status *s)
     switch (a->cls) {
     case float_class_snan:
         float_raise(float_flag_invalid, s);
-        if (s->default_nan_mode) {
-            parts_default_nan(a, s);
-        } else {
-            parts_silence_nan(a, s);
+        if (!return_snan(s)) {
+            if (s->default_nan_mode) {
+                parts_default_nan(a, s);
+            } else {
+                parts_silence_nan(a, s);
+            }
         }
         break;
     case float_class_qnan:
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index f2ad0f335e..70f686e0a3 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -92,6 +92,15 @@  static inline bool no_signaling_nans(float_status *status)
 #endif
 }
 
+static inline bool return_snan(float_status *status)
+{
+#if defined(TARGET_PPC)
+    return status->return_snan;
+#else
+    return false;
+#endif
+}
+
 /* Define how the architecture discriminates signaling NaNs.
  * This done with the most significant bit of the fraction.
  * In IEEE 754-1985 this was implementation defined, but in IEEE 754-2008
diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index 5bcbd041f7..eb55298348 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -188,6 +188,7 @@  typedef struct float_status {
     bool snan_bit_is_one;
     bool use_first_nan;
     bool no_signaling_nans;
+    bool return_snan;
 } float_status;
 
 #endif /* SOFTFLOAT_TYPES_H */
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index c4896cecc8..9fc774fdb0 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2746,6 +2746,8 @@  uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
 uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb)
 {
     float_status tstat = env->fp_status;
+
+    tstat.return_snan = true;
     set_float_exception_flags(0, &tstat);
 
     return float32_to_float64(xb >> 32, &tstat);