[RFC] target/ppc: do not silence snan in xscvspdpn

Message ID	20211213121320.947362-1-matheus.ferst@eldorado.org.br
State	Superseded
Headers	show Return-Path: <qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: matheus.ferst@eldorado.org.br To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Subject: [RFC PATCH] target/ppc: do not silence snan in xscvspdpn Date: Mon, 13 Dec 2021 09:13:20 -0300 Message-Id: <20211213121320.947362-1-matheus.ferst@eldorado.org.br> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=201.28.113.2; envelope-from=matheus.ferst@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action Precedence: list Cc: peter.maydell@linaro.org, Matheus Ferst <matheus.ferst@eldorado.org.br>, danielhb413@gmail.com, groug@kaod.org, clg@kaod.org, alex.bennee@linaro.org, aurelien@aurel32.net, david@gibson.dropbear.id.au Errors-To: qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-ppc" <qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	[RFC] target/ppc: do not silence snan in xscvspdpn \| expand [RFC] target/ppc: do not silence snan in xscvspdpn

Matheus K. Ferst Dec. 13, 2021, 12:13 p.m. UTC

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

The non-signalling versions of VSX scalar convert to shorter/longer
precision insns doesn't silence SNaNs in the hardware. We are currently
honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
float32_to_float64 that calls parts_float_to_float, which uses
parts_return_nan that finally calls parts_silence_nan if the input is an
SNaN.

To address this problem, this patch adds a new float_status flag
(return_snan) to avoid the call to parts_silence_nan in the
float_class_snan case of parts_return_nan.

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
This behavior was observed in a Power9 and can be reproduced with the
following code:

int main(void)
{
    __uint128_t t, b = 0x7f80000200000000;

    asm("xscvspdpn %x0, %x1\n\t"
        : "=wa" (t)
        : "wa" (b << 64));
    printf("0x%016" PRIx64 "%016" PRIx64 "\n",
           (uint64_t)(t >> 64), (uint64_t)t);

    b = 0x7ff0000000000002;
    asm("xscvdpspn %x0, %x1\n\t"
        : "=wa" (t)
        : "wa" (b << 64));
    printf("0x%016" PRIx64 "%016" PRIx64 "\n",
           (uint64_t)(t >> 64), (uint64_t)t);

    return 0;
}

That results in:
$ ./xscv_non_signaling
xscvspdpn: 0x7ff00000400000000000000000000000
xscvdpspn: 0x7f8000007f8000000000000000000000
$ qemu-ppc64le ./xscv_non_signaling
xscvspdpn: 0x7ff80000400000000000000000000000
xscvdpspn: 0x7f8000007f8000000000000000000000

PowerISA is more descriptive of xscvdpspn wrt SNaN but doesn't say
anything about NaNs in xscvspdpn description. Should we match the
hardware behavior? If so, does it worth adding a new flag in
float_status for a single instruction? We could also handle that in
helper_xscvdpspn, e.g.:

float32 sp = xb >> 32;
float64 dp;

dp = float32_to_float64(xb >> 32, &tstat);
if (float32_classify(sp) == is_snan) {
    dp &= ~(1ULL << 51);
    dp |= (dp & 0x7ffffffffffffULL) == 0;
}

return dp;

but it feels kind hacky.
---
 fpu/softfloat-parts.c.inc      | 10 ++++++----
 fpu/softfloat-specialize.c.inc |  9 +++++++++
 include/fpu/softfloat-types.h  |  1 +
 target/ppc/fpu_helper.c        |  2 ++
 4 files changed, 18 insertions(+), 4 deletions(-)

Philippe Mathieu-Daudé Dec. 13, 2021, 12:36 p.m. UTC | #1

On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> The non-signalling versions of VSX scalar convert to shorter/longer
> precision insns doesn't silence SNaNs in the hardware. We are currently
> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
> float32_to_float64 that calls parts_float_to_float, which uses
> parts_return_nan that finally calls parts_silence_nan if the input is an
> SNaN.
> 
> To address this problem, this patch adds a new float_status flag
> (return_snan) to avoid the call to parts_silence_nan in the
> float_class_snan case of parts_return_nan.
> 
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> This behavior was observed in a Power9 and can be reproduced with the
> following code:
> 
> int main(void)
> {
>     __uint128_t t, b = 0x7f80000200000000;
> 
>     asm("xscvspdpn %x0, %x1\n\t"
>         : "=wa" (t)
>         : "wa" (b << 64));
>     printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>            (uint64_t)(t >> 64), (uint64_t)t);
> 
>     b = 0x7ff0000000000002;
>     asm("xscvdpspn %x0, %x1\n\t"
>         : "=wa" (t)
>         : "wa" (b << 64));
>     printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>            (uint64_t)(t >> 64), (uint64_t)t);
> 
>     return 0;
> }

Why not add this test in tests/tcg/ppc64le/ ?

> 
> That results in:
> $ ./xscv_non_signaling
> xscvspdpn: 0x7ff00000400000000000000000000000
> xscvdpspn: 0x7f8000007f8000000000000000000000
> $ qemu-ppc64le ./xscv_non_signaling
> xscvspdpn: 0x7ff80000400000000000000000000000
> xscvdpspn: 0x7f8000007f8000000000000000000000
> 
> PowerISA is more descriptive of xscvdpspn wrt SNaN but doesn't say
> anything about NaNs in xscvspdpn description. Should we match the
> hardware behavior? If so, does it worth adding a new flag in
> float_status for a single instruction? We could also handle that in
> helper_xscvdpspn, e.g.:
> 
> float32 sp = xb >> 32;
> float64 dp;
> 
> dp = float32_to_float64(xb >> 32, &tstat);
> if (float32_classify(sp) == is_snan) {
>     dp &= ~(1ULL << 51);
>     dp |= (dp & 0x7ffffffffffffULL) == 0;
> }
> 
> return dp;
> 
> but it feels kind hacky.

Richard Henderson Dec. 13, 2021, 3:46 p.m. UTC | #2

On 12/13/21 4:13 AM, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> The non-signalling versions of VSX scalar convert to shorter/longer
> precision insns doesn't silence SNaNs in the hardware. We are currently
> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
> float32_to_float64 that calls parts_float_to_float, which uses
> parts_return_nan that finally calls parts_silence_nan if the input is an
> SNaN.
> 
> To address this problem, this patch adds a new float_status flag
> (return_snan) to avoid the call to parts_silence_nan in the
> float_class_snan case of parts_return_nan.
> 
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>

I think you shouldn't be using softfloat at all for this, but use the existing 
helper_todouble function.


r~

Matheus K. Ferst Dec. 13, 2021, 8:15 p.m. UTC | #3

On 13/12/2021 09:36, Philippe Mathieu-Daudé wrote:
> On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> The non-signalling versions of VSX scalar convert to shorter/longer
>> precision insns doesn't silence SNaNs in the hardware. We are currently
>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>> float32_to_float64 that calls parts_float_to_float, which uses
>> parts_return_nan that finally calls parts_silence_nan if the input is an
>> SNaN.
>>
>> To address this problem, this patch adds a new float_status flag
>> (return_snan) to avoid the call to parts_silence_nan in the
>> float_class_snan case of parts_return_nan.
>>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>> ---
>> This behavior was observed in a Power9 and can be reproduced with the
>> following code:
>>
>> int main(void)
>> {
>>      __uint128_t t, b = 0x7f80000200000000;
>>
>>      asm("xscvspdpn %x0, %x1\n\t"
>>          : "=wa" (t)
>>          : "wa" (b << 64));
>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>             (uint64_t)(t >> 64), (uint64_t)t);
>>
>>      b = 0x7ff0000000000002;
>>      asm("xscvdpspn %x0, %x1\n\t"
>>          : "=wa" (t)
>>          : "wa" (b << 64));
>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>             (uint64_t)(t >> 64), (uint64_t)t);
>>
>>      return 0;
>> }
> 
> Why not add this test in tests/tcg/ppc64le/ ?

I'll add it in v2. Is it ok to use __uint128_t in tests?

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

Matheus K. Ferst Dec. 13, 2021, 8:15 p.m. UTC | #4

On 13/12/2021 12:46, Richard Henderson wrote:
> On 12/13/21 4:13 AM, matheus.ferst@eldorado.org.br wrote:
>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> The non-signalling versions of VSX scalar convert to shorter/longer
>> precision insns doesn't silence SNaNs in the hardware. We are currently
>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>> float32_to_float64 that calls parts_float_to_float, which uses
>> parts_return_nan that finally calls parts_silence_nan if the input is an
>> SNaN.
>>
>> To address this problem, this patch adds a new float_status flag
>> (return_snan) to avoid the call to parts_silence_nan in the
>> float_class_snan case of parts_return_nan.
>>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> I think you shouldn't be using softfloat at all for this, but use the 
> existing
> helper_todouble function.
> 

Oh, that's better, I'll send a v2.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

Philippe Mathieu-Daudé Dec. 13, 2021, 10:19 p.m. UTC | #5

On 12/13/21 21:15, Matheus K. Ferst wrote:
> On 13/12/2021 09:36, Philippe Mathieu-Daudé wrote:
>> On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
>>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>
>>> The non-signalling versions of VSX scalar convert to shorter/longer
>>> precision insns doesn't silence SNaNs in the hardware. We are currently
>>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>>> float32_to_float64 that calls parts_float_to_float, which uses
>>> parts_return_nan that finally calls parts_silence_nan if the input is an
>>> SNaN.
>>>
>>> To address this problem, this patch adds a new float_status flag
>>> (return_snan) to avoid the call to parts_silence_nan in the
>>> float_class_snan case of parts_return_nan.
>>>
>>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>> ---
>>> This behavior was observed in a Power9 and can be reproduced with the
>>> following code:
>>>
>>> int main(void)
>>> {
>>>      __uint128_t t, b = 0x7f80000200000000;
>>>
>>>      asm("xscvspdpn %x0, %x1\n\t"
>>>          : "=wa" (t)
>>>          : "wa" (b << 64));
>>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>             (uint64_t)(t >> 64), (uint64_t)t);
>>>
>>>      b = 0x7ff0000000000002;
>>>      asm("xscvdpspn %x0, %x1\n\t"
>>>          : "=wa" (t)
>>>          : "wa" (b << 64));
>>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>             (uint64_t)(t >> 64), (uint64_t)t);
>>>
>>>      return 0;
>>> }
>>
>> Why not add this test in tests/tcg/ppc64le/ ?
> 
> I'll add it in v2. Is it ok to use __uint128_t in tests?

What about:

  int main(void)
  {
  #ifndef __SIZEOF_INT128__
      printf("uint128_t not available, skipping...\n");
  #else
      ...
  #endif
      return 0;
  }

Alex Bennée Dec. 15, 2021, 3:55 p.m. UTC | #6

Philippe Mathieu-Daudé <f4bug@amsat.org> writes:

> On 12/13/21 21:15, Matheus K. Ferst wrote:
>> On 13/12/2021 09:36, Philippe Mathieu-Daudé wrote:
>>> On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
>>>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>>
>>>> The non-signalling versions of VSX scalar convert to shorter/longer
>>>> precision insns doesn't silence SNaNs in the hardware. We are currently
>>>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>>>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>>>> float32_to_float64 that calls parts_float_to_float, which uses
>>>> parts_return_nan that finally calls parts_silence_nan if the input is an
>>>> SNaN.
>>>>
>>>> To address this problem, this patch adds a new float_status flag
>>>> (return_snan) to avoid the call to parts_silence_nan in the
>>>> float_class_snan case of parts_return_nan.
>>>>
>>>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>> ---
>>>> This behavior was observed in a Power9 and can be reproduced with the
>>>> following code:
>>>>
>>>> int main(void)
>>>> {
>>>>      __uint128_t t, b = 0x7f80000200000000;
>>>>
>>>>      asm("xscvspdpn %x0, %x1\n\t"
>>>>          : "=wa" (t)
>>>>          : "wa" (b << 64));
>>>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>>             (uint64_t)(t >> 64), (uint64_t)t);
>>>>
>>>>      b = 0x7ff0000000000002;
>>>>      asm("xscvdpspn %x0, %x1\n\t"
>>>>          : "=wa" (t)
>>>>          : "wa" (b << 64));
>>>>      printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>>             (uint64_t)(t >> 64), (uint64_t)t);
>>>>
>>>>      return 0;
>>>> }
>>>
>>> Why not add this test in tests/tcg/ppc64le/ ?
>> 
>> I'll add it in v2. Is it ok to use __uint128_t in tests?
>
> What about:
>
>   int main(void)
>   {
>   #ifndef __SIZEOF_INT128__
>       printf("uint128_t not available, skipping...\n");
>   #else
>       ...
>   #endif
>       return 0;
>   }

We have a tests/tcg/configure.sh which does feature tests although it is
mainly testing for the presence of compiler target flags.  We do this
because while the docker compilers are all pretty modern the user might
be using their own cross compiler.

I'm generally not keen on the tests silently skipping because it gives a
false impression things have been tested. If it is possible to avoid
INT128 shenanigans to load the values into the inline assembler lets do
that, otherwise lets feature test and ifdef a skip-test in the makefile.

Matheus K. Ferst Dec. 15, 2021, 8:01 p.m. UTC | #7

On 15/12/2021 12:55, Alex Bennée wrote:
> Philippe Mathieu-Daudé <f4bug@amsat.org> writes:
> 
>> On 12/13/21 21:15, Matheus K. Ferst wrote:
>>> On 13/12/2021 09:36, Philippe Mathieu-Daudé wrote:
>>>> On 12/13/21 13:13, matheus.ferst@eldorado.org.br wrote:
>>>>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>>>
>>>>> The non-signalling versions of VSX scalar convert to shorter/longer
>>>>> precision insns doesn't silence SNaNs in the hardware. We are currently
>>>>> honoring this behavior in xscvdpspn, since helper_xscvdpspn handles the
>>>>> conversion with extracts/deposits/etc. OTOH, xscvspdpn uses
>>>>> float32_to_float64 that calls parts_float_to_float, which uses
>>>>> parts_return_nan that finally calls parts_silence_nan if the input is an
>>>>> SNaN.
>>>>>
>>>>> To address this problem, this patch adds a new float_status flag
>>>>> (return_snan) to avoid the call to parts_silence_nan in the
>>>>> float_class_snan case of parts_return_nan.
>>>>>
>>>>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>>> ---
>>>>> This behavior was observed in a Power9 and can be reproduced with the
>>>>> following code:
>>>>>
>>>>> int main(void)
>>>>> {
>>>>>       __uint128_t t, b = 0x7f80000200000000;
>>>>>
>>>>>       asm("xscvspdpn %x0, %x1\n\t"
>>>>>           : "=wa" (t)
>>>>>           : "wa" (b << 64));
>>>>>       printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>>>              (uint64_t)(t >> 64), (uint64_t)t);
>>>>>
>>>>>       b = 0x7ff0000000000002;
>>>>>       asm("xscvdpspn %x0, %x1\n\t"
>>>>>           : "=wa" (t)
>>>>>           : "wa" (b << 64));
>>>>>       printf("0x%016" PRIx64 "%016" PRIx64 "\n",
>>>>>              (uint64_t)(t >> 64), (uint64_t)t);
>>>>>
>>>>>       return 0;
>>>>> }
>>>>
>>>> Why not add this test in tests/tcg/ppc64le/ ?
>>>
>>> I'll add it in v2. Is it ok to use __uint128_t in tests?
>>
>> What about:
>>
>>    int main(void)
>>    {
>>    #ifndef __SIZEOF_INT128__
>>        printf("uint128_t not available, skipping...\n");
>>    #else
>>        ...
>>    #endif
>>        return 0;
>>    }
> 
> We have a tests/tcg/configure.sh which does feature tests although it is
> mainly testing for the presence of compiler target flags.  We do this
> because while the docker compilers are all pretty modern the user might
> be using their own cross compiler.
> 
> I'm generally not keen on the tests silently skipping because it gives a
> false impression things have been tested. If it is possible to avoid
> INT128 shenanigans to load the values into the inline assembler lets do
> that, otherwise lets feature test and ifdef a skip-test in the makefile.
> 
> --
> Alex Bennée
> 

I think we can pass the value via register and use mtvsrd/mfvsrd to 
avoid the INT128 part.

There was a v2 of this patch, but I messed up the CC and only included 
get_maintainer.pl result and Philippe. The patch is now applied, so I'll 
send this change as a follow-up.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

[RFC] target/ppc: do not silence snan in xscvspdpn

Commit Message

Comments

Patch