diff mbox series

vect: Multistep float->int conversion only with no trapping math

Message ID 20240802124300.29260-1-jchrist@linux.ibm.com
State New
Headers show
Series vect: Multistep float->int conversion only with no trapping math | expand

Commit Message

Juergen Christ Aug. 2, 2024, 12:43 p.m. UTC
Do not convert floats to ints in multiple step if trapping math is
enabled.  This might hide some inexact signals.

Also use correct sign (the sign of the target integer type) for the
intermediate steps.  This only affects undefined behaviour (casting
floats to unsigned datatype where the float is negative).

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_conversion): multi-step
          float to int conversion only with trapping math and correct
          sign.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>

Bootstrapped and tested on x84 and s390.  Ok for trunk?

---
 gcc/tree-vect-stmts.cc | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Richard Biener Aug. 5, 2024, 11 a.m. UTC | #1
On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
>
> Do not convert floats to ints in multiple step if trapping math is
> enabled.  This might hide some inexact signals.
>
> Also use correct sign (the sign of the target integer type) for the
> intermediate steps.  This only affects undefined behaviour (casting
> floats to unsigned datatype where the float is negative).
>
> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
>           float to int conversion only with trapping math and correct
>           sign.
>
> Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
>
> Bootstrapped and tested on x84 and s390.  Ok for trunk?
>
> ---
>  gcc/tree-vect-stmts.cc | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index fdcda0d2abae..2ddd13383193 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
>             break;
>
>           cvt_type
> -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> +                                             TYPE_UNSIGNED (lhs_type));

But lhs_type should be a float type here, the idea that for a
FLOAT_EXPR (int -> float)
a signed integer type is the natural one to use - as it's 2x wider
than the original
RHS type it's signedness doesn't matter.  Note all float types should be
!TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO.

Please drop it.

>           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
>           if (cvt_type == NULL_TREE)
>             goto unsupported;
> @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
>        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
>         goto unsupported;
>
> -      if (code == FIX_TRUNC_EXPR)
> +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
>         {
>           cvt_type
> -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> +                                             TYPE_UNSIGNED (lhs_type));

Here it might be relevant for correctness - we have to choose between
sfix and ufix for the float -> [u]int conversion.

Do  you have a testcase?  Shouldn't the exactness be independent of the integer
type we convert to?

>           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
>           if (cvt_type == NULL_TREE)
>             goto unsupported;
> --
> 2.43.5
>
Juergen Christ Aug. 5, 2024, 2:01 p.m. UTC | #2
Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> >
> > Do not convert floats to ints in multiple step if trapping math is
> > enabled.  This might hide some inexact signals.
> >
> > Also use correct sign (the sign of the target integer type) for the
> > intermediate steps.  This only affects undefined behaviour (casting
> > floats to unsigned datatype where the float is negative).
> >
> > gcc/ChangeLog:
> >
> >         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> >           float to int conversion only with trapping math and correct
> >           sign.
> >
> > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
> >
> > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> >
> > ---
> >  gcc/tree-vect-stmts.cc | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index fdcda0d2abae..2ddd13383193 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> >             break;
> >
> >           cvt_type
> > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > +                                             TYPE_UNSIGNED (lhs_type));
> 
> But lhs_type should be a float type here, the idea that for a
> FLOAT_EXPR (int -> float)
> a signed integer type is the natural one to use - as it's 2x wider
> than the original
> RHS type it's signedness doesn't matter.  Note all float types should be
> !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO.
> 
> Please drop it.

Will do.  Sorry about that.

> >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> >           if (cvt_type == NULL_TREE)
> >             goto unsupported;
> > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> >         goto unsupported;
> >
> > -      if (code == FIX_TRUNC_EXPR)
> > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> >         {
> >           cvt_type
> > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > +                                             TYPE_UNSIGNED (lhs_type));
> 
> Here it might be relevant for correctness - we have to choose between
> sfix and ufix for the float -> [u]int conversion.
> 
> Do  you have a testcase?  Shouldn't the exactness be independent of the integer
> type we convert to?

I was looking at this little program which contains undefined behaviour:

#include <stdio.h>

__attribute__((noinline,noclone,noipa))
void
vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);

void
vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
{
        out[0] = in[0];
        out[1] = in[1];
        out[2] = in[2];
        out[3] = in[3];
}

int main()
{
        double in[] = {-1,-2,-3,-4};
        unsigned int out[4];

        vec_pack_ufix_trunc_v2df (in, out);
        for (int i = 0; i < 4; ++i)
                printf("out[%d] = %u\n", i, out[i]);
        return 0;
}

On s390x, I get different results after vectorization:

out[0] = 4294967295
out[1] = 4294967294
out[2] = 4294967293
out[3] = 4294967292

than without vectorization:

out[0] = 0
out[1] = 0
out[2] = 0
out[3] = 0

Even if this is undefined behaviour, I think it would be nice to have
consistent results here.

Also, while I added an expander to circumvent this problem in a
previous patch, reviewers requested to hide this behind trapping math.
Thus, I looked into this.

Seeing the result from the CI for aarch64, I guess there are some
tests that actually expect this vectorization to always happen even
though it might not be save w.r.t. trapping math.

> 
> >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> >           if (cvt_type == NULL_TREE)
> >             goto unsupported;
> > --
> > 2.43.5
> >
Richard Biener Aug. 8, 2024, 12:06 p.m. UTC | #3
On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
>
> Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > >
> > > Do not convert floats to ints in multiple step if trapping math is
> > > enabled.  This might hide some inexact signals.
> > >
> > > Also use correct sign (the sign of the target integer type) for the
> > > intermediate steps.  This only affects undefined behaviour (casting
> > > floats to unsigned datatype where the float is negative).
> > >
> > > gcc/ChangeLog:
> > >
> > >         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> > >           float to int conversion only with trapping math and correct
> > >           sign.
> > >
> > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
> > >
> > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > >
> > > ---
> > >  gcc/tree-vect-stmts.cc | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index fdcda0d2abae..2ddd13383193 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> > >             break;
> > >
> > >           cvt_type
> > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > +                                             TYPE_UNSIGNED (lhs_type));
> >
> > But lhs_type should be a float type here, the idea that for a
> > FLOAT_EXPR (int -> float)
> > a signed integer type is the natural one to use - as it's 2x wider
> > than the original
> > RHS type it's signedness doesn't matter.  Note all float types should be
> > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO.
> >
> > Please drop it.
>
> Will do.  Sorry about that.
>
> > >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > >           if (cvt_type == NULL_TREE)
> > >             goto unsupported;
> > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> > >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> > >         goto unsupported;
> > >
> > > -      if (code == FIX_TRUNC_EXPR)
> > > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > >         {
> > >           cvt_type
> > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > +                                             TYPE_UNSIGNED (lhs_type));
> >
> > Here it might be relevant for correctness - we have to choose between
> > sfix and ufix for the float -> [u]int conversion.
> >
> > Do  you have a testcase?  Shouldn't the exactness be independent of the integer
> > type we convert to?
>
> I was looking at this little program which contains undefined behaviour:
>
> #include <stdio.h>
>
> __attribute__((noinline,noclone,noipa))
> void
> vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
>
> void
> vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> {
>         out[0] = in[0];
>         out[1] = in[1];
>         out[2] = in[2];
>         out[3] = in[3];
> }
>
> int main()
> {
>         double in[] = {-1,-2,-3,-4};
>         unsigned int out[4];
>
>         vec_pack_ufix_trunc_v2df (in, out);
>         for (int i = 0; i < 4; ++i)
>                 printf("out[%d] = %u\n", i, out[i]);
>         return 0;
> }
>
> On s390x, I get different results after vectorization:
>
> out[0] = 4294967295
> out[1] = 4294967294
> out[2] = 4294967293
> out[3] = 4294967292
>
> than without vectorization:
>
> out[0] = 0
> out[1] = 0
> out[2] = 0
> out[3] = 0
>
> Even if this is undefined behaviour, I think it would be nice to have
> consistent results here.
>
> Also, while I added an expander to circumvent this problem in a
> previous patch, reviewers requested to hide this behind trapping math.
> Thus, I looked into this.
>
> Seeing the result from the CI for aarch64, I guess there are some
> tests that actually expect this vectorization to always happen even
> though it might not be save w.r.t. trapping math.

I do remember this was extensively discussed (but we might have missed
something) and one argument indeed was that when it's undefined behavior
we can do the vectorization given the actual values might be in-bound.

Richard.

> >
> > >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > >           if (cvt_type == NULL_TREE)
> > >             goto unsupported;
> > > --
> > > 2.43.5
> > >
Juergen Christ Aug. 9, 2024, 12:58 p.m. UTC | #4
Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener:
> On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> >
> > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > >
> > > > Do not convert floats to ints in multiple step if trapping math is
> > > > enabled.  This might hide some inexact signals.
> > > >
> > > > Also use correct sign (the sign of the target integer type) for the
> > > > intermediate steps.  This only affects undefined behaviour (casting
> > > > floats to unsigned datatype where the float is negative).
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> > > >           float to int conversion only with trapping math and correct
> > > >           sign.
> > > >
> > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
> > > >
> > > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > > >
> > > > ---
> > > >  gcc/tree-vect-stmts.cc | 8 +++++---
> > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index fdcda0d2abae..2ddd13383193 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> > > >             break;
> > > >
> > > >           cvt_type
> > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > +                                             TYPE_UNSIGNED (lhs_type));
> > >
> > > But lhs_type should be a float type here, the idea that for a
> > > FLOAT_EXPR (int -> float)
> > > a signed integer type is the natural one to use - as it's 2x wider
> > > than the original
> > > RHS type it's signedness doesn't matter.  Note all float types should be
> > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO.
> > >
> > > Please drop it.
> >
> > Will do.  Sorry about that.
> >
> > > >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > > >           if (cvt_type == NULL_TREE)
> > > >             goto unsupported;
> > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> > > >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> > > >         goto unsupported;
> > > >
> > > > -      if (code == FIX_TRUNC_EXPR)
> > > > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > > >         {
> > > >           cvt_type
> > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > +                                             TYPE_UNSIGNED (lhs_type));
> > >
> > > Here it might be relevant for correctness - we have to choose between
> > > sfix and ufix for the float -> [u]int conversion.
> > >
> > > Do  you have a testcase?  Shouldn't the exactness be independent of the integer
> > > type we convert to?
> >
> > I was looking at this little program which contains undefined behaviour:
> >
> > #include <stdio.h>
> >
> > __attribute__((noinline,noclone,noipa))
> > void
> > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
> >
> > void
> > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> > {
> >         out[0] = in[0];
> >         out[1] = in[1];
> >         out[2] = in[2];
> >         out[3] = in[3];
> > }
> >
> > int main()
> > {
> >         double in[] = {-1,-2,-3,-4};
> >         unsigned int out[4];
> >
> >         vec_pack_ufix_trunc_v2df (in, out);
> >         for (int i = 0; i < 4; ++i)
> >                 printf("out[%d] = %u\n", i, out[i]);
> >         return 0;
> > }
> >
> > On s390x, I get different results after vectorization:
> >
> > out[0] = 4294967295
> > out[1] = 4294967294
> > out[2] = 4294967293
> > out[3] = 4294967292
> >
> > than without vectorization:
> >
> > out[0] = 0
> > out[1] = 0
> > out[2] = 0
> > out[3] = 0
> >
> > Even if this is undefined behaviour, I think it would be nice to have
> > consistent results here.
> >
> > Also, while I added an expander to circumvent this problem in a
> > previous patch, reviewers requested to hide this behind trapping math.
> > Thus, I looked into this.
> >
> > Seeing the result from the CI for aarch64, I guess there are some
> > tests that actually expect this vectorization to always happen even
> > though it might not be save w.r.t. trapping math.
> 
> I do remember this was extensively discussed (but we might have missed
> something) and one argument indeed was that when it's undefined behavior
> we can do the vectorization given the actual values might be in-bound.

Okay.  Would you be fine with the patch to only vectorize when
trapping math is disabled?  I still could take care of the rest on
s390 backend side by defining the appropriate expanders.  Still think
it is weird though that we might produce different results on
vectorization than without vectorization.  Yes, that is what
"undefined behaviour" is all about, but we can simply fix this here.
Nevertheless, how about just adding the trapping math check?
Richard Biener Aug. 20, 2024, 8:15 a.m. UTC | #5
On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
>
> Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener:
> > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > >
> > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > > >
> > > > > Do not convert floats to ints in multiple step if trapping math is
> > > > > enabled.  This might hide some inexact signals.
> > > > >
> > > > > Also use correct sign (the sign of the target integer type) for the
> > > > > intermediate steps.  This only affects undefined behaviour (casting
> > > > > floats to unsigned datatype where the float is negative).
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> > > > >           float to int conversion only with trapping math and correct
> > > > >           sign.
> > > > >
> > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
> > > > >
> > > > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > > > >
> > > > > ---
> > > > >  gcc/tree-vect-stmts.cc | 8 +++++---
> > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > index fdcda0d2abae..2ddd13383193 100644
> > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> > > > >             break;
> > > > >
> > > > >           cvt_type
> > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > >
> > > > But lhs_type should be a float type here, the idea that for a
> > > > FLOAT_EXPR (int -> float)
> > > > a signed integer type is the natural one to use - as it's 2x wider
> > > > than the original
> > > > RHS type it's signedness doesn't matter.  Note all float types should be
> > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO.
> > > >
> > > > Please drop it.
> > >
> > > Will do.  Sorry about that.
> > >
> > > > >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > > > >           if (cvt_type == NULL_TREE)
> > > > >             goto unsupported;
> > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> > > > >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> > > > >         goto unsupported;
> > > > >
> > > > > -      if (code == FIX_TRUNC_EXPR)
> > > > > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > > > >         {
> > > > >           cvt_type
> > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > >
> > > > Here it might be relevant for correctness - we have to choose between
> > > > sfix and ufix for the float -> [u]int conversion.
> > > >
> > > > Do  you have a testcase?  Shouldn't the exactness be independent of the integer
> > > > type we convert to?
> > >
> > > I was looking at this little program which contains undefined behaviour:
> > >
> > > #include <stdio.h>
> > >
> > > __attribute__((noinline,noclone,noipa))
> > > void
> > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
> > >
> > > void
> > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> > > {
> > >         out[0] = in[0];
> > >         out[1] = in[1];
> > >         out[2] = in[2];
> > >         out[3] = in[3];
> > > }
> > >
> > > int main()
> > > {
> > >         double in[] = {-1,-2,-3,-4};
> > >         unsigned int out[4];
> > >
> > >         vec_pack_ufix_trunc_v2df (in, out);
> > >         for (int i = 0; i < 4; ++i)
> > >                 printf("out[%d] = %u\n", i, out[i]);
> > >         return 0;
> > > }
> > >
> > > On s390x, I get different results after vectorization:
> > >
> > > out[0] = 4294967295
> > > out[1] = 4294967294
> > > out[2] = 4294967293
> > > out[3] = 4294967292
> > >
> > > than without vectorization:
> > >
> > > out[0] = 0
> > > out[1] = 0
> > > out[2] = 0
> > > out[3] = 0
> > >
> > > Even if this is undefined behaviour, I think it would be nice to have
> > > consistent results here.
> > >
> > > Also, while I added an expander to circumvent this problem in a
> > > previous patch, reviewers requested to hide this behind trapping math.
> > > Thus, I looked into this.
> > >
> > > Seeing the result from the CI for aarch64, I guess there are some
> > > tests that actually expect this vectorization to always happen even
> > > though it might not be save w.r.t. trapping math.
> >
> > I do remember this was extensively discussed (but we might have missed
> > something) and one argument indeed was that when it's undefined behavior
> > we can do the vectorization given the actual values might be in-bound.
>
> Okay.  Would you be fine with the patch to only vectorize when
> trapping math is disabled?  I still could take care of the rest on
> s390 backend side by defining the appropriate expanders.  Still think
> it is weird though that we might produce different results on
> vectorization than without vectorization.  Yes, that is what
> "undefined behaviour" is all about, but we can simply fix this here.
> Nevertheless, how about just adding the trapping math check?

So to summarize - the problem is different result when vectorizing
when the scalar code invokes undefined behavior?  I think there is
nothing to fix and we shouldn't pessimize code not invoking undefined
behavior by adding a trapping math check.

Or did I misunderstand things?

Thanks,
Richard.
Juergen Christ Aug. 20, 2024, 9:16 a.m. UTC | #6
Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener:
> On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> >
> > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener:
> > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > >
> > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > > > >
> > > > > > Do not convert floats to ints in multiple step if trapping math is
> > > > > > enabled.  This might hide some inexact signals.
> > > > > >
> > > > > > Also use correct sign (the sign of the target integer type) for the
> > > > > > intermediate steps.  This only affects undefined behaviour (casting
> > > > > > floats to unsigned datatype where the float is negative).
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> > > > > >           float to int conversion only with trapping math and correct
> > > > > >           sign.
> > > > > >
> > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
> > > > > >
> > > > > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > > > > >
> > > > > > ---
> > > > > >  gcc/tree-vect-stmts.cc | 8 +++++---
> > > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > index fdcda0d2abae..2ddd13383193 100644
> > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > >             break;
> > > > > >
> > > > > >           cvt_type
> > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > > >
> > > > > But lhs_type should be a float type here, the idea that for a
> > > > > FLOAT_EXPR (int -> float)
> > > > > a signed integer type is the natural one to use - as it's 2x wider
> > > > > than the original
> > > > > RHS type it's signedness doesn't matter.  Note all float types should be
> > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO.
> > > > >
> > > > > Please drop it.
> > > >
> > > > Will do.  Sorry about that.
> > > >
> > > > > >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > > > > >           if (cvt_type == NULL_TREE)
> > > > > >             goto unsupported;
> > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> > > > > >         goto unsupported;
> > > > > >
> > > > > > -      if (code == FIX_TRUNC_EXPR)
> > > > > > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > > > > >         {
> > > > > >           cvt_type
> > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > > >
> > > > > Here it might be relevant for correctness - we have to choose between
> > > > > sfix and ufix for the float -> [u]int conversion.
> > > > >
> > > > > Do  you have a testcase?  Shouldn't the exactness be independent of the integer
> > > > > type we convert to?
> > > >
> > > > I was looking at this little program which contains undefined behaviour:
> > > >
> > > > #include <stdio.h>
> > > >
> > > > __attribute__((noinline,noclone,noipa))
> > > > void
> > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
> > > >
> > > > void
> > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> > > > {
> > > >         out[0] = in[0];
> > > >         out[1] = in[1];
> > > >         out[2] = in[2];
> > > >         out[3] = in[3];
> > > > }
> > > >
> > > > int main()
> > > > {
> > > >         double in[] = {-1,-2,-3,-4};
> > > >         unsigned int out[4];
> > > >
> > > >         vec_pack_ufix_trunc_v2df (in, out);
> > > >         for (int i = 0; i < 4; ++i)
> > > >                 printf("out[%d] = %u\n", i, out[i]);
> > > >         return 0;
> > > > }
> > > >
> > > > On s390x, I get different results after vectorization:
> > > >
> > > > out[0] = 4294967295
> > > > out[1] = 4294967294
> > > > out[2] = 4294967293
> > > > out[3] = 4294967292
> > > >
> > > > than without vectorization:
> > > >
> > > > out[0] = 0
> > > > out[1] = 0
> > > > out[2] = 0
> > > > out[3] = 0
> > > >
> > > > Even if this is undefined behaviour, I think it would be nice to have
> > > > consistent results here.
> > > >
> > > > Also, while I added an expander to circumvent this problem in a
> > > > previous patch, reviewers requested to hide this behind trapping math.
> > > > Thus, I looked into this.
> > > >
> > > > Seeing the result from the CI for aarch64, I guess there are some
> > > > tests that actually expect this vectorization to always happen even
> > > > though it might not be save w.r.t. trapping math.
> > >
> > > I do remember this was extensively discussed (but we might have missed
> > > something) and one argument indeed was that when it's undefined behavior
> > > we can do the vectorization given the actual values might be in-bound.
> >
> > Okay.  Would you be fine with the patch to only vectorize when
> > trapping math is disabled?  I still could take care of the rest on
> > s390 backend side by defining the appropriate expanders.  Still think
> > it is weird though that we might produce different results on
> > vectorization than without vectorization.  Yes, that is what
> > "undefined behaviour" is all about, but we can simply fix this here.
> > Nevertheless, how about just adding the trapping math check?
> 
> So to summarize - the problem is different result when vectorizing
> when the scalar code invokes undefined behavior?  I think there is
> nothing to fix and we shouldn't pessimize code not invoking undefined
> behavior by adding a trapping math check.
> 
> Or did I misunderstand things?

The different results can still be delt with in the backend.  The only
remaining part is the question if vectorization of FIX_TRUNC_EXPR in
multiple steps should be guarded by the trapping math flag or not?  I
think it should, but according to CI results, some architectures
already rely on the current behaviour.  So I am unsure if we should
add the flag or not.  What is your opinion on that?

Regards,

Juergen
Richard Biener Aug. 20, 2024, 12:51 p.m. UTC | #7
On Tue, Aug 20, 2024 at 11:16 AM Juergen Christ <jchrist@linux.ibm.com> wrote:
>
> Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener:
> > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > >
> > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener:
> > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > > >
> > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > > > > >
> > > > > > > Do not convert floats to ints in multiple step if trapping math is
> > > > > > > enabled.  This might hide some inexact signals.
> > > > > > >
> > > > > > > Also use correct sign (the sign of the target integer type) for the
> > > > > > > intermediate steps.  This only affects undefined behaviour (casting
> > > > > > > floats to unsigned datatype where the float is negative).
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > >         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> > > > > > >           float to int conversion only with trapping math and correct
> > > > > > >           sign.
> > > > > > >
> > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
> > > > > > >
> > > > > > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > > > > > >
> > > > > > > ---
> > > > > > >  gcc/tree-vect-stmts.cc | 8 +++++---
> > > > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > > index fdcda0d2abae..2ddd13383193 100644
> > > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > > >             break;
> > > > > > >
> > > > > > >           cvt_type
> > > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > > > >
> > > > > > But lhs_type should be a float type here, the idea that for a
> > > > > > FLOAT_EXPR (int -> float)
> > > > > > a signed integer type is the natural one to use - as it's 2x wider
> > > > > > than the original
> > > > > > RHS type it's signedness doesn't matter.  Note all float types should be
> > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO.
> > > > > >
> > > > > > Please drop it.
> > > > >
> > > > > Will do.  Sorry about that.
> > > > >
> > > > > > >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > > > > > >           if (cvt_type == NULL_TREE)
> > > > > > >             goto unsupported;
> > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > > >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> > > > > > >         goto unsupported;
> > > > > > >
> > > > > > > -      if (code == FIX_TRUNC_EXPR)
> > > > > > > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > > > > > >         {
> > > > > > >           cvt_type
> > > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > > > >
> > > > > > Here it might be relevant for correctness - we have to choose between
> > > > > > sfix and ufix for the float -> [u]int conversion.
> > > > > >
> > > > > > Do  you have a testcase?  Shouldn't the exactness be independent of the integer
> > > > > > type we convert to?
> > > > >
> > > > > I was looking at this little program which contains undefined behaviour:
> > > > >
> > > > > #include <stdio.h>
> > > > >
> > > > > __attribute__((noinline,noclone,noipa))
> > > > > void
> > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
> > > > >
> > > > > void
> > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> > > > > {
> > > > >         out[0] = in[0];
> > > > >         out[1] = in[1];
> > > > >         out[2] = in[2];
> > > > >         out[3] = in[3];
> > > > > }
> > > > >
> > > > > int main()
> > > > > {
> > > > >         double in[] = {-1,-2,-3,-4};
> > > > >         unsigned int out[4];
> > > > >
> > > > >         vec_pack_ufix_trunc_v2df (in, out);
> > > > >         for (int i = 0; i < 4; ++i)
> > > > >                 printf("out[%d] = %u\n", i, out[i]);
> > > > >         return 0;
> > > > > }
> > > > >
> > > > > On s390x, I get different results after vectorization:
> > > > >
> > > > > out[0] = 4294967295
> > > > > out[1] = 4294967294
> > > > > out[2] = 4294967293
> > > > > out[3] = 4294967292
> > > > >
> > > > > than without vectorization:
> > > > >
> > > > > out[0] = 0
> > > > > out[1] = 0
> > > > > out[2] = 0
> > > > > out[3] = 0
> > > > >
> > > > > Even if this is undefined behaviour, I think it would be nice to have
> > > > > consistent results here.
> > > > >
> > > > > Also, while I added an expander to circumvent this problem in a
> > > > > previous patch, reviewers requested to hide this behind trapping math.
> > > > > Thus, I looked into this.
> > > > >
> > > > > Seeing the result from the CI for aarch64, I guess there are some
> > > > > tests that actually expect this vectorization to always happen even
> > > > > though it might not be save w.r.t. trapping math.
> > > >
> > > > I do remember this was extensively discussed (but we might have missed
> > > > something) and one argument indeed was that when it's undefined behavior
> > > > we can do the vectorization given the actual values might be in-bound.
> > >
> > > Okay.  Would you be fine with the patch to only vectorize when
> > > trapping math is disabled?  I still could take care of the rest on
> > > s390 backend side by defining the appropriate expanders.  Still think
> > > it is weird though that we might produce different results on
> > > vectorization than without vectorization.  Yes, that is what
> > > "undefined behaviour" is all about, but we can simply fix this here.
> > > Nevertheless, how about just adding the trapping math check?
> >
> > So to summarize - the problem is different result when vectorizing
> > when the scalar code invokes undefined behavior?  I think there is
> > nothing to fix and we shouldn't pessimize code not invoking undefined
> > behavior by adding a trapping math check.
> >
> > Or did I misunderstand things?
>
> The different results can still be delt with in the backend.  The only
> remaining part is the question if vectorization of FIX_TRUNC_EXPR in
> multiple steps should be guarded by the trapping math flag or not?  I
> think it should, but according to CI results, some architectures
> already rely on the current behaviour.  So I am unsure if we should
> add the flag or not.  What is your opinion on that?

If we are adding a float conversion as intermediate step that would
definitely need to be guarded by !flag_trapping_math, say for
double -> float -> int, but I think we're doing double -> long -> int
for this and check ranges to make sure we are not missing a
truncation.

Which path is it that does not properly honor exception flags in
the case of multi-step conversions?

Richard.

>
> Regards,
>
> Juergen
Juergen Christ Aug. 20, 2024, 1:35 p.m. UTC | #8
Am Tue, Aug 20, 2024 at 02:51:02PM +0200 schrieb Richard Biener:
> On Tue, Aug 20, 2024 at 11:16 AM Juergen Christ <jchrist@linux.ibm.com> wrote:
> >
> > Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener:
> > > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > >
> > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener:
> > > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > > > >
> > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > > > > > >
> > > > > > > > Do not convert floats to ints in multiple step if trapping math is
> > > > > > > > enabled.  This might hide some inexact signals.
> > > > > > > >
> > > > > > > > Also use correct sign (the sign of the target integer type) for the
> > > > > > > > intermediate steps.  This only affects undefined behaviour (casting
> > > > > > > > floats to unsigned datatype where the float is negative).
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > >         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> > > > > > > >           float to int conversion only with trapping math and correct
> > > > > > > >           sign.
> > > > > > > >
> > > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
> > > > > > > >
> > > > > > > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > > > > > > >
> > > > > > > > ---
> > > > > > > >  gcc/tree-vect-stmts.cc | 8 +++++---
> > > > > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > > > index fdcda0d2abae..2ddd13383193 100644
> > > > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > > > >             break;
> > > > > > > >
> > > > > > > >           cvt_type
> > > > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > > > > >
> > > > > > > But lhs_type should be a float type here, the idea that for a
> > > > > > > FLOAT_EXPR (int -> float)
> > > > > > > a signed integer type is the natural one to use - as it's 2x wider
> > > > > > > than the original
> > > > > > > RHS type it's signedness doesn't matter.  Note all float types should be
> > > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO.
> > > > > > >
> > > > > > > Please drop it.
> > > > > >
> > > > > > Will do.  Sorry about that.
> > > > > >
> > > > > > > >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > > > > > > >           if (cvt_type == NULL_TREE)
> > > > > > > >             goto unsupported;
> > > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > > > >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> > > > > > > >         goto unsupported;
> > > > > > > >
> > > > > > > > -      if (code == FIX_TRUNC_EXPR)
> > > > > > > > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > > > > > > >         {
> > > > > > > >           cvt_type
> > > > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > > > > >
> > > > > > > Here it might be relevant for correctness - we have to choose between
> > > > > > > sfix and ufix for the float -> [u]int conversion.
> > > > > > >
> > > > > > > Do  you have a testcase?  Shouldn't the exactness be independent of the integer
> > > > > > > type we convert to?
> > > > > >
> > > > > > I was looking at this little program which contains undefined behaviour:
> > > > > >
> > > > > > #include <stdio.h>
> > > > > >
> > > > > > __attribute__((noinline,noclone,noipa))
> > > > > > void
> > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
> > > > > >
> > > > > > void
> > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> > > > > > {
> > > > > >         out[0] = in[0];
> > > > > >         out[1] = in[1];
> > > > > >         out[2] = in[2];
> > > > > >         out[3] = in[3];
> > > > > > }
> > > > > >
> > > > > > int main()
> > > > > > {
> > > > > >         double in[] = {-1,-2,-3,-4};
> > > > > >         unsigned int out[4];
> > > > > >
> > > > > >         vec_pack_ufix_trunc_v2df (in, out);
> > > > > >         for (int i = 0; i < 4; ++i)
> > > > > >                 printf("out[%d] = %u\n", i, out[i]);
> > > > > >         return 0;
> > > > > > }
> > > > > >
> > > > > > On s390x, I get different results after vectorization:
> > > > > >
> > > > > > out[0] = 4294967295
> > > > > > out[1] = 4294967294
> > > > > > out[2] = 4294967293
> > > > > > out[3] = 4294967292
> > > > > >
> > > > > > than without vectorization:
> > > > > >
> > > > > > out[0] = 0
> > > > > > out[1] = 0
> > > > > > out[2] = 0
> > > > > > out[3] = 0
> > > > > >
> > > > > > Even if this is undefined behaviour, I think it would be nice to have
> > > > > > consistent results here.
> > > > > >
> > > > > > Also, while I added an expander to circumvent this problem in a
> > > > > > previous patch, reviewers requested to hide this behind trapping math.
> > > > > > Thus, I looked into this.
> > > > > >
> > > > > > Seeing the result from the CI for aarch64, I guess there are some
> > > > > > tests that actually expect this vectorization to always happen even
> > > > > > though it might not be save w.r.t. trapping math.
> > > > >
> > > > > I do remember this was extensively discussed (but we might have missed
> > > > > something) and one argument indeed was that when it's undefined behavior
> > > > > we can do the vectorization given the actual values might be in-bound.
> > > >
> > > > Okay.  Would you be fine with the patch to only vectorize when
> > > > trapping math is disabled?  I still could take care of the rest on
> > > > s390 backend side by defining the appropriate expanders.  Still think
> > > > it is weird though that we might produce different results on
> > > > vectorization than without vectorization.  Yes, that is what
> > > > "undefined behaviour" is all about, but we can simply fix this here.
> > > > Nevertheless, how about just adding the trapping math check?
> > >
> > > So to summarize - the problem is different result when vectorizing
> > > when the scalar code invokes undefined behavior?  I think there is
> > > nothing to fix and we shouldn't pessimize code not invoking undefined
> > > behavior by adding a trapping math check.
> > >
> > > Or did I misunderstand things?
> >
> > The different results can still be delt with in the backend.  The only
> > remaining part is the question if vectorization of FIX_TRUNC_EXPR in
> > multiple steps should be guarded by the trapping math flag or not?  I
> > think it should, but according to CI results, some architectures
> > already rely on the current behaviour.  So I am unsure if we should
> > add the flag or not.  What is your opinion on that?
> 
> If we are adding a float conversion as intermediate step that would
> definitely need to be guarded by !flag_trapping_math, say for
> double -> float -> int, but I think we're doing double -> long -> int
> for this and check ranges to make sure we are not missing a
> truncation.

I thought that was the problematic conversion.  At least I did not see
a check there, but maybe I am wrong.  Then I will drop this patch
completely.

Thanks,

Juergen
Richard Biener Aug. 21, 2024, 11:24 a.m. UTC | #9
On Tue, Aug 20, 2024 at 3:35 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
>
> Am Tue, Aug 20, 2024 at 02:51:02PM +0200 schrieb Richard Biener:
> > On Tue, Aug 20, 2024 at 11:16 AM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > >
> > > Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener:
> > > > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > > >
> > > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener:
> > > > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > > > > >
> > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > > > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote:
> > > > > > > > >
> > > > > > > > > Do not convert floats to ints in multiple step if trapping math is
> > > > > > > > > enabled.  This might hide some inexact signals.
> > > > > > > > >
> > > > > > > > > Also use correct sign (the sign of the target integer type) for the
> > > > > > > > > intermediate steps.  This only affects undefined behaviour (casting
> > > > > > > > > floats to unsigned datatype where the float is negative).
> > > > > > > > >
> > > > > > > > > gcc/ChangeLog:
> > > > > > > > >
> > > > > > > > >         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> > > > > > > > >           float to int conversion only with trapping math and correct
> > > > > > > > >           sign.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
> > > > > > > > >
> > > > > > > > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  gcc/tree-vect-stmts.cc | 8 +++++---
> > > > > > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > > > > index fdcda0d2abae..2ddd13383193 100644
> > > > > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > > > > >             break;
> > > > > > > > >
> > > > > > > > >           cvt_type
> > > > > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > > > > > >
> > > > > > > > But lhs_type should be a float type here, the idea that for a
> > > > > > > > FLOAT_EXPR (int -> float)
> > > > > > > > a signed integer type is the natural one to use - as it's 2x wider
> > > > > > > > than the original
> > > > > > > > RHS type it's signedness doesn't matter.  Note all float types should be
> > > > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO.
> > > > > > > >
> > > > > > > > Please drop it.
> > > > > > >
> > > > > > > Will do.  Sorry about that.
> > > > > > >
> > > > > > > > >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > > > > > > > >           if (cvt_type == NULL_TREE)
> > > > > > > > >             goto unsupported;
> > > > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > > > > >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> > > > > > > > >         goto unsupported;
> > > > > > > > >
> > > > > > > > > -      if (code == FIX_TRUNC_EXPR)
> > > > > > > > > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > > > > > > > >         {
> > > > > > > > >           cvt_type
> > > > > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > > > > > > > +                                             TYPE_UNSIGNED (lhs_type));
> > > > > > > >
> > > > > > > > Here it might be relevant for correctness - we have to choose between
> > > > > > > > sfix and ufix for the float -> [u]int conversion.
> > > > > > > >
> > > > > > > > Do  you have a testcase?  Shouldn't the exactness be independent of the integer
> > > > > > > > type we convert to?
> > > > > > >
> > > > > > > I was looking at this little program which contains undefined behaviour:
> > > > > > >
> > > > > > > #include <stdio.h>
> > > > > > >
> > > > > > > __attribute__((noinline,noclone,noipa))
> > > > > > > void
> > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
> > > > > > >
> > > > > > > void
> > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> > > > > > > {
> > > > > > >         out[0] = in[0];
> > > > > > >         out[1] = in[1];
> > > > > > >         out[2] = in[2];
> > > > > > >         out[3] = in[3];
> > > > > > > }
> > > > > > >
> > > > > > > int main()
> > > > > > > {
> > > > > > >         double in[] = {-1,-2,-3,-4};
> > > > > > >         unsigned int out[4];
> > > > > > >
> > > > > > >         vec_pack_ufix_trunc_v2df (in, out);
> > > > > > >         for (int i = 0; i < 4; ++i)
> > > > > > >                 printf("out[%d] = %u\n", i, out[i]);
> > > > > > >         return 0;
> > > > > > > }
> > > > > > >
> > > > > > > On s390x, I get different results after vectorization:
> > > > > > >
> > > > > > > out[0] = 4294967295
> > > > > > > out[1] = 4294967294
> > > > > > > out[2] = 4294967293
> > > > > > > out[3] = 4294967292
> > > > > > >
> > > > > > > than without vectorization:
> > > > > > >
> > > > > > > out[0] = 0
> > > > > > > out[1] = 0
> > > > > > > out[2] = 0
> > > > > > > out[3] = 0
> > > > > > >
> > > > > > > Even if this is undefined behaviour, I think it would be nice to have
> > > > > > > consistent results here.
> > > > > > >
> > > > > > > Also, while I added an expander to circumvent this problem in a
> > > > > > > previous patch, reviewers requested to hide this behind trapping math.
> > > > > > > Thus, I looked into this.
> > > > > > >
> > > > > > > Seeing the result from the CI for aarch64, I guess there are some
> > > > > > > tests that actually expect this vectorization to always happen even
> > > > > > > though it might not be save w.r.t. trapping math.
> > > > > >
> > > > > > I do remember this was extensively discussed (but we might have missed
> > > > > > something) and one argument indeed was that when it's undefined behavior
> > > > > > we can do the vectorization given the actual values might be in-bound.
> > > > >
> > > > > Okay.  Would you be fine with the patch to only vectorize when
> > > > > trapping math is disabled?  I still could take care of the rest on
> > > > > s390 backend side by defining the appropriate expanders.  Still think
> > > > > it is weird though that we might produce different results on
> > > > > vectorization than without vectorization.  Yes, that is what
> > > > > "undefined behaviour" is all about, but we can simply fix this here.
> > > > > Nevertheless, how about just adding the trapping math check?
> > > >
> > > > So to summarize - the problem is different result when vectorizing
> > > > when the scalar code invokes undefined behavior?  I think there is
> > > > nothing to fix and we shouldn't pessimize code not invoking undefined
> > > > behavior by adding a trapping math check.
> > > >
> > > > Or did I misunderstand things?
> > >
> > > The different results can still be delt with in the backend.  The only
> > > remaining part is the question if vectorization of FIX_TRUNC_EXPR in
> > > multiple steps should be guarded by the trapping math flag or not?  I
> > > think it should, but according to CI results, some architectures
> > > already rely on the current behaviour.  So I am unsure if we should
> > > add the flag or not.  What is your opinion on that?
> >
> > If we are adding a float conversion as intermediate step that would
> > definitely need to be guarded by !flag_trapping_math, say for
> > double -> float -> int, but I think we're doing double -> long -> int
> > for this and check ranges to make sure we are not missing a
> > truncation.
>
> I thought that was the problematic conversion.  At least I did not see
> a check there, but maybe I am wrong.  Then I will drop this patch
> completely.

So we're not doing any range analysis but for double -> int performed
as double -> long -> int when the double cannot be represented in int
the code invokes undefined behavior.  We're only considering intermediate
types where the final step is a truncation so we should be OK here.
The corner-case would be float->unsigned performed as
float->int->unsigned but I think that supportable_narrowing_operation
will reject the int->unsigned conversion.  I'm not sure we are
ever using a non vec_[un]pack_{u,s}fix_trunc optab aka
{u,s}fix_trunc directly for vectors (but x86 has patterns for this).
Doing float->unsigned via int would be wrong then (always, not
only with -ftrapping-math).

So, I think we're fine.

Richard.

>
> Thanks,
>
> Juergen
diff mbox series

Patch

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fdcda0d2abae..2ddd13383193 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5448,7 +5448,8 @@  vectorizable_conversion (vec_info *vinfo,
 	    break;
 
 	  cvt_type
-	    = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
+	    = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
+					      TYPE_UNSIGNED (lhs_type));
 	  cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
 	  if (cvt_type == NULL_TREE)
 	    goto unsupported;
@@ -5505,10 +5506,11 @@  vectorizable_conversion (vec_info *vinfo,
       if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
 	goto unsupported;
 
-      if (code == FIX_TRUNC_EXPR)
+      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
 	{
 	  cvt_type
-	    = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0);
+	    = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
+					      TYPE_UNSIGNED (lhs_type));
 	  cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
 	  if (cvt_type == NULL_TREE)
 	    goto unsupported;