Message ID | 20240802124300.29260-1-jchrist@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | vect: Multistep float->int conversion only with no trapping math | expand |
On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > Do not convert floats to ints in multiple step if trapping math is > enabled. This might hide some inexact signals. > > Also use correct sign (the sign of the target integer type) for the > intermediate steps. This only affects undefined behaviour (casting > floats to unsigned datatype where the float is negative). > > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > float to int conversion only with trapping math and correct > sign. > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > --- > gcc/tree-vect-stmts.cc | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index fdcda0d2abae..2ddd13383193 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > break; > > cvt_type > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > + TYPE_UNSIGNED (lhs_type)); But lhs_type should be a float type here, the idea that for a FLOAT_EXPR (int -> float) a signed integer type is the natural one to use - as it's 2x wider than the original RHS type it's signedness doesn't matter. Note all float types should be !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO. Please drop it. > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > if (cvt_type == NULL_TREE) > goto unsupported; > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > goto unsupported; > > - if (code == FIX_TRUNC_EXPR) > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > { > cvt_type > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > + TYPE_UNSIGNED (lhs_type)); Here it might be relevant for correctness - we have to choose between sfix and ufix for the float -> [u]int conversion. Do you have a testcase? Shouldn't the exactness be independent of the integer type we convert to? > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > if (cvt_type == NULL_TREE) > goto unsupported; > -- > 2.43.5 >
Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > Do not convert floats to ints in multiple step if trapping math is > > enabled. This might hide some inexact signals. > > > > Also use correct sign (the sign of the target integer type) for the > > intermediate steps. This only affects undefined behaviour (casting > > floats to unsigned datatype where the float is negative). > > > > gcc/ChangeLog: > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > float to int conversion only with trapping math and correct > > sign. > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > --- > > gcc/tree-vect-stmts.cc | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > index fdcda0d2abae..2ddd13383193 100644 > > --- a/gcc/tree-vect-stmts.cc > > +++ b/gcc/tree-vect-stmts.cc > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > break; > > > > cvt_type > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > + TYPE_UNSIGNED (lhs_type)); > > But lhs_type should be a float type here, the idea that for a > FLOAT_EXPR (int -> float) > a signed integer type is the natural one to use - as it's 2x wider > than the original > RHS type it's signedness doesn't matter. Note all float types should be > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO. > > Please drop it. Will do. Sorry about that. > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > if (cvt_type == NULL_TREE) > > goto unsupported; > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > goto unsupported; > > > > - if (code == FIX_TRUNC_EXPR) > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > { > > cvt_type > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > + TYPE_UNSIGNED (lhs_type)); > > Here it might be relevant for correctness - we have to choose between > sfix and ufix for the float -> [u]int conversion. > > Do you have a testcase? Shouldn't the exactness be independent of the integer > type we convert to? I was looking at this little program which contains undefined behaviour: #include <stdio.h> __attribute__((noinline,noclone,noipa)) void vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); void vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) { out[0] = in[0]; out[1] = in[1]; out[2] = in[2]; out[3] = in[3]; } int main() { double in[] = {-1,-2,-3,-4}; unsigned int out[4]; vec_pack_ufix_trunc_v2df (in, out); for (int i = 0; i < 4; ++i) printf("out[%d] = %u\n", i, out[i]); return 0; } On s390x, I get different results after vectorization: out[0] = 4294967295 out[1] = 4294967294 out[2] = 4294967293 out[3] = 4294967292 than without vectorization: out[0] = 0 out[1] = 0 out[2] = 0 out[3] = 0 Even if this is undefined behaviour, I think it would be nice to have consistent results here. Also, while I added an expander to circumvent this problem in a previous patch, reviewers requested to hide this behind trapping math. Thus, I looked into this. Seeing the result from the CI for aarch64, I guess there are some tests that actually expect this vectorization to always happen even though it might not be save w.r.t. trapping math. > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > if (cvt_type == NULL_TREE) > > goto unsupported; > > -- > > 2.43.5 > >
On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > Do not convert floats to ints in multiple step if trapping math is > > > enabled. This might hide some inexact signals. > > > > > > Also use correct sign (the sign of the target integer type) for the > > > intermediate steps. This only affects undefined behaviour (casting > > > floats to unsigned datatype where the float is negative). > > > > > > gcc/ChangeLog: > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > > float to int conversion only with trapping math and correct > > > sign. > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > --- > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > index fdcda0d2abae..2ddd13383193 100644 > > > --- a/gcc/tree-vect-stmts.cc > > > +++ b/gcc/tree-vect-stmts.cc > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > > break; > > > > > > cvt_type > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > + TYPE_UNSIGNED (lhs_type)); > > > > But lhs_type should be a float type here, the idea that for a > > FLOAT_EXPR (int -> float) > > a signed integer type is the natural one to use - as it's 2x wider > > than the original > > RHS type it's signedness doesn't matter. Note all float types should be > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO. > > > > Please drop it. > > Will do. Sorry about that. > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > if (cvt_type == NULL_TREE) > > > goto unsupported; > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > > goto unsupported; > > > > > > - if (code == FIX_TRUNC_EXPR) > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > { > > > cvt_type > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > + TYPE_UNSIGNED (lhs_type)); > > > > Here it might be relevant for correctness - we have to choose between > > sfix and ufix for the float -> [u]int conversion. > > > > Do you have a testcase? Shouldn't the exactness be independent of the integer > > type we convert to? > > I was looking at this little program which contains undefined behaviour: > > #include <stdio.h> > > __attribute__((noinline,noclone,noipa)) > void > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > void > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > { > out[0] = in[0]; > out[1] = in[1]; > out[2] = in[2]; > out[3] = in[3]; > } > > int main() > { > double in[] = {-1,-2,-3,-4}; > unsigned int out[4]; > > vec_pack_ufix_trunc_v2df (in, out); > for (int i = 0; i < 4; ++i) > printf("out[%d] = %u\n", i, out[i]); > return 0; > } > > On s390x, I get different results after vectorization: > > out[0] = 4294967295 > out[1] = 4294967294 > out[2] = 4294967293 > out[3] = 4294967292 > > than without vectorization: > > out[0] = 0 > out[1] = 0 > out[2] = 0 > out[3] = 0 > > Even if this is undefined behaviour, I think it would be nice to have > consistent results here. > > Also, while I added an expander to circumvent this problem in a > previous patch, reviewers requested to hide this behind trapping math. > Thus, I looked into this. > > Seeing the result from the CI for aarch64, I guess there are some > tests that actually expect this vectorization to always happen even > though it might not be save w.r.t. trapping math. I do remember this was extensively discussed (but we might have missed something) and one argument indeed was that when it's undefined behavior we can do the vectorization given the actual values might be in-bound. Richard. > > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > if (cvt_type == NULL_TREE) > > > goto unsupported; > > > -- > > > 2.43.5 > > >
Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > Do not convert floats to ints in multiple step if trapping math is > > > > enabled. This might hide some inexact signals. > > > > > > > > Also use correct sign (the sign of the target integer type) for the > > > > intermediate steps. This only affects undefined behaviour (casting > > > > floats to unsigned datatype where the float is negative). > > > > > > > > gcc/ChangeLog: > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > > > float to int conversion only with trapping math and correct > > > > sign. > > > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > --- > > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > --- a/gcc/tree-vect-stmts.cc > > > > +++ b/gcc/tree-vect-stmts.cc > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > > > break; > > > > > > > > cvt_type > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > But lhs_type should be a float type here, the idea that for a > > > FLOAT_EXPR (int -> float) > > > a signed integer type is the natural one to use - as it's 2x wider > > > than the original > > > RHS type it's signedness doesn't matter. Note all float types should be > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO. > > > > > > Please drop it. > > > > Will do. Sorry about that. > > > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > > if (cvt_type == NULL_TREE) > > > > goto unsupported; > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > > > goto unsupported; > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > { > > > > cvt_type > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > Here it might be relevant for correctness - we have to choose between > > > sfix and ufix for the float -> [u]int conversion. > > > > > > Do you have a testcase? Shouldn't the exactness be independent of the integer > > > type we convert to? > > > > I was looking at this little program which contains undefined behaviour: > > > > #include <stdio.h> > > > > __attribute__((noinline,noclone,noipa)) > > void > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > void > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > { > > out[0] = in[0]; > > out[1] = in[1]; > > out[2] = in[2]; > > out[3] = in[3]; > > } > > > > int main() > > { > > double in[] = {-1,-2,-3,-4}; > > unsigned int out[4]; > > > > vec_pack_ufix_trunc_v2df (in, out); > > for (int i = 0; i < 4; ++i) > > printf("out[%d] = %u\n", i, out[i]); > > return 0; > > } > > > > On s390x, I get different results after vectorization: > > > > out[0] = 4294967295 > > out[1] = 4294967294 > > out[2] = 4294967293 > > out[3] = 4294967292 > > > > than without vectorization: > > > > out[0] = 0 > > out[1] = 0 > > out[2] = 0 > > out[3] = 0 > > > > Even if this is undefined behaviour, I think it would be nice to have > > consistent results here. > > > > Also, while I added an expander to circumvent this problem in a > > previous patch, reviewers requested to hide this behind trapping math. > > Thus, I looked into this. > > > > Seeing the result from the CI for aarch64, I guess there are some > > tests that actually expect this vectorization to always happen even > > though it might not be save w.r.t. trapping math. > > I do remember this was extensively discussed (but we might have missed > something) and one argument indeed was that when it's undefined behavior > we can do the vectorization given the actual values might be in-bound. Okay. Would you be fine with the patch to only vectorize when trapping math is disabled? I still could take care of the rest on s390 backend side by defining the appropriate expanders. Still think it is weird though that we might produce different results on vectorization than without vectorization. Yes, that is what "undefined behaviour" is all about, but we can simply fix this here. Nevertheless, how about just adding the trapping math check?
On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > > > Do not convert floats to ints in multiple step if trapping math is > > > > > enabled. This might hide some inexact signals. > > > > > > > > > > Also use correct sign (the sign of the target integer type) for the > > > > > intermediate steps. This only affects undefined behaviour (casting > > > > > floats to unsigned datatype where the float is negative). > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > > > > float to int conversion only with trapping math and correct > > > > > sign. > > > > > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> > > > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > > > --- > > > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > > --- a/gcc/tree-vect-stmts.cc > > > > > +++ b/gcc/tree-vect-stmts.cc > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > > > > break; > > > > > > > > > > cvt_type > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > But lhs_type should be a float type here, the idea that for a > > > > FLOAT_EXPR (int -> float) > > > > a signed integer type is the natural one to use - as it's 2x wider > > > > than the original > > > > RHS type it's signedness doesn't matter. Note all float types should be > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO. > > > > > > > > Please drop it. > > > > > > Will do. Sorry about that. > > > > > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > > > if (cvt_type == NULL_TREE) > > > > > goto unsupported; > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > > > > goto unsupported; > > > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > > { > > > > > cvt_type > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > Here it might be relevant for correctness - we have to choose between > > > > sfix and ufix for the float -> [u]int conversion. > > > > > > > > Do you have a testcase? Shouldn't the exactness be independent of the integer > > > > type we convert to? > > > > > > I was looking at this little program which contains undefined behaviour: > > > > > > #include <stdio.h> > > > > > > __attribute__((noinline,noclone,noipa)) > > > void > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > > > void > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > > { > > > out[0] = in[0]; > > > out[1] = in[1]; > > > out[2] = in[2]; > > > out[3] = in[3]; > > > } > > > > > > int main() > > > { > > > double in[] = {-1,-2,-3,-4}; > > > unsigned int out[4]; > > > > > > vec_pack_ufix_trunc_v2df (in, out); > > > for (int i = 0; i < 4; ++i) > > > printf("out[%d] = %u\n", i, out[i]); > > > return 0; > > > } > > > > > > On s390x, I get different results after vectorization: > > > > > > out[0] = 4294967295 > > > out[1] = 4294967294 > > > out[2] = 4294967293 > > > out[3] = 4294967292 > > > > > > than without vectorization: > > > > > > out[0] = 0 > > > out[1] = 0 > > > out[2] = 0 > > > out[3] = 0 > > > > > > Even if this is undefined behaviour, I think it would be nice to have > > > consistent results here. > > > > > > Also, while I added an expander to circumvent this problem in a > > > previous patch, reviewers requested to hide this behind trapping math. > > > Thus, I looked into this. > > > > > > Seeing the result from the CI for aarch64, I guess there are some > > > tests that actually expect this vectorization to always happen even > > > though it might not be save w.r.t. trapping math. > > > > I do remember this was extensively discussed (but we might have missed > > something) and one argument indeed was that when it's undefined behavior > > we can do the vectorization given the actual values might be in-bound. > > Okay. Would you be fine with the patch to only vectorize when > trapping math is disabled? I still could take care of the rest on > s390 backend side by defining the appropriate expanders. Still think > it is weird though that we might produce different results on > vectorization than without vectorization. Yes, that is what > "undefined behaviour" is all about, but we can simply fix this here. > Nevertheless, how about just adding the trapping math check? So to summarize - the problem is different result when vectorizing when the scalar code invokes undefined behavior? I think there is nothing to fix and we shouldn't pessimize code not invoking undefined behavior by adding a trapping math check. Or did I misunderstand things? Thanks, Richard.
Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener: > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > > > > > Do not convert floats to ints in multiple step if trapping math is > > > > > > enabled. This might hide some inexact signals. > > > > > > > > > > > > Also use correct sign (the sign of the target integer type) for the > > > > > > intermediate steps. This only affects undefined behaviour (casting > > > > > > floats to unsigned datatype where the float is negative). > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > > > > > float to int conversion only with trapping math and correct > > > > > > sign. > > > > > > > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> > > > > > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > > > > > --- > > > > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > > > --- a/gcc/tree-vect-stmts.cc > > > > > > +++ b/gcc/tree-vect-stmts.cc > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > break; > > > > > > > > > > > > cvt_type > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > > > But lhs_type should be a float type here, the idea that for a > > > > > FLOAT_EXPR (int -> float) > > > > > a signed integer type is the natural one to use - as it's 2x wider > > > > > than the original > > > > > RHS type it's signedness doesn't matter. Note all float types should be > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO. > > > > > > > > > > Please drop it. > > > > > > > > Will do. Sorry about that. > > > > > > > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > > > > if (cvt_type == NULL_TREE) > > > > > > goto unsupported; > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > > > > > goto unsupported; > > > > > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > > > { > > > > > > cvt_type > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > > > Here it might be relevant for correctness - we have to choose between > > > > > sfix and ufix for the float -> [u]int conversion. > > > > > > > > > > Do you have a testcase? Shouldn't the exactness be independent of the integer > > > > > type we convert to? > > > > > > > > I was looking at this little program which contains undefined behaviour: > > > > > > > > #include <stdio.h> > > > > > > > > __attribute__((noinline,noclone,noipa)) > > > > void > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > > > > > void > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > > > { > > > > out[0] = in[0]; > > > > out[1] = in[1]; > > > > out[2] = in[2]; > > > > out[3] = in[3]; > > > > } > > > > > > > > int main() > > > > { > > > > double in[] = {-1,-2,-3,-4}; > > > > unsigned int out[4]; > > > > > > > > vec_pack_ufix_trunc_v2df (in, out); > > > > for (int i = 0; i < 4; ++i) > > > > printf("out[%d] = %u\n", i, out[i]); > > > > return 0; > > > > } > > > > > > > > On s390x, I get different results after vectorization: > > > > > > > > out[0] = 4294967295 > > > > out[1] = 4294967294 > > > > out[2] = 4294967293 > > > > out[3] = 4294967292 > > > > > > > > than without vectorization: > > > > > > > > out[0] = 0 > > > > out[1] = 0 > > > > out[2] = 0 > > > > out[3] = 0 > > > > > > > > Even if this is undefined behaviour, I think it would be nice to have > > > > consistent results here. > > > > > > > > Also, while I added an expander to circumvent this problem in a > > > > previous patch, reviewers requested to hide this behind trapping math. > > > > Thus, I looked into this. > > > > > > > > Seeing the result from the CI for aarch64, I guess there are some > > > > tests that actually expect this vectorization to always happen even > > > > though it might not be save w.r.t. trapping math. > > > > > > I do remember this was extensively discussed (but we might have missed > > > something) and one argument indeed was that when it's undefined behavior > > > we can do the vectorization given the actual values might be in-bound. > > > > Okay. Would you be fine with the patch to only vectorize when > > trapping math is disabled? I still could take care of the rest on > > s390 backend side by defining the appropriate expanders. Still think > > it is weird though that we might produce different results on > > vectorization than without vectorization. Yes, that is what > > "undefined behaviour" is all about, but we can simply fix this here. > > Nevertheless, how about just adding the trapping math check? > > So to summarize - the problem is different result when vectorizing > when the scalar code invokes undefined behavior? I think there is > nothing to fix and we shouldn't pessimize code not invoking undefined > behavior by adding a trapping math check. > > Or did I misunderstand things? The different results can still be delt with in the backend. The only remaining part is the question if vectorization of FIX_TRUNC_EXPR in multiple steps should be guarded by the trapping math flag or not? I think it should, but according to CI results, some architectures already rely on the current behaviour. So I am unsure if we should add the flag or not. What is your opinion on that? Regards, Juergen
On Tue, Aug 20, 2024 at 11:16 AM Juergen Christ <jchrist@linux.ibm.com> wrote: > > Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener: > > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > > > > > > > Do not convert floats to ints in multiple step if trapping math is > > > > > > > enabled. This might hide some inexact signals. > > > > > > > > > > > > > > Also use correct sign (the sign of the target integer type) for the > > > > > > > intermediate steps. This only affects undefined behaviour (casting > > > > > > > floats to unsigned datatype where the float is negative). > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > > > > > > float to int conversion only with trapping math and correct > > > > > > > sign. > > > > > > > > > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> > > > > > > > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > > > > > > > --- > > > > > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > > > > --- a/gcc/tree-vect-stmts.cc > > > > > > > +++ b/gcc/tree-vect-stmts.cc > > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > > break; > > > > > > > > > > > > > > cvt_type > > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > > > > > But lhs_type should be a float type here, the idea that for a > > > > > > FLOAT_EXPR (int -> float) > > > > > > a signed integer type is the natural one to use - as it's 2x wider > > > > > > than the original > > > > > > RHS type it's signedness doesn't matter. Note all float types should be > > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO. > > > > > > > > > > > > Please drop it. > > > > > > > > > > Will do. Sorry about that. > > > > > > > > > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > > > > > if (cvt_type == NULL_TREE) > > > > > > > goto unsupported; > > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > > > > > > goto unsupported; > > > > > > > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > > > > { > > > > > > > cvt_type > > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > > > > > Here it might be relevant for correctness - we have to choose between > > > > > > sfix and ufix for the float -> [u]int conversion. > > > > > > > > > > > > Do you have a testcase? Shouldn't the exactness be independent of the integer > > > > > > type we convert to? > > > > > > > > > > I was looking at this little program which contains undefined behaviour: > > > > > > > > > > #include <stdio.h> > > > > > > > > > > __attribute__((noinline,noclone,noipa)) > > > > > void > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > > > > > > > void > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > > > > { > > > > > out[0] = in[0]; > > > > > out[1] = in[1]; > > > > > out[2] = in[2]; > > > > > out[3] = in[3]; > > > > > } > > > > > > > > > > int main() > > > > > { > > > > > double in[] = {-1,-2,-3,-4}; > > > > > unsigned int out[4]; > > > > > > > > > > vec_pack_ufix_trunc_v2df (in, out); > > > > > for (int i = 0; i < 4; ++i) > > > > > printf("out[%d] = %u\n", i, out[i]); > > > > > return 0; > > > > > } > > > > > > > > > > On s390x, I get different results after vectorization: > > > > > > > > > > out[0] = 4294967295 > > > > > out[1] = 4294967294 > > > > > out[2] = 4294967293 > > > > > out[3] = 4294967292 > > > > > > > > > > than without vectorization: > > > > > > > > > > out[0] = 0 > > > > > out[1] = 0 > > > > > out[2] = 0 > > > > > out[3] = 0 > > > > > > > > > > Even if this is undefined behaviour, I think it would be nice to have > > > > > consistent results here. > > > > > > > > > > Also, while I added an expander to circumvent this problem in a > > > > > previous patch, reviewers requested to hide this behind trapping math. > > > > > Thus, I looked into this. > > > > > > > > > > Seeing the result from the CI for aarch64, I guess there are some > > > > > tests that actually expect this vectorization to always happen even > > > > > though it might not be save w.r.t. trapping math. > > > > > > > > I do remember this was extensively discussed (but we might have missed > > > > something) and one argument indeed was that when it's undefined behavior > > > > we can do the vectorization given the actual values might be in-bound. > > > > > > Okay. Would you be fine with the patch to only vectorize when > > > trapping math is disabled? I still could take care of the rest on > > > s390 backend side by defining the appropriate expanders. Still think > > > it is weird though that we might produce different results on > > > vectorization than without vectorization. Yes, that is what > > > "undefined behaviour" is all about, but we can simply fix this here. > > > Nevertheless, how about just adding the trapping math check? > > > > So to summarize - the problem is different result when vectorizing > > when the scalar code invokes undefined behavior? I think there is > > nothing to fix and we shouldn't pessimize code not invoking undefined > > behavior by adding a trapping math check. > > > > Or did I misunderstand things? > > The different results can still be delt with in the backend. The only > remaining part is the question if vectorization of FIX_TRUNC_EXPR in > multiple steps should be guarded by the trapping math flag or not? I > think it should, but according to CI results, some architectures > already rely on the current behaviour. So I am unsure if we should > add the flag or not. What is your opinion on that? If we are adding a float conversion as intermediate step that would definitely need to be guarded by !flag_trapping_math, say for double -> float -> int, but I think we're doing double -> long -> int for this and check ranges to make sure we are not missing a truncation. Which path is it that does not properly honor exception flags in the case of multi-step conversions? Richard. > > Regards, > > Juergen
Am Tue, Aug 20, 2024 at 02:51:02PM +0200 schrieb Richard Biener: > On Tue, Aug 20, 2024 at 11:16 AM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener: > > > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > > > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > > > > > > > > > Do not convert floats to ints in multiple step if trapping math is > > > > > > > > enabled. This might hide some inexact signals. > > > > > > > > > > > > > > > > Also use correct sign (the sign of the target integer type) for the > > > > > > > > intermediate steps. This only affects undefined behaviour (casting > > > > > > > > floats to unsigned datatype where the float is negative). > > > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > > > > > > > float to int conversion only with trapping math and correct > > > > > > > > sign. > > > > > > > > > > > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> > > > > > > > > > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > > > > > > > > > --- > > > > > > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > > > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > > > > > --- a/gcc/tree-vect-stmts.cc > > > > > > > > +++ b/gcc/tree-vect-stmts.cc > > > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > > > break; > > > > > > > > > > > > > > > > cvt_type > > > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > > > > > > > But lhs_type should be a float type here, the idea that for a > > > > > > > FLOAT_EXPR (int -> float) > > > > > > > a signed integer type is the natural one to use - as it's 2x wider > > > > > > > than the original > > > > > > > RHS type it's signedness doesn't matter. Note all float types should be > > > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO. > > > > > > > > > > > > > > Please drop it. > > > > > > > > > > > > Will do. Sorry about that. > > > > > > > > > > > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > > > > > > if (cvt_type == NULL_TREE) > > > > > > > > goto unsupported; > > > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > > > > > > > goto unsupported; > > > > > > > > > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > > > > > { > > > > > > > > cvt_type > > > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > > > > > > > Here it might be relevant for correctness - we have to choose between > > > > > > > sfix and ufix for the float -> [u]int conversion. > > > > > > > > > > > > > > Do you have a testcase? Shouldn't the exactness be independent of the integer > > > > > > > type we convert to? > > > > > > > > > > > > I was looking at this little program which contains undefined behaviour: > > > > > > > > > > > > #include <stdio.h> > > > > > > > > > > > > __attribute__((noinline,noclone,noipa)) > > > > > > void > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > > > > > > > > > void > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > > > > > { > > > > > > out[0] = in[0]; > > > > > > out[1] = in[1]; > > > > > > out[2] = in[2]; > > > > > > out[3] = in[3]; > > > > > > } > > > > > > > > > > > > int main() > > > > > > { > > > > > > double in[] = {-1,-2,-3,-4}; > > > > > > unsigned int out[4]; > > > > > > > > > > > > vec_pack_ufix_trunc_v2df (in, out); > > > > > > for (int i = 0; i < 4; ++i) > > > > > > printf("out[%d] = %u\n", i, out[i]); > > > > > > return 0; > > > > > > } > > > > > > > > > > > > On s390x, I get different results after vectorization: > > > > > > > > > > > > out[0] = 4294967295 > > > > > > out[1] = 4294967294 > > > > > > out[2] = 4294967293 > > > > > > out[3] = 4294967292 > > > > > > > > > > > > than without vectorization: > > > > > > > > > > > > out[0] = 0 > > > > > > out[1] = 0 > > > > > > out[2] = 0 > > > > > > out[3] = 0 > > > > > > > > > > > > Even if this is undefined behaviour, I think it would be nice to have > > > > > > consistent results here. > > > > > > > > > > > > Also, while I added an expander to circumvent this problem in a > > > > > > previous patch, reviewers requested to hide this behind trapping math. > > > > > > Thus, I looked into this. > > > > > > > > > > > > Seeing the result from the CI for aarch64, I guess there are some > > > > > > tests that actually expect this vectorization to always happen even > > > > > > though it might not be save w.r.t. trapping math. > > > > > > > > > > I do remember this was extensively discussed (but we might have missed > > > > > something) and one argument indeed was that when it's undefined behavior > > > > > we can do the vectorization given the actual values might be in-bound. > > > > > > > > Okay. Would you be fine with the patch to only vectorize when > > > > trapping math is disabled? I still could take care of the rest on > > > > s390 backend side by defining the appropriate expanders. Still think > > > > it is weird though that we might produce different results on > > > > vectorization than without vectorization. Yes, that is what > > > > "undefined behaviour" is all about, but we can simply fix this here. > > > > Nevertheless, how about just adding the trapping math check? > > > > > > So to summarize - the problem is different result when vectorizing > > > when the scalar code invokes undefined behavior? I think there is > > > nothing to fix and we shouldn't pessimize code not invoking undefined > > > behavior by adding a trapping math check. > > > > > > Or did I misunderstand things? > > > > The different results can still be delt with in the backend. The only > > remaining part is the question if vectorization of FIX_TRUNC_EXPR in > > multiple steps should be guarded by the trapping math flag or not? I > > think it should, but according to CI results, some architectures > > already rely on the current behaviour. So I am unsure if we should > > add the flag or not. What is your opinion on that? > > If we are adding a float conversion as intermediate step that would > definitely need to be guarded by !flag_trapping_math, say for > double -> float -> int, but I think we're doing double -> long -> int > for this and check ranges to make sure we are not missing a > truncation. I thought that was the problematic conversion. At least I did not see a check there, but maybe I am wrong. Then I will drop this patch completely. Thanks, Juergen
On Tue, Aug 20, 2024 at 3:35 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > Am Tue, Aug 20, 2024 at 02:51:02PM +0200 schrieb Richard Biener: > > On Tue, Aug 20, 2024 at 11:16 AM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener: > > > > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > > > > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchrist@linux.ibm.com> wrote: > > > > > > > > > > > > > > > > > > Do not convert floats to ints in multiple step if trapping math is > > > > > > > > > enabled. This might hide some inexact signals. > > > > > > > > > > > > > > > > > > Also use correct sign (the sign of the target integer type) for the > > > > > > > > > intermediate steps. This only affects undefined behaviour (casting > > > > > > > > > floats to unsigned datatype where the float is negative). > > > > > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > > > > > > > > float to int conversion only with trapping math and correct > > > > > > > > > sign. > > > > > > > > > > > > > > > > > > Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> > > > > > > > > > > > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > > > > > > > > > > > --- > > > > > > > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > > > > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > > > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > > > > > > --- a/gcc/tree-vect-stmts.cc > > > > > > > > > +++ b/gcc/tree-vect-stmts.cc > > > > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > > > > break; > > > > > > > > > > > > > > > > > > cvt_type > > > > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > > > > > > > > > But lhs_type should be a float type here, the idea that for a > > > > > > > > FLOAT_EXPR (int -> float) > > > > > > > > a signed integer type is the natural one to use - as it's 2x wider > > > > > > > > than the original > > > > > > > > RHS type it's signedness doesn't matter. Note all float types should be > > > > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent IMO. > > > > > > > > > > > > > > > > Please drop it. > > > > > > > > > > > > > > Will do. Sorry about that. > > > > > > > > > > > > > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > > > > > > > if (cvt_type == NULL_TREE) > > > > > > > > > goto unsupported; > > > > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > > > > > > > > goto unsupported; > > > > > > > > > > > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > > > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > > > > > > { > > > > > > > > > cvt_type > > > > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), > > > > > > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > > > > > > > > > > > Here it might be relevant for correctness - we have to choose between > > > > > > > > sfix and ufix for the float -> [u]int conversion. > > > > > > > > > > > > > > > > Do you have a testcase? Shouldn't the exactness be independent of the integer > > > > > > > > type we convert to? > > > > > > > > > > > > > > I was looking at this little program which contains undefined behaviour: > > > > > > > > > > > > > > #include <stdio.h> > > > > > > > > > > > > > > __attribute__((noinline,noclone,noipa)) > > > > > > > void > > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > > > > > > > > > > > void > > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > > > > > > { > > > > > > > out[0] = in[0]; > > > > > > > out[1] = in[1]; > > > > > > > out[2] = in[2]; > > > > > > > out[3] = in[3]; > > > > > > > } > > > > > > > > > > > > > > int main() > > > > > > > { > > > > > > > double in[] = {-1,-2,-3,-4}; > > > > > > > unsigned int out[4]; > > > > > > > > > > > > > > vec_pack_ufix_trunc_v2df (in, out); > > > > > > > for (int i = 0; i < 4; ++i) > > > > > > > printf("out[%d] = %u\n", i, out[i]); > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > On s390x, I get different results after vectorization: > > > > > > > > > > > > > > out[0] = 4294967295 > > > > > > > out[1] = 4294967294 > > > > > > > out[2] = 4294967293 > > > > > > > out[3] = 4294967292 > > > > > > > > > > > > > > than without vectorization: > > > > > > > > > > > > > > out[0] = 0 > > > > > > > out[1] = 0 > > > > > > > out[2] = 0 > > > > > > > out[3] = 0 > > > > > > > > > > > > > > Even if this is undefined behaviour, I think it would be nice to have > > > > > > > consistent results here. > > > > > > > > > > > > > > Also, while I added an expander to circumvent this problem in a > > > > > > > previous patch, reviewers requested to hide this behind trapping math. > > > > > > > Thus, I looked into this. > > > > > > > > > > > > > > Seeing the result from the CI for aarch64, I guess there are some > > > > > > > tests that actually expect this vectorization to always happen even > > > > > > > though it might not be save w.r.t. trapping math. > > > > > > > > > > > > I do remember this was extensively discussed (but we might have missed > > > > > > something) and one argument indeed was that when it's undefined behavior > > > > > > we can do the vectorization given the actual values might be in-bound. > > > > > > > > > > Okay. Would you be fine with the patch to only vectorize when > > > > > trapping math is disabled? I still could take care of the rest on > > > > > s390 backend side by defining the appropriate expanders. Still think > > > > > it is weird though that we might produce different results on > > > > > vectorization than without vectorization. Yes, that is what > > > > > "undefined behaviour" is all about, but we can simply fix this here. > > > > > Nevertheless, how about just adding the trapping math check? > > > > > > > > So to summarize - the problem is different result when vectorizing > > > > when the scalar code invokes undefined behavior? I think there is > > > > nothing to fix and we shouldn't pessimize code not invoking undefined > > > > behavior by adding a trapping math check. > > > > > > > > Or did I misunderstand things? > > > > > > The different results can still be delt with in the backend. The only > > > remaining part is the question if vectorization of FIX_TRUNC_EXPR in > > > multiple steps should be guarded by the trapping math flag or not? I > > > think it should, but according to CI results, some architectures > > > already rely on the current behaviour. So I am unsure if we should > > > add the flag or not. What is your opinion on that? > > > > If we are adding a float conversion as intermediate step that would > > definitely need to be guarded by !flag_trapping_math, say for > > double -> float -> int, but I think we're doing double -> long -> int > > for this and check ranges to make sure we are not missing a > > truncation. > > I thought that was the problematic conversion. At least I did not see > a check there, but maybe I am wrong. Then I will drop this patch > completely. So we're not doing any range analysis but for double -> int performed as double -> long -> int when the double cannot be represented in int the code invokes undefined behavior. We're only considering intermediate types where the final step is a truncation so we should be OK here. The corner-case would be float->unsigned performed as float->int->unsigned but I think that supportable_narrowing_operation will reject the int->unsigned conversion. I'm not sure we are ever using a non vec_[un]pack_{u,s}fix_trunc optab aka {u,s}fix_trunc directly for vectors (but x86 has patterns for this). Doing float->unsigned via int would be wrong then (always, not only with -ftrapping-math). So, I think we're fine. Richard. > > Thanks, > > Juergen
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index fdcda0d2abae..2ddd13383193 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, break; cvt_type - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), + TYPE_UNSIGNED (lhs_type)); cvt_type = get_same_sized_vectype (cvt_type, vectype_in); if (cvt_type == NULL_TREE) goto unsupported; @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) goto unsupported; - if (code == FIX_TRUNC_EXPR) + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) { cvt_type - = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), 0); + = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode), + TYPE_UNSIGNED (lhs_type)); cvt_type = get_same_sized_vectype (cvt_type, vectype_in); if (cvt_type == NULL_TREE) goto unsupported;
Do not convert floats to ints in multiple step if trapping math is enabled. This might hide some inexact signals. Also use correct sign (the sign of the target integer type) for the intermediate steps. This only affects undefined behaviour (casting floats to unsigned datatype where the float is negative). gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_conversion): multi-step float to int conversion only with trapping math and correct sign. Signed-off-by: Juergen Christ <jchrist@linux.ibm.com> Bootstrapped and tested on x84 and s390. Ok for trunk? --- gcc/tree-vect-stmts.cc | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)