Message ID | 20221013123649.474497-1-aldyh@redhat.com |
---|---|
State | New |
Headers | show |
Series | [PR24021] Implement PLUS_EXPR range-op entry for floats. | expand |
On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote:
> PR tree-optimization/24021
Ah - Verboten in Fortran:
$ cat d.f
DOUBLE PRECISION A, X
A = 0.0
DO X = 0.1, 1.0
A = A + X
ENDDO
END
$ gfortran d.f
d.f:3:9:
3 | DO X = 0.1, 1.0
| 1
Warning: Deleted feature: Loop variable at (1) must be integer
d.f:3:12:
3 | DO X = 0.1, 1.0
| 1
Warning: Deleted feature: Start expression in DO loop at (1) must be integer
d.f:3:17:
3 | DO X = 0.1, 1.0
| 1
Warning: Deleted feature: End expression in DO loop at (1) must be integer
I'm not following. My patch doesn't affect this behavior. What am I missing? Aldy On Thu, Oct 13, 2022 at 3:04 PM Toon Moene <toon@moene.org> wrote: > > On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote: > > > PR tree-optimization/24021 > > Ah - Verboten in Fortran: > > $ cat d.f > DOUBLE PRECISION A, X > A = 0.0 > DO X = 0.1, 1.0 > A = A + X > ENDDO > END > $ gfortran d.f > d.f:3:9: > > 3 | DO X = 0.1, 1.0 > | 1 > Warning: Deleted feature: Loop variable at (1) must be integer > d.f:3:12: > > 3 | DO X = 0.1, 1.0 > | 1 > Warning: Deleted feature: Start expression in DO loop at (1) must be integer > d.f:3:17: > > 3 | DO X = 0.1, 1.0 > | 1 > Warning: Deleted feature: End expression in DO loop at (1) must be integer > > -- > Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >
It was just a comment on the code of the PR ... Toon. On 10/13/22 15:44, Aldy Hernandez wrote: > I'm not following. My patch doesn't affect this behavior. > > What am I missing? > > Aldy > > On Thu, Oct 13, 2022 at 3:04 PM Toon Moene <toon@moene.org> wrote: >> >> On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote: >> >>> PR tree-optimization/24021 >> >> Ah - Verboten in Fortran: >> >> $ cat d.f >> DOUBLE PRECISION A, X >> A = 0.0 >> DO X = 0.1, 1.0 >> A = A + X >> ENDDO >> END >> $ gfortran d.f >> d.f:3:9: >> >> 3 | DO X = 0.1, 1.0 >> | 1 >> Warning: Deleted feature: Loop variable at (1) must be integer >> d.f:3:12: >> >> 3 | DO X = 0.1, 1.0 >> | 1 >> Warning: Deleted feature: Start expression in DO loop at (1) must be integer >> d.f:3:17: >> >> 3 | DO X = 0.1, 1.0 >> | 1 >> Warning: Deleted feature: End expression in DO loop at (1) must be integer >> >> -- >> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >> >
On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote: > +// Like real_arithmetic, but round the result to INF if the operation > +// produced inexact results. > +// > +// ?? There is still one problematic case, i387. With > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in > +// XFmode (long_double_type_node), so that case is OK. But without > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode > +// precision (64-bit mantissa) and only occassionally rounded to > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > +// this is ok as well though it is just occassionally more precise. ?? > + > +static void > +frange_arithmetic (enum tree_code code, tree type, > + REAL_VALUE_TYPE &result, > + const REAL_VALUE_TYPE &op1, > + const REAL_VALUE_TYPE &op2, > + const REAL_VALUE_TYPE &inf) > +{ > + REAL_VALUE_TYPE value; > + enum machine_mode mode = TYPE_MODE (type); > + bool mode_composite = MODE_COMPOSITE_P (mode); > + > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > + real_convert (&result, mode, &value); > + > + // If real_convert above has rounded an inexact value to towards > + // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp > + // later (real_nextafter). > + bool rounding = (flag_rounding_math > + && (real_isneg (&inf) > + ? real_less (&result, &value) > + : !real_less (&value, &result))); I thought the agreement during Cauldron was that we'd do this always, regardless of flag_rounding_math. Because excess precision (the fast one like on ia32 or -mfpmath=387 on x86_64), or -frounding-math, or FMA contraction can all increase precision and worst case it all behaves like -frounding-math for the ranges. So, perhaps use: if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value) : !real_less (&value, &result)) && (inexact || !real_identical (&result, &value)))) ? No need to do the real_isneg/real_less stuff for mode_composite, then we do it always for inexacts, but otherwise we check if the rounding performed by real.cc has been in the conservative direction (for upper bound to +inf, for lower bound to -inf), if yes, we don't need to do anything, if yes, we frange_nextafter. As discussed, for mode_composite, I think we want to do the extra stuff for inexact denormals and otherwise do the nextafter unconditionally, because our internal mode_composite representation isn't precise enough. > + // Be extra careful if there may be discrepancies between the > + // compile and runtime results. > + if ((rounding || mode_composite) > + && (inexact || !real_identical (&result, &value))) > + { > + if (mode_composite) > + { > + bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0; Use real_isdenormal here? Though, real_iszero needs the same thing. > + if (denormal) > + { > + REAL_VALUE_TYPE tmp; And explain here why is this, that IBM extended denormals have just DFmode precision. Though, now that I think about it, while this is correct for denormals, > + real_convert (&tmp, DFmode, &value); > + frange_nextafter (DFmode, tmp, inf); > + real_convert (&result, mode, &tmp); > + } there are also the cases where the higher double exponent is in the [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so. https://en.wikipedia.org/wiki/Double-precision_floating-point_format If the upper double is denormal in the DFmode sense, so smaller absolute value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to do, the lower double must be always +/- zero. Now, if the result is __DBL_MIN__, the upper double is already normalized but we can add __DBL_DENORM_MIN__ to it, which will make the number have 54-bit precision. If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__ and make it 55-bit precision. Etc. until we reach __DBL_MIN__ * 2e53 where it acts like fully normalized 106-bit precision number. I must say I'm not really sure what real_nextafter is doing in those cases, I'm afraid it doesn't handle it correctly but the only other use of real_nextafter is guarded with: /* Don't handle composite modes, nor decimal, nor modes without inf or denorm at least for now. */ if (format->pnan < format->p || format->b == 10 || !format->has_inf || !format->has_denorm) return false; so it isn't that big deal except for ranges. Jakub
On Thu, Oct 13, 2022 at 3:52 PM Toon Moene <toon@moene.org> wrote: > > It was just a comment on the code of the PR ... Ahh... the patch adds support for ranger handling of floating point PLUS_EXPRs in the IL. The PR's testcase is just one of the many things it will be able to solve. Aldy > > Toon. > > On 10/13/22 15:44, Aldy Hernandez wrote: > > > I'm not following. My patch doesn't affect this behavior. > > > > What am I missing? > > > > Aldy > > > > On Thu, Oct 13, 2022 at 3:04 PM Toon Moene <toon@moene.org> wrote: > >> > >> On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote: > >> > >>> PR tree-optimization/24021 > >> > >> Ah - Verboten in Fortran: > >> > >> $ cat d.f > >> DOUBLE PRECISION A, X > >> A = 0.0 > >> DO X = 0.1, 1.0 > >> A = A + X > >> ENDDO > >> END > >> $ gfortran d.f > >> d.f:3:9: > >> > >> 3 | DO X = 0.1, 1.0 > >> | 1 > >> Warning: Deleted feature: Loop variable at (1) must be integer > >> d.f:3:12: > >> > >> 3 | DO X = 0.1, 1.0 > >> | 1 > >> Warning: Deleted feature: Start expression in DO loop at (1) must be integer > >> d.f:3:17: > >> > >> 3 | DO X = 0.1, 1.0 > >> | 1 > >> Warning: Deleted feature: End expression in DO loop at (1) must be integer > >> > >> -- > >> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 > >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > >> > > > > -- > Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >
On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote: > > +// Like real_arithmetic, but round the result to INF if the operation > > +// produced inexact results. > > +// > > +// ?? There is still one problematic case, i387. With > > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in > > +// XFmode (long_double_type_node), so that case is OK. But without > > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode > > +// precision (64-bit mantissa) and only occassionally rounded to > > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > > +// this is ok as well though it is just occassionally more precise. ?? > > + > > +static void > > +frange_arithmetic (enum tree_code code, tree type, > > + REAL_VALUE_TYPE &result, > > + const REAL_VALUE_TYPE &op1, > > + const REAL_VALUE_TYPE &op2, > > + const REAL_VALUE_TYPE &inf) > > +{ > > + REAL_VALUE_TYPE value; > > + enum machine_mode mode = TYPE_MODE (type); > > + bool mode_composite = MODE_COMPOSITE_P (mode); > > + > > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > > + real_convert (&result, mode, &value); > > + > > + // If real_convert above has rounded an inexact value to towards > > + // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp > > + // later (real_nextafter). > > + bool rounding = (flag_rounding_math > > + && (real_isneg (&inf) > > + ? real_less (&result, &value) > > + : !real_less (&value, &result))); > > I thought the agreement during Cauldron was that we'd do this always, > regardless of flag_rounding_math. > Because excess precision (the fast one like on ia32 or -mfpmath=387 on > x86_64), or -frounding-math, or FMA contraction can all increase precision > and worst case it all behaves like -frounding-math for the ranges. > > So, perhaps use: > if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value) > : !real_less (&value, &result)) > && (inexact || !real_identical (&result, &value)))) Done. > ? > No need to do the real_isneg/real_less stuff for mode_composite, then > we do it always for inexacts, but otherwise we check if the rounding > performed by real.cc has been in the conservative direction (for upper > bound to +inf, for lower bound to -inf), if yes, we don't need to do > anything, if yes, we frange_nextafter. > > As discussed, for mode_composite, I think we want to do the extra > stuff for inexact denormals and otherwise do the nextafter unconditionally, > because our internal mode_composite representation isn't precise enough. > > > + // Be extra careful if there may be discrepancies between the > > + // compile and runtime results. > > + if ((rounding || mode_composite) > > + && (inexact || !real_identical (&result, &value))) > > + { > > + if (mode_composite) > > + { > > + bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0; > > Use real_isdenormal here? Done. > Though, real_iszero needs the same thing. So... real_isdenormal() || real_iszero() as in the attached patch? > > > + if (denormal) > > + { > > + REAL_VALUE_TYPE tmp; > > And explain here why is this, that IBM extended denormals have just > DFmode precision. Done. > Though, now that I think about it, while this is correct for denormals, > > > + real_convert (&tmp, DFmode, &value); > > + frange_nextafter (DFmode, tmp, inf); > > + real_convert (&result, mode, &tmp); > > + } > > there are also the cases where the higher double exponent is in the > [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so. > https://en.wikipedia.org/wiki/Double-precision_floating-point_format > If the upper double is denormal in the DFmode sense, so smaller absolute > value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to > do, the lower double must be always +/- zero. > Now, if the result is __DBL_MIN__, the upper double is already normalized > but we can add __DBL_DENORM_MIN__ to it, which will make the number have > 54-bit precision. > If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__ > and make it 55-bit precision. Etc. until we reach __DBL_MIN__ * 2e53 > where it acts like fully normalized 106-bit precision number. > I must say I'm not really sure what real_nextafter is doing in those cases, > I'm afraid it doesn't handle it correctly but the only other use > of real_nextafter is guarded with: > /* Don't handle composite modes, nor decimal, nor modes without > inf or denorm at least for now. */ > if (format->pnan < format->p > || format->b == 10 > || !format->has_inf > || !format->has_denorm) > return false; Dunno. Is there a conservative thing we can do for mode_composites that aren't denormal or zero? How does this look? Aldy
PING On Mon, Oct 17, 2022 at 8:21 AM Aldy Hernandez <aldyh@redhat.com> wrote: > > On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote: > > > +// Like real_arithmetic, but round the result to INF if the operation > > > +// produced inexact results. > > > +// > > > +// ?? There is still one problematic case, i387. With > > > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in > > > +// XFmode (long_double_type_node), so that case is OK. But without > > > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode > > > +// precision (64-bit mantissa) and only occassionally rounded to > > > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > > > +// this is ok as well though it is just occassionally more precise. ?? > > > + > > > +static void > > > +frange_arithmetic (enum tree_code code, tree type, > > > + REAL_VALUE_TYPE &result, > > > + const REAL_VALUE_TYPE &op1, > > > + const REAL_VALUE_TYPE &op2, > > > + const REAL_VALUE_TYPE &inf) > > > +{ > > > + REAL_VALUE_TYPE value; > > > + enum machine_mode mode = TYPE_MODE (type); > > > + bool mode_composite = MODE_COMPOSITE_P (mode); > > > + > > > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > > > + real_convert (&result, mode, &value); > > > + > > > + // If real_convert above has rounded an inexact value to towards > > > + // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp > > > + // later (real_nextafter). > > > + bool rounding = (flag_rounding_math > > > + && (real_isneg (&inf) > > > + ? real_less (&result, &value) > > > + : !real_less (&value, &result))); > > > > I thought the agreement during Cauldron was that we'd do this always, > > regardless of flag_rounding_math. > > Because excess precision (the fast one like on ia32 or -mfpmath=387 on > > x86_64), or -frounding-math, or FMA contraction can all increase precision > > and worst case it all behaves like -frounding-math for the ranges. > > > > So, perhaps use: > > if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value) > > : !real_less (&value, &result)) > > && (inexact || !real_identical (&result, &value)))) > > Done. > > > ? > > No need to do the real_isneg/real_less stuff for mode_composite, then > > we do it always for inexacts, but otherwise we check if the rounding > > performed by real.cc has been in the conservative direction (for upper > > bound to +inf, for lower bound to -inf), if yes, we don't need to do > > anything, if yes, we frange_nextafter. > > > > As discussed, for mode_composite, I think we want to do the extra > > stuff for inexact denormals and otherwise do the nextafter unconditionally, > > because our internal mode_composite representation isn't precise enough. > > > > > + // Be extra careful if there may be discrepancies between the > > > + // compile and runtime results. > > > + if ((rounding || mode_composite) > > > + && (inexact || !real_identical (&result, &value))) > > > + { > > > + if (mode_composite) > > > + { > > > + bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0; > > > > Use real_isdenormal here? > > Done. > > > Though, real_iszero needs the same thing. > > So... real_isdenormal() || real_iszero() as in the attached patch? > > > > > > + if (denormal) > > > + { > > > + REAL_VALUE_TYPE tmp; > > > > And explain here why is this, that IBM extended denormals have just > > DFmode precision. > > Done. > > > Though, now that I think about it, while this is correct for denormals, > > > > > + real_convert (&tmp, DFmode, &value); > > > + frange_nextafter (DFmode, tmp, inf); > > > + real_convert (&result, mode, &tmp); > > > + } > > > > there are also the cases where the higher double exponent is in the > > [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so. > > https://en.wikipedia.org/wiki/Double-precision_floating-point_format > > If the upper double is denormal in the DFmode sense, so smaller absolute > > value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to > > do, the lower double must be always +/- zero. > > Now, if the result is __DBL_MIN__, the upper double is already normalized > > but we can add __DBL_DENORM_MIN__ to it, which will make the number have > > 54-bit precision. > > If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__ > > and make it 55-bit precision. Etc. until we reach __DBL_MIN__ * 2e53 > > where it acts like fully normalized 106-bit precision number. > > I must say I'm not really sure what real_nextafter is doing in those cases, > > I'm afraid it doesn't handle it correctly but the only other use > > of real_nextafter is guarded with: > > /* Don't handle composite modes, nor decimal, nor modes without > > inf or denorm at least for now. */ > > if (format->pnan < format->p > > || format->b == 10 > > || !format->has_inf > > || !format->has_denorm) > > return false; > > Dunno. Is there a conservative thing we can do for mode_composites > that aren't denormal or zero? > > How does this look? > Aldy
On 10/24/22 00:04, Aldy Hernandez via Gcc-patches wrote:
> PING
I'd be a lot more comfortable if Jakub would chime in here.
Jeff
Ping ping On Mon, Oct 24, 2022, 08:04 Aldy Hernandez <aldyh@redhat.com> wrote: > PING > > On Mon, Oct 17, 2022 at 8:21 AM Aldy Hernandez <aldyh@redhat.com> wrote: > > > > On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > > > On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote: > > > > +// Like real_arithmetic, but round the result to INF if the > operation > > > > +// produced inexact results. > > > > +// > > > > +// ?? There is still one problematic case, i387. With > > > > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic > in > > > > +// XFmode (long_double_type_node), so that case is OK. But without > > > > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode > > > > +// precision (64-bit mantissa) and only occassionally rounded to > > > > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > > > > +// this is ok as well though it is just occassionally more precise. > ?? > > > > + > > > > +static void > > > > +frange_arithmetic (enum tree_code code, tree type, > > > > + REAL_VALUE_TYPE &result, > > > > + const REAL_VALUE_TYPE &op1, > > > > + const REAL_VALUE_TYPE &op2, > > > > + const REAL_VALUE_TYPE &inf) > > > > +{ > > > > + REAL_VALUE_TYPE value; > > > > + enum machine_mode mode = TYPE_MODE (type); > > > > + bool mode_composite = MODE_COMPOSITE_P (mode); > > > > + > > > > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > > > > + real_convert (&result, mode, &value); > > > > + > > > > + // If real_convert above has rounded an inexact value to towards > > > > + // inf, we can keep the result as is, otherwise we'll adjust by 1 > ulp > > > > + // later (real_nextafter). > > > > + bool rounding = (flag_rounding_math > > > > + && (real_isneg (&inf) > > > > + ? real_less (&result, &value) > > > > + : !real_less (&value, &result))); > > > > > > I thought the agreement during Cauldron was that we'd do this always, > > > regardless of flag_rounding_math. > > > Because excess precision (the fast one like on ia32 or -mfpmath=387 on > > > x86_64), or -frounding-math, or FMA contraction can all increase > precision > > > and worst case it all behaves like -frounding-math for the ranges. > > > > > > So, perhaps use: > > > if ((mode_composite || (real_isneg (&inf) ? real_less (&result, > &value) > > > : !real_less (&value, > &result)) > > > && (inexact || !real_identical (&result, &value)))) > > > > Done. > > > > > ? > > > No need to do the real_isneg/real_less stuff for mode_composite, then > > > we do it always for inexacts, but otherwise we check if the rounding > > > performed by real.cc has been in the conservative direction (for upper > > > bound to +inf, for lower bound to -inf), if yes, we don't need to do > > > anything, if yes, we frange_nextafter. > > > > > > As discussed, for mode_composite, I think we want to do the extra > > > stuff for inexact denormals and otherwise do the nextafter > unconditionally, > > > because our internal mode_composite representation isn't precise > enough. > > > > > > > + // Be extra careful if there may be discrepancies between the > > > > + // compile and runtime results. > > > > + if ((rounding || mode_composite) > > > > + && (inexact || !real_identical (&result, &value))) > > > > + { > > > > + if (mode_composite) > > > > + { > > > > + bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0; > > > > > > Use real_isdenormal here? > > > > Done. > > > > > Though, real_iszero needs the same thing. > > > > So... real_isdenormal() || real_iszero() as in the attached patch? > > > > > > > > > + if (denormal) > > > > + { > > > > + REAL_VALUE_TYPE tmp; > > > > > > And explain here why is this, that IBM extended denormals have just > > > DFmode precision. > > > > Done. > > > > > Though, now that I think about it, while this is correct for denormals, > > > > > > > + real_convert (&tmp, DFmode, &value); > > > > + frange_nextafter (DFmode, tmp, inf); > > > > + real_convert (&result, mode, &tmp); > > > > + } > > > > > > there are also the cases where the higher double exponent is in the > > > [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so. > > > https://en.wikipedia.org/wiki/Double-precision_floating-point_format > > > If the upper double is denormal in the DFmode sense, so smaller > absolute > > > value than __DBL_MIN__, then doing nextafter in DFmode is the right > thing to > > > do, the lower double must be always +/- zero. > > > Now, if the result is __DBL_MIN__, the upper double is already > normalized > > > but we can add __DBL_DENORM_MIN__ to it, which will make the number > have > > > 54-bit precision. > > > If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__ > > > and make it 55-bit precision. Etc. until we reach __DBL_MIN__ * 2e53 > > > where it acts like fully normalized 106-bit precision number. > > > I must say I'm not really sure what real_nextafter is doing in those > cases, > > > I'm afraid it doesn't handle it correctly but the only other use > > > of real_nextafter is guarded with: > > > /* Don't handle composite modes, nor decimal, nor modes without > > > inf or denorm at least for now. */ > > > if (format->pnan < format->p > > > || format->b == 10 > > > || !format->has_inf > > > || !format->has_denorm) > > > return false; > > > > Dunno. Is there a conservative thing we can do for mode_composites > > that aren't denormal or zero? > > > > How does this look? > > Aldy >
On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote: > --- a/gcc/range-op-float.cc > +++ b/gcc/range-op-float.cc > @@ -200,6 +200,116 @@ frelop_early_resolve (irange &r, tree type, > && relop_early_resolve (r, type, op1, op2, rel, my_rel)); > } > > +// If R contains a NAN of unknown sign, update the NAN's signbit > +// depending on two operands. > + > +inline void > +update_nan_sign (frange &r, const frange &op1, const frange &op2) > +{ > + if (!r.maybe_isnan ()) > + return; > + > + bool op1_nan = op1.maybe_isnan (); > + bool op2_nan = op2.maybe_isnan (); > + bool sign1, sign2; > + > + gcc_checking_assert (!r.nan_signbit_p (sign1)); > + if (op1_nan && op2_nan) > + { > + if (op1.nan_signbit_p (sign1) && op2.nan_signbit_p (sign2)) > + r.update_nan (sign1 | sign2); > + } > + else if (op1_nan) > + { > + if (op1.nan_signbit_p (sign1)) > + r.update_nan (sign1); > + } > + else if (op2_nan) > + { > + if (op2.nan_signbit_p (sign2)) > + r.update_nan (sign2); > + } > +} IEEE 754-2008 says: "When either an input or result is NaN, this standard does not interpret the sign of a NaN. Note, however, that operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand. The logical predicate totalOrder is also affected by the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation." so, while one can e.g. see on x86_64 that in simple -O0 int main () { volatile float f1 = __builtin_nansf (""); volatile float f2 = __builtin_copysignf (__builtin_nansf (""), -1.0f); volatile float f3 = __builtin_nanf (""); volatile float f4 = __builtin_copysignf (__builtin_nanf (""), -1.0f); volatile float fzero = 0.0f; __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2); __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4); f1 = -f1; f2 = -f2; f3 = -f3; f4 = -f4; __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2); __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4); volatile float f5 = f1 + fzero; volatile float f6 = fzero + f1; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f2 + fzero; f6 = fzero + f2; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f3 + fzero; f6 = fzero + f3; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f4 + fzero; f6 = fzero + f4; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f1 + f2; f6 = f2 + f1; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f2 + f3; f6 = f3 + f2; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f3 + f4; f6 = f4 + f3; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f4 + f1; f6 = f1 + f4; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); return 0; } result of: 7fa00000 ffa00000 7fc00000 ffc00000 ffa00000 7fa00000 ffc00000 7fc00000 ffe00000 ffe00000 7fe00000 7fe00000 ffc00000 ffc00000 7fc00000 7fc00000 7fe00000 ffe00000 ffc00000 7fe00000 7fc00000 ffc00000 ffe00000 7fc00000 which basically shows that copysign copies sign bit, negation toggles it, binary operation with a single NaN (quiet or signaling) get that NaN with its sign and for binary operation on 2 NaNs (again quiet or signaling) one gets sign from the second? operand, I think the above IEEE text doesn't guarantee it except for a simple assignment (but with no mode conversion; that one copies the bit), NEGATE_EXPR (toggles it), ABS_EXPR (clears it), __builtin_copysign*/IFN_COPYSIGN (copies it from the second operand). Everything else, including invalid operation cases, should set the possible sign bit values of NANs to 0/1 rather than just one of them. Perhaps COND_EXPR 2nd/3rd operand is a move too. As you are adding a binary operation, that should be one of those cases where we drop the NAN sign to VARYING. > + > +// If either operand is a NAN, set R to the combination of both NANs > +// signwise and return TRUE. > + > +inline bool > +propagate_nans (frange &r, const frange &op1, const frange &op2) > +{ > + if (op1.known_isnan () || op2.known_isnan ()) > + { > + r.set_nan (op1.type ()); > + update_nan_sign (r, op1, op2); > + return true; > + } > + return false; > +} > + > +// Set VALUE to its next real value, or INF if the operation overflows. > + > +inline void > +frange_nextafter (enum machine_mode mode, > + REAL_VALUE_TYPE &value, > + const REAL_VALUE_TYPE &inf) > +{ > + const real_format *fmt = REAL_MODE_FORMAT (mode); > + REAL_VALUE_TYPE tmp; > + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); > + if (overflow) > + value = inf; > + else > + value = tmp; > +} > + > +// Like real_arithmetic, but round the result to INF if the operation > +// produced inexact results. > +// > +// ?? There is still one problematic case, i387. With > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in > +// XFmode (long_double_type_node), so that case is OK. But without > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode > +// precision (64-bit mantissa) and only occassionally rounded to > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > +// this is ok as well though it is just occassionally more precise. ?? > + > +static void > +frange_arithmetic (enum tree_code code, tree type, > + REAL_VALUE_TYPE &result, > + const REAL_VALUE_TYPE &op1, > + const REAL_VALUE_TYPE &op2, > + const REAL_VALUE_TYPE &inf) > +{ > + REAL_VALUE_TYPE value; > + enum machine_mode mode = TYPE_MODE (type); > + bool mode_composite = MODE_COMPOSITE_P (mode); > + > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > + real_convert (&result, mode, &value); > + > + // Be extra careful if there may be discrepancies between the > + // compile and runtime results. > + if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value) > + : !real_less (&value, &result))) > + && (inexact || !real_identical (&result, &value))) > + { > + if (mode_composite) > + { > + if (real_isdenormal (&result) > + || real_iszero (&result)) > + { > + // IBM extended denormals only have DFmode precision. > + REAL_VALUE_TYPE tmp; > + real_convert (&tmp, DFmode, &value); > + frange_nextafter (DFmode, tmp, inf); > + real_convert (&result, mode, &tmp); > + return; > + } As discussed before, this might not be correct for the larger double double denormals (is correct for real_iszero). I'll try to play with it in self-tests and compare with what one gets from libm nextafterl: int main () { long double a = __builtin_nextafterl (0.0L, 1.0L); __builtin_printf ("%La\n", a); for (int i = 0; i < 108; i++, a *= 2.0L) __builtin_printf ("%d %La %La\n", i, a, __builtin_nextafterl (a, 1.0L)); } 0x0.0000000000001p-1022 0 0x0.0000000000001p-1022 0x0.0000000000002p-1022 1 0x0.0000000000002p-1022 0x0.0000000000003p-1022 2 0x0.0000000000004p-1022 0x0.0000000000005p-1022 3 0x0.0000000000008p-1022 0x0.0000000000009p-1022 4 0x0.000000000001p-1022 0x0.0000000000011p-1022 5 0x0.000000000002p-1022 0x0.0000000000021p-1022 6 0x0.000000000004p-1022 0x0.0000000000041p-1022 7 0x0.000000000008p-1022 0x0.0000000000081p-1022 8 0x0.00000000001p-1022 0x0.0000000000101p-1022 9 0x0.00000000002p-1022 0x0.0000000000201p-1022 10 0x0.00000000004p-1022 0x0.0000000000401p-1022 11 0x0.00000000008p-1022 0x0.0000000000801p-1022 12 0x0.0000000001p-1022 0x0.0000000001001p-1022 13 0x0.0000000002p-1022 0x0.0000000002001p-1022 14 0x0.0000000004p-1022 0x0.0000000004001p-1022 15 0x0.0000000008p-1022 0x0.0000000008001p-1022 16 0x0.000000001p-1022 0x0.0000000010001p-1022 17 0x0.000000002p-1022 0x0.0000000020001p-1022 18 0x0.000000004p-1022 0x0.0000000040001p-1022 19 0x0.000000008p-1022 0x0.0000000080001p-1022 20 0x0.00000001p-1022 0x0.0000000100001p-1022 21 0x0.00000002p-1022 0x0.0000000200001p-1022 22 0x0.00000004p-1022 0x0.0000000400001p-1022 23 0x0.00000008p-1022 0x0.0000000800001p-1022 24 0x0.0000001p-1022 0x0.0000001000001p-1022 25 0x0.0000002p-1022 0x0.0000002000001p-1022 26 0x0.0000004p-1022 0x0.0000004000001p-1022 27 0x0.0000008p-1022 0x0.0000008000001p-1022 28 0x0.000001p-1022 0x0.0000010000001p-1022 29 0x0.000002p-1022 0x0.0000020000001p-1022 30 0x0.000004p-1022 0x0.0000040000001p-1022 31 0x0.000008p-1022 0x0.0000080000001p-1022 32 0x0.00001p-1022 0x0.0000100000001p-1022 33 0x0.00002p-1022 0x0.0000200000001p-1022 34 0x0.00004p-1022 0x0.0000400000001p-1022 35 0x0.00008p-1022 0x0.0000800000001p-1022 36 0x0.0001p-1022 0x0.0001000000001p-1022 37 0x0.0002p-1022 0x0.0002000000001p-1022 38 0x0.0004p-1022 0x0.0004000000001p-1022 39 0x0.0008p-1022 0x0.0008000000001p-1022 40 0x0.001p-1022 0x0.0010000000001p-1022 41 0x0.002p-1022 0x0.0020000000001p-1022 42 0x0.004p-1022 0x0.0040000000001p-1022 43 0x0.008p-1022 0x0.0080000000001p-1022 44 0x0.01p-1022 0x0.0100000000001p-1022 45 0x0.02p-1022 0x0.0200000000001p-1022 46 0x0.04p-1022 0x0.0400000000001p-1022 47 0x0.08p-1022 0x0.0800000000001p-1022 48 0x0.1p-1022 0x0.1000000000001p-1022 49 0x0.2p-1022 0x0.2000000000001p-1022 50 0x0.4p-1022 0x0.4000000000001p-1022 51 0x0.8p-1022 0x0.8000000000001p-1022 52 0x1p-1022 0x1.0000000000001p-1022 53 0x1p-1021 0x1.00000000000008p-1021 54 0x1p-1020 0x1.00000000000004p-1020 55 0x1p-1019 0x1.00000000000002p-1019 56 0x1p-1018 0x1.00000000000001p-1018 57 0x1p-1017 0x1.000000000000008p-1017 58 0x1p-1016 0x1.000000000000004p-1016 59 0x1p-1015 0x1.000000000000002p-1015 60 0x1p-1014 0x1.000000000000001p-1014 61 0x1p-1013 0x1.0000000000000008p-1013 62 0x1p-1012 0x1.0000000000000004p-1012 63 0x1p-1011 0x1.0000000000000002p-1011 64 0x1p-1010 0x1.0000000000000001p-1010 65 0x1p-1009 0x1.00000000000000008p-1009 66 0x1p-1008 0x1.00000000000000004p-1008 67 0x1p-1007 0x1.00000000000000002p-1007 68 0x1p-1006 0x1.00000000000000001p-1006 69 0x1p-1005 0x1.000000000000000008p-1005 70 0x1p-1004 0x1.000000000000000004p-1004 71 0x1p-1003 0x1.000000000000000002p-1003 72 0x1p-1002 0x1.000000000000000001p-1002 73 0x1p-1001 0x1.0000000000000000008p-1001 74 0x1p-1000 0x1.0000000000000000004p-1000 75 0x1p-999 0x1.0000000000000000002p-999 76 0x1p-998 0x1.0000000000000000001p-998 77 0x1p-997 0x1.00000000000000000008p-997 78 0x1p-996 0x1.00000000000000000004p-996 79 0x1p-995 0x1.00000000000000000002p-995 80 0x1p-994 0x1.00000000000000000001p-994 81 0x1p-993 0x1.000000000000000000008p-993 82 0x1p-992 0x1.000000000000000000004p-992 83 0x1p-991 0x1.000000000000000000002p-991 84 0x1p-990 0x1.000000000000000000001p-990 85 0x1p-989 0x1.0000000000000000000008p-989 86 0x1p-988 0x1.0000000000000000000004p-988 87 0x1p-987 0x1.0000000000000000000002p-987 88 0x1p-986 0x1.0000000000000000000001p-986 89 0x1p-985 0x1.00000000000000000000008p-985 90 0x1p-984 0x1.00000000000000000000004p-984 91 0x1p-983 0x1.00000000000000000000002p-983 92 0x1p-982 0x1.00000000000000000000001p-982 93 0x1p-981 0x1.000000000000000000000008p-981 94 0x1p-980 0x1.000000000000000000000004p-980 95 0x1p-979 0x1.000000000000000000000002p-979 96 0x1p-978 0x1.000000000000000000000001p-978 97 0x1p-977 0x1.0000000000000000000000008p-977 98 0x1p-976 0x1.0000000000000000000000004p-976 99 0x1p-975 0x1.0000000000000000000000002p-975 100 0x1p-974 0x1.0000000000000000000000001p-974 101 0x1p-973 0x1.00000000000000000000000008p-973 102 0x1p-972 0x1.00000000000000000000000004p-972 103 0x1p-971 0x1.00000000000000000000000002p-971 104 0x1p-970 0x1.00000000000000000000000001p-970 105 0x1p-969 0x1.000000000000000000000000008p-969 106 0x1p-968 0x1.000000000000000000000000008p-968 107 0x1p-967 0x1.000000000000000000000000008p-967 Will need to find out which of these numbers GCC real_isdenormal actually treats as denormals and if there are any which aren't. Powerpc64le -mlong-double-128 -mabi=ibmlongdouble gcc certain prints: #define __DBL_MIN__ ((double)2.22507385850720138309023271733240406e-308L) #define __DBL_DENORM_MIN__ ((double)4.94065645841246544176568792868221372e-324L) #define __LDBL_MIN__ 2.00416836000897277799610805135016205e-292L #define __LDBL_DENORM_MIN__ 4.94065645841246544176568792868221372e-324L and so the LDBL denorm min is correctly the same as double denorm min, while minimum normal is actually quite larger for long double than double (also ok, but unusual, as e.g. __DBL_MIN__ is much smaller than __FLT_MIN__ and __FLT128_MIN__ much smaller than __DBL_MIN__). Note, seems glibc nextafterl for IBM double double is actually adding __DBL_DENORM_MIN__ to the value for the small ones, so perhaps we should do that too and not convert to DFmode and back. Jakub
On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote: > +// Set VALUE to its next real value, or INF if the operation overflows. > + > +inline void > +frange_nextafter (enum machine_mode mode, > + REAL_VALUE_TYPE &value, > + const REAL_VALUE_TYPE &inf) > +{ > + const real_format *fmt = REAL_MODE_FORMAT (mode); > + REAL_VALUE_TYPE tmp; > + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); > + if (overflow) > + value = inf; > + else > + value = tmp; Please change the above 5 lines to just real_nextafter (&tmp, fmt, &value, &inf); value = tmp; What real_nextafter returns isn't bool whether it overflowed, but a bool whether some exception should be raised for the nextafter, which can be either an overflow (in that case tmp will be already set to infinity of the right sign: if (REAL_EXP (r) > fmt->emax) { get_inf (r, x->sign); return true; } or underflow to zero: if (REAL_EXP (r) <= fmt->emin - fmt->p) { get_zero (r, x->sign); return true; } or when the result is subnormal: return r->cl == rvc_zero || REAL_EXP (r) < fmt->emin; But we shouldn't care about exceptions, we aren't emitting code, just estimating floating point range, and +-inf, or +-0, or subnormal are the right answers. Just for playing with this, I've tried: --- range-op-float.cc.jj 2022-11-02 10:06:02.620960861 +0100 +++ range-op-float.cc 2022-11-04 19:49:14.188075070 +0100 @@ -1815,6 +1815,20 @@ frange_float (const char *lb, const char return frange (type, min, max); } +inline void +frange_nextafter (enum machine_mode mode, + REAL_VALUE_TYPE &value, + const REAL_VALUE_TYPE &inf) +{ + const real_format *fmt = REAL_MODE_FORMAT (mode); + REAL_VALUE_TYPE tmp; + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); + if (overflow) + value = inf; + else + value = tmp; +} + void range_op_float_tests () { @@ -1833,6 +1847,30 @@ range_op_float_tests () r1 = frange_float ("-1", "-0"); r1.update_nan (false); ASSERT_EQ (r, r1); + + if (MODE_COMPOSITE_P (TYPE_MODE (long_double_type_node))) + { + enum machine_mode mode = TYPE_MODE (long_double_type_node); + REAL_VALUE_TYPE result = dconst0; + { + REAL_VALUE_TYPE tmp; + real_convert (&tmp, DFmode, &result); + frange_nextafter (DFmode, tmp, dconstinf); + real_convert (&result, mode, &tmp); + } + for (int i = 0; i < 108; ++i) + { + char buf[1024], buf2[1024]; + REAL_VALUE_TYPE tmp, tmp2; + real_convert (&tmp, DFmode, &result); + frange_nextafter (DFmode, tmp, dconstinf); + real_convert (&tmp2, mode, &tmp); + real_to_hexadecimal (buf, &result, 1024, 0, 1); + real_to_hexadecimal (buf2, &tmp2, 1024, 0, 1); + fprintf (stderr, "%d %d %s %s\n", i, real_isdenormal (&result), buf, buf2); + real_arithmetic (&result, MULT_EXPR, &result, &dconst2); + } + } } } // namespace selftest in cross to powerpc64le-linux and I get: 0 0 0x0.8p-1073 0x0.8p-1072 1 0 0x0.8p-1072 0x0.cp-1072 2 0 0x0.8p-1071 0x0.ap-1071 3 0 0x0.8p-1070 0x0.9p-1070 4 0 0x0.8p-1069 0x0.88p-1069 5 0 0x0.8p-1068 0x0.84p-1068 6 0 0x0.8p-1067 0x0.82p-1067 7 0 0x0.8p-1066 0x0.81p-1066 8 0 0x0.8p-1065 0x0.808p-1065 9 0 0x0.8p-1064 0x0.804p-1064 10 0 0x0.8p-1063 0x0.802p-1063 11 0 0x0.8p-1062 0x0.801p-1062 12 0 0x0.8p-1061 0x0.8008p-1061 13 0 0x0.8p-1060 0x0.8004p-1060 14 0 0x0.8p-1059 0x0.8002p-1059 15 0 0x0.8p-1058 0x0.8001p-1058 16 0 0x0.8p-1057 0x0.80008p-1057 17 0 0x0.8p-1056 0x0.80004p-1056 18 0 0x0.8p-1055 0x0.80002p-1055 19 0 0x0.8p-1054 0x0.80001p-1054 20 0 0x0.8p-1053 0x0.800008p-1053 21 0 0x0.8p-1052 0x0.800004p-1052 22 0 0x0.8p-1051 0x0.800002p-1051 23 0 0x0.8p-1050 0x0.800001p-1050 24 0 0x0.8p-1049 0x0.8000008p-1049 25 0 0x0.8p-1048 0x0.8000004p-1048 26 0 0x0.8p-1047 0x0.8000002p-1047 27 0 0x0.8p-1046 0x0.8000001p-1046 28 0 0x0.8p-1045 0x0.80000008p-1045 29 0 0x0.8p-1044 0x0.80000004p-1044 30 0 0x0.8p-1043 0x0.80000002p-1043 31 0 0x0.8p-1042 0x0.80000001p-1042 32 0 0x0.8p-1041 0x0.800000008p-1041 33 0 0x0.8p-1040 0x0.800000004p-1040 34 0 0x0.8p-1039 0x0.800000002p-1039 35 0 0x0.8p-1038 0x0.800000001p-1038 36 0 0x0.8p-1037 0x0.8000000008p-1037 37 0 0x0.8p-1036 0x0.8000000004p-1036 38 0 0x0.8p-1035 0x0.8000000002p-1035 39 0 0x0.8p-1034 0x0.8000000001p-1034 40 0 0x0.8p-1033 0x0.80000000008p-1033 41 0 0x0.8p-1032 0x0.80000000004p-1032 42 0 0x0.8p-1031 0x0.80000000002p-1031 43 0 0x0.8p-1030 0x0.80000000001p-1030 44 0 0x0.8p-1029 0x0.800000000008p-1029 45 0 0x0.8p-1028 0x0.800000000004p-1028 46 0 0x0.8p-1027 0x0.800000000002p-1027 47 0 0x0.8p-1026 0x0.800000000001p-1026 48 0 0x0.8p-1025 0x0.8000000000008p-1025 49 0 0x0.8p-1024 0x0.8000000000004p-1024 50 0 0x0.8p-1023 0x0.8000000000002p-1023 51 0 0x0.8p-1022 0x0.8000000000001p-1022 52 0 0x0.8p-1021 0x0.80000000000008p-1021 53 0 0x0.8p-1020 0x0.80000000000008p-1020 54 0 0x0.8p-1019 0x0.80000000000008p-1019 55 0 0x0.8p-1018 0x0.80000000000008p-1018 56 0 0x0.8p-1017 0x0.80000000000008p-1017 57 0 0x0.8p-1016 0x0.80000000000008p-1016 58 0 0x0.8p-1015 0x0.80000000000008p-1015 59 0 0x0.8p-1014 0x0.80000000000008p-1014 60 0 0x0.8p-1013 0x0.80000000000008p-1013 61 0 0x0.8p-1012 0x0.80000000000008p-1012 62 0 0x0.8p-1011 0x0.80000000000008p-1011 63 0 0x0.8p-1010 0x0.80000000000008p-1010 64 0 0x0.8p-1009 0x0.80000000000008p-1009 65 0 0x0.8p-1008 0x0.80000000000008p-1008 66 0 0x0.8p-1007 0x0.80000000000008p-1007 67 0 0x0.8p-1006 0x0.80000000000008p-1006 68 0 0x0.8p-1005 0x0.80000000000008p-1005 69 0 0x0.8p-1004 0x0.80000000000008p-1004 70 0 0x0.8p-1003 0x0.80000000000008p-1003 71 0 0x0.8p-1002 0x0.80000000000008p-1002 72 0 0x0.8p-1001 0x0.80000000000008p-1001 73 0 0x0.8p-1000 0x0.80000000000008p-1000 74 0 0x0.8p-999 0x0.80000000000008p-999 75 0 0x0.8p-998 0x0.80000000000008p-998 76 0 0x0.8p-997 0x0.80000000000008p-997 77 0 0x0.8p-996 0x0.80000000000008p-996 78 0 0x0.8p-995 0x0.80000000000008p-995 79 0 0x0.8p-994 0x0.80000000000008p-994 80 0 0x0.8p-993 0x0.80000000000008p-993 81 0 0x0.8p-992 0x0.80000000000008p-992 82 0 0x0.8p-991 0x0.80000000000008p-991 83 0 0x0.8p-990 0x0.80000000000008p-990 84 0 0x0.8p-989 0x0.80000000000008p-989 85 0 0x0.8p-988 0x0.80000000000008p-988 86 0 0x0.8p-987 0x0.80000000000008p-987 87 0 0x0.8p-986 0x0.80000000000008p-986 88 0 0x0.8p-985 0x0.80000000000008p-985 89 0 0x0.8p-984 0x0.80000000000008p-984 90 0 0x0.8p-983 0x0.80000000000008p-983 91 0 0x0.8p-982 0x0.80000000000008p-982 92 0 0x0.8p-981 0x0.80000000000008p-981 93 0 0x0.8p-980 0x0.80000000000008p-980 94 0 0x0.8p-979 0x0.80000000000008p-979 95 0 0x0.8p-978 0x0.80000000000008p-978 96 0 0x0.8p-977 0x0.80000000000008p-977 97 0 0x0.8p-976 0x0.80000000000008p-976 98 0 0x0.8p-975 0x0.80000000000008p-975 99 0 0x0.8p-974 0x0.80000000000008p-974 100 0 0x0.8p-973 0x0.80000000000008p-973 101 0 0x0.8p-972 0x0.80000000000008p-972 102 0 0x0.8p-971 0x0.80000000000008p-971 103 0 0x0.8p-970 0x0.80000000000008p-970 104 0 0x0.8p-969 0x0.80000000000008p-969 105 0 0x0.8p-968 0x0.80000000000008p-968 106 0 0x0.8p-967 0x0.80000000000008p-967 107 0 0x0.8p-966 0x0.80000000000008p-966 One thing that is clear is that real_isdenormal is never true for these (but then a question is if flush_denormals_to_zero actually works). Jakub
On Fri, Nov 04, 2022 at 08:14:23PM +0100, Jakub Jelinek via Gcc-patches wrote: > One thing that is clear is that real_isdenormal is never true for these > (but then a question is if flush_denormals_to_zero actually works). So, after some more investigation, I think that is actually the case, real_isdenormal is only meaningful (will ever return true) right after round_for_format. The uses inside of real.cc are fine, real_to_target first calls round_for_format and then calls fmt->encode in which the real_isdenormal calls are done. And, round_for_format is what transforms the normalized way of expressing the number into the denormal form: /* Check the range of the exponent. If we're out of range, either underflow or overflow. */ if (REAL_EXP (r) > emax2) goto overflow; else if (REAL_EXP (r) <= emin2m1) { int diff; if (!fmt->has_denorm) { /* Don't underflow completely until we've had a chance to round. */ if (REAL_EXP (r) < emin2m1) goto underflow; } else { diff = emin2m1 - REAL_EXP (r) + 1; if (diff > p2) goto underflow; /* De-normalize the significand. */ r->sig[0] |= sticky_rshift_significand (r, r, diff); SET_REAL_EXP (r, REAL_EXP (r) + diff); } } But, real_to_target is one of the 4 callers of static round_for_format, another one is real_convert, but that one undoes that immediately: round_for_format (fmt, r); /* Make resulting NaN value to be qNaN. The caller has the responsibility to avoid the operation if flag_signaling_nans is on. */ if (r->cl == rvc_nan) r->signalling = 0; /* round_for_format de-normalizes denormals. Undo just that part. */ if (r->cl == rvc_normal) normalize (r); and the last two are inside of encoding the IBM double double composite. So, I think everywhere in the frange you'll see normalized REAL_VALUE_TYPE and so real_isdenormal will always be false there. You need to check for REAL_EXP (r) < fmt->emin or so (and ideally only on REAL_VALUE_TYPE already real_converted to the right mode (your frange_arithmetics does that already, which is good, but say conversions and other unary ops might need it too). If I in the hack from last mail replace - fprintf (stderr, "%d %d %s %s\n", i, real_isdenormal (&result), buf, buf2); + fprintf (stderr, "%d %d %s %s\n", i, REAL_EXP (&result) < REAL_MODE_FORMAT (mode)->emin, buf, buf2); then it is 1 until: 102 1 0x0.8p-971 0x0.80000000000008p-971 103 1 0x0.8p-970 0x0.80000000000008p-970 104 1 0x0.8p-969 0x0.80000000000008p-969 105 0 0x0.8p-968 0x0.80000000000008p-968 106 0 0x0.8p-967 0x0.80000000000008p-967 107 0 0x0.8p-966 0x0.80000000000008p-966 Now, the __LDBL_MIN__ powerpc64 gcc announces is 2.00416836000897277799610805135016205e-292L which is equivalent to: 0x1p-969 and that is equivalent to 0x0.8p-968, so I think that is exactly what we want. Jakub
On Fri, Nov 4, 2022 at 8:53 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Fri, Nov 04, 2022 at 08:14:23PM +0100, Jakub Jelinek via Gcc-patches wrote: > > One thing that is clear is that real_isdenormal is never true for these > > (but then a question is if flush_denormals_to_zero actually works). > > So, after some more investigation, I think that is actually the case, > real_isdenormal is only meaningful (will ever return true) right after > round_for_format. > The uses inside of real.cc are fine, real_to_target first calls > round_for_format and then calls fmt->encode in which the real_isdenormal > calls are done. And, round_for_format is what transforms the > normalized way of expressing the number into the denormal form: > /* Check the range of the exponent. If we're out of range, > either underflow or overflow. */ > if (REAL_EXP (r) > emax2) > goto overflow; > else if (REAL_EXP (r) <= emin2m1) > { > int diff; > > if (!fmt->has_denorm) > { > /* Don't underflow completely until we've had a chance to round. */ > if (REAL_EXP (r) < emin2m1) > goto underflow; > } > else > { > diff = emin2m1 - REAL_EXP (r) + 1; > if (diff > p2) > goto underflow; > > /* De-normalize the significand. */ > r->sig[0] |= sticky_rshift_significand (r, r, diff); > SET_REAL_EXP (r, REAL_EXP (r) + diff); > } > } > But, real_to_target is one of the 4 callers of static round_for_format, > another one is real_convert, but that one undoes that immediately: > round_for_format (fmt, r); > > /* Make resulting NaN value to be qNaN. The caller has the > responsibility to avoid the operation if flag_signaling_nans > is on. */ > if (r->cl == rvc_nan) > r->signalling = 0; > > /* round_for_format de-normalizes denormals. Undo just that part. */ > if (r->cl == rvc_normal) > normalize (r); > and the last two are inside of encoding the IBM double double composite. > So, I think everywhere in the frange you'll see normalized REAL_VALUE_TYPE > and so real_isdenormal will always be false there. > You need to check for REAL_EXP (r) < fmt->emin or so (and ideally only on > REAL_VALUE_TYPE already real_converted to the right mode (your > frange_arithmetics does that already, which is good, but say conversions > and other unary ops might need it too). Let me see if I understand you correctly... real_isdenormal is always returning false for our uses in frange? So instead of using real_isdenormal in flush_denormals_to_zero, perhaps we should be using: REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin ?? Could we perhaps make real_isdenormal always work for all values? Is that possible? Aldy
On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote: > Let me see if I understand you correctly... > > real_isdenormal is always returning false for our uses in frange? So > instead of using real_isdenormal in flush_denormals_to_zero, perhaps > we should be using: > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin > > ?? > > Could we perhaps make real_isdenormal always work for all values? Is > that possible? Yes. Or have real_isdenormal_target for the uses in real.cc which would be private to real.cc and would do what real_isdenormal currently does, and then real_isdenormal with the above definition (well, r->cl == rvc_normal && ...). Jakub
On Mon, Nov 7, 2022 at 1:43 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote: > > Let me see if I understand you correctly... > > > > real_isdenormal is always returning false for our uses in frange? So > > instead of using real_isdenormal in flush_denormals_to_zero, perhaps > > we should be using: > > > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin > > > > ?? > > > > Could we perhaps make real_isdenormal always work for all values? Is > > that possible? > > Yes. Or have real_isdenormal_target for the uses in real.cc > which would be private to real.cc and would do what real_isdenormal > currently does, and then real_isdenormal with the above definition > (well, r->cl == rvc_normal && ...). If the REAL_EXP(r)... definition works for everyone, can't we just use that? Or do you think there'd be a measurable performance impact? Aldy
On Mon, Nov 07, 2022 at 01:48:28PM +0100, Aldy Hernandez wrote: > On Mon, Nov 7, 2022 at 1:43 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote: > > > Let me see if I understand you correctly... > > > > > > real_isdenormal is always returning false for our uses in frange? So > > > instead of using real_isdenormal in flush_denormals_to_zero, perhaps > > > we should be using: > > > > > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin > > > > > > ?? > > > > > > Could we perhaps make real_isdenormal always work for all values? Is > > > that possible? > > > > Yes. Or have real_isdenormal_target for the uses in real.cc > > which would be private to real.cc and would do what real_isdenormal > > currently does, and then real_isdenormal with the above definition > > (well, r->cl == rvc_normal && ...). > > If the REAL_EXP(r)... definition works for everyone, can't we just use > that? Or do you think there'd be a measurable performance impact? It doesn't work alone. Either denormals are in normalized form, then r->cl == rvc_normal && REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin is true, or they are in denormal form, then r->cl == rvc_normal && (r->sig[SIGSZ-1] & SIG_MSB) == 0 is true. You could use r->cl == rvc_normal && (REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin || (r->sig[SIGSZ-1] & SIG_MSB) == 0) but that would be waste of compile time both in real.cc and outside of real.cc. Jakub
Fair enough. How's this? Tested on x86-64 Linux. LAPACK regression testing as well. On Mon, Nov 7, 2022 at 1:56 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Nov 07, 2022 at 01:48:28PM +0100, Aldy Hernandez wrote: > > On Mon, Nov 7, 2022 at 1:43 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > > > On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote: > > > > Let me see if I understand you correctly... > > > > > > > > real_isdenormal is always returning false for our uses in frange? So > > > > instead of using real_isdenormal in flush_denormals_to_zero, perhaps > > > > we should be using: > > > > > > > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin > > > > > > > > ?? > > > > > > > > Could we perhaps make real_isdenormal always work for all values? Is > > > > that possible? > > > > > > Yes. Or have real_isdenormal_target for the uses in real.cc > > > which would be private to real.cc and would do what real_isdenormal > > > currently does, and then real_isdenormal with the above definition > > > (well, r->cl == rvc_normal && ...). > > > > If the REAL_EXP(r)... definition works for everyone, can't we just use > > that? Or do you think there'd be a measurable performance impact? > > It doesn't work alone. > > Either denormals are in normalized form, then > r->cl == rvc_normal && REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin > is true, or they are in denormal form, then > r->cl == rvc_normal && (r->sig[SIGSZ-1] & SIG_MSB) == 0 > is true. > > You could use > r->cl == rvc_normal > && (REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin > || (r->sig[SIGSZ-1] & SIG_MSB) == 0) > but that would be waste of compile time both in real.cc and outside of > real.cc. > > Jakub >
On Fri, Nov 4, 2022 at 8:14 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote: > > +// Set VALUE to its next real value, or INF if the operation overflows. > > + > > +inline void > > +frange_nextafter (enum machine_mode mode, > > + REAL_VALUE_TYPE &value, > > + const REAL_VALUE_TYPE &inf) > > +{ > > + const real_format *fmt = REAL_MODE_FORMAT (mode); > > + REAL_VALUE_TYPE tmp; > > + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); > > + if (overflow) > > + value = inf; > > + else > > + value = tmp; > > Please change the above 5 lines to just > real_nextafter (&tmp, fmt, &value, &inf); > value = tmp; Done. As suggested upthread, I have also adjusted update_nan_sign() to drop the NAN sign to VARYING if both operands are NAN. As an optimization I keep the sign if both operands are NAN and have the same sign. How's this? Tested on x86-64 Linux. LAPACK testing as well. Aldy
On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote: > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez <aldyh@redhat.com> > Date: Mon, 7 Nov 2022 14:18:57 +0100 > Subject: [PATCH] Provide normalized and denormal format version of > real_isdenormal. > > Implement real_isdenormal_target() to be used within real.cc where the > argument is known to be in denormal format. Rewrite real_isdenormal() > for use outside of real.cc where the argument is known to be > normalized. > > gcc/ChangeLog: > > * real.cc (real_isdenormal_target): New. > (encode_ieee_single): Use real_isdenormal_target. > (encode_ieee_double): Same. > (encode_ieee_extended): Same. > (encode_ieee_quad): Same. > (encode_ieee_half): Same. > (encode_arm_bfloat_half): Same. > * value-range.cc (frange::flush_denormals_to_zero): Same. > * real.h (real_isdenormal): Rewrite to look at mode. I'd make real_isdenormal_target static inline bool rather than inline bool, it is only defined in real.cc, so there is no point exporting it. Though, as you've added the mode argument, the real.cc inline could very well also be called real_isdenormal too, it wouldn't be a redeclaration or ODR violation. Jakub
On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > As suggested upthread, I have also adjusted update_nan_sign() to drop > the NAN sign to VARYING if both operands are NAN. As an optimization > I keep the sign if both operands are NAN and have the same sign. For NaNs this still relies on something IEEE754 doesn't guarantee, as I cited, after a binary operation the sign bit of the NaN is unspecified, whether there is one NaN operand or two. It might be that all CPUs handle it the way you've implemented (that for one NaN operand the sign of NaN result will be the same as that NaN operand and for two it will be the sign of one of the two NaNs operands, never something else), but I think we'd need to check more than one implementation for that (I've only tried x86_64 and thus SSE behavior in it), so one would need to test i387 long double behavior too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... The guarantee given by IEEE754 is only for those copy, negate, abs, copySign operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, __builtin_copysign*. Otherwise LGTM (but would be nice to get into GCC13 not just +, but also -, *, /, sqrt at least). Jakub
On Tue, Nov 8, 2022 at 12:07 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote: > > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001 > > From: Aldy Hernandez <aldyh@redhat.com> > > Date: Mon, 7 Nov 2022 14:18:57 +0100 > > Subject: [PATCH] Provide normalized and denormal format version of > > real_isdenormal. > > > > Implement real_isdenormal_target() to be used within real.cc where the > > argument is known to be in denormal format. Rewrite real_isdenormal() > > for use outside of real.cc where the argument is known to be > > normalized. > > > > gcc/ChangeLog: > > > > * real.cc (real_isdenormal_target): New. > > (encode_ieee_single): Use real_isdenormal_target. > > (encode_ieee_double): Same. > > (encode_ieee_extended): Same. > > (encode_ieee_quad): Same. > > (encode_ieee_half): Same. > > (encode_arm_bfloat_half): Same. > > * value-range.cc (frange::flush_denormals_to_zero): Same. > > * real.h (real_isdenormal): Rewrite to look at mode. > > I'd make real_isdenormal_target static inline bool > rather than inline bool, it is only defined in real.cc, so there is > no point exporting it. Huh. I thought inline alone would inhibit the exporting. Thanks. > Though, as you've added the mode argument, the real.cc inline > could very well also be called real_isdenormal too, it wouldn't be > a redeclaration or ODR violation. Great, even better. OK pending tests? Aldy
On Tue, Nov 8, 2022 at 12:20 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > > As suggested upthread, I have also adjusted update_nan_sign() to drop > > the NAN sign to VARYING if both operands are NAN. As an optimization > > I keep the sign if both operands are NAN and have the same sign. > > For NaNs this still relies on something IEEE754 doesn't guarantee, > as I cited, after a binary operation the sign bit of the NaN is > unspecified, whether there is one NaN operand or two. > It might be that all CPUs handle it the way you've implemented > (that for one NaN operand the sign of NaN result will be the same > as that NaN operand and for two it will be the sign of one of the two > NaNs operands, never something else), but I think we'd need to check > more than one implementation for that (I've only tried x86_64 and thus > SSE behavior in it), so one would need to test i387 long double behavior > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, > __builtin_copysign*. Ughh, that's unfortunate. OK, I've added a big note. > > Otherwise LGTM (but would be nice to get into GCC13 not just > +, but also -, *, /, sqrt at least). Minus is trivial as we can implement it with a negate and plus. I have a patch queued up for that. The rest require a bit more thought, though perhaps with what we have so far can serve as a base. I'll look into it. Attached is the patch I'm retesting. Thanks for your patience, and copious help here. Aldy
On Tue, Nov 08, 2022 at 01:47:58PM +0100, Aldy Hernandez wrote: > On Tue, Nov 8, 2022 at 12:07 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote: > > > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001 > > > From: Aldy Hernandez <aldyh@redhat.com> > > > Date: Mon, 7 Nov 2022 14:18:57 +0100 > > > Subject: [PATCH] Provide normalized and denormal format version of > > > real_isdenormal. > > > > > > Implement real_isdenormal_target() to be used within real.cc where the > > > argument is known to be in denormal format. Rewrite real_isdenormal() > > > for use outside of real.cc where the argument is known to be > > > normalized. > > > > > > gcc/ChangeLog: > > > > > > * real.cc (real_isdenormal_target): New. > > > (encode_ieee_single): Use real_isdenormal_target. > > > (encode_ieee_double): Same. > > > (encode_ieee_extended): Same. > > > (encode_ieee_quad): Same. > > > (encode_ieee_half): Same. > > > (encode_arm_bfloat_half): Same. > > > * value-range.cc (frange::flush_denormals_to_zero): Same. > > > * real.h (real_isdenormal): Rewrite to look at mode. > > > > I'd make real_isdenormal_target static inline bool > > rather than inline bool, it is only defined in real.cc, so there is > > no point exporting it. > > Huh. I thought inline alone would inhibit the exporting. Thanks. That is what happens with C99 inline (unless there is extern for the decl), but C++ inline is different. It isn't guaranteed to be exported, but with -fkeep-inline-functions or if you say take address of the inline in a way that can't be optimized back into a call to the inline (or even just call it with -O0), it is exported. > > > Though, as you've added the mode argument, the real.cc inline > > could very well also be called real_isdenormal too, it wouldn't be > > a redeclaration or ODR violation. > > Great, even better. > > OK pending tests? > Aldy > From c3ca1d606bfb22bf4f8bc7ac0ce561bd6afc3368 Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez <aldyh@redhat.com> > Date: Mon, 7 Nov 2022 14:18:57 +0100 > Subject: [PATCH] Provide normalized and denormal format version of > real_isdenormal. > > Implement a variant of real_isdenormal() to be used within real.cc > where the argument is known to be in denormal format. Rewrite > real_isdenormal() for use outside of real.cc where the argument is > known to be normalized. > > gcc/ChangeLog: > > * real.cc (real_isdenormal): New. > * real.h (real_isdenormal): Add mode argument. Rewrite for > normalized values. > * value-range.cc (frange::flush_denormals_to_zero): Pass mode to > real_isdenormal. > --- > gcc/real.cc | 10 ++++++++++ > gcc/real.h | 7 ++++--- > gcc/value-range.cc | 5 +++-- > 3 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/gcc/real.cc b/gcc/real.cc > index aae7c335d59..028aad95ec4 100644 > --- a/gcc/real.cc > +++ b/gcc/real.cc > @@ -111,6 +111,16 @@ static const REAL_VALUE_TYPE * real_digit (int); > static void times_pten (REAL_VALUE_TYPE *, int); > > static void round_for_format (const struct real_format *, REAL_VALUE_TYPE *); > + > +/* Determine whether a floating-point value X is a denormal. R is > + expected to be in denormal form, so this function is only > + meaningful after a call to round_for_format. */ > + > +static inline bool > +real_isdenormal (const REAL_VALUE_TYPE *r) > +{ > + return (r->sig[SIGSZ-1] & SIG_MSB) == 0; I would probably keep the r->cl == rvc_normal in here too. I know the code in real.cc didn't do it before, but what r->sig is for the rvc_zero/rvc_inf is unclear. It is true that get_zero/get_canonical_?nan/get_inf clear the whole sig, but not really sure if we guarantee that everywhere. The real.cc uses were like: bool denormal = ...; at the start of the function and then switch (...) { ... case rvc_normal: if (denormal) ... } so another even better possibility would be to use your simple real.cc (real_isdenormal) and drop all the denormal variables, so: - bool denormal = ...; switch (...) { ... case rvc_normal: - if (denormal) + if (real_isdenormal (r)) ... } Otherwise LGTM. Jakub
On Tue, Nov 08, 2022 at 02:06:58PM +0100, Aldy Hernandez wrote: > + gcc_checking_assert (!r.nan_signbit_p (sign1)); > + if (op1_nan && op2_nan) > + { > + // If boths signs agree, we could use that sign, but IEEE754 > + // does not guarantee this for a binary operator. The x86_64 > + // architure does keep the common known sign, but further tests > + // are needed to see if other architectures do the same (i387 > + // long double, ARM/aarch64, PowerPC, s390,{,x}, RSICV, etc). s/RSICV/RISCV/ > + // In the meantime, keep sign VARYING. > + ; > + } > + else if (op1_nan) > + { > + if (op1.nan_signbit_p (sign1)) > + r.update_nan (sign1); > + } > + else if (op2_nan) > + { > + if (op2.nan_signbit_p (sign2)) > + r.update_nan (sign2); > + } Well, these cases also aren't guaranteed for binary operator. I think a conforming implementation can say copy the NaN operand to result and toggle the sign. Or, if the operand would be a sNaN, it must turn it into a qNaN (don't remember right now if there are requirements on what it can do with the mantissa which needs to change for the sNaN -> qNaN difference at least, but whether it can just generate a canonical qNaN or needs to preserve at least some bits), but could e.g. clear or toggle the sign of the NaN as well. Whether there are any such implementations or not is a question. For the single qNaN operand case, it would surprise me if anybody bothered to tweak the sign bit in any way, just copying the input seems simpler to me, but for the sNaN -> qNaN conversion it wouldn't surprise me that much. Jakub
On Tue, Nov 8, 2022 at 2:25 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 02:06:58PM +0100, Aldy Hernandez wrote: > > + gcc_checking_assert (!r.nan_signbit_p (sign1)); > > + if (op1_nan && op2_nan) > > + { > > + // If boths signs agree, we could use that sign, but IEEE754 > > + // does not guarantee this for a binary operator. The x86_64 > > + // architure does keep the common known sign, but further tests > > + // are needed to see if other architectures do the same (i387 > > + // long double, ARM/aarch64, PowerPC, s390,{,x}, RSICV, etc). > > s/RSICV/RISCV/ > > > + // In the meantime, keep sign VARYING. > > + ; > > + } > > + else if (op1_nan) > > + { > > + if (op1.nan_signbit_p (sign1)) > > + r.update_nan (sign1); > > + } > > + else if (op2_nan) > > + { > > + if (op2.nan_signbit_p (sign2)) > > + r.update_nan (sign2); > > + } > > Well, these cases also aren't guaranteed for binary operator. > I think a conforming implementation can say copy the NaN operand > to result and toggle the sign. Or, if the operand would be a sNaN, > it must turn it into a qNaN (don't remember right now if there are > requirements on what it can do with the mantissa which needs to change > for the sNaN -> qNaN difference at least, but whether it can just > generate a canonical qNaN or needs to preserve at least some bits), > but could e.g. clear or toggle the sign of the NaN as well. > Whether there are any such implementations or not is a question. > For the single qNaN operand case, it would surprise me if anybody > bothered to tweak the sign bit in any way, just copying the input > seems simpler to me, but for the sNaN -> qNaN conversion it wouldn't > surprise me that much. Well, perhaps we should just nuke update_nan_sign() altogether, and always keep the sign varying? inline bool propagate_nans (frange &r, const frange &op1, const frange &op2) { if (op1.known_isnan () || op2.known_isnan ()) { r.set_nan (op1.type ()); return true; } return false; } I'm fine either way. The less code the better :). Aldy
On Tue, Nov 08, 2022 at 02:47:35PM +0100, Aldy Hernandez wrote: > Well, perhaps we should just nuke update_nan_sign() altogether, and > always keep the sign varying? > > inline bool > propagate_nans (frange &r, const frange &op1, const frange &op2) > { > if (op1.known_isnan () || op2.known_isnan ()) > { > r.set_nan (op1.type ()); > return true; > } > return false; > } > > I'm fine either way. The less code the better :). Yes, but you had 2 callers, so something needs to be done also if in foperator_plus::fold_range. Jakub
On Tue, Nov 8, 2022 at 2:15 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 01:47:58PM +0100, Aldy Hernandez wrote: > > On Tue, Nov 8, 2022 at 12:07 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > > > On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote: > > > > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001 > > > > From: Aldy Hernandez <aldyh@redhat.com> > > > > Date: Mon, 7 Nov 2022 14:18:57 +0100 > > > > Subject: [PATCH] Provide normalized and denormal format version of > > > > real_isdenormal. > > > > > > > > Implement real_isdenormal_target() to be used within real.cc where the > > > > argument is known to be in denormal format. Rewrite real_isdenormal() > > > > for use outside of real.cc where the argument is known to be > > > > normalized. > > > > > > > > gcc/ChangeLog: > > > > > > > > * real.cc (real_isdenormal_target): New. > > > > (encode_ieee_single): Use real_isdenormal_target. > > > > (encode_ieee_double): Same. > > > > (encode_ieee_extended): Same. > > > > (encode_ieee_quad): Same. > > > > (encode_ieee_half): Same. > > > > (encode_arm_bfloat_half): Same. > > > > * value-range.cc (frange::flush_denormals_to_zero): Same. > > > > * real.h (real_isdenormal): Rewrite to look at mode. > > > > > > I'd make real_isdenormal_target static inline bool > > > rather than inline bool, it is only defined in real.cc, so there is > > > no point exporting it. > > > > Huh. I thought inline alone would inhibit the exporting. Thanks. > > That is what happens with C99 inline (unless there is extern for the decl), > but C++ inline is different. It isn't guaranteed to be exported, but > with -fkeep-inline-functions or if you say take address of the inline > in a way that can't be optimized back into a call to the inline (or even > just call it with -O0), it is exported. > > > > > Though, as you've added the mode argument, the real.cc inline > > > could very well also be called real_isdenormal too, it wouldn't be > > > a redeclaration or ODR violation. > > > > Great, even better. > > > > OK pending tests? > > Aldy > > > From c3ca1d606bfb22bf4f8bc7ac0ce561bd6afc3368 Mon Sep 17 00:00:00 2001 > > From: Aldy Hernandez <aldyh@redhat.com> > > Date: Mon, 7 Nov 2022 14:18:57 +0100 > > Subject: [PATCH] Provide normalized and denormal format version of > > real_isdenormal. > > > > Implement a variant of real_isdenormal() to be used within real.cc > > where the argument is known to be in denormal format. Rewrite > > real_isdenormal() for use outside of real.cc where the argument is > > known to be normalized. > > > > gcc/ChangeLog: > > > > * real.cc (real_isdenormal): New. > > * real.h (real_isdenormal): Add mode argument. Rewrite for > > normalized values. > > * value-range.cc (frange::flush_denormals_to_zero): Pass mode to > > real_isdenormal. > > --- > > gcc/real.cc | 10 ++++++++++ > > gcc/real.h | 7 ++++--- > > gcc/value-range.cc | 5 +++-- > > 3 files changed, 17 insertions(+), 5 deletions(-) > > > > diff --git a/gcc/real.cc b/gcc/real.cc > > index aae7c335d59..028aad95ec4 100644 > > --- a/gcc/real.cc > > +++ b/gcc/real.cc > > @@ -111,6 +111,16 @@ static const REAL_VALUE_TYPE * real_digit (int); > > static void times_pten (REAL_VALUE_TYPE *, int); > > > > static void round_for_format (const struct real_format *, REAL_VALUE_TYPE *); > > + > > +/* Determine whether a floating-point value X is a denormal. R is > > + expected to be in denormal form, so this function is only > > + meaningful after a call to round_for_format. */ > > + > > +static inline bool > > +real_isdenormal (const REAL_VALUE_TYPE *r) > > +{ > > + return (r->sig[SIGSZ-1] & SIG_MSB) == 0; > > I would probably keep the r->cl == rvc_normal in here too. > I know the code in real.cc didn't do it before, but what > r->sig is for the rvc_zero/rvc_inf is unclear. It is true > that get_zero/get_canonical_?nan/get_inf clear the whole sig, > but not really sure if we guarantee that everywhere. > The real.cc uses were like: > bool denormal = ...; > at the start of the function and then > switch (...) > { > ... > case rvc_normal: > if (denormal) > ... > } > so another even better possibility would be to use your simple > real.cc (real_isdenormal) and drop all the denormal variables, so: > - bool denormal = ...; > switch (...) > { > ... > case rvc_normal: > - if (denormal) > + if (real_isdenormal (r)) > ... > } Sure. Attached patch in testing. Aldy
On Tue, Nov 08, 2022 at 03:02:40PM +0100, Aldy Hernandez wrote: > From d02ce8eaf16d2fc6db6472268fd962e09c2fd81e Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez <aldyh@redhat.com> > Date: Mon, 7 Nov 2022 14:18:57 +0100 > Subject: [PATCH] Provide normalized and denormal format version of > real_isdenormal. > > Implement a variant of real_isdenormal() to be used within real.cc > where the argument is known to be in denormal format. Rewrite > real_isdenormal() for use outside of real.cc where the argument is > known to be normalized. > > gcc/ChangeLog: > > * real.cc (real_isdenormal): New. > (encode_ieee_single): Call real_isdenormal. > (encode_ieee_double): Same. > (encode_ieee_extended): Same. > (encode_ieee_quad): Same. > (encode_ieee_half): Same. > (encode_arm_bfloat_half): Same. > * real.h (real_isdenormal): Add mode argument. Rewrite for > normalized values. > * value-range.cc (frange::flush_denormals_to_zero): Pass mode to > real_isdenormal. LGTM, thanks. Jakub
On Tue, Nov 8, 2022 at 2:50 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 02:47:35PM +0100, Aldy Hernandez wrote: > > Well, perhaps we should just nuke update_nan_sign() altogether, and > > always keep the sign varying? > > > > inline bool > > propagate_nans (frange &r, const frange &op1, const frange &op2) > > { > > if (op1.known_isnan () || op2.known_isnan ()) > > { > > r.set_nan (op1.type ()); > > return true; > > } > > return false; > > } > > > > I'm fine either way. The less code the better :). > > Yes, but you had 2 callers, so something needs to be done also if > in foperator_plus::fold_range. We can also remove the update_nan_sign() in the other call because the r.set() before it sets a default NAN (with a varying sign). Attached patch in testing. Aldy
On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote: > +// If either operand is a NAN, set R to the combination of both NANs > +// signwise and return TRUE. This comment doesn't describe what it does now. If either operand is a NAN, set R to NAN with unspecified sign bit and return TRUE. ? Other than this LGTM. Jakub
On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote: > > +// If either operand is a NAN, set R to the combination of both NANs > > +// signwise and return TRUE. > > This comment doesn't describe what it does now. > If either operand is a NAN, set R to NAN with unspecified sign bit and return > TRUE. > ? OMG, I suck! // If either operand is a NAN, set R to NAN and return TRUE. Tests on-going :). Aldy
On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > > As suggested upthread, I have also adjusted update_nan_sign() to drop > > the NAN sign to VARYING if both operands are NAN. As an optimization > > I keep the sign if both operands are NAN and have the same sign. > > For NaNs this still relies on something IEEE754 doesn't guarantee, > as I cited, after a binary operation the sign bit of the NaN is > unspecified, whether there is one NaN operand or two. > It might be that all CPUs handle it the way you've implemented > (that for one NaN operand the sign of NaN result will be the same > as that NaN operand and for two it will be the sign of one of the two > NaNs operands, never something else), but I think we'd need to check > more than one implementation for that (I've only tried x86_64 and thus > SSE behavior in it), so one would need to test i387 long double behavior > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, > __builtin_copysign*. FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of the input NaNs do not factor in. > > Otherwise LGTM (but would be nice to get into GCC13 not just > +, but also -, *, /, sqrt at least). > > Jakub >
On Tue, Nov 08, 2022 at 09:44:40AM -0800, Andrew Waterman wrote: > On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > > > As suggested upthread, I have also adjusted update_nan_sign() to drop > > > the NAN sign to VARYING if both operands are NAN. As an optimization > > > I keep the sign if both operands are NAN and have the same sign. > > > > For NaNs this still relies on something IEEE754 doesn't guarantee, > > as I cited, after a binary operation the sign bit of the NaN is > > unspecified, whether there is one NaN operand or two. > > It might be that all CPUs handle it the way you've implemented > > (that for one NaN operand the sign of NaN result will be the same > > as that NaN operand and for two it will be the sign of one of the two > > NaNs operands, never something else), but I think we'd need to check > > more than one implementation for that (I've only tried x86_64 and thus > > SSE behavior in it), so one would need to test i387 long double behavior > > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... > > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign > > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, > > __builtin_copysign*. > > FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of > the input NaNs do not factor in. Just for binary operations and some unary, or also the ones that IEEE754 spells out (moves, negations, absolute value and copysign)? Jakub
On Tue, Nov 8, 2022 at 10:11 AM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 09:44:40AM -0800, Andrew Waterman wrote: > > On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > > > > As suggested upthread, I have also adjusted update_nan_sign() to drop > > > > the NAN sign to VARYING if both operands are NAN. As an optimization > > > > I keep the sign if both operands are NAN and have the same sign. > > > > > > For NaNs this still relies on something IEEE754 doesn't guarantee, > > > as I cited, after a binary operation the sign bit of the NaN is > > > unspecified, whether there is one NaN operand or two. > > > It might be that all CPUs handle it the way you've implemented > > > (that for one NaN operand the sign of NaN result will be the same > > > as that NaN operand and for two it will be the sign of one of the two > > > NaNs operands, never something else), but I think we'd need to check > > > more than one implementation for that (I've only tried x86_64 and thus > > > SSE behavior in it), so one would need to test i387 long double behavior > > > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... > > > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign > > > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, > > > __builtin_copysign*. > > > > FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of > > the input NaNs do not factor in. > > Just for binary operations and some unary, or also the ones that > IEEE754 spells out (moves, negations, absolute value and copysign)? I should've been more specific in my earlier email: I was referring to the arithmetic operators. Copysign and friends do not canonicalize NaNs. > > Jakub >
Sigh, one more thing. There are further possibilities for a NAN result, even if the operands are !NAN and the result from frange_arithmetic is free of NANs. Adding different signed infinities. For example, [-INF,+INF] + [-INF,+INF] has the possibility of adding -INF and +INF, which is a NAN. Since we end up calling frange arithmetic on the lower bounds and then on the upper bounds, we miss this, and mistakenly think we're free of NANs. I have a patch in testing, but FYI, in case anyone notices this before I get around to it tomorrow. Aldy On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote: > > +// If either operand is a NAN, set R to the combination of both NANs > > +// signwise and return TRUE. > > This comment doesn't describe what it does now. > If either operand is a NAN, set R to NAN with unspecified sign bit and return > TRUE. > ? > > Other than this LGTM. > > Jakub >
This patch fixes the oversight. Tested on x86-64 Linux. Pushed. On Wed, Nov 9, 2022 at 12:05 AM Aldy Hernandez <aldyh@redhat.com> wrote: > > Sigh, one more thing. > > There are further possibilities for a NAN result, even if the operands > are !NAN and the result from frange_arithmetic is free of NANs. > Adding different signed infinities. > > For example, [-INF,+INF] + [-INF,+INF] has the possibility of adding > -INF and +INF, which is a NAN. Since we end up calling frange > arithmetic on the lower bounds and then on the upper bounds, we miss > this, and mistakenly think we're free of NANs. > > I have a patch in testing, but FYI, in case anyone notices this before > I get around to it tomorrow. > > Aldy > > On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote: > > > +// If either operand is a NAN, set R to the combination of both NANs > > > +// signwise and return TRUE. > > > > This comment doesn't describe what it does now. > > If either operand is a NAN, set R to NAN with unspecified sign bit and return > > TRUE. > > ? > > > > Other than this LGTM. > > > > Jakub > >
diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc index 23e0f5ef4e2..a967c4da393 100644 --- a/gcc/range-op-float.cc +++ b/gcc/range-op-float.cc @@ -200,6 +200,124 @@ frelop_early_resolve (irange &r, tree type, && relop_early_resolve (r, type, op1, op2, rel, my_rel)); } +// If R contains a NAN of unknown sign, update the NAN's signbit +// depending on two operands. + +inline void +update_nan_sign (frange &r, const frange &op1, const frange &op2) +{ + if (!r.maybe_isnan ()) + return; + + bool op1_nan = op1.maybe_isnan (); + bool op2_nan = op2.maybe_isnan (); + bool sign1, sign2; + + gcc_checking_assert (!r.nan_signbit_p (sign1)); + if (op1_nan && op2_nan) + { + if (op1.nan_signbit_p (sign1) && op2.nan_signbit_p (sign2)) + r.update_nan (sign1 | sign2); + } + else if (op1_nan) + { + if (op1.nan_signbit_p (sign1)) + r.update_nan (sign1); + } + else if (op2_nan) + { + if (op2.nan_signbit_p (sign2)) + r.update_nan (sign2); + } +} + +// If either operand is a NAN, set R to the combination of both NANs +// signwise and return TRUE. + +inline bool +propagate_nans (frange &r, const frange &op1, const frange &op2) +{ + if (op1.known_isnan () || op2.known_isnan ()) + { + r.set_nan (op1.type ()); + update_nan_sign (r, op1, op2); + return true; + } + return false; +} + +// Set VALUE to its next real value, or INF if the operation overflows. + +inline void +frange_nextafter (enum machine_mode mode, + REAL_VALUE_TYPE &value, + const REAL_VALUE_TYPE &inf) +{ + const real_format *fmt = REAL_MODE_FORMAT (mode); + REAL_VALUE_TYPE tmp; + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); + if (overflow) + value = inf; + else + value = tmp; +} + +// Like real_arithmetic, but round the result to INF if the operation +// produced inexact results. +// +// ?? There is still one problematic case, i387. With +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in +// XFmode (long_double_type_node), so that case is OK. But without +// -mfpmath=sse, all the SF/DFmode computations are in XFmode +// precision (64-bit mantissa) and only occassionally rounded to +// SF/DFmode (when storing into memory from the 387 stack). Maybe +// this is ok as well though it is just occassionally more precise. ?? + +static void +frange_arithmetic (enum tree_code code, tree type, + REAL_VALUE_TYPE &result, + const REAL_VALUE_TYPE &op1, + const REAL_VALUE_TYPE &op2, + const REAL_VALUE_TYPE &inf) +{ + REAL_VALUE_TYPE value; + enum machine_mode mode = TYPE_MODE (type); + bool mode_composite = MODE_COMPOSITE_P (mode); + + bool inexact = real_arithmetic (&value, code, &op1, &op2); + real_convert (&result, mode, &value); + + // If real_convert above has rounded an inexact value to towards + // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp + // later (real_nextafter). + bool rounding = (flag_rounding_math + && (real_isneg (&inf) + ? real_less (&result, &value) + : !real_less (&value, &result))); + + // Be extra careful if there may be discrepancies between the + // compile and runtime results. + if ((rounding || mode_composite) + && (inexact || !real_identical (&result, &value))) + { + if (mode_composite) + { + bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0; + if (denormal) + { + REAL_VALUE_TYPE tmp; + real_convert (&tmp, DFmode, &value); + frange_nextafter (DFmode, tmp, inf); + real_convert (&result, mode, &tmp); + } + else + frange_nextafter (mode, result, inf); + } + else + frange_nextafter (mode, result, inf); + } +} + // Crop R to [-INF, MAX] where MAX is the maximum representable number // for TYPE. @@ -1620,6 +1738,58 @@ foperator_unordered_equal::op1_range (frange &r, tree type, return true; } +class foperator_plus : public range_operator_float +{ + using range_operator_float::fold_range; + +public: + bool fold_range (frange &r, tree type, + const frange &lh, + const frange &rh, + relation_kind rel = VREL_VARYING) const final override; +} fop_plus; + +bool +foperator_plus::fold_range (frange &r, tree type, + const frange &op1, const frange &op2, + relation_kind) const +{ + if (empty_range_varying (r, type, op1, op2)) + return true; + if (propagate_nans (r, op1, op2)) + return true; + + REAL_VALUE_TYPE lb, ub; + frange_arithmetic (PLUS_EXPR, type, lb, + op1.lower_bound (), op2.lower_bound (), dconstninf); + frange_arithmetic (PLUS_EXPR, type, ub, + op1.upper_bound (), op2.upper_bound (), dconstinf); + + // Handle possible NANs by saturating to the appropriate INF if only + // one end is a NAN. If both ends are a NAN, just return a NAN. + bool lb_nan = real_isnan (&lb); + bool ub_nan = real_isnan (&ub); + if (lb_nan && ub_nan) + { + r.set_nan (type); + return true; + } + if (lb_nan) + lb = dconstninf; + else if (ub_nan) + ub = dconstinf; + + // The setter sets NAN by default for HONOR_NANS. + r.set (type, lb, ub); + + if (lb_nan || ub_nan) + update_nan_sign (r, op1, op2); + else if (!op1.maybe_isnan () && !op2.maybe_isnan ()) + r.clear_nan (); + + return true; +} + // Instantiate a range_op_table for floating point operations. static floating_op_table global_floating_table; @@ -1652,6 +1822,7 @@ floating_op_table::floating_op_table () set (ABS_EXPR, fop_abs); set (NEGATE_EXPR, fop_negate); + set (PLUS_EXPR, fop_plus); } // Return a pointer to the range_operator_float instance, if there is diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-plus.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-plus.c new file mode 100644 index 00000000000..3739ea4e810 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-plus.c @@ -0,0 +1,21 @@ +// { dg-do compile } +// { dg-options "-O2 -fno-tree-fre -fno-tree-dominator-opts -fno-thread-jumps -fdump-tree-vrp2" } + +double BG_SplineLength () +{ + double lastPoint; + double i; + + for (i = 0.01;i<=1;i+=0.1f) + if (!(i != 0.0)) + { + lastPoint = i; + } + else + { + lastPoint = 2; + } + return lastPoint; +} + +// { dg-final { scan-tree-dump-times "return 2\\.0e" 1 "vrp2" } }