diff mbox series

[PR24021] Implement PLUS_EXPR range-op entry for floats.

Message ID 20221013123649.474497-1-aldyh@redhat.com
State New
Headers show
Series [PR24021] Implement PLUS_EXPR range-op entry for floats. | expand

Commit Message

Aldy Hernandez Oct. 13, 2022, 12:36 p.m. UTC
[Jakub, this is a cleaned up version of what we iterated on earlier
this summer.  It contains additional smarts to propagate NAN signs on
entry.  I'd like a nod before committing.]

This is the range-op entry for floating point PLUS_EXPR.  It's the
most intricate range entry we have so far, because we need to keep
track of rounding and target FP formats.  This will be the last FP
entry I commit, mostly to avoid disturbing the tree any further, and
also because what we have so far is enough for a solid VRP.

So far we track NANs and signs correctly.  We also handle relationals
(symbolics and numeric), both ordered and unordered, ABS_EXPR and
NEGATE_EXPR which are used to fold __builtin_isinf, and __builtin_sign
(__builtin_copysign is coming up).  All in all, I think this provide
more than enough for basic VRP on floats, as well as provide a basis
to flesh out the rest if there's interest.

My goal with this entry is to provide a template for additional binary
operators, as they tend to follow a similar pattern: handle NANs, do
the arithmetic while keeping track of rounding, and adjust for NAN.  I
may abstract the general parts as we do for irange's fold_range and
wi_fold.

Oh yeah... and I'd like to finally close this PR ;-).

How does this look?

	PR tree-optimization/24021

gcc/ChangeLog:

	* range-op-float.cc (update_nan_sign): New.
	(propagate_nans): New.
	(frange_nextafter): New.
	(frange_arithmetic): New.
	(class foperator_plus): New.
	(floating_op_table::floating_op_table): Add PLUS_EXPR entry.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/vrp-float-plus.c: New test.
---
 gcc/range-op-float.cc                         | 171 ++++++++++++++++++
 .../gcc.dg/tree-ssa/vrp-float-plus.c          |  21 +++
 2 files changed, 192 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-plus.c

Comments

Toon Moene Oct. 13, 2022, 1:02 p.m. UTC | #1
On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote:

> 	PR tree-optimization/24021

Ah - Verboten in Fortran:

$ cat d.f
       DOUBLE PRECISION A, X
       A = 0.0
       DO X = 0.1, 1.0
          A = A + X
       ENDDO
       END
$ gfortran d.f
d.f:3:9:

     3 |       DO X = 0.1, 1.0
       |         1
Warning: Deleted feature: Loop variable at (1) must be integer
d.f:3:12:

     3 |       DO X = 0.1, 1.0
       |            1
Warning: Deleted feature: Start expression in DO loop at (1) must be integer
d.f:3:17:

     3 |       DO X = 0.1, 1.0
       |                 1
Warning: Deleted feature: End expression in DO loop at (1) must be integer
Aldy Hernandez Oct. 13, 2022, 1:44 p.m. UTC | #2
I'm not following.  My patch doesn't affect this behavior.

What am I missing?

Aldy

On Thu, Oct 13, 2022 at 3:04 PM Toon Moene <toon@moene.org> wrote:
>
> On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote:
>
> >       PR tree-optimization/24021
>
> Ah - Verboten in Fortran:
>
> $ cat d.f
>        DOUBLE PRECISION A, X
>        A = 0.0
>        DO X = 0.1, 1.0
>           A = A + X
>        ENDDO
>        END
> $ gfortran d.f
> d.f:3:9:
>
>      3 |       DO X = 0.1, 1.0
>        |         1
> Warning: Deleted feature: Loop variable at (1) must be integer
> d.f:3:12:
>
>      3 |       DO X = 0.1, 1.0
>        |            1
> Warning: Deleted feature: Start expression in DO loop at (1) must be integer
> d.f:3:17:
>
>      3 |       DO X = 0.1, 1.0
>        |                 1
> Warning: Deleted feature: End expression in DO loop at (1) must be integer
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
>
Toon Moene Oct. 13, 2022, 1:52 p.m. UTC | #3
It was just a comment on the code of the PR ...

Toon.

On 10/13/22 15:44, Aldy Hernandez wrote:

> I'm not following.  My patch doesn't affect this behavior.
> 
> What am I missing?
> 
> Aldy
> 
> On Thu, Oct 13, 2022 at 3:04 PM Toon Moene <toon@moene.org> wrote:
>>
>> On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote:
>>
>>>        PR tree-optimization/24021
>>
>> Ah - Verboten in Fortran:
>>
>> $ cat d.f
>>         DOUBLE PRECISION A, X
>>         A = 0.0
>>         DO X = 0.1, 1.0
>>            A = A + X
>>         ENDDO
>>         END
>> $ gfortran d.f
>> d.f:3:9:
>>
>>       3 |       DO X = 0.1, 1.0
>>         |         1
>> Warning: Deleted feature: Loop variable at (1) must be integer
>> d.f:3:12:
>>
>>       3 |       DO X = 0.1, 1.0
>>         |            1
>> Warning: Deleted feature: Start expression in DO loop at (1) must be integer
>> d.f:3:17:
>>
>>       3 |       DO X = 0.1, 1.0
>>         |                 1
>> Warning: Deleted feature: End expression in DO loop at (1) must be integer
>>
>> --
>> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
>> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
>>
>
Jakub Jelinek Oct. 13, 2022, 5:57 p.m. UTC | #4
On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote:
> +// Like real_arithmetic, but round the result to INF if the operation
> +// produced inexact results.
> +//
> +// ?? There is still one problematic case, i387.  With
> +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
> +// XFmode (long_double_type_node), so that case is OK.  But without
> +// -mfpmath=sse, all the SF/DFmode computations are in XFmode
> +// precision (64-bit mantissa) and only occassionally rounded to
> +// SF/DFmode (when storing into memory from the 387 stack).  Maybe
> +// this is ok as well though it is just occassionally more precise. ??
> +
> +static void
> +frange_arithmetic (enum tree_code code, tree type,
> +		   REAL_VALUE_TYPE &result,
> +		   const REAL_VALUE_TYPE &op1,
> +		   const REAL_VALUE_TYPE &op2,
> +		   const REAL_VALUE_TYPE &inf)
> +{
> +  REAL_VALUE_TYPE value;
> +  enum machine_mode mode = TYPE_MODE (type);
> +  bool mode_composite = MODE_COMPOSITE_P (mode);
> +
> +  bool inexact = real_arithmetic (&value, code, &op1, &op2);
> +  real_convert (&result, mode, &value);
> +
> +  // If real_convert above has rounded an inexact value to towards
> +  // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp
> +  // later (real_nextafter).
> +  bool rounding = (flag_rounding_math
> +		   && (real_isneg (&inf)
> +		       ? real_less (&result, &value)
> +		       : !real_less (&value, &result)));

I thought the agreement during Cauldron was that we'd do this always,
regardless of flag_rounding_math.
Because excess precision (the fast one like on ia32 or -mfpmath=387 on
x86_64), or -frounding-math, or FMA contraction can all increase precision
and worst case it all behaves like -frounding-math for the ranges.

So, perhaps use:
  if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value)
					    : !real_less (&value, &result))
      && (inexact || !real_identical (&result, &value))))
?
No need to do the real_isneg/real_less stuff for mode_composite, then
we do it always for inexacts, but otherwise we check if the rounding
performed by real.cc has been in the conservative direction (for upper
bound to +inf, for lower bound to -inf), if yes, we don't need to do
anything, if yes, we frange_nextafter.

As discussed, for mode_composite, I think we want to do the extra
stuff for inexact denormals and otherwise do the nextafter unconditionally,
because our internal mode_composite representation isn't precise enough.

> +  // Be extra careful if there may be discrepancies between the
> +  // compile and runtime results.
> +  if ((rounding || mode_composite)
> +      && (inexact || !real_identical (&result, &value)))
> +    {
> +      if (mode_composite)
> +	{
> +	  bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0;

Use real_isdenormal here?
Though, real_iszero needs the same thing.

> +	  if (denormal)
> +	    {
> +	      REAL_VALUE_TYPE tmp;

And explain here why is this, that IBM extended denormals have just
DFmode precision.
Though, now that I think about it, while this is correct for denormals,

> +	      real_convert (&tmp, DFmode, &value);
> +	      frange_nextafter (DFmode, tmp, inf);
> +	      real_convert (&result, mode, &tmp);
> +	    }

there are also the cases where the higher double exponent is in the
[__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
If the upper double is denormal in the DFmode sense, so smaller absolute
value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to
do, the lower double must be always +/- zero.
Now, if the result is __DBL_MIN__, the upper double is already normalized
but we can add __DBL_DENORM_MIN__ to it, which will make the number have
54-bit precision.
If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__
and make it 55-bit precision.  Etc. until we reach __DBL_MIN__ * 2e53
where it acts like fully normalized 106-bit precision number.
I must say I'm not really sure what real_nextafter is doing in those cases,
I'm afraid it doesn't handle it correctly but the only other use
of real_nextafter is guarded with:
  /* Don't handle composite modes, nor decimal, nor modes without
     inf or denorm at least for now.  */
  if (format->pnan < format->p
      || format->b == 10
      || !format->has_inf
      || !format->has_denorm)
    return false;
so it isn't that big deal except for ranges.

	Jakub
Aldy Hernandez Oct. 14, 2022, 8:04 a.m. UTC | #5
On Thu, Oct 13, 2022 at 3:52 PM Toon Moene <toon@moene.org> wrote:
>
> It was just a comment on the code of the PR ...

Ahh... the patch adds support for ranger handling of floating point
PLUS_EXPRs in the IL.  The PR's testcase is just one of the many
things it will be able to solve.

Aldy

>
> Toon.
>
> On 10/13/22 15:44, Aldy Hernandez wrote:
>
> > I'm not following.  My patch doesn't affect this behavior.
> >
> > What am I missing?
> >
> > Aldy
> >
> > On Thu, Oct 13, 2022 at 3:04 PM Toon Moene <toon@moene.org> wrote:
> >>
> >> On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote:
> >>
> >>>        PR tree-optimization/24021
> >>
> >> Ah - Verboten in Fortran:
> >>
> >> $ cat d.f
> >>         DOUBLE PRECISION A, X
> >>         A = 0.0
> >>         DO X = 0.1, 1.0
> >>            A = A + X
> >>         ENDDO
> >>         END
> >> $ gfortran d.f
> >> d.f:3:9:
> >>
> >>       3 |       DO X = 0.1, 1.0
> >>         |         1
> >> Warning: Deleted feature: Loop variable at (1) must be integer
> >> d.f:3:12:
> >>
> >>       3 |       DO X = 0.1, 1.0
> >>         |            1
> >> Warning: Deleted feature: Start expression in DO loop at (1) must be integer
> >> d.f:3:17:
> >>
> >>       3 |       DO X = 0.1, 1.0
> >>         |                 1
> >> Warning: Deleted feature: End expression in DO loop at (1) must be integer
> >>
> >> --
> >> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> >> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> >>
> >
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
>
Aldy Hernandez Oct. 17, 2022, 6:21 a.m. UTC | #6
On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote:
> > +// Like real_arithmetic, but round the result to INF if the operation
> > +// produced inexact results.
> > +//
> > +// ?? There is still one problematic case, i387.  With
> > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
> > +// XFmode (long_double_type_node), so that case is OK.  But without
> > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode
> > +// precision (64-bit mantissa) and only occassionally rounded to
> > +// SF/DFmode (when storing into memory from the 387 stack).  Maybe
> > +// this is ok as well though it is just occassionally more precise. ??
> > +
> > +static void
> > +frange_arithmetic (enum tree_code code, tree type,
> > +                REAL_VALUE_TYPE &result,
> > +                const REAL_VALUE_TYPE &op1,
> > +                const REAL_VALUE_TYPE &op2,
> > +                const REAL_VALUE_TYPE &inf)
> > +{
> > +  REAL_VALUE_TYPE value;
> > +  enum machine_mode mode = TYPE_MODE (type);
> > +  bool mode_composite = MODE_COMPOSITE_P (mode);
> > +
> > +  bool inexact = real_arithmetic (&value, code, &op1, &op2);
> > +  real_convert (&result, mode, &value);
> > +
> > +  // If real_convert above has rounded an inexact value to towards
> > +  // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp
> > +  // later (real_nextafter).
> > +  bool rounding = (flag_rounding_math
> > +                && (real_isneg (&inf)
> > +                    ? real_less (&result, &value)
> > +                    : !real_less (&value, &result)));
>
> I thought the agreement during Cauldron was that we'd do this always,
> regardless of flag_rounding_math.
> Because excess precision (the fast one like on ia32 or -mfpmath=387 on
> x86_64), or -frounding-math, or FMA contraction can all increase precision
> and worst case it all behaves like -frounding-math for the ranges.
>
> So, perhaps use:
>   if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value)
>                                             : !real_less (&value, &result))
>       && (inexact || !real_identical (&result, &value))))

Done.

> ?
> No need to do the real_isneg/real_less stuff for mode_composite, then
> we do it always for inexacts, but otherwise we check if the rounding
> performed by real.cc has been in the conservative direction (for upper
> bound to +inf, for lower bound to -inf), if yes, we don't need to do
> anything, if yes, we frange_nextafter.
>
> As discussed, for mode_composite, I think we want to do the extra
> stuff for inexact denormals and otherwise do the nextafter unconditionally,
> because our internal mode_composite representation isn't precise enough.
>
> > +  // Be extra careful if there may be discrepancies between the
> > +  // compile and runtime results.
> > +  if ((rounding || mode_composite)
> > +      && (inexact || !real_identical (&result, &value)))
> > +    {
> > +      if (mode_composite)
> > +     {
> > +       bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0;
>
> Use real_isdenormal here?

Done.

> Though, real_iszero needs the same thing.

So... real_isdenormal() || real_iszero() as in the attached patch?

>
> > +       if (denormal)
> > +         {
> > +           REAL_VALUE_TYPE tmp;
>
> And explain here why is this, that IBM extended denormals have just
> DFmode precision.

Done.

> Though, now that I think about it, while this is correct for denormals,
>
> > +           real_convert (&tmp, DFmode, &value);
> > +           frange_nextafter (DFmode, tmp, inf);
> > +           real_convert (&result, mode, &tmp);
> > +         }
>
> there are also the cases where the higher double exponent is in the
> [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so.
> https://en.wikipedia.org/wiki/Double-precision_floating-point_format
> If the upper double is denormal in the DFmode sense, so smaller absolute
> value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to
> do, the lower double must be always +/- zero.
> Now, if the result is __DBL_MIN__, the upper double is already normalized
> but we can add __DBL_DENORM_MIN__ to it, which will make the number have
> 54-bit precision.
> If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__
> and make it 55-bit precision.  Etc. until we reach __DBL_MIN__ * 2e53
> where it acts like fully normalized 106-bit precision number.
> I must say I'm not really sure what real_nextafter is doing in those cases,
> I'm afraid it doesn't handle it correctly but the only other use
> of real_nextafter is guarded with:
>   /* Don't handle composite modes, nor decimal, nor modes without
>      inf or denorm at least for now.  */
>   if (format->pnan < format->p
>       || format->b == 10
>       || !format->has_inf
>       || !format->has_denorm)
>     return false;

Dunno.  Is there a conservative thing we can do for mode_composites
that aren't denormal or zero?

How does this look?
Aldy
Aldy Hernandez Oct. 24, 2022, 6:04 a.m. UTC | #7
PING

On Mon, Oct 17, 2022 at 8:21 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote:
> > > +// Like real_arithmetic, but round the result to INF if the operation
> > > +// produced inexact results.
> > > +//
> > > +// ?? There is still one problematic case, i387.  With
> > > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
> > > +// XFmode (long_double_type_node), so that case is OK.  But without
> > > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode
> > > +// precision (64-bit mantissa) and only occassionally rounded to
> > > +// SF/DFmode (when storing into memory from the 387 stack).  Maybe
> > > +// this is ok as well though it is just occassionally more precise. ??
> > > +
> > > +static void
> > > +frange_arithmetic (enum tree_code code, tree type,
> > > +                REAL_VALUE_TYPE &result,
> > > +                const REAL_VALUE_TYPE &op1,
> > > +                const REAL_VALUE_TYPE &op2,
> > > +                const REAL_VALUE_TYPE &inf)
> > > +{
> > > +  REAL_VALUE_TYPE value;
> > > +  enum machine_mode mode = TYPE_MODE (type);
> > > +  bool mode_composite = MODE_COMPOSITE_P (mode);
> > > +
> > > +  bool inexact = real_arithmetic (&value, code, &op1, &op2);
> > > +  real_convert (&result, mode, &value);
> > > +
> > > +  // If real_convert above has rounded an inexact value to towards
> > > +  // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp
> > > +  // later (real_nextafter).
> > > +  bool rounding = (flag_rounding_math
> > > +                && (real_isneg (&inf)
> > > +                    ? real_less (&result, &value)
> > > +                    : !real_less (&value, &result)));
> >
> > I thought the agreement during Cauldron was that we'd do this always,
> > regardless of flag_rounding_math.
> > Because excess precision (the fast one like on ia32 or -mfpmath=387 on
> > x86_64), or -frounding-math, or FMA contraction can all increase precision
> > and worst case it all behaves like -frounding-math for the ranges.
> >
> > So, perhaps use:
> >   if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value)
> >                                             : !real_less (&value, &result))
> >       && (inexact || !real_identical (&result, &value))))
>
> Done.
>
> > ?
> > No need to do the real_isneg/real_less stuff for mode_composite, then
> > we do it always for inexacts, but otherwise we check if the rounding
> > performed by real.cc has been in the conservative direction (for upper
> > bound to +inf, for lower bound to -inf), if yes, we don't need to do
> > anything, if yes, we frange_nextafter.
> >
> > As discussed, for mode_composite, I think we want to do the extra
> > stuff for inexact denormals and otherwise do the nextafter unconditionally,
> > because our internal mode_composite representation isn't precise enough.
> >
> > > +  // Be extra careful if there may be discrepancies between the
> > > +  // compile and runtime results.
> > > +  if ((rounding || mode_composite)
> > > +      && (inexact || !real_identical (&result, &value)))
> > > +    {
> > > +      if (mode_composite)
> > > +     {
> > > +       bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0;
> >
> > Use real_isdenormal here?
>
> Done.
>
> > Though, real_iszero needs the same thing.
>
> So... real_isdenormal() || real_iszero() as in the attached patch?
>
> >
> > > +       if (denormal)
> > > +         {
> > > +           REAL_VALUE_TYPE tmp;
> >
> > And explain here why is this, that IBM extended denormals have just
> > DFmode precision.
>
> Done.
>
> > Though, now that I think about it, while this is correct for denormals,
> >
> > > +           real_convert (&tmp, DFmode, &value);
> > > +           frange_nextafter (DFmode, tmp, inf);
> > > +           real_convert (&result, mode, &tmp);
> > > +         }
> >
> > there are also the cases where the higher double exponent is in the
> > [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so.
> > https://en.wikipedia.org/wiki/Double-precision_floating-point_format
> > If the upper double is denormal in the DFmode sense, so smaller absolute
> > value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to
> > do, the lower double must be always +/- zero.
> > Now, if the result is __DBL_MIN__, the upper double is already normalized
> > but we can add __DBL_DENORM_MIN__ to it, which will make the number have
> > 54-bit precision.
> > If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__
> > and make it 55-bit precision.  Etc. until we reach __DBL_MIN__ * 2e53
> > where it acts like fully normalized 106-bit precision number.
> > I must say I'm not really sure what real_nextafter is doing in those cases,
> > I'm afraid it doesn't handle it correctly but the only other use
> > of real_nextafter is guarded with:
> >   /* Don't handle composite modes, nor decimal, nor modes without
> >      inf or denorm at least for now.  */
> >   if (format->pnan < format->p
> >       || format->b == 10
> >       || !format->has_inf
> >       || !format->has_denorm)
> >     return false;
>
> Dunno.  Is there a conservative thing we can do for mode_composites
> that aren't denormal or zero?
>
> How does this look?
> Aldy
Jeff Law Oct. 29, 2022, 4:55 a.m. UTC | #8
On 10/24/22 00:04, Aldy Hernandez via Gcc-patches wrote:
> PING

I'd be a lot more comfortable if Jakub would chime in here.


Jeff
Aldy Hernandez Oct. 31, 2022, 8:42 a.m. UTC | #9
Ping ping

On Mon, Oct 24, 2022, 08:04 Aldy Hernandez <aldyh@redhat.com> wrote:

> PING
>
> On Mon, Oct 17, 2022 at 8:21 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote:
> > > > +// Like real_arithmetic, but round the result to INF if the
> operation
> > > > +// produced inexact results.
> > > > +//
> > > > +// ?? There is still one problematic case, i387.  With
> > > > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic
> in
> > > > +// XFmode (long_double_type_node), so that case is OK.  But without
> > > > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode
> > > > +// precision (64-bit mantissa) and only occassionally rounded to
> > > > +// SF/DFmode (when storing into memory from the 387 stack).  Maybe
> > > > +// this is ok as well though it is just occassionally more precise.
> ??
> > > > +
> > > > +static void
> > > > +frange_arithmetic (enum tree_code code, tree type,
> > > > +                REAL_VALUE_TYPE &result,
> > > > +                const REAL_VALUE_TYPE &op1,
> > > > +                const REAL_VALUE_TYPE &op2,
> > > > +                const REAL_VALUE_TYPE &inf)
> > > > +{
> > > > +  REAL_VALUE_TYPE value;
> > > > +  enum machine_mode mode = TYPE_MODE (type);
> > > > +  bool mode_composite = MODE_COMPOSITE_P (mode);
> > > > +
> > > > +  bool inexact = real_arithmetic (&value, code, &op1, &op2);
> > > > +  real_convert (&result, mode, &value);
> > > > +
> > > > +  // If real_convert above has rounded an inexact value to towards
> > > > +  // inf, we can keep the result as is, otherwise we'll adjust by 1
> ulp
> > > > +  // later (real_nextafter).
> > > > +  bool rounding = (flag_rounding_math
> > > > +                && (real_isneg (&inf)
> > > > +                    ? real_less (&result, &value)
> > > > +                    : !real_less (&value, &result)));
> > >
> > > I thought the agreement during Cauldron was that we'd do this always,
> > > regardless of flag_rounding_math.
> > > Because excess precision (the fast one like on ia32 or -mfpmath=387 on
> > > x86_64), or -frounding-math, or FMA contraction can all increase
> precision
> > > and worst case it all behaves like -frounding-math for the ranges.
> > >
> > > So, perhaps use:
> > >   if ((mode_composite || (real_isneg (&inf) ? real_less (&result,
> &value)
> > >                                             : !real_less (&value,
> &result))
> > >       && (inexact || !real_identical (&result, &value))))
> >
> > Done.
> >
> > > ?
> > > No need to do the real_isneg/real_less stuff for mode_composite, then
> > > we do it always for inexacts, but otherwise we check if the rounding
> > > performed by real.cc has been in the conservative direction (for upper
> > > bound to +inf, for lower bound to -inf), if yes, we don't need to do
> > > anything, if yes, we frange_nextafter.
> > >
> > > As discussed, for mode_composite, I think we want to do the extra
> > > stuff for inexact denormals and otherwise do the nextafter
> unconditionally,
> > > because our internal mode_composite representation isn't precise
> enough.
> > >
> > > > +  // Be extra careful if there may be discrepancies between the
> > > > +  // compile and runtime results.
> > > > +  if ((rounding || mode_composite)
> > > > +      && (inexact || !real_identical (&result, &value)))
> > > > +    {
> > > > +      if (mode_composite)
> > > > +     {
> > > > +       bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0;
> > >
> > > Use real_isdenormal here?
> >
> > Done.
> >
> > > Though, real_iszero needs the same thing.
> >
> > So... real_isdenormal() || real_iszero() as in the attached patch?
> >
> > >
> > > > +       if (denormal)
> > > > +         {
> > > > +           REAL_VALUE_TYPE tmp;
> > >
> > > And explain here why is this, that IBM extended denormals have just
> > > DFmode precision.
> >
> > Done.
> >
> > > Though, now that I think about it, while this is correct for denormals,
> > >
> > > > +           real_convert (&tmp, DFmode, &value);
> > > > +           frange_nextafter (DFmode, tmp, inf);
> > > > +           real_convert (&result, mode, &tmp);
> > > > +         }
> > >
> > > there are also the cases where the higher double exponent is in the
> > > [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so.
> > > https://en.wikipedia.org/wiki/Double-precision_floating-point_format
> > > If the upper double is denormal in the DFmode sense, so smaller
> absolute
> > > value than __DBL_MIN__, then doing nextafter in DFmode is the right
> thing to
> > > do, the lower double must be always +/- zero.
> > > Now, if the result is __DBL_MIN__, the upper double is already
> normalized
> > > but we can add __DBL_DENORM_MIN__ to it, which will make the number
> have
> > > 54-bit precision.
> > > If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__
> > > and make it 55-bit precision.  Etc. until we reach __DBL_MIN__ * 2e53
> > > where it acts like fully normalized 106-bit precision number.
> > > I must say I'm not really sure what real_nextafter is doing in those
> cases,
> > > I'm afraid it doesn't handle it correctly but the only other use
> > > of real_nextafter is guarded with:
> > >   /* Don't handle composite modes, nor decimal, nor modes without
> > >      inf or denorm at least for now.  */
> > >   if (format->pnan < format->p
> > >       || format->b == 10
> > >       || !format->has_inf
> > >       || !format->has_denorm)
> > >     return false;
> >
> > Dunno.  Is there a conservative thing we can do for mode_composites
> > that aren't denormal or zero?
> >
> > How does this look?
> > Aldy
>
Jakub Jelinek Nov. 4, 2022, 1:16 p.m. UTC | #10
On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote:
> --- a/gcc/range-op-float.cc
> +++ b/gcc/range-op-float.cc
> @@ -200,6 +200,116 @@ frelop_early_resolve (irange &r, tree type,
>  	  && relop_early_resolve (r, type, op1, op2, rel, my_rel));
>  }
>  
> +// If R contains a NAN of unknown sign, update the NAN's signbit
> +// depending on two operands.
> +
> +inline void
> +update_nan_sign (frange &r, const frange &op1, const frange &op2)
> +{
> +  if (!r.maybe_isnan ())
> +    return;
> +
> +  bool op1_nan = op1.maybe_isnan ();
> +  bool op2_nan = op2.maybe_isnan ();
> +  bool sign1, sign2;
> +
> +  gcc_checking_assert (!r.nan_signbit_p (sign1));
> +  if (op1_nan && op2_nan)
> +    {
> +      if (op1.nan_signbit_p (sign1) && op2.nan_signbit_p (sign2))
> +	r.update_nan (sign1 | sign2);
> +    }
> +  else if (op1_nan)
> +    {
> +      if (op1.nan_signbit_p (sign1))
> +	r.update_nan (sign1);
> +    }
> +  else if (op2_nan)
> +    {
> +      if (op2.nan_signbit_p (sign2))
> +	r.update_nan (sign2);
> +    }
> +}

IEEE 754-2008 says:
"When either an input or result is NaN, this standard does not interpret the sign of a NaN. Note, however,
that operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result,
sometimes based upon the sign bit of a NaN operand. The logical predicate totalOrder is also affected by
the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN
result, even when there is only one input NaN, or when the NaN is produced from an invalid
operation."
so, while one can e.g. see on x86_64 that in simple -O0
int
main ()
{
  volatile float f1 = __builtin_nansf ("");
  volatile float f2 = __builtin_copysignf (__builtin_nansf (""), -1.0f);
  volatile float f3 = __builtin_nanf ("");
  volatile float f4 = __builtin_copysignf (__builtin_nanf (""), -1.0f);
  volatile float fzero = 0.0f;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2);
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4);
  f1 = -f1;
  f2 = -f2;
  f3 = -f3;
  f4 = -f4;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2);
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4);
  volatile float f5 = f1 + fzero;
  volatile float f6 = fzero + f1;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f2 + fzero;
  f6 = fzero + f2;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f3 + fzero;
  f6 = fzero + f3;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f4 + fzero;
  f6 = fzero + f4;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f1 + f2;
  f6 = f2 + f1;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f2 + f3;
  f6 = f3 + f2;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f3 + f4;
  f6 = f4 + f3;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f4 + f1;
  f6 = f1 + f4;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  return 0;
}
result of:
7fa00000 ffa00000
7fc00000 ffc00000
ffa00000 7fa00000
ffc00000 7fc00000
ffe00000 ffe00000
7fe00000 7fe00000
ffc00000 ffc00000
7fc00000 7fc00000
7fe00000 ffe00000
ffc00000 7fe00000
7fc00000 ffc00000
ffe00000 7fc00000
which basically shows that copysign copies sign bit, negation toggles it,
binary operation with a single NaN (quiet or signaling) get that NaN
with its sign and for binary operation on 2 NaNs (again quiet or signaling)
one gets sign from the second? operand, I think the above IEEE text
doesn't guarantee it except for a simple assignment (but with no mode
conversion; that one copies the bit), NEGATE_EXPR (toggles it),
ABS_EXPR (clears it), __builtin_copysign*/IFN_COPYSIGN (copies it from
the second operand).  Everything else, including invalid operation cases,
should set the possible sign bit values of NANs to 0/1 rather than just
one of them.  Perhaps COND_EXPR 2nd/3rd operand is a move too.

As you are adding a binary operation, that should be one of those cases
where we drop the NAN sign to VARYING.

> +
> +// If either operand is a NAN, set R to the combination of both NANs
> +// signwise and return TRUE.
> +
> +inline bool
> +propagate_nans (frange &r, const frange &op1, const frange &op2)
> +{
> +  if (op1.known_isnan () || op2.known_isnan ())
> +    {
> +      r.set_nan (op1.type ());
> +      update_nan_sign (r, op1, op2);
> +      return true;
> +    }
> +  return false;
> +}
> +
> +// Set VALUE to its next real value, or INF if the operation overflows.
> +
> +inline void
> +frange_nextafter (enum machine_mode mode,
> +		  REAL_VALUE_TYPE &value,
> +		  const REAL_VALUE_TYPE &inf)
> +{
> +  const real_format *fmt = REAL_MODE_FORMAT (mode);
> +  REAL_VALUE_TYPE tmp;
> +  bool overflow = real_nextafter (&tmp, fmt, &value, &inf);
> +  if (overflow)
> +    value = inf;
> +  else
> +    value = tmp;
> +}
> +
> +// Like real_arithmetic, but round the result to INF if the operation
> +// produced inexact results.
> +//
> +// ?? There is still one problematic case, i387.  With
> +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
> +// XFmode (long_double_type_node), so that case is OK.  But without
> +// -mfpmath=sse, all the SF/DFmode computations are in XFmode
> +// precision (64-bit mantissa) and only occassionally rounded to
> +// SF/DFmode (when storing into memory from the 387 stack).  Maybe
> +// this is ok as well though it is just occassionally more precise. ??
> +
> +static void
> +frange_arithmetic (enum tree_code code, tree type,
> +		   REAL_VALUE_TYPE &result,
> +		   const REAL_VALUE_TYPE &op1,
> +		   const REAL_VALUE_TYPE &op2,
> +		   const REAL_VALUE_TYPE &inf)
> +{
> +  REAL_VALUE_TYPE value;
> +  enum machine_mode mode = TYPE_MODE (type);
> +  bool mode_composite = MODE_COMPOSITE_P (mode);
> +
> +  bool inexact = real_arithmetic (&value, code, &op1, &op2);
> +  real_convert (&result, mode, &value);
> +
> +  // Be extra careful if there may be discrepancies between the
> +  // compile and runtime results.
> +  if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value)
> +			  : !real_less (&value, &result)))
> +      && (inexact || !real_identical (&result, &value)))
> +    {
> +      if (mode_composite)
> +	{
> +	  if (real_isdenormal (&result)
> +	      || real_iszero (&result))
> +	    {
> +	      // IBM extended denormals only have DFmode precision.
> +	      REAL_VALUE_TYPE tmp;
> +	      real_convert (&tmp, DFmode, &value);
> +	      frange_nextafter (DFmode, tmp, inf);
> +	      real_convert (&result, mode, &tmp);
> +	      return;
> +	    }

As discussed before, this might not be correct for the larger double double
denormals (is correct for real_iszero).  I'll try to play with it in
self-tests and compare with what one gets from libm nextafterl:
int
main ()
{
  long double a = __builtin_nextafterl (0.0L, 1.0L);
  __builtin_printf ("%La\n", a);
  for (int i = 0; i < 108; i++, a *= 2.0L)
    __builtin_printf ("%d %La %La\n", i, a, __builtin_nextafterl (a, 1.0L));
}
0x0.0000000000001p-1022
0 0x0.0000000000001p-1022 0x0.0000000000002p-1022
1 0x0.0000000000002p-1022 0x0.0000000000003p-1022
2 0x0.0000000000004p-1022 0x0.0000000000005p-1022
3 0x0.0000000000008p-1022 0x0.0000000000009p-1022
4 0x0.000000000001p-1022 0x0.0000000000011p-1022
5 0x0.000000000002p-1022 0x0.0000000000021p-1022
6 0x0.000000000004p-1022 0x0.0000000000041p-1022
7 0x0.000000000008p-1022 0x0.0000000000081p-1022
8 0x0.00000000001p-1022 0x0.0000000000101p-1022
9 0x0.00000000002p-1022 0x0.0000000000201p-1022
10 0x0.00000000004p-1022 0x0.0000000000401p-1022
11 0x0.00000000008p-1022 0x0.0000000000801p-1022
12 0x0.0000000001p-1022 0x0.0000000001001p-1022
13 0x0.0000000002p-1022 0x0.0000000002001p-1022
14 0x0.0000000004p-1022 0x0.0000000004001p-1022
15 0x0.0000000008p-1022 0x0.0000000008001p-1022
16 0x0.000000001p-1022 0x0.0000000010001p-1022
17 0x0.000000002p-1022 0x0.0000000020001p-1022
18 0x0.000000004p-1022 0x0.0000000040001p-1022
19 0x0.000000008p-1022 0x0.0000000080001p-1022
20 0x0.00000001p-1022 0x0.0000000100001p-1022
21 0x0.00000002p-1022 0x0.0000000200001p-1022
22 0x0.00000004p-1022 0x0.0000000400001p-1022
23 0x0.00000008p-1022 0x0.0000000800001p-1022
24 0x0.0000001p-1022 0x0.0000001000001p-1022
25 0x0.0000002p-1022 0x0.0000002000001p-1022
26 0x0.0000004p-1022 0x0.0000004000001p-1022
27 0x0.0000008p-1022 0x0.0000008000001p-1022
28 0x0.000001p-1022 0x0.0000010000001p-1022
29 0x0.000002p-1022 0x0.0000020000001p-1022
30 0x0.000004p-1022 0x0.0000040000001p-1022
31 0x0.000008p-1022 0x0.0000080000001p-1022
32 0x0.00001p-1022 0x0.0000100000001p-1022
33 0x0.00002p-1022 0x0.0000200000001p-1022
34 0x0.00004p-1022 0x0.0000400000001p-1022
35 0x0.00008p-1022 0x0.0000800000001p-1022
36 0x0.0001p-1022 0x0.0001000000001p-1022
37 0x0.0002p-1022 0x0.0002000000001p-1022
38 0x0.0004p-1022 0x0.0004000000001p-1022
39 0x0.0008p-1022 0x0.0008000000001p-1022
40 0x0.001p-1022 0x0.0010000000001p-1022
41 0x0.002p-1022 0x0.0020000000001p-1022
42 0x0.004p-1022 0x0.0040000000001p-1022
43 0x0.008p-1022 0x0.0080000000001p-1022
44 0x0.01p-1022 0x0.0100000000001p-1022
45 0x0.02p-1022 0x0.0200000000001p-1022
46 0x0.04p-1022 0x0.0400000000001p-1022
47 0x0.08p-1022 0x0.0800000000001p-1022
48 0x0.1p-1022 0x0.1000000000001p-1022
49 0x0.2p-1022 0x0.2000000000001p-1022
50 0x0.4p-1022 0x0.4000000000001p-1022
51 0x0.8p-1022 0x0.8000000000001p-1022
52 0x1p-1022 0x1.0000000000001p-1022
53 0x1p-1021 0x1.00000000000008p-1021
54 0x1p-1020 0x1.00000000000004p-1020
55 0x1p-1019 0x1.00000000000002p-1019
56 0x1p-1018 0x1.00000000000001p-1018
57 0x1p-1017 0x1.000000000000008p-1017
58 0x1p-1016 0x1.000000000000004p-1016
59 0x1p-1015 0x1.000000000000002p-1015
60 0x1p-1014 0x1.000000000000001p-1014
61 0x1p-1013 0x1.0000000000000008p-1013
62 0x1p-1012 0x1.0000000000000004p-1012
63 0x1p-1011 0x1.0000000000000002p-1011
64 0x1p-1010 0x1.0000000000000001p-1010
65 0x1p-1009 0x1.00000000000000008p-1009
66 0x1p-1008 0x1.00000000000000004p-1008
67 0x1p-1007 0x1.00000000000000002p-1007
68 0x1p-1006 0x1.00000000000000001p-1006
69 0x1p-1005 0x1.000000000000000008p-1005
70 0x1p-1004 0x1.000000000000000004p-1004
71 0x1p-1003 0x1.000000000000000002p-1003
72 0x1p-1002 0x1.000000000000000001p-1002
73 0x1p-1001 0x1.0000000000000000008p-1001
74 0x1p-1000 0x1.0000000000000000004p-1000
75 0x1p-999 0x1.0000000000000000002p-999
76 0x1p-998 0x1.0000000000000000001p-998
77 0x1p-997 0x1.00000000000000000008p-997
78 0x1p-996 0x1.00000000000000000004p-996
79 0x1p-995 0x1.00000000000000000002p-995
80 0x1p-994 0x1.00000000000000000001p-994
81 0x1p-993 0x1.000000000000000000008p-993
82 0x1p-992 0x1.000000000000000000004p-992
83 0x1p-991 0x1.000000000000000000002p-991
84 0x1p-990 0x1.000000000000000000001p-990
85 0x1p-989 0x1.0000000000000000000008p-989
86 0x1p-988 0x1.0000000000000000000004p-988
87 0x1p-987 0x1.0000000000000000000002p-987
88 0x1p-986 0x1.0000000000000000000001p-986
89 0x1p-985 0x1.00000000000000000000008p-985
90 0x1p-984 0x1.00000000000000000000004p-984
91 0x1p-983 0x1.00000000000000000000002p-983
92 0x1p-982 0x1.00000000000000000000001p-982
93 0x1p-981 0x1.000000000000000000000008p-981
94 0x1p-980 0x1.000000000000000000000004p-980
95 0x1p-979 0x1.000000000000000000000002p-979
96 0x1p-978 0x1.000000000000000000000001p-978
97 0x1p-977 0x1.0000000000000000000000008p-977
98 0x1p-976 0x1.0000000000000000000000004p-976
99 0x1p-975 0x1.0000000000000000000000002p-975
100 0x1p-974 0x1.0000000000000000000000001p-974
101 0x1p-973 0x1.00000000000000000000000008p-973
102 0x1p-972 0x1.00000000000000000000000004p-972
103 0x1p-971 0x1.00000000000000000000000002p-971
104 0x1p-970 0x1.00000000000000000000000001p-970
105 0x1p-969 0x1.000000000000000000000000008p-969
106 0x1p-968 0x1.000000000000000000000000008p-968
107 0x1p-967 0x1.000000000000000000000000008p-967

Will need to find out which of these numbers GCC
real_isdenormal actually treats as denormals and if there are
any which aren't.
Powerpc64le -mlong-double-128 -mabi=ibmlongdouble gcc certain prints:
#define __DBL_MIN__ ((double)2.22507385850720138309023271733240406e-308L)
#define __DBL_DENORM_MIN__ ((double)4.94065645841246544176568792868221372e-324L)
#define __LDBL_MIN__ 2.00416836000897277799610805135016205e-292L
#define __LDBL_DENORM_MIN__ 4.94065645841246544176568792868221372e-324L
and so the LDBL denorm min is correctly the same as double denorm min, while
minimum normal is actually quite larger for long double than double (also
ok, but unusual, as e.g. __DBL_MIN__ is much smaller than __FLT_MIN__ and
__FLT128_MIN__ much smaller than __DBL_MIN__).
Note, seems glibc nextafterl for IBM double double is actually adding
__DBL_DENORM_MIN__ to the value for the small ones, so perhaps we should
do that too and not convert to DFmode and back.

	Jakub
Jakub Jelinek Nov. 4, 2022, 7:14 p.m. UTC | #11
On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote:
> +// Set VALUE to its next real value, or INF if the operation overflows.
> +
> +inline void
> +frange_nextafter (enum machine_mode mode,
> +		  REAL_VALUE_TYPE &value,
> +		  const REAL_VALUE_TYPE &inf)
> +{
> +  const real_format *fmt = REAL_MODE_FORMAT (mode);
> +  REAL_VALUE_TYPE tmp;
> +  bool overflow = real_nextafter (&tmp, fmt, &value, &inf);
> +  if (overflow)
> +    value = inf;
> +  else
> +    value = tmp;

Please change the above 5 lines to just
  real_nextafter (&tmp, fmt, &value, &inf);
  value = tmp;
What real_nextafter returns isn't bool whether it overflowed, but
a bool whether some exception should be raised for the nextafter,
which can be either an overflow (in that case tmp will be already set
to infinity of the right sign:
          if (REAL_EXP (r) > fmt->emax)
            {
              get_inf (r, x->sign);
              return true;
            }
or underflow to zero:
  if (REAL_EXP (r) <= fmt->emin - fmt->p)
    {
      get_zero (r, x->sign);
      return true;
    }
or when the result is subnormal:
  return r->cl == rvc_zero || REAL_EXP (r) < fmt->emin;
But we shouldn't care about exceptions, we aren't emitting code, just
estimating floating point range, and +-inf, or +-0, or subnormal
are the right answers.

Just for playing with this, I've tried:
--- range-op-float.cc.jj	2022-11-02 10:06:02.620960861 +0100
+++ range-op-float.cc	2022-11-04 19:49:14.188075070 +0100
@@ -1815,6 +1815,20 @@ frange_float (const char *lb, const char
   return frange (type, min, max);
 }
 
+inline void                                                                                                                                                                        
+frange_nextafter (enum machine_mode mode,                                                                                                                                          
+		  REAL_VALUE_TYPE &value,                                                                                                                                             
+		  const REAL_VALUE_TYPE &inf)                                                                                                                                         
+{                                                                                                                                                                                  
+  const real_format *fmt = REAL_MODE_FORMAT (mode);                                                                                                                                
+  REAL_VALUE_TYPE tmp;                                                                                                                                                             
+  bool overflow = real_nextafter (&tmp, fmt, &value, &inf);                                                                                                                        
+  if (overflow)                                                                                                                                                                    
+    value = inf;                                                                                                                                                                   
+  else                                                                                                                                                                             
+   value = tmp;                                                                                                                                                                   
+}                                                                                                                                                                                  
+
 void
 range_op_float_tests ()
 {
@@ -1833,6 +1847,30 @@ range_op_float_tests ()
   r1 = frange_float ("-1", "-0");
   r1.update_nan (false);
   ASSERT_EQ (r, r1);
+
+  if (MODE_COMPOSITE_P (TYPE_MODE (long_double_type_node)))
+    {
+      enum machine_mode mode = TYPE_MODE (long_double_type_node);
+      REAL_VALUE_TYPE result = dconst0;
+      {
+	REAL_VALUE_TYPE tmp;
+	real_convert (&tmp, DFmode, &result);
+	frange_nextafter (DFmode, tmp, dconstinf);
+	real_convert (&result, mode, &tmp);
+      }
+      for (int i = 0; i < 108; ++i)
+	{
+	  char buf[1024], buf2[1024];
+	  REAL_VALUE_TYPE tmp, tmp2;
+	  real_convert (&tmp, DFmode, &result);
+	  frange_nextafter (DFmode, tmp, dconstinf);
+	  real_convert (&tmp2, mode, &tmp);
+	  real_to_hexadecimal (buf, &result, 1024, 0, 1);
+	  real_to_hexadecimal (buf2, &tmp2, 1024, 0, 1);
+	  fprintf (stderr, "%d %d %s %s\n", i, real_isdenormal (&result), buf, buf2);
+	  real_arithmetic (&result, MULT_EXPR, &result, &dconst2);
+	}
+    }
 }
 
 } // namespace selftest

  
in cross to powerpc64le-linux and I get:
0 0 0x0.8p-1073 0x0.8p-1072
1 0 0x0.8p-1072 0x0.cp-1072
2 0 0x0.8p-1071 0x0.ap-1071
3 0 0x0.8p-1070 0x0.9p-1070
4 0 0x0.8p-1069 0x0.88p-1069
5 0 0x0.8p-1068 0x0.84p-1068
6 0 0x0.8p-1067 0x0.82p-1067
7 0 0x0.8p-1066 0x0.81p-1066
8 0 0x0.8p-1065 0x0.808p-1065
9 0 0x0.8p-1064 0x0.804p-1064
10 0 0x0.8p-1063 0x0.802p-1063
11 0 0x0.8p-1062 0x0.801p-1062
12 0 0x0.8p-1061 0x0.8008p-1061
13 0 0x0.8p-1060 0x0.8004p-1060
14 0 0x0.8p-1059 0x0.8002p-1059
15 0 0x0.8p-1058 0x0.8001p-1058
16 0 0x0.8p-1057 0x0.80008p-1057
17 0 0x0.8p-1056 0x0.80004p-1056
18 0 0x0.8p-1055 0x0.80002p-1055
19 0 0x0.8p-1054 0x0.80001p-1054
20 0 0x0.8p-1053 0x0.800008p-1053
21 0 0x0.8p-1052 0x0.800004p-1052
22 0 0x0.8p-1051 0x0.800002p-1051
23 0 0x0.8p-1050 0x0.800001p-1050
24 0 0x0.8p-1049 0x0.8000008p-1049
25 0 0x0.8p-1048 0x0.8000004p-1048
26 0 0x0.8p-1047 0x0.8000002p-1047
27 0 0x0.8p-1046 0x0.8000001p-1046
28 0 0x0.8p-1045 0x0.80000008p-1045
29 0 0x0.8p-1044 0x0.80000004p-1044
30 0 0x0.8p-1043 0x0.80000002p-1043
31 0 0x0.8p-1042 0x0.80000001p-1042
32 0 0x0.8p-1041 0x0.800000008p-1041
33 0 0x0.8p-1040 0x0.800000004p-1040
34 0 0x0.8p-1039 0x0.800000002p-1039
35 0 0x0.8p-1038 0x0.800000001p-1038
36 0 0x0.8p-1037 0x0.8000000008p-1037
37 0 0x0.8p-1036 0x0.8000000004p-1036
38 0 0x0.8p-1035 0x0.8000000002p-1035
39 0 0x0.8p-1034 0x0.8000000001p-1034
40 0 0x0.8p-1033 0x0.80000000008p-1033
41 0 0x0.8p-1032 0x0.80000000004p-1032
42 0 0x0.8p-1031 0x0.80000000002p-1031
43 0 0x0.8p-1030 0x0.80000000001p-1030
44 0 0x0.8p-1029 0x0.800000000008p-1029
45 0 0x0.8p-1028 0x0.800000000004p-1028
46 0 0x0.8p-1027 0x0.800000000002p-1027
47 0 0x0.8p-1026 0x0.800000000001p-1026
48 0 0x0.8p-1025 0x0.8000000000008p-1025
49 0 0x0.8p-1024 0x0.8000000000004p-1024
50 0 0x0.8p-1023 0x0.8000000000002p-1023
51 0 0x0.8p-1022 0x0.8000000000001p-1022
52 0 0x0.8p-1021 0x0.80000000000008p-1021
53 0 0x0.8p-1020 0x0.80000000000008p-1020
54 0 0x0.8p-1019 0x0.80000000000008p-1019
55 0 0x0.8p-1018 0x0.80000000000008p-1018
56 0 0x0.8p-1017 0x0.80000000000008p-1017
57 0 0x0.8p-1016 0x0.80000000000008p-1016
58 0 0x0.8p-1015 0x0.80000000000008p-1015
59 0 0x0.8p-1014 0x0.80000000000008p-1014
60 0 0x0.8p-1013 0x0.80000000000008p-1013
61 0 0x0.8p-1012 0x0.80000000000008p-1012
62 0 0x0.8p-1011 0x0.80000000000008p-1011
63 0 0x0.8p-1010 0x0.80000000000008p-1010
64 0 0x0.8p-1009 0x0.80000000000008p-1009
65 0 0x0.8p-1008 0x0.80000000000008p-1008
66 0 0x0.8p-1007 0x0.80000000000008p-1007
67 0 0x0.8p-1006 0x0.80000000000008p-1006
68 0 0x0.8p-1005 0x0.80000000000008p-1005
69 0 0x0.8p-1004 0x0.80000000000008p-1004
70 0 0x0.8p-1003 0x0.80000000000008p-1003
71 0 0x0.8p-1002 0x0.80000000000008p-1002
72 0 0x0.8p-1001 0x0.80000000000008p-1001
73 0 0x0.8p-1000 0x0.80000000000008p-1000
74 0 0x0.8p-999 0x0.80000000000008p-999
75 0 0x0.8p-998 0x0.80000000000008p-998
76 0 0x0.8p-997 0x0.80000000000008p-997
77 0 0x0.8p-996 0x0.80000000000008p-996
78 0 0x0.8p-995 0x0.80000000000008p-995
79 0 0x0.8p-994 0x0.80000000000008p-994
80 0 0x0.8p-993 0x0.80000000000008p-993
81 0 0x0.8p-992 0x0.80000000000008p-992
82 0 0x0.8p-991 0x0.80000000000008p-991
83 0 0x0.8p-990 0x0.80000000000008p-990
84 0 0x0.8p-989 0x0.80000000000008p-989
85 0 0x0.8p-988 0x0.80000000000008p-988
86 0 0x0.8p-987 0x0.80000000000008p-987
87 0 0x0.8p-986 0x0.80000000000008p-986
88 0 0x0.8p-985 0x0.80000000000008p-985
89 0 0x0.8p-984 0x0.80000000000008p-984
90 0 0x0.8p-983 0x0.80000000000008p-983
91 0 0x0.8p-982 0x0.80000000000008p-982
92 0 0x0.8p-981 0x0.80000000000008p-981
93 0 0x0.8p-980 0x0.80000000000008p-980
94 0 0x0.8p-979 0x0.80000000000008p-979
95 0 0x0.8p-978 0x0.80000000000008p-978
96 0 0x0.8p-977 0x0.80000000000008p-977
97 0 0x0.8p-976 0x0.80000000000008p-976
98 0 0x0.8p-975 0x0.80000000000008p-975
99 0 0x0.8p-974 0x0.80000000000008p-974
100 0 0x0.8p-973 0x0.80000000000008p-973
101 0 0x0.8p-972 0x0.80000000000008p-972
102 0 0x0.8p-971 0x0.80000000000008p-971
103 0 0x0.8p-970 0x0.80000000000008p-970
104 0 0x0.8p-969 0x0.80000000000008p-969
105 0 0x0.8p-968 0x0.80000000000008p-968
106 0 0x0.8p-967 0x0.80000000000008p-967
107 0 0x0.8p-966 0x0.80000000000008p-966

One thing that is clear is that real_isdenormal is never true for these
(but then a question is if flush_denormals_to_zero actually works).

	Jakub
Jakub Jelinek Nov. 4, 2022, 7:53 p.m. UTC | #12
On Fri, Nov 04, 2022 at 08:14:23PM +0100, Jakub Jelinek via Gcc-patches wrote:
> One thing that is clear is that real_isdenormal is never true for these
> (but then a question is if flush_denormals_to_zero actually works).

So, after some more investigation, I think that is actually the case,
real_isdenormal is only meaningful (will ever return true) right after
round_for_format.
The uses inside of real.cc are fine, real_to_target first calls
round_for_format and then calls fmt->encode in which the real_isdenormal
calls are done.  And, round_for_format is what transforms the
normalized way of expressing the number into the denormal form:
  /* Check the range of the exponent.  If we're out of range,
     either underflow or overflow.  */
  if (REAL_EXP (r) > emax2)
    goto overflow;
  else if (REAL_EXP (r) <= emin2m1)
    {
      int diff;

      if (!fmt->has_denorm)
        {
          /* Don't underflow completely until we've had a chance to round.  */
          if (REAL_EXP (r) < emin2m1)
            goto underflow;
        }
      else
        {
          diff = emin2m1 - REAL_EXP (r) + 1;
          if (diff > p2)
            goto underflow;

          /* De-normalize the significand.  */
          r->sig[0] |= sticky_rshift_significand (r, r, diff);
          SET_REAL_EXP (r, REAL_EXP (r) + diff);
        }
    }
But, real_to_target is one of the 4 callers of static round_for_format,
another one is real_convert, but that one undoes that immediately:
  round_for_format (fmt, r);
        
  /* Make resulting NaN value to be qNaN. The caller has the
     responsibility to avoid the operation if flag_signaling_nans
     is on.  */
  if (r->cl == rvc_nan)
    r->signalling = 0;
      
  /* round_for_format de-normalizes denormals.  Undo just that part.  */
  if (r->cl == rvc_normal)
    normalize (r);
and the last two are inside of encoding the IBM double double composite.
So, I think everywhere in the frange you'll see normalized REAL_VALUE_TYPE
and so real_isdenormal will always be false there.
You need to check for REAL_EXP (r) < fmt->emin or so (and ideally only on
REAL_VALUE_TYPE already real_converted to the right mode (your
frange_arithmetics does that already, which is good, but say conversions
and other unary ops might need it too).
If I in the hack from last mail replace
-	  fprintf (stderr, "%d %d %s %s\n", i, real_isdenormal (&result), buf, buf2);
+	  fprintf (stderr, "%d %d %s %s\n", i, REAL_EXP (&result) < REAL_MODE_FORMAT (mode)->emin, buf, buf2);
then it is 1 until:
102 1 0x0.8p-971 0x0.80000000000008p-971
103 1 0x0.8p-970 0x0.80000000000008p-970
104 1 0x0.8p-969 0x0.80000000000008p-969
105 0 0x0.8p-968 0x0.80000000000008p-968
106 0 0x0.8p-967 0x0.80000000000008p-967
107 0 0x0.8p-966 0x0.80000000000008p-966

Now, the __LDBL_MIN__ powerpc64 gcc announces is
2.00416836000897277799610805135016205e-292L
which is equivalent to:
0x1p-969
and that is equivalent to
0x0.8p-968, so I think that is exactly what we want.

	Jakub
Aldy Hernandez Nov. 7, 2022, 12:35 p.m. UTC | #13
On Fri, Nov 4, 2022 at 8:53 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Fri, Nov 04, 2022 at 08:14:23PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > One thing that is clear is that real_isdenormal is never true for these
> > (but then a question is if flush_denormals_to_zero actually works).
>
> So, after some more investigation, I think that is actually the case,
> real_isdenormal is only meaningful (will ever return true) right after
> round_for_format.
> The uses inside of real.cc are fine, real_to_target first calls
> round_for_format and then calls fmt->encode in which the real_isdenormal
> calls are done.  And, round_for_format is what transforms the
> normalized way of expressing the number into the denormal form:
>   /* Check the range of the exponent.  If we're out of range,
>      either underflow or overflow.  */
>   if (REAL_EXP (r) > emax2)
>     goto overflow;
>   else if (REAL_EXP (r) <= emin2m1)
>     {
>       int diff;
>
>       if (!fmt->has_denorm)
>         {
>           /* Don't underflow completely until we've had a chance to round.  */
>           if (REAL_EXP (r) < emin2m1)
>             goto underflow;
>         }
>       else
>         {
>           diff = emin2m1 - REAL_EXP (r) + 1;
>           if (diff > p2)
>             goto underflow;
>
>           /* De-normalize the significand.  */
>           r->sig[0] |= sticky_rshift_significand (r, r, diff);
>           SET_REAL_EXP (r, REAL_EXP (r) + diff);
>         }
>     }
> But, real_to_target is one of the 4 callers of static round_for_format,
> another one is real_convert, but that one undoes that immediately:
>   round_for_format (fmt, r);
>
>   /* Make resulting NaN value to be qNaN. The caller has the
>      responsibility to avoid the operation if flag_signaling_nans
>      is on.  */
>   if (r->cl == rvc_nan)
>     r->signalling = 0;
>
>   /* round_for_format de-normalizes denormals.  Undo just that part.  */
>   if (r->cl == rvc_normal)
>     normalize (r);
> and the last two are inside of encoding the IBM double double composite.
> So, I think everywhere in the frange you'll see normalized REAL_VALUE_TYPE
> and so real_isdenormal will always be false there.
> You need to check for REAL_EXP (r) < fmt->emin or so (and ideally only on
> REAL_VALUE_TYPE already real_converted to the right mode (your
> frange_arithmetics does that already, which is good, but say conversions
> and other unary ops might need it too).

Let me see if I understand you correctly...

real_isdenormal is always returning false for our uses in frange?  So
instead of using real_isdenormal in flush_denormals_to_zero, perhaps
we should be using:

REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin

??

Could we perhaps make real_isdenormal always work for all values?  Is
that possible?

Aldy
Jakub Jelinek Nov. 7, 2022, 12:43 p.m. UTC | #14
On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote:
> Let me see if I understand you correctly...
> 
> real_isdenormal is always returning false for our uses in frange?  So
> instead of using real_isdenormal in flush_denormals_to_zero, perhaps
> we should be using:
> 
> REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin
> 
> ??
> 
> Could we perhaps make real_isdenormal always work for all values?  Is
> that possible?

Yes.  Or have real_isdenormal_target for the uses in real.cc
which would be private to real.cc and would do what real_isdenormal
currently does, and then real_isdenormal with the above definition
(well, r->cl == rvc_normal && ...).

	Jakub
Aldy Hernandez Nov. 7, 2022, 12:48 p.m. UTC | #15
On Mon, Nov 7, 2022 at 1:43 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote:
> > Let me see if I understand you correctly...
> >
> > real_isdenormal is always returning false for our uses in frange?  So
> > instead of using real_isdenormal in flush_denormals_to_zero, perhaps
> > we should be using:
> >
> > REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin
> >
> > ??
> >
> > Could we perhaps make real_isdenormal always work for all values?  Is
> > that possible?
>
> Yes.  Or have real_isdenormal_target for the uses in real.cc
> which would be private to real.cc and would do what real_isdenormal
> currently does, and then real_isdenormal with the above definition
> (well, r->cl == rvc_normal && ...).

If the REAL_EXP(r)... definition works for everyone, can't we just use
that?  Or do you think there'd be a measurable performance impact?

Aldy
Jakub Jelinek Nov. 7, 2022, 12:56 p.m. UTC | #16
On Mon, Nov 07, 2022 at 01:48:28PM +0100, Aldy Hernandez wrote:
> On Mon, Nov 7, 2022 at 1:43 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote:
> > > Let me see if I understand you correctly...
> > >
> > > real_isdenormal is always returning false for our uses in frange?  So
> > > instead of using real_isdenormal in flush_denormals_to_zero, perhaps
> > > we should be using:
> > >
> > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin
> > >
> > > ??
> > >
> > > Could we perhaps make real_isdenormal always work for all values?  Is
> > > that possible?
> >
> > Yes.  Or have real_isdenormal_target for the uses in real.cc
> > which would be private to real.cc and would do what real_isdenormal
> > currently does, and then real_isdenormal with the above definition
> > (well, r->cl == rvc_normal && ...).
> 
> If the REAL_EXP(r)... definition works for everyone, can't we just use
> that?  Or do you think there'd be a measurable performance impact?

It doesn't work alone.

Either denormals are in normalized form, then
r->cl == rvc_normal && REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin
is true, or they are in denormal form, then
r->cl == rvc_normal && (r->sig[SIGSZ-1] & SIG_MSB) == 0
is true.

You could use
r->cl == rvc_normal
&& (REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin
    || (r->sig[SIGSZ-1] & SIG_MSB) == 0)
but that would be waste of compile time both in real.cc and outside of
real.cc.

	Jakub
Aldy Hernandez Nov. 7, 2022, 3:38 p.m. UTC | #17
Fair enough.

How's this?

Tested on x86-64 Linux.  LAPACK regression testing as well.

On Mon, Nov 7, 2022 at 1:56 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Nov 07, 2022 at 01:48:28PM +0100, Aldy Hernandez wrote:
> > On Mon, Nov 7, 2022 at 1:43 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote:
> > > > Let me see if I understand you correctly...
> > > >
> > > > real_isdenormal is always returning false for our uses in frange?  So
> > > > instead of using real_isdenormal in flush_denormals_to_zero, perhaps
> > > > we should be using:
> > > >
> > > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin
> > > >
> > > > ??
> > > >
> > > > Could we perhaps make real_isdenormal always work for all values?  Is
> > > > that possible?
> > >
> > > Yes.  Or have real_isdenormal_target for the uses in real.cc
> > > which would be private to real.cc and would do what real_isdenormal
> > > currently does, and then real_isdenormal with the above definition
> > > (well, r->cl == rvc_normal && ...).
> >
> > If the REAL_EXP(r)... definition works for everyone, can't we just use
> > that?  Or do you think there'd be a measurable performance impact?
>
> It doesn't work alone.
>
> Either denormals are in normalized form, then
> r->cl == rvc_normal && REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin
> is true, or they are in denormal form, then
> r->cl == rvc_normal && (r->sig[SIGSZ-1] & SIG_MSB) == 0
> is true.
>
> You could use
> r->cl == rvc_normal
> && (REAL_EXP (r) < REAL_MODE_FORMAT (mode)->emin
>     || (r->sig[SIGSZ-1] & SIG_MSB) == 0)
> but that would be waste of compile time both in real.cc and outside of
> real.cc.
>
>         Jakub
>
Aldy Hernandez Nov. 7, 2022, 3:41 p.m. UTC | #18
On Fri, Nov 4, 2022 at 8:14 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote:
> > +// Set VALUE to its next real value, or INF if the operation overflows.
> > +
> > +inline void
> > +frange_nextafter (enum machine_mode mode,
> > +               REAL_VALUE_TYPE &value,
> > +               const REAL_VALUE_TYPE &inf)
> > +{
> > +  const real_format *fmt = REAL_MODE_FORMAT (mode);
> > +  REAL_VALUE_TYPE tmp;
> > +  bool overflow = real_nextafter (&tmp, fmt, &value, &inf);
> > +  if (overflow)
> > +    value = inf;
> > +  else
> > +    value = tmp;
>
> Please change the above 5 lines to just
>   real_nextafter (&tmp, fmt, &value, &inf);
>   value = tmp;

Done.

As suggested upthread, I have also adjusted update_nan_sign() to drop
the NAN sign to VARYING if both operands are NAN.  As an optimization
I keep the sign if both operands are NAN and have the same sign.

How's this?

Tested on x86-64 Linux.  LAPACK testing as well.

Aldy
Jakub Jelinek Nov. 8, 2022, 11:07 a.m. UTC | #19
On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote:
> From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez <aldyh@redhat.com>
> Date: Mon, 7 Nov 2022 14:18:57 +0100
> Subject: [PATCH] Provide normalized and denormal format version of
>  real_isdenormal.
> 
> Implement real_isdenormal_target() to be used within real.cc where the
> argument is known to be in denormal format.  Rewrite real_isdenormal()
> for use outside of real.cc where the argument is known to be
> normalized.
> 
> gcc/ChangeLog:
> 
> 	* real.cc (real_isdenormal_target): New.
> 	(encode_ieee_single): Use real_isdenormal_target.
> 	(encode_ieee_double): Same.
> 	(encode_ieee_extended): Same.
> 	(encode_ieee_quad): Same.
> 	(encode_ieee_half): Same.
> 	(encode_arm_bfloat_half): Same.
> 	* value-range.cc (frange::flush_denormals_to_zero): Same.
> 	* real.h (real_isdenormal): Rewrite to look at mode.

I'd make real_isdenormal_target static inline bool
rather than inline bool, it is only defined in real.cc, so there is
no point exporting it.
Though, as you've added the mode argument, the real.cc inline
could very well also be called real_isdenormal too, it wouldn't be
a redeclaration or ODR violation. 

	Jakub
Jakub Jelinek Nov. 8, 2022, 11:20 a.m. UTC | #20
On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> As suggested upthread, I have also adjusted update_nan_sign() to drop
> the NAN sign to VARYING if both operands are NAN.  As an optimization
> I keep the sign if both operands are NAN and have the same sign.

For NaNs this still relies on something IEEE754 doesn't guarantee,
as I cited, after a binary operation the sign bit of the NaN is
unspecified, whether there is one NaN operand or two.
It might be that all CPUs handle it the way you've implemented
(that for one NaN operand the sign of NaN result will be the same
as that NaN operand and for two it will be the sign of one of the two
NaNs operands, never something else), but I think we'd need to check
more than one implementation for that (I've only tried x86_64 and thus
SSE behavior in it), so one would need to test i387 long double behavior
too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
The guarantee given by IEEE754 is only for those copy, negate, abs, copySign
operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
__builtin_copysign*.

Otherwise LGTM (but would be nice to get into GCC13 not just
+, but also -, *, /, sqrt at least).

	Jakub
Aldy Hernandez Nov. 8, 2022, 12:47 p.m. UTC | #21
On Tue, Nov 8, 2022 at 12:07 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote:
> > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001
> > From: Aldy Hernandez <aldyh@redhat.com>
> > Date: Mon, 7 Nov 2022 14:18:57 +0100
> > Subject: [PATCH] Provide normalized and denormal format version of
> >  real_isdenormal.
> >
> > Implement real_isdenormal_target() to be used within real.cc where the
> > argument is known to be in denormal format.  Rewrite real_isdenormal()
> > for use outside of real.cc where the argument is known to be
> > normalized.
> >
> > gcc/ChangeLog:
> >
> >       * real.cc (real_isdenormal_target): New.
> >       (encode_ieee_single): Use real_isdenormal_target.
> >       (encode_ieee_double): Same.
> >       (encode_ieee_extended): Same.
> >       (encode_ieee_quad): Same.
> >       (encode_ieee_half): Same.
> >       (encode_arm_bfloat_half): Same.
> >       * value-range.cc (frange::flush_denormals_to_zero): Same.
> >       * real.h (real_isdenormal): Rewrite to look at mode.
>
> I'd make real_isdenormal_target static inline bool
> rather than inline bool, it is only defined in real.cc, so there is
> no point exporting it.

Huh.  I thought inline alone would inhibit the exporting.  Thanks.

> Though, as you've added the mode argument, the real.cc inline
> could very well also be called real_isdenormal too, it wouldn't be
> a redeclaration or ODR violation.

Great, even better.

OK pending tests?
Aldy
Aldy Hernandez Nov. 8, 2022, 1:06 p.m. UTC | #22
On Tue, Nov 8, 2022 at 12:20 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> > As suggested upthread, I have also adjusted update_nan_sign() to drop
> > the NAN sign to VARYING if both operands are NAN.  As an optimization
> > I keep the sign if both operands are NAN and have the same sign.
>
> For NaNs this still relies on something IEEE754 doesn't guarantee,
> as I cited, after a binary operation the sign bit of the NaN is
> unspecified, whether there is one NaN operand or two.
> It might be that all CPUs handle it the way you've implemented
> (that for one NaN operand the sign of NaN result will be the same
> as that NaN operand and for two it will be the sign of one of the two
> NaNs operands, never something else), but I think we'd need to check
> more than one implementation for that (I've only tried x86_64 and thus
> SSE behavior in it), so one would need to test i387 long double behavior
> too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
> The guarantee given by IEEE754 is only for those copy, negate, abs, copySign
> operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
> __builtin_copysign*.

Ughh, that's unfortunate.  OK, I've added a big note.

>
> Otherwise LGTM (but would be nice to get into GCC13 not just
> +, but also -, *, /, sqrt at least).

Minus is trivial as we can implement it with a negate and plus.  I
have a patch queued up for that.  The rest require a bit more thought,
though perhaps with what we have so far can serve as a base.  I'll
look into it.

Attached is the patch I'm retesting.

Thanks for your patience, and copious help here.
Aldy
Jakub Jelinek Nov. 8, 2022, 1:15 p.m. UTC | #23
On Tue, Nov 08, 2022 at 01:47:58PM +0100, Aldy Hernandez wrote:
> On Tue, Nov 8, 2022 at 12:07 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote:
> > > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001
> > > From: Aldy Hernandez <aldyh@redhat.com>
> > > Date: Mon, 7 Nov 2022 14:18:57 +0100
> > > Subject: [PATCH] Provide normalized and denormal format version of
> > >  real_isdenormal.
> > >
> > > Implement real_isdenormal_target() to be used within real.cc where the
> > > argument is known to be in denormal format.  Rewrite real_isdenormal()
> > > for use outside of real.cc where the argument is known to be
> > > normalized.
> > >
> > > gcc/ChangeLog:
> > >
> > >       * real.cc (real_isdenormal_target): New.
> > >       (encode_ieee_single): Use real_isdenormal_target.
> > >       (encode_ieee_double): Same.
> > >       (encode_ieee_extended): Same.
> > >       (encode_ieee_quad): Same.
> > >       (encode_ieee_half): Same.
> > >       (encode_arm_bfloat_half): Same.
> > >       * value-range.cc (frange::flush_denormals_to_zero): Same.
> > >       * real.h (real_isdenormal): Rewrite to look at mode.
> >
> > I'd make real_isdenormal_target static inline bool
> > rather than inline bool, it is only defined in real.cc, so there is
> > no point exporting it.
> 
> Huh.  I thought inline alone would inhibit the exporting.  Thanks.

That is what happens with C99 inline (unless there is extern for the decl),
but C++ inline is different.  It isn't guaranteed to be exported, but
with -fkeep-inline-functions or if you say take address of the inline
in a way that can't be optimized back into a call to the inline (or even
just call it with -O0), it is exported.
> 
> > Though, as you've added the mode argument, the real.cc inline
> > could very well also be called real_isdenormal too, it wouldn't be
> > a redeclaration or ODR violation.
> 
> Great, even better.
> 
> OK pending tests?
> Aldy

> From c3ca1d606bfb22bf4f8bc7ac0ce561bd6afc3368 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez <aldyh@redhat.com>
> Date: Mon, 7 Nov 2022 14:18:57 +0100
> Subject: [PATCH] Provide normalized and denormal format version of
>  real_isdenormal.
> 
> Implement a variant of real_isdenormal() to be used within real.cc
> where the argument is known to be in denormal format.  Rewrite
> real_isdenormal() for use outside of real.cc where the argument is
> known to be normalized.
> 
> gcc/ChangeLog:
> 
> 	* real.cc (real_isdenormal): New.
> 	* real.h (real_isdenormal): Add mode argument.  Rewrite for
> 	normalized values.
> 	* value-range.cc (frange::flush_denormals_to_zero): Pass mode to
> 	real_isdenormal.
> ---
>  gcc/real.cc        | 10 ++++++++++
>  gcc/real.h         |  7 ++++---
>  gcc/value-range.cc |  5 +++--
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/real.cc b/gcc/real.cc
> index aae7c335d59..028aad95ec4 100644
> --- a/gcc/real.cc
> +++ b/gcc/real.cc
> @@ -111,6 +111,16 @@ static const REAL_VALUE_TYPE * real_digit (int);
>  static void times_pten (REAL_VALUE_TYPE *, int);
>  
>  static void round_for_format (const struct real_format *, REAL_VALUE_TYPE *);
> +
> +/* Determine whether a floating-point value X is a denormal.  R is
> +   expected to be in denormal form, so this function is only
> +   meaningful after a call to round_for_format.  */
> +
> +static inline bool
> +real_isdenormal (const REAL_VALUE_TYPE *r)
> +{
> +  return (r->sig[SIGSZ-1] & SIG_MSB) == 0;

I would probably keep the r->cl == rvc_normal in here too.
I know the code in real.cc didn't do it before, but what
r->sig is for the rvc_zero/rvc_inf is unclear.  It is true
that get_zero/get_canonical_?nan/get_inf clear the whole sig,
but not really sure if we guarantee that everywhere.
The real.cc uses were like:
  bool denormal = ...;
at the start of the function and then
  switch (...)
    {
...
    case rvc_normal:
      if (denormal)
	...
    }
so another even better possibility would be to use your simple
real.cc (real_isdenormal) and drop all the denormal variables, so:
- bool denormal = ...;
  switch (...)
    {
...
    case rvc_normal:
-     if (denormal)
+     if (real_isdenormal (r))
	...
    }

Otherwise LGTM.

	Jakub
Jakub Jelinek Nov. 8, 2022, 1:24 p.m. UTC | #24
On Tue, Nov 08, 2022 at 02:06:58PM +0100, Aldy Hernandez wrote:
> +  gcc_checking_assert (!r.nan_signbit_p (sign1));
> +  if (op1_nan && op2_nan)
> +    {
> +      // If boths signs agree, we could use that sign, but IEEE754
> +      // does not guarantee this for a binary operator.  The x86_64
> +      // architure does keep the common known sign, but further tests
> +      // are needed to see if other architectures do the same (i387
> +      // long double, ARM/aarch64, PowerPC, s390,{,x}, RSICV, etc).

s/RSICV/RISCV/

> +      // In the meantime, keep sign VARYING.
> +      ;
> +    }
> +  else if (op1_nan)
> +    {
> +      if (op1.nan_signbit_p (sign1))
> +	r.update_nan (sign1);
> +    }
> +  else if (op2_nan)
> +    {
> +      if (op2.nan_signbit_p (sign2))
> +	r.update_nan (sign2);
> +    }

Well, these cases also aren't guaranteed for binary operator.
I think a conforming implementation can say copy the NaN operand
to result and toggle the sign.  Or, if the operand would be a sNaN,
it must turn it into a qNaN (don't remember right now if there are
requirements on what it can do with the mantissa which needs to change
for the sNaN -> qNaN difference at least, but whether it can just
generate a canonical qNaN or needs to preserve at least some bits),
but could e.g. clear or toggle the sign of the NaN as well.
Whether there are any such implementations or not is a question.
For the single qNaN operand case, it would surprise me if anybody
bothered to tweak the sign bit in any way, just copying the input
seems simpler to me, but for the sNaN -> qNaN conversion it wouldn't
surprise me that much.

	Jakub
Aldy Hernandez Nov. 8, 2022, 1:47 p.m. UTC | #25
On Tue, Nov 8, 2022 at 2:25 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 08, 2022 at 02:06:58PM +0100, Aldy Hernandez wrote:
> > +  gcc_checking_assert (!r.nan_signbit_p (sign1));
> > +  if (op1_nan && op2_nan)
> > +    {
> > +      // If boths signs agree, we could use that sign, but IEEE754
> > +      // does not guarantee this for a binary operator.  The x86_64
> > +      // architure does keep the common known sign, but further tests
> > +      // are needed to see if other architectures do the same (i387
> > +      // long double, ARM/aarch64, PowerPC, s390,{,x}, RSICV, etc).
>
> s/RSICV/RISCV/
>
> > +      // In the meantime, keep sign VARYING.
> > +      ;
> > +    }
> > +  else if (op1_nan)
> > +    {
> > +      if (op1.nan_signbit_p (sign1))
> > +     r.update_nan (sign1);
> > +    }
> > +  else if (op2_nan)
> > +    {
> > +      if (op2.nan_signbit_p (sign2))
> > +     r.update_nan (sign2);
> > +    }
>
> Well, these cases also aren't guaranteed for binary operator.
> I think a conforming implementation can say copy the NaN operand
> to result and toggle the sign.  Or, if the operand would be a sNaN,
> it must turn it into a qNaN (don't remember right now if there are
> requirements on what it can do with the mantissa which needs to change
> for the sNaN -> qNaN difference at least, but whether it can just
> generate a canonical qNaN or needs to preserve at least some bits),
> but could e.g. clear or toggle the sign of the NaN as well.
> Whether there are any such implementations or not is a question.
> For the single qNaN operand case, it would surprise me if anybody
> bothered to tweak the sign bit in any way, just copying the input
> seems simpler to me, but for the sNaN -> qNaN conversion it wouldn't
> surprise me that much.

Well, perhaps we should just nuke update_nan_sign() altogether, and
always keep the sign varying?

inline bool
propagate_nans (frange &r, const frange &op1, const frange &op2)
{
  if (op1.known_isnan () || op2.known_isnan ())
    {
      r.set_nan (op1.type ());
      return true;
    }
  return false;
}

I'm fine either way.  The less code the better :).

Aldy
Jakub Jelinek Nov. 8, 2022, 1:50 p.m. UTC | #26
On Tue, Nov 08, 2022 at 02:47:35PM +0100, Aldy Hernandez wrote:
> Well, perhaps we should just nuke update_nan_sign() altogether, and
> always keep the sign varying?
> 
> inline bool
> propagate_nans (frange &r, const frange &op1, const frange &op2)
> {
>   if (op1.known_isnan () || op2.known_isnan ())
>     {
>       r.set_nan (op1.type ());
>       return true;
>     }
>   return false;
> }
> 
> I'm fine either way.  The less code the better :).

Yes, but you had 2 callers, so something needs to be done also if
in foperator_plus::fold_range.

	Jakub
Aldy Hernandez Nov. 8, 2022, 2:02 p.m. UTC | #27
On Tue, Nov 8, 2022 at 2:15 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 08, 2022 at 01:47:58PM +0100, Aldy Hernandez wrote:
> > On Tue, Nov 8, 2022 at 12:07 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote:
> > > > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001
> > > > From: Aldy Hernandez <aldyh@redhat.com>
> > > > Date: Mon, 7 Nov 2022 14:18:57 +0100
> > > > Subject: [PATCH] Provide normalized and denormal format version of
> > > >  real_isdenormal.
> > > >
> > > > Implement real_isdenormal_target() to be used within real.cc where the
> > > > argument is known to be in denormal format.  Rewrite real_isdenormal()
> > > > for use outside of real.cc where the argument is known to be
> > > > normalized.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >       * real.cc (real_isdenormal_target): New.
> > > >       (encode_ieee_single): Use real_isdenormal_target.
> > > >       (encode_ieee_double): Same.
> > > >       (encode_ieee_extended): Same.
> > > >       (encode_ieee_quad): Same.
> > > >       (encode_ieee_half): Same.
> > > >       (encode_arm_bfloat_half): Same.
> > > >       * value-range.cc (frange::flush_denormals_to_zero): Same.
> > > >       * real.h (real_isdenormal): Rewrite to look at mode.
> > >
> > > I'd make real_isdenormal_target static inline bool
> > > rather than inline bool, it is only defined in real.cc, so there is
> > > no point exporting it.
> >
> > Huh.  I thought inline alone would inhibit the exporting.  Thanks.
>
> That is what happens with C99 inline (unless there is extern for the decl),
> but C++ inline is different.  It isn't guaranteed to be exported, but
> with -fkeep-inline-functions or if you say take address of the inline
> in a way that can't be optimized back into a call to the inline (or even
> just call it with -O0), it is exported.
> >
> > > Though, as you've added the mode argument, the real.cc inline
> > > could very well also be called real_isdenormal too, it wouldn't be
> > > a redeclaration or ODR violation.
> >
> > Great, even better.
> >
> > OK pending tests?
> > Aldy
>
> > From c3ca1d606bfb22bf4f8bc7ac0ce561bd6afc3368 Mon Sep 17 00:00:00 2001
> > From: Aldy Hernandez <aldyh@redhat.com>
> > Date: Mon, 7 Nov 2022 14:18:57 +0100
> > Subject: [PATCH] Provide normalized and denormal format version of
> >  real_isdenormal.
> >
> > Implement a variant of real_isdenormal() to be used within real.cc
> > where the argument is known to be in denormal format.  Rewrite
> > real_isdenormal() for use outside of real.cc where the argument is
> > known to be normalized.
> >
> > gcc/ChangeLog:
> >
> >       * real.cc (real_isdenormal): New.
> >       * real.h (real_isdenormal): Add mode argument.  Rewrite for
> >       normalized values.
> >       * value-range.cc (frange::flush_denormals_to_zero): Pass mode to
> >       real_isdenormal.
> > ---
> >  gcc/real.cc        | 10 ++++++++++
> >  gcc/real.h         |  7 ++++---
> >  gcc/value-range.cc |  5 +++--
> >  3 files changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/real.cc b/gcc/real.cc
> > index aae7c335d59..028aad95ec4 100644
> > --- a/gcc/real.cc
> > +++ b/gcc/real.cc
> > @@ -111,6 +111,16 @@ static const REAL_VALUE_TYPE * real_digit (int);
> >  static void times_pten (REAL_VALUE_TYPE *, int);
> >
> >  static void round_for_format (const struct real_format *, REAL_VALUE_TYPE *);
> > +
> > +/* Determine whether a floating-point value X is a denormal.  R is
> > +   expected to be in denormal form, so this function is only
> > +   meaningful after a call to round_for_format.  */
> > +
> > +static inline bool
> > +real_isdenormal (const REAL_VALUE_TYPE *r)
> > +{
> > +  return (r->sig[SIGSZ-1] & SIG_MSB) == 0;
>
> I would probably keep the r->cl == rvc_normal in here too.
> I know the code in real.cc didn't do it before, but what
> r->sig is for the rvc_zero/rvc_inf is unclear.  It is true
> that get_zero/get_canonical_?nan/get_inf clear the whole sig,
> but not really sure if we guarantee that everywhere.
> The real.cc uses were like:
>   bool denormal = ...;
> at the start of the function and then
>   switch (...)
>     {
> ...
>     case rvc_normal:
>       if (denormal)
>         ...
>     }
> so another even better possibility would be to use your simple
> real.cc (real_isdenormal) and drop all the denormal variables, so:
> - bool denormal = ...;
>   switch (...)
>     {
> ...
>     case rvc_normal:
> -     if (denormal)
> +     if (real_isdenormal (r))
>         ...
>     }

Sure.

Attached patch in testing.

Aldy
Jakub Jelinek Nov. 8, 2022, 2:03 p.m. UTC | #28
On Tue, Nov 08, 2022 at 03:02:40PM +0100, Aldy Hernandez wrote:
> From d02ce8eaf16d2fc6db6472268fd962e09c2fd81e Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez <aldyh@redhat.com>
> Date: Mon, 7 Nov 2022 14:18:57 +0100
> Subject: [PATCH] Provide normalized and denormal format version of
>  real_isdenormal.
> 
> Implement a variant of real_isdenormal() to be used within real.cc
> where the argument is known to be in denormal format.  Rewrite
> real_isdenormal() for use outside of real.cc where the argument is
> known to be normalized.
> 
> gcc/ChangeLog:
> 
> 	* real.cc (real_isdenormal): New.
> 	(encode_ieee_single): Call real_isdenormal.
> 	(encode_ieee_double): Same.
> 	(encode_ieee_extended): Same.
> 	(encode_ieee_quad): Same.
> 	(encode_ieee_half): Same.
> 	(encode_arm_bfloat_half): Same.
> 	* real.h (real_isdenormal): Add mode argument.  Rewrite for
> 	normalized values.
> 	* value-range.cc (frange::flush_denormals_to_zero): Pass mode to
> 	real_isdenormal.

LGTM, thanks.

	Jakub
Aldy Hernandez Nov. 8, 2022, 2:06 p.m. UTC | #29
On Tue, Nov 8, 2022 at 2:50 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 08, 2022 at 02:47:35PM +0100, Aldy Hernandez wrote:
> > Well, perhaps we should just nuke update_nan_sign() altogether, and
> > always keep the sign varying?
> >
> > inline bool
> > propagate_nans (frange &r, const frange &op1, const frange &op2)
> > {
> >   if (op1.known_isnan () || op2.known_isnan ())
> >     {
> >       r.set_nan (op1.type ());
> >       return true;
> >     }
> >   return false;
> > }
> >
> > I'm fine either way.  The less code the better :).
>
> Yes, but you had 2 callers, so something needs to be done also if
> in foperator_plus::fold_range.

We can also remove the update_nan_sign() in the other call because the
r.set() before it sets a default NAN (with a varying sign).

Attached patch in testing.

Aldy
Jakub Jelinek Nov. 8, 2022, 2:11 p.m. UTC | #30
On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote:
> +// If either operand is a NAN, set R to the combination of both NANs
> +// signwise and return TRUE.

This comment doesn't describe what it does now.
If either operand is a NAN, set R to NAN with unspecified sign bit and return
TRUE.
?

Other than this LGTM.

	Jakub
Aldy Hernandez Nov. 8, 2022, 2:14 p.m. UTC | #31
On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote:
> > +// If either operand is a NAN, set R to the combination of both NANs
> > +// signwise and return TRUE.
>
> This comment doesn't describe what it does now.
> If either operand is a NAN, set R to NAN with unspecified sign bit and return
> TRUE.
> ?

OMG, I suck!

// If either operand is a NAN, set R to NAN and return TRUE.

Tests on-going :).

Aldy
Andrew Waterman Nov. 8, 2022, 5:44 p.m. UTC | #32
On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> > As suggested upthread, I have also adjusted update_nan_sign() to drop
> > the NAN sign to VARYING if both operands are NAN.  As an optimization
> > I keep the sign if both operands are NAN and have the same sign.
>
> For NaNs this still relies on something IEEE754 doesn't guarantee,
> as I cited, after a binary operation the sign bit of the NaN is
> unspecified, whether there is one NaN operand or two.
> It might be that all CPUs handle it the way you've implemented
> (that for one NaN operand the sign of NaN result will be the same
> as that NaN operand and for two it will be the sign of one of the two
> NaNs operands, never something else), but I think we'd need to check
> more than one implementation for that (I've only tried x86_64 and thus
> SSE behavior in it), so one would need to test i387 long double behavior
> too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
> The guarantee given by IEEE754 is only for those copy, negate, abs, copySign
> operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
> __builtin_copysign*.

FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of
the input NaNs do not factor in.

>
> Otherwise LGTM (but would be nice to get into GCC13 not just
> +, but also -, *, /, sqrt at least).
>
>         Jakub
>
Jakub Jelinek Nov. 8, 2022, 6:11 p.m. UTC | #33
On Tue, Nov 08, 2022 at 09:44:40AM -0800, Andrew Waterman wrote:
> On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> > > As suggested upthread, I have also adjusted update_nan_sign() to drop
> > > the NAN sign to VARYING if both operands are NAN.  As an optimization
> > > I keep the sign if both operands are NAN and have the same sign.
> >
> > For NaNs this still relies on something IEEE754 doesn't guarantee,
> > as I cited, after a binary operation the sign bit of the NaN is
> > unspecified, whether there is one NaN operand or two.
> > It might be that all CPUs handle it the way you've implemented
> > (that for one NaN operand the sign of NaN result will be the same
> > as that NaN operand and for two it will be the sign of one of the two
> > NaNs operands, never something else), but I think we'd need to check
> > more than one implementation for that (I've only tried x86_64 and thus
> > SSE behavior in it), so one would need to test i387 long double behavior
> > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
> > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign
> > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
> > __builtin_copysign*.
> 
> FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of
> the input NaNs do not factor in.

Just for binary operations and some unary, or also the ones that
IEEE754 spells out (moves, negations, absolute value and copysign)?

	Jakub
Andrew Waterman Nov. 8, 2022, 6:17 p.m. UTC | #34
On Tue, Nov 8, 2022 at 10:11 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 08, 2022 at 09:44:40AM -0800, Andrew Waterman wrote:
> > On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> > > > As suggested upthread, I have also adjusted update_nan_sign() to drop
> > > > the NAN sign to VARYING if both operands are NAN.  As an optimization
> > > > I keep the sign if both operands are NAN and have the same sign.
> > >
> > > For NaNs this still relies on something IEEE754 doesn't guarantee,
> > > as I cited, after a binary operation the sign bit of the NaN is
> > > unspecified, whether there is one NaN operand or two.
> > > It might be that all CPUs handle it the way you've implemented
> > > (that for one NaN operand the sign of NaN result will be the same
> > > as that NaN operand and for two it will be the sign of one of the two
> > > NaNs operands, never something else), but I think we'd need to check
> > > more than one implementation for that (I've only tried x86_64 and thus
> > > SSE behavior in it), so one would need to test i387 long double behavior
> > > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
> > > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign
> > > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
> > > __builtin_copysign*.
> >
> > FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of
> > the input NaNs do not factor in.
>
> Just for binary operations and some unary, or also the ones that
> IEEE754 spells out (moves, negations, absolute value and copysign)?

I should've been more specific in my earlier email: I was referring to
the arithmetic operators.  Copysign and friends do not canonicalize
NaNs.

>
>         Jakub
>
Aldy Hernandez Nov. 8, 2022, 11:05 p.m. UTC | #35
Sigh, one more thing.

There are further possibilities for a NAN result, even if the operands
are !NAN and the result from frange_arithmetic is free of NANs.
Adding different signed infinities.

For example, [-INF,+INF] + [-INF,+INF] has the possibility of adding
-INF and +INF, which is a NAN.  Since we end up calling frange
arithmetic on the lower bounds and then on the upper bounds, we miss
this, and mistakenly think we're free of NANs.

I have a patch in testing, but FYI, in case anyone notices this before
I get around to it tomorrow.

Aldy

On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote:
> > +// If either operand is a NAN, set R to the combination of both NANs
> > +// signwise and return TRUE.
>
> This comment doesn't describe what it does now.
> If either operand is a NAN, set R to NAN with unspecified sign bit and return
> TRUE.
> ?
>
> Other than this LGTM.
>
>         Jakub
>
Aldy Hernandez Nov. 9, 2022, 6:59 a.m. UTC | #36
This patch fixes the oversight.

Tested on x86-64 Linux.

Pushed.

On Wed, Nov 9, 2022 at 12:05 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> Sigh, one more thing.
>
> There are further possibilities for a NAN result, even if the operands
> are !NAN and the result from frange_arithmetic is free of NANs.
> Adding different signed infinities.
>
> For example, [-INF,+INF] + [-INF,+INF] has the possibility of adding
> -INF and +INF, which is a NAN.  Since we end up calling frange
> arithmetic on the lower bounds and then on the upper bounds, we miss
> this, and mistakenly think we're free of NANs.
>
> I have a patch in testing, but FYI, in case anyone notices this before
> I get around to it tomorrow.
>
> Aldy
>
> On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote:
> > > +// If either operand is a NAN, set R to the combination of both NANs
> > > +// signwise and return TRUE.
> >
> > This comment doesn't describe what it does now.
> > If either operand is a NAN, set R to NAN with unspecified sign bit and return
> > TRUE.
> > ?
> >
> > Other than this LGTM.
> >
> >         Jakub
> >
diff mbox series

Patch

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 23e0f5ef4e2..a967c4da393 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -200,6 +200,124 @@  frelop_early_resolve (irange &r, tree type,
 	  && relop_early_resolve (r, type, op1, op2, rel, my_rel));
 }
 
+// If R contains a NAN of unknown sign, update the NAN's signbit
+// depending on two operands.
+
+inline void
+update_nan_sign (frange &r, const frange &op1, const frange &op2)
+{
+  if (!r.maybe_isnan ())
+    return;
+
+  bool op1_nan = op1.maybe_isnan ();
+  bool op2_nan = op2.maybe_isnan ();
+  bool sign1, sign2;
+
+  gcc_checking_assert (!r.nan_signbit_p (sign1));
+  if (op1_nan && op2_nan)
+    {
+      if (op1.nan_signbit_p (sign1) && op2.nan_signbit_p (sign2))
+	r.update_nan (sign1 | sign2);
+    }
+  else if (op1_nan)
+    {
+      if (op1.nan_signbit_p (sign1))
+	r.update_nan (sign1);
+    }
+  else if (op2_nan)
+    {
+      if (op2.nan_signbit_p (sign2))
+	r.update_nan (sign2);
+    }
+}
+
+// If either operand is a NAN, set R to the combination of both NANs
+// signwise and return TRUE.
+
+inline bool
+propagate_nans (frange &r, const frange &op1, const frange &op2)
+{
+  if (op1.known_isnan () || op2.known_isnan ())
+    {
+      r.set_nan (op1.type ());
+      update_nan_sign (r, op1, op2);
+      return true;
+    }
+  return false;
+}
+
+// Set VALUE to its next real value, or INF if the operation overflows.
+
+inline void
+frange_nextafter (enum machine_mode mode,
+		  REAL_VALUE_TYPE &value,
+		  const REAL_VALUE_TYPE &inf)
+{
+  const real_format *fmt = REAL_MODE_FORMAT (mode);
+  REAL_VALUE_TYPE tmp;
+  bool overflow = real_nextafter (&tmp, fmt, &value, &inf);
+  if (overflow)
+    value = inf;
+  else
+    value = tmp;
+}
+
+// Like real_arithmetic, but round the result to INF if the operation
+// produced inexact results.
+//
+// ?? There is still one problematic case, i387.  With
+// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
+// XFmode (long_double_type_node), so that case is OK.  But without
+// -mfpmath=sse, all the SF/DFmode computations are in XFmode
+// precision (64-bit mantissa) and only occassionally rounded to
+// SF/DFmode (when storing into memory from the 387 stack).  Maybe
+// this is ok as well though it is just occassionally more precise. ??
+
+static void
+frange_arithmetic (enum tree_code code, tree type,
+		   REAL_VALUE_TYPE &result,
+		   const REAL_VALUE_TYPE &op1,
+		   const REAL_VALUE_TYPE &op2,
+		   const REAL_VALUE_TYPE &inf)
+{
+  REAL_VALUE_TYPE value;
+  enum machine_mode mode = TYPE_MODE (type);
+  bool mode_composite = MODE_COMPOSITE_P (mode);
+
+  bool inexact = real_arithmetic (&value, code, &op1, &op2);
+  real_convert (&result, mode, &value);
+
+  // If real_convert above has rounded an inexact value to towards
+  // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp
+  // later (real_nextafter).
+  bool rounding = (flag_rounding_math
+		   && (real_isneg (&inf)
+		       ? real_less (&result, &value)
+		       : !real_less (&value, &result)));
+
+  // Be extra careful if there may be discrepancies between the
+  // compile and runtime results.
+  if ((rounding || mode_composite)
+      && (inexact || !real_identical (&result, &value)))
+    {
+      if (mode_composite)
+	{
+	  bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0;
+	  if (denormal)
+	    {
+	      REAL_VALUE_TYPE tmp;
+	      real_convert (&tmp, DFmode, &value);
+	      frange_nextafter (DFmode, tmp, inf);
+	      real_convert (&result, mode, &tmp);
+	    }
+	  else
+	    frange_nextafter (mode, result, inf);
+	}
+      else
+	frange_nextafter (mode, result, inf);
+    }
+}
+
 // Crop R to [-INF, MAX] where MAX is the maximum representable number
 // for TYPE.
 
@@ -1620,6 +1738,58 @@  foperator_unordered_equal::op1_range (frange &r, tree type,
   return true;
 }
 
+class foperator_plus : public range_operator_float
+{
+  using range_operator_float::fold_range;
+
+public:
+  bool fold_range (frange &r, tree type,
+		   const frange &lh,
+		   const frange &rh,
+		   relation_kind rel = VREL_VARYING) const final override;
+} fop_plus;
+
+bool
+foperator_plus::fold_range (frange &r, tree type,
+			    const frange &op1, const frange &op2,
+			    relation_kind) const
+{
+  if (empty_range_varying (r, type, op1, op2))
+    return true;
+  if (propagate_nans (r, op1, op2))
+    return true;
+
+  REAL_VALUE_TYPE lb, ub;
+  frange_arithmetic (PLUS_EXPR, type, lb,
+		     op1.lower_bound (), op2.lower_bound (), dconstninf);
+  frange_arithmetic (PLUS_EXPR, type, ub,
+		     op1.upper_bound (), op2.upper_bound (), dconstinf);
+
+  // Handle possible NANs by saturating to the appropriate INF if only
+  // one end is a NAN.  If both ends are a NAN, just return a NAN.
+  bool lb_nan = real_isnan (&lb);
+  bool ub_nan = real_isnan (&ub);
+  if (lb_nan && ub_nan)
+    {
+      r.set_nan (type);
+      return true;
+    }
+  if (lb_nan)
+    lb = dconstninf;
+  else if (ub_nan)
+    ub = dconstinf;
+
+  // The setter sets NAN by default for HONOR_NANS.
+  r.set (type, lb, ub);
+
+  if (lb_nan || ub_nan)
+    update_nan_sign (r, op1, op2);
+  else if (!op1.maybe_isnan () && !op2.maybe_isnan ())
+    r.clear_nan ();
+
+  return true;
+}
+
 // Instantiate a range_op_table for floating point operations.
 static floating_op_table global_floating_table;
 
@@ -1652,6 +1822,7 @@  floating_op_table::floating_op_table ()
 
   set (ABS_EXPR, fop_abs);
   set (NEGATE_EXPR, fop_negate);
+  set (PLUS_EXPR, fop_plus);
 }
 
 // Return a pointer to the range_operator_float instance, if there is
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-plus.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-plus.c
new file mode 100644
index 00000000000..3739ea4e810
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-plus.c
@@ -0,0 +1,21 @@ 
+// { dg-do compile }
+// { dg-options "-O2 -fno-tree-fre -fno-tree-dominator-opts -fno-thread-jumps -fdump-tree-vrp2" }
+
+double BG_SplineLength ()
+{
+  double lastPoint;
+  double i;
+
+  for (i = 0.01;i<=1;i+=0.1f)
+    if (!(i != 0.0))
+      {
+        lastPoint = i;
+      }
+    else
+      {
+        lastPoint = 2;
+      }
+  return lastPoint;
+}
+
+// { dg-final { scan-tree-dump-times "return 2\\.0e" 1 "vrp2" } }