Start implementing -frounding-math

Message ID	alpine.DEB.2.02.1906221743430.16432@grove.saclay.inria.fr
State	New
Headers	show Return-Path: <gcc-patches-return-503505-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type :content-id; q=dns; s=default; b=wFmLTlByfMEpOhLaFnCHETCNrjS7EIQ ppzqbvO8qZJ9mXjUQ+BnSxueCoGta5zjmXOyPCYaB0lDv5hhARJMaO7fy5aeGqmU GNlkJXC/KCWS/rBKqG2SH66fLQ3vcmfjP1sKuSPr0mdr6kQ0u1X8gUDekpwBdQHZ Zj48bS6IW27U= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Date: Sat, 22 Jun 2019 18:10:15 +0200 (CEST) From: Marc Glisse <marc.glisse@inria.fr> To: gcc-patches@gcc.gnu.org Subject: Start implementing -frounding-math Message-ID: <alpine.DEB.2.02.1906221743430.16432@grove.saclay.inria.fr> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-964516830-1561219606=:16432" Content-ID: <alpine.DEB.2.02.1906221808020.16432@grove.saclay.inria.fr>
Series	Start implementing -frounding-math \| expand Start implementing -frounding-math

Marc Glisse June 22, 2019, 4:10 p.m. UTC

Hello,

as discussed in the PR, this seems like a simple enough approach to handle 
FENV functionality safely, while keeping it possible to implement 
optimizations in the future.

Some key missing things:
- handle C, not just C++ (I don't care, but some people probably do)
- handle vectors (for complex, I don't know what it means)

Then flag_trapping_math should also enable this path, meaning that we 
should stop making it the default, or performance will suffer.

Nice to have:
- parse the fenv_access pragma and make it set flag_rounding_math or similar.
- sqrt

All the optimizations can come later (I count having different functions 
for flag_rounding_math and flag_trapping_math as one such optimization).


I put the lowering in its own pass, because it needs to run at -O0 and 
there aren't that many passes at -O0 where I could put it. It would 
probably be better to handle this directly during expansion, but with my 
knowledge of the compiler it was easier to lower it before.

This patch passes bootstrap+regtest on x86_64. I expect it may break a few 
testcases on some targets (arm?) that check that we optimize some things 
even with -frounding-math, but as far as I am concerned those do not count 
as regressions because -frounding-math was never really implemented, so I 
would encourage target maintainers to xfail those for now.

I'd like to handle this incrementally, rather than wait for a mega-patch 
that does everything, if that's ok. For instance, I didn't handle vectors 
in this first patch because the interaction with vector lowering was not 
completely obvious. Plus it may help get others to implement some parts of 
it ;-)

2019-06-24  Marc Glisse  <marc.glisse@inria.fr>

         PR middle-end/34678
gcc/cp/
         * typeck.c (cp_build_binary_op): Generate internal functions for float
         operations with -frounding-math.

gcc/
         * Makefile.in: Handle new file gimple-lower-fenv.cc.
         * gimple-lower-fenv.cc: New file.
         * internal-fn.c (expand_FENV_PLUS, expand_FENV_MINUS, expand_FENV_MULT,
         expand_FENV_DIV): New functions.
         * internal-fn.def (FENV_PLUS, FENV_MINUS, FENV_MULT, FENV_DIV): New
         internal functions.
         * passes.def (pass_lower_fenv): New pass.
         * tree-pass.h (make_pass_lower_fenv): Declare new function.

Richard Biener June 22, 2019, 4:58 p.m. UTC | #1

On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote:
>Hello,
>
>as discussed in the PR, this seems like a simple enough approach to
>handle 
>FENV functionality safely, while keeping it possible to implement 
>optimizations in the future.
>
>Some key missing things:
>- handle C, not just C++ (I don't care, but some people probably do)

As you tackle C++, what does the standard say to constexpr contexts and FENV? That is, what's the FP environment at compiler - time (I suppose FENV modifying functions are not constexpr declared). 

>- handle vectors (for complex, I don't know what it means)
>
>Then flag_trapping_math should also enable this path, meaning that we 
>should stop making it the default, or performance will suffer.

Do we need N variants of the functions to really encode FP options into the IL and thus allow inlining of say different signed-zero flag functions?

I didn't look at the patch but I suppose you rely on RTL to not do code motion across FENV modifications and not fold
Constants? That is, don't we really need unspec_volatile variant patterns for the 
Operations? 

Thanks for working on this. 

Richard. 

>Nice to have:
>- parse the fenv_access pragma and make it set flag_rounding_math or
>similar.
>- sqrt
>
>All the optimizations can come later (I count having different
>functions 
>for flag_rounding_math and flag_trapping_math as one such
>optimization).
>
>
>I put the lowering in its own pass, because it needs to run at -O0 and 
>there aren't that many passes at -O0 where I could put it. It would 
>probably be better to handle this directly during expansion, but with
>my 
>knowledge of the compiler it was easier to lower it before.
>
>This patch passes bootstrap+regtest on x86_64. I expect it may break a
>few 
>testcases on some targets (arm?) that check that we optimize some
>things 
>even with -frounding-math, but as far as I am concerned those do not
>count 
>as regressions because -frounding-math was never really implemented, so
>I 
>would encourage target maintainers to xfail those for now.
>
>I'd like to handle this incrementally, rather than wait for a
>mega-patch 
>that does everything, if that's ok. For instance, I didn't handle
>vectors 
>in this first patch because the interaction with vector lowering was
>not 
>completely obvious. Plus it may help get others to implement some parts
>of 
>it ;-)
>
>2019-06-24  Marc Glisse  <marc.glisse@inria.fr>
>
>         PR middle-end/34678
>gcc/cp/
> * typeck.c (cp_build_binary_op): Generate internal functions for float
>         operations with -frounding-math.
>
>gcc/
>         * Makefile.in: Handle new file gimple-lower-fenv.cc.
>         * gimple-lower-fenv.cc: New file.
>* internal-fn.c (expand_FENV_PLUS, expand_FENV_MINUS, expand_FENV_MULT,
>         expand_FENV_DIV): New functions.
>    * internal-fn.def (FENV_PLUS, FENV_MINUS, FENV_MULT, FENV_DIV): New
>         internal functions.
>         * passes.def (pass_lower_fenv): New pass.
>         * tree-pass.h (make_pass_lower_fenv): Declare new function.

Marc Glisse June 22, 2019, 10:21 p.m. UTC | #2

On Sat, 22 Jun 2019, Richard Biener wrote:

> On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote:
>> Hello,
>>
>> as discussed in the PR, this seems like a simple enough approach to
>> handle
>> FENV functionality safely, while keeping it possible to implement
>> optimizations in the future.
>>
>> Some key missing things:
>> - handle C, not just C++ (I don't care, but some people probably do)
>
> As you tackle C++, what does the standard say to constexpr contexts and 
> FENV? That is, what's the FP environment at compiler - time (I suppose 
> FENV modifying functions are not constexpr declared).

The C++ standard doesn't care much about fenv:

[Note: This document does not require an implementation to support the 
FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma 
is supported. As a consequence, it is implementation- defined whether 
these functions can be used to test floating-point status flags, set 
floating-point control modes, or run under non-default mode settings. If 
the pragma is used to enable control over the floating-point environment, 
this document does not specify the effect on floating-point evaluation in 
constant expressions. — end note]

We should care about the C standard, and do whatever makes sense for C++ 
without expecting the C++ standard to tell us exactly what that is. We can 
check what visual studio and intel do, but we don't have to follow them.

-frounding-math is supposed to be equivalent to "#pragma stdc fenv_access 
on" covering the whole program.

For constant expressions, I see a difference between
constexpr double third = 1. / 3.;
which really needs to be done at compile time, and
const double third = 1. / 3.;
which will try to evaluate the rhs as constexpr, but where the program is 
still valid if that fails. The second one clearly should refuse to be 
evaluated at compile time if we are specifying a dynamic rounding 
direction. For the first one, I am not sure. I guess you should only write 
that in "fenv_access off" regions and I wouldn't mind a compile error.

Note that C2x adds a pragma fenv_round that specifies a rounding direction 
for a region of code, which seems relevant for constant expressions. That 
pragma looks hard, but maybe some pieces would be nice to add.

>> - handle vectors (for complex, I don't know what it means)
>>
>> Then flag_trapping_math should also enable this path, meaning that we
>> should stop making it the default, or performance will suffer.
>
> Do we need N variants of the functions to really encode FP options into 
> the IL and thus allow inlining of say different signed-zero flag 
> functions?

Not sure what you are suggesting. I am essentially creating a new 
tree_code (well, an internal function) for an addition-like function that 
actually reads/writes memory, so it should be orthogonal to inlining, and 
only the front-end should care about -frounding-math. I didn't think about 
the interaction with signed-zero. Ah, you mean 
IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc? The ones I am starting 
from are supposed to be safe-for-everything. As refinement, I was thinking 
in 2 directions:
* add a third constant argument, where we can specify extra info
* add a variant for the case where the function is pure (because I expect 
that's easier on the compiler than "pure if (arg3 & 8) != 0")
I am not sure more variants are needed.

Also, while rounding clearly applies to an operation, signed-zero kind of 
seems to apply to a variable, and in an operation, I don't really know if 
it means that I can pretend that an argument of -0. is +0. (I can return 
+inf for 1/-0.) or if it means I can return 0. when the operation should 
return -0.. Probably both... If we have just -fsigned-zeros but no 
rounding or trapping, the penalty of using an IFN would be bad. But indeed 
inlining functions with different -f(no-)signed-zeros forces to use 
-fsigned-zeros for the whole merged function if we don't encode it in the 
operations. Hmm

> I didn't look at the patch but I suppose you rely on RTL to not do code 
> motion across FENV modifications and not fold Constants?

No, I rely on asm volatile to prevent that, as in your recent hack, except 
that the asm only appears near expansion. I am trying to start from 
something safe and refine with optimizations, no subtlety.

> That is, don't we really need unspec_volatile variant patterns for the 
> Operations?

Yes. One future optimization (that I listed in the PR) is to let targets 
expand those IFN as they like (without the asm barriers), using some 
unspec_volatile. I hope we can get there, although just letting targets 
replace "=g" with whatever in the asm would already get most of the 
benefits.

I just thought of one issue for vector intrinsics, say _mm_add_pd, where 
the fenv_access status that should matter is that of the caller, not the 
one in emmintrin.h. But since I don't have the pragma or vectors, that can 
wait.

Richard Biener June 24, 2019, 11:56 a.m. UTC | #3

On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Sat, 22 Jun 2019, Richard Biener wrote:
>
> > On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote:
> >> Hello,
> >>
> >> as discussed in the PR, this seems like a simple enough approach to
> >> handle
> >> FENV functionality safely, while keeping it possible to implement
> >> optimizations in the future.
> >>
> >> Some key missing things:
> >> - handle C, not just C++ (I don't care, but some people probably do)
> >
> > As you tackle C++, what does the standard say to constexpr contexts and
> > FENV? That is, what's the FP environment at compiler - time (I suppose
> > FENV modifying functions are not constexpr declared).
>
> The C++ standard doesn't care much about fenv:
>
> [Note: This document does not require an implementation to support the
> FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma
> is supported. As a consequence, it is implementation- defined whether
> these functions can be used to test floating-point status flags, set
> floating-point control modes, or run under non-default mode settings. If
> the pragma is used to enable control over the floating-point environment,
> this document does not specify the effect on floating-point evaluation in
> constant expressions. — end note]

Oh, I see.

> We should care about the C standard, and do whatever makes sense for C++
> without expecting the C++ standard to tell us exactly what that is. We can
> check what visual studio and intel do, but we don't have to follow them.

This makes it somewhat odd to implement this for C++ first and not C, but hey ;)

> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
> on" covering the whole program.
>
> For constant expressions, I see a difference between
> constexpr double third = 1. / 3.;
> which really needs to be done at compile time, and
> const double third = 1. / 3.;
> which will try to evaluate the rhs as constexpr, but where the program is
> still valid if that fails. The second one clearly should refuse to be
> evaluated at compile time if we are specifying a dynamic rounding
> direction. For the first one, I am not sure. I guess you should only write
> that in "fenv_access off" regions and I wouldn't mind a compile error.
>
> Note that C2x adds a pragma fenv_round that specifies a rounding direction
> for a region of code, which seems relevant for constant expressions. That
> pragma looks hard, but maybe some pieces would be nice to add.

Hmm.  My thinking was along the line that at the start of main() the
C abstract machine might specify the initial rounding mode (and exception
state) is implementation defined and all constant expressions are evaluated
whilst being in this state.  So we can define that to round-to-nearest and
simply fold all constants in contexts we are allowed to evaluate at
compile-time as we see them?

I guess fenv_round aims at using a pragma to change the rounding mode?

> >> - handle vectors (for complex, I don't know what it means)
> >>
> >> Then flag_trapping_math should also enable this path, meaning that we
> >> should stop making it the default, or performance will suffer.
> >
> > Do we need N variants of the functions to really encode FP options into
> > the IL and thus allow inlining of say different signed-zero flag
> > functions?
>
> Not sure what you are suggesting. I am essentially creating a new
> tree_code (well, an internal function) for an addition-like function that
> actually reads/writes memory, so it should be orthogonal to inlining, and
> only the front-end should care about -frounding-math. I didn't think about
> the interaction with signed-zero. Ah, you mean
> IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc?

Yeah.  Basically the goal is to have the IL fully defined on its own, without
having its semantic depend on flag_*.

> The ones I am starting
> from are supposed to be safe-for-everything. As refinement, I was thinking
> in 2 directions:
> * add a third constant argument, where we can specify extra info
> * add a variant for the case where the function is pure (because I expect
> that's easier on the compiler than "pure if (arg3 & 8) != 0")
> I am not sure more variants are needed.

For optimization having a ADD_ROUND_TO_ZERO (or the extra params
specifying an explicit rounding mode) might be interesting since on x86
there are now instructions with rounding mode control bits.

> Also, while rounding clearly applies to an operation, signed-zero kind of
> seems to apply to a variable, and in an operation, I don't really know if
> it means that I can pretend that an argument of -0. is +0. (I can return
> +inf for 1/-0.) or if it means I can return 0. when the operation should
> return -0.. Probably both... If we have just -fsigned-zeros but no
> rounding or trapping, the penalty of using an IFN would be bad. But indeed
> inlining functions with different -f(no-)signed-zeros forces to use
> -fsigned-zeros for the whole merged function if we don't encode it in the
> operations. Hmm

Yeah.  I guess we need to think about each and every case and how
to deal with it.  There's denormals and flush-to-zero (not covered by
posix fenv modification IIRC) and a lot of math optimization flags
that do not map to FP operations directly...

> > I didn't look at the patch but I suppose you rely on RTL to not do code
> > motion across FENV modifications and not fold Constants?
>
> No, I rely on asm volatile to prevent that, as in your recent hack, except
> that the asm only appears near expansion. I am trying to start from
> something safe and refine with optimizations, no subtlety.

Ah, OK.  So indeed instead of a new pass doing the lowering on GIMPLE
this should ideally be done by populating expand_FENV_* appropriately.

> > That is, don't we really need unspec_volatile variant patterns for the
> > Operations?
>
> Yes. One future optimization (that I listed in the PR) is to let targets
> expand those IFN as they like (without the asm barriers), using some
> unspec_volatile. I hope we can get there, although just letting targets
> replace "=g" with whatever in the asm would already get most of the
> benefits.
>
>
>
> I just thought of one issue for vector intrinsics, say _mm_add_pd, where
> the fenv_access status that should matter is that of the caller, not the
> one in emmintrin.h. But since I don't have the pragma or vectors, that can
> wait.

True.  I guess for the intrinsic headers we could invent some new attribute
(or assume such semantics for always_inline which IIRC they are) saying
that a function inherits options from the caller (difficult if not
inlined, it would
imply cloning, thus always-inline again...).

On the patch I'd name _DIV _RDIV (to match the tree code we are dealing
with).  You miss _NEGATE and also the _FIX_TRUNC and _FLOAT in
case those might trap with -ftrapping-math.  There are also internal
functions for POW, FMOD and others which are ECF_CONST but may
not end up being folded from their builtin counter-part with -frounding-math.

I guess builtins need the same treatment for -ftrapping-math as they
do for -frounding-math.  I think you already mentioned the default
of this flag doesn't make much sense (well, the flag isn't fully
honored/implemented).

So I think the patch is a good start but I'd say we should not introduce
the new pass but instead expand to the asm() kludge directly which
would make it also easier to handle some ops as unspecs in the target.

In the future an optimize_fenv pass could annotate the call with
the optional specifier if it detects regions with known exception/rounding
state but it still may not rewrite back the internal functions to
plain operations (at least before IPA) since the IFNs are required so
the FENV modifying operations are code-motion barriers.

Thanks,
Richard.

>
> --
> Marc Glisse

Marc Glisse June 24, 2019, 1:47 p.m. UTC | #4

On Mon, 24 Jun 2019, Richard Biener wrote:

> On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>
>> On Sat, 22 Jun 2019, Richard Biener wrote:
>>
>>> On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>> Hello,
>>>>
>>>> as discussed in the PR, this seems like a simple enough approach to
>>>> handle
>>>> FENV functionality safely, while keeping it possible to implement
>>>> optimizations in the future.
>>>>
>>>> Some key missing things:
>>>> - handle C, not just C++ (I don't care, but some people probably do)
>>>
>>> As you tackle C++, what does the standard say to constexpr contexts and
>>> FENV? That is, what's the FP environment at compiler - time (I suppose
>>> FENV modifying functions are not constexpr declared).
>>
>> The C++ standard doesn't care much about fenv:
>>
>> [Note: This document does not require an implementation to support the
>> FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma
>> is supported. As a consequence, it is implementation- defined whether
>> these functions can be used to test floating-point status flags, set
>> floating-point control modes, or run under non-default mode settings. If
>> the pragma is used to enable control over the floating-point environment,
>> this document does not specify the effect on floating-point evaluation in
>> constant expressions. — end note]
>
> Oh, I see.
>
>> We should care about the C standard, and do whatever makes sense for C++
>> without expecting the C++ standard to tell us exactly what that is. We can
>> check what visual studio and intel do, but we don't have to follow them.
>
> This makes it somewhat odd to implement this for C++ first and not C, but hey ;)

Well, I maintain a part of CGAL, a C++ library, that uses interval 
arithmetic and thus relies on a non-default rounding direction. I am 
trying to prepare this dog food so I can eat it myself...

>> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
>> on" covering the whole program.
>>
>> For constant expressions, I see a difference between
>> constexpr double third = 1. / 3.;
>> which really needs to be done at compile time, and
>> const double third = 1. / 3.;
>> which will try to evaluate the rhs as constexpr, but where the program is
>> still valid if that fails. The second one clearly should refuse to be
>> evaluated at compile time if we are specifying a dynamic rounding
>> direction. For the first one, I am not sure. I guess you should only write
>> that in "fenv_access off" regions and I wouldn't mind a compile error.
>>
>> Note that C2x adds a pragma fenv_round that specifies a rounding direction
>> for a region of code, which seems relevant for constant expressions. That
>> pragma looks hard, but maybe some pieces would be nice to add.
>
> Hmm.  My thinking was along the line that at the start of main() the
> C abstract machine might specify the initial rounding mode (and exception
> state) is implementation defined and all constant expressions are evaluated
> whilst being in this state.  So we can define that to round-to-nearest and
> simply fold all constants in contexts we are allowed to evaluate at
> compile-time as we see them?

There are way too many such contexts. In C++, any initializer is 
constexpr-evaluated if possible (PR 85746 shows that this is bad for 
__builtin_constant_p), and I do want
double d = 1. / 3;
to depend on the dynamic rounding direction. I'd rather err on the other 
extreme and only fold when we are forced to, say
constexpr double d = 1. / 3;
or even reject it because it is inexact, if pragmas put us in a region 
with dynamic rounding.

> I guess fenv_round aims at using a pragma to change the rounding mode?

Yes. You can specify either a fixed rounding mode, or "dynamic". In the 
first case, it overrides the dynamic rounding mode.

>>>> - handle vectors (for complex, I don't know what it means)
>>>>
>>>> Then flag_trapping_math should also enable this path, meaning that we
>>>> should stop making it the default, or performance will suffer.
>>>
>>> Do we need N variants of the functions to really encode FP options into
>>> the IL and thus allow inlining of say different signed-zero flag
>>> functions?
>>
>> Not sure what you are suggesting. I am essentially creating a new
>> tree_code (well, an internal function) for an addition-like function that
>> actually reads/writes memory, so it should be orthogonal to inlining, and
>> only the front-end should care about -frounding-math. I didn't think about
>> the interaction with signed-zero. Ah, you mean
>> IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc?
>
> Yeah.  Basically the goal is to have the IL fully defined on its own, without
> having its semantic depend on flag_*.
>
>> The ones I am starting
>> from are supposed to be safe-for-everything. As refinement, I was thinking
>> in 2 directions:
>> * add a third constant argument, where we can specify extra info
>> * add a variant for the case where the function is pure (because I expect
>> that's easier on the compiler than "pure if (arg3 & 8) != 0")
>> I am not sure more variants are needed.
>
> For optimization having a ADD_ROUND_TO_ZERO (or the extra params
> specifying an explicit rounding mode) might be interesting since on x86
> there are now instructions with rounding mode control bits.

Yes. Pragma fenv_round would match well with that. On the other hand, it 
would be painful for platforms that do not have such instructions, forcing 
to generate plenty of fe[gs]etround, and probably have a pass to try and 
reduce their number.

Side remark, I am sad that Intel added rounded versions for scalars and 
512 bit vectors but not for intermediate sizes, while I am most 
interested in 128 bits. Masking most of the 512 bits still causes the 
dreaded clock slow-down.

>> Also, while rounding clearly applies to an operation, signed-zero kind of
>> seems to apply to a variable, and in an operation, I don't really know if
>> it means that I can pretend that an argument of -0. is +0. (I can return
>> +inf for 1/-0.) or if it means I can return 0. when the operation should
>> return -0.. Probably both... If we have just -fsigned-zeros but no
>> rounding or trapping, the penalty of using an IFN would be bad. But indeed
>> inlining functions with different -f(no-)signed-zeros forces to use
>> -fsigned-zeros for the whole merged function if we don't encode it in the
>> operations. Hmm
>
> Yeah.  I guess we need to think about each and every case and how
> to deal with it.  There's denormals and flush-to-zero (not covered by
> posix fenv modification IIRC) and a lot of math optimization flags
> that do not map to FP operations directly...

If we really try to model all that, at some point we may as well remove 
PLUS_EXPR for floats...

.FENV_PLUS (x, y, flags)

where flags is a bitfield that specifies if we care about signed zeros, 
signalling NaNs, what the rounding is (dynamic, don't care, up, down, 
etc), if we care about exceptions, if we can do unsafe optimizations, if 
we can contract +* into fma, etc. That would force to rewrite a lot of 
optimizations :-(

And CSE might become complicated with several expressions that differ only 
in their flags.

.FENV_PLUS (x, y) was supposed to be equivalent to .FENV_PLUS (x, y, 
safeflags) where safeflags are the strictest flags possible, while leaving 
existing stuff like -funsafe-math-optimizations alone (so no regression), 
with the idea that the version with flags would come later.

>>> I didn't look at the patch but I suppose you rely on RTL to not do code
>>> motion across FENV modifications and not fold Constants?
>>
>> No, I rely on asm volatile to prevent that, as in your recent hack, except
>> that the asm only appears near expansion. I am trying to start from
>> something safe and refine with optimizations, no subtlety.
>
> Ah, OK.  So indeed instead of a new pass doing the lowering on GIMPLE
> this should ideally be done by populating expand_FENV_* appropriately.

Yes, I was lazy because it means I need to understand better how expansion 
works :-(

>>> That is, don't we really need unspec_volatile variant patterns for the
>>> Operations?
>>
>> Yes. One future optimization (that I listed in the PR) is to let targets
>> expand those IFN as they like (without the asm barriers), using some
>> unspec_volatile. I hope we can get there, although just letting targets
>> replace "=g" with whatever in the asm would already get most of the
>> benefits.
>>
>>
>>
>> I just thought of one issue for vector intrinsics, say _mm_add_pd, where
>> the fenv_access status that should matter is that of the caller, not the
>> one in emmintrin.h. But since I don't have the pragma or vectors, that can
>> wait.
>
> True.  I guess for the intrinsic headers we could invent some new attribute
> (or assume such semantics for always_inline which IIRC they are) saying
> that a function inherits options from the caller (difficult if not
> inlined, it would
> imply cloning, thus always-inline again...).
>
> On the patch I'd name _DIV _RDIV (to match the tree code we are dealing
> with).  You miss _NEGATE

True. I am only interested in -frounding-math, so my first reaction was 
that I don't need to do anything for NEGATE, but indeed with a signalling 
NaN anything can have an effect.

> and also the _FIX_TRUNC and _FLOAT in case those might trap with 
> -ftrapping-math.

I don't know much about fixed point, and I didn't think about conversions 
yet. I'll have to check what the C standard says about those.

> There are also internal functions for POW, FMOD and others which are 
> ECF_CONST but may not end up being folded from their builtin 
> counter-part with -frounding-math.

I don't know how far this needs to go. SQRT has correctly rounded 
instructions on several targets, so it is relevant. But unless your libm 
provides a correctly-rounded implementation of pow, the compiler could 
also ignore it. The new pragma fenv_round is scary in part because it 
seems to imply that all math functions need to have a correctly rounding 
implementation.

> I guess builtins need the same treatment for -ftrapping-math as they
> do for -frounding-math.  I think you already mentioned the default
> of this flag doesn't make much sense (well, the flag isn't fully
> honored/implemented).

PR 54192
(coincidentally, it caused a missed vectorization in 
https://stackoverflow.com/a/56681744/1918193 last week)

> So I think the patch is a good start but I'd say we should not introduce
> the new pass but instead expand to the asm() kludge directly which
> would make it also easier to handle some ops as unspecs in the target.

This also answers what should be done with vectors, I'll need to add code 
to tree-vect-generic for the new functions.

Richard Biener June 24, 2019, 2:09 p.m. UTC | #5

On Mon, Jun 24, 2019 at 3:47 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Mon, 24 Jun 2019, Richard Biener wrote:
>
> > On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>
> >> On Sat, 22 Jun 2019, Richard Biener wrote:
> >>
> >>> On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote:
> >>>> Hello,
> >>>>
> >>>> as discussed in the PR, this seems like a simple enough approach to
> >>>> handle
> >>>> FENV functionality safely, while keeping it possible to implement
> >>>> optimizations in the future.
> >>>>
> >>>> Some key missing things:
> >>>> - handle C, not just C++ (I don't care, but some people probably do)
> >>>
> >>> As you tackle C++, what does the standard say to constexpr contexts and
> >>> FENV? That is, what's the FP environment at compiler - time (I suppose
> >>> FENV modifying functions are not constexpr declared).
> >>
> >> The C++ standard doesn't care much about fenv:
> >>
> >> [Note: This document does not require an implementation to support the
> >> FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma
> >> is supported. As a consequence, it is implementation- defined whether
> >> these functions can be used to test floating-point status flags, set
> >> floating-point control modes, or run under non-default mode settings. If
> >> the pragma is used to enable control over the floating-point environment,
> >> this document does not specify the effect on floating-point evaluation in
> >> constant expressions. — end note]
> >
> > Oh, I see.
> >
> >> We should care about the C standard, and do whatever makes sense for C++
> >> without expecting the C++ standard to tell us exactly what that is. We can
> >> check what visual studio and intel do, but we don't have to follow them.
> >
> > This makes it somewhat odd to implement this for C++ first and not C, but hey ;)
>
> Well, I maintain a part of CGAL, a C++ library, that uses interval
> arithmetic and thus relies on a non-default rounding direction. I am
> trying to prepare this dog food so I can eat it myself...

;)

> >> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
> >> on" covering the whole program.
> >>
> >> For constant expressions, I see a difference between
> >> constexpr double third = 1. / 3.;
> >> which really needs to be done at compile time, and
> >> const double third = 1. / 3.;
> >> which will try to evaluate the rhs as constexpr, but where the program is
> >> still valid if that fails. The second one clearly should refuse to be
> >> evaluated at compile time if we are specifying a dynamic rounding
> >> direction. For the first one, I am not sure. I guess you should only write
> >> that in "fenv_access off" regions and I wouldn't mind a compile error.
> >>
> >> Note that C2x adds a pragma fenv_round that specifies a rounding direction
> >> for a region of code, which seems relevant for constant expressions. That
> >> pragma looks hard, but maybe some pieces would be nice to add.
> >
> > Hmm.  My thinking was along the line that at the start of main() the
> > C abstract machine might specify the initial rounding mode (and exception
> > state) is implementation defined and all constant expressions are evaluated
> > whilst being in this state.  So we can define that to round-to-nearest and
> > simply fold all constants in contexts we are allowed to evaluate at
> > compile-time as we see them?
>
> There are way too many such contexts. In C++, any initializer is
> constexpr-evaluated if possible (PR 85746 shows that this is bad for
> __builtin_constant_p), and I do want
> double d = 1. / 3;
> to depend on the dynamic rounding direction. I'd rather err on the other
> extreme and only fold when we are forced to, say
> constexpr double d = 1. / 3;
> or even reject it because it is inexact, if pragmas put us in a region
> with dynamic rounding.

OK, fair enough.  I just hoped that global

double x = 1.0/3.0;

do not become runtime initializers with -frounding-math ...

> > I guess fenv_round aims at using a pragma to change the rounding mode?
>
> Yes. You can specify either a fixed rounding mode, or "dynamic". In the
> first case, it overrides the dynamic rounding mode.
>
> >>>> - handle vectors (for complex, I don't know what it means)
> >>>>
> >>>> Then flag_trapping_math should also enable this path, meaning that we
> >>>> should stop making it the default, or performance will suffer.
> >>>
> >>> Do we need N variants of the functions to really encode FP options into
> >>> the IL and thus allow inlining of say different signed-zero flag
> >>> functions?
> >>
> >> Not sure what you are suggesting. I am essentially creating a new
> >> tree_code (well, an internal function) for an addition-like function that
> >> actually reads/writes memory, so it should be orthogonal to inlining, and
> >> only the front-end should care about -frounding-math. I didn't think about
> >> the interaction with signed-zero. Ah, you mean
> >> IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc?
> >
> > Yeah.  Basically the goal is to have the IL fully defined on its own, without
> > having its semantic depend on flag_*.
> >
> >> The ones I am starting
> >> from are supposed to be safe-for-everything. As refinement, I was thinking
> >> in 2 directions:
> >> * add a third constant argument, where we can specify extra info
> >> * add a variant for the case where the function is pure (because I expect
> >> that's easier on the compiler than "pure if (arg3 & 8) != 0")
> >> I am not sure more variants are needed.
> >
> > For optimization having a ADD_ROUND_TO_ZERO (or the extra params
> > specifying an explicit rounding mode) might be interesting since on x86
> > there are now instructions with rounding mode control bits.
>
> Yes. Pragma fenv_round would match well with that. On the other hand, it
> would be painful for platforms that do not have such instructions, forcing
> to generate plenty of fe[gs]etround, and probably have a pass to try and
> reduce their number.
>
> Side remark, I am sad that Intel added rounded versions for scalars and
> 512 bit vectors but not for intermediate sizes, while I am most
> interested in 128 bits. Masking most of the 512 bits still causes the
> dreaded clock slow-down.

Ick.  I thought this was vector-length agnostic...

> >> Also, while rounding clearly applies to an operation, signed-zero kind of
> >> seems to apply to a variable, and in an operation, I don't really know if
> >> it means that I can pretend that an argument of -0. is +0. (I can return
> >> +inf for 1/-0.) or if it means I can return 0. when the operation should
> >> return -0.. Probably both... If we have just -fsigned-zeros but no
> >> rounding or trapping, the penalty of using an IFN would be bad. But indeed
> >> inlining functions with different -f(no-)signed-zeros forces to use
> >> -fsigned-zeros for the whole merged function if we don't encode it in the
> >> operations. Hmm
> >
> > Yeah.  I guess we need to think about each and every case and how
> > to deal with it.  There's denormals and flush-to-zero (not covered by
> > posix fenv modification IIRC) and a lot of math optimization flags
> > that do not map to FP operations directly...
>
> If we really try to model all that, at some point we may as well remove
> PLUS_EXPR for floats...
>
> .FENV_PLUS (x, y, flags)
>
> where flags is a bitfield that specifies if we care about signed zeros,
> signalling NaNs, what the rounding is (dynamic, don't care, up, down,
> etc), if we care about exceptions, if we can do unsafe optimizations, if
> we can contract +* into fma, etc. That would force to rewrite a lot of
> optimizations :-(
>
> And CSE might become complicated with several expressions that differ only
> in their flags.
>
> .FENV_PLUS (x, y) was supposed to be equivalent to .FENV_PLUS (x, y,
> safeflags) where safeflags are the strictest flags possible, while leaving
> existing stuff like -funsafe-math-optimizations alone (so no regression),
> with the idea that the version with flags would come later.

Yeah, I'm fine with this incremental approach and it really be constrained
to FP environment access.

> >>> I didn't look at the patch but I suppose you rely on RTL to not do code
> >>> motion across FENV modifications and not fold Constants?
> >>
> >> No, I rely on asm volatile to prevent that, as in your recent hack, except
> >> that the asm only appears near expansion. I am trying to start from
> >> something safe and refine with optimizations, no subtlety.
> >
> > Ah, OK.  So indeed instead of a new pass doing the lowering on GIMPLE
> > this should ideally be done by populating expand_FENV_* appropriately.
>
> Yes, I was lazy because it means I need to understand better how expansion
> works :-(

A bit of copy&paste from examples could do the trick I guess...

> >>> That is, don't we really need unspec_volatile variant patterns for the
> >>> Operations?
> >>
> >> Yes. One future optimization (that I listed in the PR) is to let targets
> >> expand those IFN as they like (without the asm barriers), using some
> >> unspec_volatile. I hope we can get there, although just letting targets
> >> replace "=g" with whatever in the asm would already get most of the
> >> benefits.
> >>
> >>
> >>
> >> I just thought of one issue for vector intrinsics, say _mm_add_pd, where
> >> the fenv_access status that should matter is that of the caller, not the
> >> one in emmintrin.h. But since I don't have the pragma or vectors, that can
> >> wait.
> >
> > True.  I guess for the intrinsic headers we could invent some new attribute
> > (or assume such semantics for always_inline which IIRC they are) saying
> > that a function inherits options from the caller (difficult if not
> > inlined, it would
> > imply cloning, thus always-inline again...).
> >
> > On the patch I'd name _DIV _RDIV (to match the tree code we are dealing
> > with).  You miss _NEGATE
>
> True. I am only interested in -frounding-math, so my first reaction was
> that I don't need to do anything for NEGATE, but indeed with a signalling
> NaN anything can have an effect.
>
> > and also the _FIX_TRUNC and _FLOAT in case those might trap with
> > -ftrapping-math.
>
> I don't know much about fixed point, and I didn't think about conversions
> yet. I'll have to check what the C standard says about those.

FIX_TRUNC is float -> integer conversion (overflow/underflow flag?)

> > There are also internal functions for POW, FMOD and others which are
> > ECF_CONST but may not end up being folded from their builtin
> > counter-part with -frounding-math.
>
> I don't know how far this needs to go. SQRT has correctly rounded
> instructions on several targets, so it is relevant. But unless your libm
> provides a correctly-rounded implementation of pow, the compiler could
> also ignore it. The new pragma fenv_round is scary in part because it
> seems to imply that all math functions need to have a correctly rounding
> implementation.
>
> > I guess builtins need the same treatment for -ftrapping-math as they
> > do for -frounding-math.  I think you already mentioned the default
> > of this flag doesn't make much sense (well, the flag isn't fully
> > honored/implemented).
>
> PR 54192
> (coincidentally, it caused a missed vectorization in
> https://stackoverflow.com/a/56681744/1918193 last week)

I commented there.  Lets just make -frounding-math == FENV_ACCESS ON
and keep -ftrapping-math as whether FP exceptions raise traps.

> > So I think the patch is a good start but I'd say we should not introduce
> > the new pass but instead expand to the asm() kludge directly which
> > would make it also easier to handle some ops as unspecs in the target.
>
> This also answers what should be done with vectors, I'll need to add code
> to tree-vect-generic for the new functions.

Yeah.  Auto-vectorizing would also need adjustment of course (also
costing like estimate_num_insns or others).

Richard.

> --
> Marc Glisse

Marc Glisse June 24, 2019, 2:57 p.m. UTC | #6

On Mon, 24 Jun 2019, Richard Biener wrote:

>>>> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
>>>> on" covering the whole program.
>>>>
>>>> For constant expressions, I see a difference between
>>>> constexpr double third = 1. / 3.;
>>>> which really needs to be done at compile time, and
>>>> const double third = 1. / 3.;
>>>> which will try to evaluate the rhs as constexpr, but where the program is
>>>> still valid if that fails. The second one clearly should refuse to be
>>>> evaluated at compile time if we are specifying a dynamic rounding
>>>> direction. For the first one, I am not sure. I guess you should only write
>>>> that in "fenv_access off" regions and I wouldn't mind a compile error.
>>>>
>>>> Note that C2x adds a pragma fenv_round that specifies a rounding direction
>>>> for a region of code, which seems relevant for constant expressions. That
>>>> pragma looks hard, but maybe some pieces would be nice to add.
>>>
>>> Hmm.  My thinking was along the line that at the start of main() the
>>> C abstract machine might specify the initial rounding mode (and exception
>>> state) is implementation defined and all constant expressions are evaluated
>>> whilst being in this state.  So we can define that to round-to-nearest and
>>> simply fold all constants in contexts we are allowed to evaluate at
>>> compile-time as we see them?
>>
>> There are way too many such contexts. In C++, any initializer is
>> constexpr-evaluated if possible (PR 85746 shows that this is bad for
>> __builtin_constant_p), and I do want
>> double d = 1. / 3;
>> to depend on the dynamic rounding direction. I'd rather err on the other
>> extreme and only fold when we are forced to, say
>> constexpr double d = 1. / 3;
>> or even reject it because it is inexact, if pragmas put us in a region
>> with dynamic rounding.
>
> OK, fair enough.  I just hoped that global
>
> double x = 1.0/3.0;
>
> do not become runtime initializers with -frounding-math ...

Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round, 
which I guess could affect this (the C draft isn't very explicit), the 
program doesn't have many chances to set a rounding mode before 
initializing globals. It could do so in the initializer of another 
variable, but relying on the order of initialization this way seems bad. 
Maybe in this case it would make sense to assume the default rounding 
mode...

In practice, I would only set -frounding-math on a per function basis
(possibly using pragma fenv_access), so the optimization of what happens
to globals doesn't seem so important.

>> Side remark, I am sad that Intel added rounded versions for scalars and
>> 512 bit vectors but not for intermediate sizes, while I am most
>> interested in 128 bits. Masking most of the 512 bits still causes the
>> dreaded clock slow-down.
>
> Ick.  I thought this was vector-length agnostic...

I think all of the new stuff in AVX512 is, except rounding...

Also, the rounded functions have exceptions disabled, which may make
them hard to use with fenv_access.

>>> I guess builtins need the same treatment for -ftrapping-math as they
>>> do for -frounding-math.  I think you already mentioned the default
>>> of this flag doesn't make much sense (well, the flag isn't fully
>>> honored/implemented).
>>
>> PR 54192
>> (coincidentally, it caused a missed vectorization in
>> https://stackoverflow.com/a/56681744/1918193 last week)
>
> I commented there.  Lets just make -frounding-math == FENV_ACCESS ON
> and keep -ftrapping-math as whether FP exceptions raise traps.

One issue is that the C pragmas do not let me convey that I am interested 
in dynamic rounding but not exception flags. It is possible to optimize 
quite a bit more with just rounding. In particular, the functions are pure 
(at some point we will have to teach the compiler the difference between 
the FP environment and general memory, but I'd rather wait).

> Yeah.  Auto-vectorizing would also need adjustment of course (also
> costing like estimate_num_insns or others).

Anything that is only about optimizing the code in -frounding-math
functions can wait, that's the good point of implementing a new feature.

Szabolcs Nagy June 24, 2019, 3:53 p.m. UTC | #7

On 22/06/2019 23:21, Marc Glisse wrote:
> We should care about the C standard, and do whatever makes sense for C++ without expecting the C++ standard to tell us exactly what that is. We
> can check what visual studio and intel do, but we don't have to follow them.
> 
> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access on" covering the whole program.

i think there are 4 settings that make sense:
(i think function level granularity is ok for
this, iso c has block scope granularity, gcc
has translation unit level granularity.)

(1) except flags + only caller observes it.
 i.e. exception flags raised during the execution
 of the function matter, but only the caller
 observes the flags by checking them.

(2) rounding mode + only caller changes it.
 i.e. rounding mode may not be the default during
 the execution of the function, but only the
 caller may change the rounding mode.

(3) except flags + anything may observe/unset it.
 i.e. exception flags raised during the execution
 of the function matter, and any call or inline
 asm may observe or unset them (unless the
 compiler can prove otherwise).

(4) rounding mode + anything may change it.
 i.e. rounding mode may not be the default or
 change during the execution of a function,
 and any call or inline asm may change it.

i think -frounding-math implements (2) fairly reliably,
and #pragma stdc fenv_access on requires (3) and (4).

-ftrapping-math was never clear, but it should
probably do (1) or (5) := (3)+"exceptions may trap".

so iso c has 2 levels: fenv access on/off, where
"on" means that essentially everything has to be
compiled with (3) and (4) (even functions that
don't do anything with fenv). this is not very
practical: most extern calls don't modify the fenv
so fp operations can be reordered around them,
(1) and (2) are more relaxed about this, however
that model needs fp barriers around the few calls
that actually does fenv access.

to me (1) + (2) + builtins for fp barriers seems
more useful than iso c (3) + (4), but iso c is
worth implementing too, since that's the standard.
so ideally there would be multiple flags/function
attributes and builtin barriers to make fenv access
usable in practice. (however not many things care
about fenv access so i don't know if that amount
of work is justifiable).

> For constant expressions, I see a difference between
> constexpr double third = 1. / 3.;
> which really needs to be done at compile time, and
> const double third = 1. / 3.;
> which will try to evaluate the rhs as constexpr, but where the program is still valid if that fails. The second one clearly should refuse to be
> evaluated at compile time if we are specifying a dynamic rounding direction. For the first one, I am not sure. I guess you should only write
> that in "fenv_access off" regions and I wouldn't mind a compile error.
iso c specifies rules for const expressions:
http://port70.net/~nsz/c/c11/n1570.html#F.8.4

static/thread storage duration is evaluated with
default rounding mode and no exceptions are signaled.

other initialization is evaluated at runtime.
(i.e. rounding-mode dependent result and
exception flags are observable).

Richard Biener June 24, 2019, 5:53 p.m. UTC | #8

On Mon, Jun 24, 2019 at 4:57 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Mon, 24 Jun 2019, Richard Biener wrote:
>
> >>>> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
> >>>> on" covering the whole program.
> >>>>
> >>>> For constant expressions, I see a difference between
> >>>> constexpr double third = 1. / 3.;
> >>>> which really needs to be done at compile time, and
> >>>> const double third = 1. / 3.;
> >>>> which will try to evaluate the rhs as constexpr, but where the program is
> >>>> still valid if that fails. The second one clearly should refuse to be
> >>>> evaluated at compile time if we are specifying a dynamic rounding
> >>>> direction. For the first one, I am not sure. I guess you should only write
> >>>> that in "fenv_access off" regions and I wouldn't mind a compile error.
> >>>>
> >>>> Note that C2x adds a pragma fenv_round that specifies a rounding direction
> >>>> for a region of code, which seems relevant for constant expressions. That
> >>>> pragma looks hard, but maybe some pieces would be nice to add.
> >>>
> >>> Hmm.  My thinking was along the line that at the start of main() the
> >>> C abstract machine might specify the initial rounding mode (and exception
> >>> state) is implementation defined and all constant expressions are evaluated
> >>> whilst being in this state.  So we can define that to round-to-nearest and
> >>> simply fold all constants in contexts we are allowed to evaluate at
> >>> compile-time as we see them?
> >>
> >> There are way too many such contexts. In C++, any initializer is
> >> constexpr-evaluated if possible (PR 85746 shows that this is bad for
> >> __builtin_constant_p), and I do want
> >> double d = 1. / 3;
> >> to depend on the dynamic rounding direction. I'd rather err on the other
> >> extreme and only fold when we are forced to, say
> >> constexpr double d = 1. / 3;
> >> or even reject it because it is inexact, if pragmas put us in a region
> >> with dynamic rounding.
> >
> > OK, fair enough.  I just hoped that global
> >
> > double x = 1.0/3.0;
> >
> > do not become runtime initializers with -frounding-math ...
>
> Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round,
> which I guess could affect this (the C draft isn't very explicit), the
> program doesn't have many chances to set a rounding mode before
> initializing globals. It could do so in the initializer of another
> variable, but relying on the order of initialization this way seems bad.
> Maybe in this case it would make sense to assume the default rounding
> mode...
>
> In practice, I would only set -frounding-math on a per function basis
> (possibly using pragma fenv_access), so the optimization of what happens
> to globals doesn't seem so important.
>
> >> Side remark, I am sad that Intel added rounded versions for scalars and
> >> 512 bit vectors but not for intermediate sizes, while I am most
> >> interested in 128 bits. Masking most of the 512 bits still causes the
> >> dreaded clock slow-down.
> >
> > Ick.  I thought this was vector-length agnostic...
>
> I think all of the new stuff in AVX512 is, except rounding...
>
> Also, the rounded functions have exceptions disabled, which may make
> them hard to use with fenv_access.
>
> >>> I guess builtins need the same treatment for -ftrapping-math as they
> >>> do for -frounding-math.  I think you already mentioned the default
> >>> of this flag doesn't make much sense (well, the flag isn't fully
> >>> honored/implemented).
> >>
> >> PR 54192
> >> (coincidentally, it caused a missed vectorization in
> >> https://stackoverflow.com/a/56681744/1918193 last week)
> >
> > I commented there.  Lets just make -frounding-math == FENV_ACCESS ON
> > and keep -ftrapping-math as whether FP exceptions raise traps.
>
> One issue is that the C pragmas do not let me convey that I am interested
> in dynamic rounding but not exception flags. It is possible to optimize
> quite a bit more with just rounding. In particular, the functions are pure
> (at some point we will have to teach the compiler the difference between
> the FP environment and general memory, but I'd rather wait).
>
> > Yeah.  Auto-vectorizing would also need adjustment of course (also
> > costing like estimate_num_insns or others).
>
> Anything that is only about optimizing the code in -frounding-math
> functions can wait, that's the good point of implementing a new feature.

Sure - the only thing we may want to avoid is designing us into a corner
we cannot easily escape from.  Whenever I thought about -frounding-math
and friends (and not doing asm()-like hacks ;)) I thought we need to make
the data dependence on the FP environment explicit.  So I'd have done

{ FP result, new FP ENV state } = FENV_PLUS (op1, op2, old FP ENV state);

with the usual caveat of representing multiple return values.  Our standard
way via a projection riding ontop of _Complex types works as long as you
use scalars and matching types, a more general projection facility would
use N-uples of abitrary component types (since those are an
implementation detail).
My usual alternative was (ab-)using asm()s since those can have multiple
outputs and provide internal-function like asm-body IDs more-or-less directly
mapping to RTL instructions for example.

With using global memory as FENV state you use virtual operands for this.

And indeed for -frounding-math the operations itself do not change the
FP environment (thus are pure) and the memory approach looks easiest
(it's already implemented this way for builtins). Given the pace of
improving -frounding-math support in the past I think it's fine to continue
in this direction.

Richard.

> --
> Marc Glisse

Marc Glisse June 24, 2019, 8:09 p.m. UTC | #9

On Mon, 24 Jun 2019, Szabolcs Nagy wrote:

> On 22/06/2019 23:21, Marc Glisse wrote:
>> We should care about the C standard, and do whatever makes sense for C++ without expecting the C++ standard to tell us exactly what that is. We
>> can check what visual studio and intel do, but we don't have to follow them.
>>
>> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access on" covering the whole program.
>
> i think there are 4 settings that make sense:
> (i think function level granularity is ok for
> this, iso c has block scope granularity, gcc
> has translation unit level granularity.)
>
> (1) except flags + only caller observes it.
> i.e. exception flags raised during the execution
> of the function matter, but only the caller
> observes the flags by checking them.
>
> (2) rounding mode + only caller changes it.
> i.e. rounding mode may not be the default during
> the execution of the function, but only the
> caller may change the rounding mode.
>
> (3) except flags + anything may observe/unset it.
> i.e. exception flags raised during the execution
> of the function matter, and any call or inline
> asm may observe or unset them (unless the
> compiler can prove otherwise).
>
> (4) rounding mode + anything may change it.
> i.e. rounding mode may not be the default or
> change during the execution of a function,
> and any call or inline asm may change it.
>
> i think -frounding-math implements (2) fairly reliably,

I hadn't thought of it that way, but it is true that this is fairly well 
handled. I could possibly use this in some places in CGAL, using a wrapper 
so I can specify noinline/noipa at the call site. I'll have to experiment.

In particular it means that if I use -frounding-math to enable (4), there 
are valid uses where it will cause a speed regression :-(

> and #pragma stdc fenv_access on requires (3) and (4).
>
> -ftrapping-math was never clear, but it should
> probably do (1) or (5) := (3)+"exceptions may trap".
>
> so iso c has 2 levels: fenv access on/off, where
> "on" means that essentially everything has to be
> compiled with (3) and (4) (even functions that
> don't do anything with fenv). this is not very
> practical: most extern calls don't modify the fenv
> so fp operations can be reordered around them,
> (1) and (2) are more relaxed about this, however
> that model needs fp barriers around the few calls
> that actually does fenv access.
>
> to me (1) + (2) + builtins for fp barriers seems
> more useful than iso c (3) + (4), but iso c is
> worth implementing too, since that's the standard.
> so ideally there would be multiple flags/function
> attributes and builtin barriers to make fenv access
> usable in practice. (however not many things care
> about fenv access so i don't know if that amount
> of work is justifiable).

That makes sense. If we got (4), the interest for (2) would depend a lot 
on the speed difference. If the difference is small enough, then having 
only (4) might suffice. But at least separating rounding from exception 
flags seems good.

Depending on how we change things, it could be nice to add to the 
decription of -frounding-math the precision you gave above (only the 
caller may change it).

>> For constant expressions, I see a difference between
>> constexpr double third = 1. / 3.;
>> which really needs to be done at compile time, and
>> const double third = 1. / 3.;
>> which will try to evaluate the rhs as constexpr, but where the program is still valid if that fails. The second one clearly should refuse to be
>> evaluated at compile time if we are specifying a dynamic rounding direction. For the first one, I am not sure. I guess you should only write
>> that in "fenv_access off" regions and I wouldn't mind a compile error.
> iso c specifies rules for const expressions:
> http://port70.net/~nsz/c/c11/n1570.html#F.8.4
>
> static/thread storage duration is evaluated with
> default rounding mode and no exceptions are signaled.
>
> other initialization is evaluated at runtime.
> (i.e. rounding-mode dependent result and
> exception flags are observable).

Thanks for the reference.

Marc Glisse Aug. 4, 2019, 4:07 p.m. UTC | #10

Hello,

just posting the current version of this patch, in case people have 
comments.

Some changes: the inline asm is introduced during expansion, and the thing 
is controlled by a different flag (it should be controlled by the pragma, 
but that's starting to be too many pieces to implement at the same time, 
and I didn't want to cause a regression for people using -frounding-math 
in the case where it actually works). I also added an extra parameter, 
currently always 0, to specify some properties of the operation : the 
first one I am thinking of is "don't care about exceptions" since I only 
care about rounding, but that will require even more flags / pragmas to 
specify the variants we want...

For the inline asm, I hesitated between building a temporary GIMPLE_ASM 
just so I could pass it to the existing expansion, or "inlining" a 
simplified version. This version always goes through the stack, which 
matches well with the constraint "=m". One would have to modify the code 
to allow "=x". Using "=mx", the compiler does simplify things so we 
actually go through registers (it randomly leaves a dead store to the 
stack here or there, but not that many and it looks like an existing 
missed optimization), which makes me think it is not that important to 
write specific code to handle "=x".

Some possible future work:
- target hook to specify a constraint different from "=m"
- target hook to expand the functions and/or the opaque pass-through
- more operations (maybe comparisons, conversions, etc)
- lowering generic vector operations, so I can enable them in the front-end
- parsing the pragma
- optimizations (at least exact constant folding)
- constexpr? Disable in some contexts where a dynamic rounding mode makes 
less sense?
- C front-end
- Use caller's environment for always_inline callee? We would have to mark 
the call so we remember what the environment was, and it would be too late 
for some foldings, but we could still translate the operations that 
remain, which should be sufficient for the x86 *intrin.h files. To be safe 
we would have to assume fenv_access on for always_inline functions and 
only lower them to regular operations when we see the caller, but that 
might be too much.

Joseph Myers Aug. 7, 2019, 9:07 p.m. UTC | #11

On Sat, 22 Jun 2019, Marc Glisse wrote:

> as discussed in the PR, this seems like a simple enough approach to handle
> FENV functionality safely, while keeping it possible to implement
> optimizations in the future.

Could you give a high-level description of the implementation approach, 
and how this design is intended to (eventually) achieve the required 
constraints on code movement and removal?  In 
<https://gcc.gnu.org/ml/gcc/2013-01/msg00095.html> I listed those 
constraints as:

* General calls may set, clear or test exceptions, or manipulate the 
rounding mode (as may asms, depending on their inputs / outputs / 
clobbers).

* Floating-point operations have the rounding mode as input.  They may set 
(but not clear or test) floating-point exception flags.

* Thus in general floating-point operations may not be moved across most 
calls (or relevant asms), or values from one side of a call reused for the 
same operation with the same inputs appearing on the other side of the 
call.

* Statements such as "(void) (a * b);" can't be eliminated because they 
may raise exceptions.  (That's purely about exceptions, not rounding 
modes.)

(I should add that const function calls should not depend on the rounding 
mode, but pure calls may.  Also, on some architectures there are explicit 
register names for asms to use in inputs / outputs / clobbers to refer to 
the floating-point state registers, and asms not referring to those can be 
taken not to manipulate floating-point state, but other architectures 
don't have such names.  The safe approach for asms would be to assume that 
all asms on all architectures can manipulate floating-point state, until 
there is a way to declare what the relevant registers are.)

(I should also note that DFP has a separate rounding mode from binary FP, 
but that is unlikely to affect anything in this patch - although there 
might end up being potential minor optimizations from knowing that certain 
asms only involve one of the two rounding modes.)

> I'd like to handle this incrementally, rather than wait for a mega-patch that
> does everything, if that's ok. For instance, I didn't handle vectors in this
> first patch because the interaction with vector lowering was not completely
> obvious. Plus it may help get others to implement some parts of it ;-)

Are there testcases that could be added initially to demonstrate how this 
fixes cases that are currently broken, even if other cases aren't fixed?

Joseph Myers Aug. 7, 2019, 9:35 p.m. UTC | #12

On Sun, 23 Jun 2019, Marc Glisse wrote:

> For constant expressions, I see a difference between
> constexpr double third = 1. / 3.;
> which really needs to be done at compile time, and
> const double third = 1. / 3.;
> which will try to evaluate the rhs as constexpr, but where the program is
> still valid if that fails. The second one clearly should refuse to be
> evaluated at compile time if we are specifying a dynamic rounding direction.

For C, initializers with static or thread storage duration always use 
round-to-nearest and discard exceptions (see F.8.2 and F.8.5).  This is 
unaffected by FENV_ACCESS (but *is* affected by FENV_ROUND).

> Note that C2x adds a pragma fenv_round that specifies a rounding direction for
> a region of code, which seems relevant for constant expressions. That pragma
> looks hard, but maybe some pieces would be nice to add.

FENV_ROUND (and FENV_DEC_ROUND) shouldn't be that hard, given the 
optimizers avoiding code movement that doesn't respect rounding modes 
(though I'm only thinking of C here, not C++).  You'd insert appropriate 
built-in function calls to save and restore the dynamic rounding modes in 
scopes with a constant rounding mode set, taking due care about scopes 
being left through goto etc., and restore the mode around calls to 
functions that aren't meant to be affected by the constant rounding modes 
- you'd also need a built-in function to indicate to make a call that is 
affected by the constant rounding modes (and make __builtin_tgmath do that 
as well), and to define all the relevant functions as macros using that 
built-in function in the standard library headers.  Optimizations for 
architectures supporting rounding modes embedded in instructions could 
come later.

Complications would include:

* <float.h> constants should use hex floats to avoid being affected by the 
constant rounding mode (in turn, this may mean disallowing the FENV_ROUND 
pragma in C90 mode because of the lack of hex floats there).  If they use 
decimal rather than hex they'd need to be very long constants to have 
exactly the right value in all rounding modes.

* The built-in functions to change the dynamic rounding mode can't involve 
calling fegetround / fesetround, because those are in libm and libm is not 
supposed to be required unless you call a function in <math.h>, 
<complex.h> or <fenv.h> (simply using a language feature such as a pragma 
should not introduce a libm dependency).  So a similar issue applies as 
applied with atomic compound assignment for floating-point types: every 
target with hardware floating point needs to have its own support for 
expanding those built-in functions inline, and relevant tests will FAIL 
(or be UNSUPPORTED through the compiler calling sorry () when the pragma 
is used) on targets without that support, until it is added.  (And in 
cases where the rounding modes is TLS data in libc rather than in 
hardware, such as soft-float PowerPC GNU/Linux and maybe some other cases 
for DFP, you need new implementation-namespace interfaces there to save / 
restore it.)

Joseph Myers Aug. 8, 2019, 12:04 a.m. UTC | #13

On Mon, 24 Jun 2019, Richard Biener wrote:

> On the patch I'd name _DIV _RDIV (to match the tree code we are dealing
> with).  You miss _NEGATE and also the _FIX_TRUNC and _FLOAT in
> case those might trap with -ftrapping-math.  There are also internal

Negation (and abs and copysign) can never raise any exceptions even with 
signaling NaN arguments.

Conversion between integers and floating-point *can* raise exceptions 
(depending on the types involved, e.g. conversions from int to IEEE double 
are always exact with no exceptions raised).  And conversions from integer 
to floating-point, when the types mean they aren't necessarily exact, 
depend on the rounding mode (whereas conversions from floating-point to 
integer types always truncate towards 0).

Marc Glisse Aug. 8, 2019, 6:12 a.m. UTC | #14

On Wed, 7 Aug 2019, Joseph Myers wrote:

> On Sat, 22 Jun 2019, Marc Glisse wrote:
>
>> as discussed in the PR, this seems like a simple enough approach to handle
>> FENV functionality safely, while keeping it possible to implement
>> optimizations in the future.
>
> Could you give a high-level description of the implementation approach,

At the GIMPLE level, z = x + y is represented as a function call z = 
.FENV_PLUS (x, y, options). The floating point environment (rounding mode, 
exceptions) is considered to be somewhere in memory (I think it still 
works if it is a hard register). Unless options say so, .FENV_PLUS may 
read/write to memory. There are very little optimizations that can be done 
on general function calls, so this should avoid unwanted movement or 
removal. We can still implement some specific optimizations just for those 
functions.

At the RTL level, well the idea is that good back-ends would expand 
.FENV_PLUS however they want, but the default is to have the arguments and 
the result use an asm volatile pass-through, which is opaque to optimizers 
and prevents constant propagation, removal, movement, etc.

(the use of "options" is to avoid having many variants depending on 
whether we only care about rounding, exceptions, maybe ignore signed 
zeros, etc, with 0 as the strictest, always-safe version. For explicitly 
rounded operations as with pragma fenv_round, a different function might 
be better since the 0 case is not a safe replacement anymore)

> and how this design is intended to (eventually) achieve the required
> constraints on code movement and removal?  In
> <https://gcc.gnu.org/ml/gcc/2013-01/msg00095.html> I listed those
> constraints as:
>
> * General calls may set, clear or test exceptions, or manipulate the
> rounding mode
> (as may asms, depending on their inputs / outputs / clobbers).

If the asm is volatile, this works fine. I'll come back to this below.

> * Floating-point operations have the rounding mode as input.  They may set
> (but not clear or test) floating-point exception flags.
>
> * Thus in general floating-point operations may not be moved across most
> calls (or relevant asms), or values from one side of a call reused for the
> same operation with the same inputs appearing on the other side of the
> call.
>
> * Statements such as "(void) (a * b);" can't be eliminated because they
> may raise exceptions.  (That's purely about exceptions, not rounding
> modes.)

I had to add TREE_SIDE_EFFECTS = 1 so the C++ front-end wouldn't remove it 
prematurely.

> (I should add that const function calls should not depend on the rounding
> mode, but pure calls may.

That perfectly fits with the idea of having the FP env as part of memory.

> Also, on some architectures there are explicit
> register names for asms to use in inputs / outputs / clobbers to refer to
> the floating-point state registers, and asms not referring to those can be
> taken not to manipulate floating-point state, but other architectures
> don't have such names.  The safe approach for asms would be to assume that
> all asms on all architectures can manipulate floating-point state, until
> there is a way to declare what the relevant registers are.)

I assume that an asm using this register as a constraint is already 
prevented from moving across function calls somehow? If so, at least 
gimple seems safe.

For RTL, if those asm were volatile, the default expansion would be fine. 
If they don't need to be and somehow manage to cross the pass-through asm, 
I guess a target hook to add extra input/output/clobber to the 
pass-through asm would work. Or best the target would expand the 
operations to (unspec) insns that explicitly handle exactly those 
registers.

> (I should also note that DFP has a separate rounding mode from binary FP,
> but that is unlikely to affect anything in this patch - although there
> might end up being potential minor optimizations from knowing that certain
> asms only involve one of the two rounding modes.)
>
>> I'd like to handle this incrementally, rather than wait for a mega-patch that
>> does everything, if that's ok. For instance, I didn't handle vectors in this
>> first patch because the interaction with vector lowering was not completely
>> obvious. Plus it may help get others to implement some parts of it ;-)
>
> Are there testcases that could be added initially to demonstrate how this
> fixes cases that are currently broken, even if other cases aren't fixed?

Yes. I'll need to look into dg-require-effective-target fenv(_exceptions) 
to see how to disable those new tests where they are not supported. There 
are many easy tests that already start working, say computing 1./3 twice 
with a change of rounding mode in between and checking that the results 
differ, or computing 1./3 and ignoring the result but checking FE_INEXACT.

On Wed, 7 Aug 2019, Joseph Myers wrote:

> On Sun, 23 Jun 2019, Marc Glisse wrote:
>
>> For constant expressions, I see a difference between
>> constexpr double third = 1. / 3.;
>> which really needs to be done at compile time, and
>> const double third = 1. / 3.;
>> which will try to evaluate the rhs as constexpr, but where the program is
>> still valid if that fails. The second one clearly should refuse to be
>> evaluated at compile time if we are specifying a dynamic rounding direction.
>
> For C, initializers with static or thread storage duration always use
> round-to-nearest and discard exceptions (see F.8.2 and F.8.5).  This is
> unaffected by FENV_ACCESS (but *is* affected by FENV_ROUND).

Thanks for the precision.

>> Note that C2x adds a pragma fenv_round that specifies a rounding direction for
>> a region of code, which seems relevant for constant expressions. That pragma
>> looks hard, but maybe some pieces would be nice to add.
>
> FENV_ROUND (and FENV_DEC_ROUND) shouldn't be that hard, given the

On the glibc side I expect it to be a lot of work, it seems to require a 
correctly rounded version of all math functions...

> optimizers avoiding code movement that doesn't respect rounding modes
> (though I'm only thinking of C here, not C++).  You'd insert appropriate
> built-in function calls to save and restore the dynamic rounding modes in
> scopes with a constant rounding mode set, taking due care about scopes
> being left through goto etc., and restore the mode around calls to
> functions that aren't meant to be affected by the constant rounding modes
> - you'd also need a built-in function to indicate to make a call that is
> affected by the constant rounding modes (and make __builtin_tgmath do that
> as well), and to define all the relevant functions as macros using that
> built-in function in the standard library headers.  Optimizations for
> architectures supporting rounding modes embedded in instructions could
> come later.
>
> Complications would include:
>
> * <float.h> constants should use hex floats to avoid being affected by the
> constant rounding mode (in turn, this may mean disallowing the FENV_ROUND
> pragma in C90 mode because of the lack of hex floats there).  If they use
> decimal rather than hex they'd need to be very long constants to have
> exactly the right value in all rounding modes.

True. I thought that was on the libc side, but no, float.h is in gcc 
indeed, and all the values are provided by the compiler as macros anyway.

I didn't look at the rounding that happens while parsing a literal yet, 
and in particular which pragmas are supposed to affect it (probably not 
fenv_access, only fenv_round).

It seems that hex floats are accepted even in C89 with a pedwarn that can 
be disabled with __extension__, although I am not sure if using 
__extension__ in __FLT_MAX__ (so it wouldn't be a pure literal anymore) 
would cause trouble.

We could also have #pragma fenv_round to_nearest (not the exact syntax) in 
float.h, although the C standard doesn't seem to have a push/pop mechanism 
to restore fenv_round at the end of the file.

> * The built-in functions to change the dynamic rounding mode can't involve
> calling fegetround / fesetround, because those are in libm and libm is not
> supposed to be required unless you call a function in <math.h>,
> <complex.h> or <fenv.h> (simply using a language feature such as a pragma
> should not introduce a libm dependency).  So a similar issue applies as
> applied with atomic compound assignment for floating-point types: every
> target with hardware floating point needs to have its own support for
> expanding those built-in functions inline, and relevant tests will FAIL
> (or be UNSUPPORTED through the compiler calling sorry () when the pragma
> is used) on targets without that support, until it is added.  (And in
> cases where the rounding modes is TLS data in libc rather than in
> hardware, such as soft-float PowerPC GNU/Linux and maybe some other cases
> for DFP, you need new implementation-namespace interfaces there to save /
> restore it.)

Honestly, that doesn't seem like a priority. Sure, long term for strict 
conformance (and a bit for performance) it could make sense, but having a 
not-strictly-legal dependency on libm when using a pragma that is meant 
for use with fenv.h seems much better than missing the functionality 
altogether.

Joseph Myers Aug. 8, 2019, 1:02 p.m. UTC | #15

On Thu, 8 Aug 2019, Marc Glisse wrote:

> > FENV_ROUND (and FENV_DEC_ROUND) shouldn't be that hard, given the
> 
> On the glibc side I expect it to be a lot of work, it seems to require a
> correctly rounded version of all math functions...

No, it doesn't.  18661-4 reserves cr* names for correctly rounded 
functions; most of the non-cr* names are not bound to the IEEE operations 
and so have no specific accuracy requirements, with FENV_ROUND they just 
need to behave the same as if the relevant dynamic rounding mode were set 
(via the compiler temporarily setting it before calling the function).

> It seems that hex floats are accepted even in C89 with a pedwarn that can be

Not for -std=c90 (since accepting p+ or p- as part of a pp-number would 
change the semantics of some valid C90 programs, see 
gcc.dg/c90-hexfloat-2.c), only -std=gnu* and C99 and later standards.

> We could also have #pragma fenv_round to_nearest (not the exact syntax) in
> float.h, although the C standard doesn't seem to have a push/pop mechanism to
> restore fenv_round at the end of the file.

Also, what's relevant is the state when the macro is expanded, not when 
it's defined.

(The math.h M_* constants aren't a big issue; at most maybe they need a 
few more digits so the constant rounds the same as the underlying 
irrational number in all rounding modes.  The float.h constants are an 
issue precisely because the values are dyadic rationals but need many 
decimal digits to represent them exactly in decimal.)

Marc Glisse Aug. 8, 2019, 1:15 p.m. UTC | #16

On Thu, 8 Aug 2019, Joseph Myers wrote:

> On Thu, 8 Aug 2019, Marc Glisse wrote:
>
>>> FENV_ROUND (and FENV_DEC_ROUND) shouldn't be that hard, given the
>>
>> On the glibc side I expect it to be a lot of work, it seems to require a
>> correctly rounded version of all math functions...
>
> No, it doesn't.  18661-4 reserves cr* names for correctly rounded
> functions; most of the non-cr* names are not bound to the IEEE operations
> and so have no specific accuracy requirements, with FENV_ROUND they just
> need to behave the same as if the relevant dynamic rounding mode were set
> (via the compiler temporarily setting it before calling the function).

And since glibc just ignores the rounding mode, that's no constraint at 
all, at least on that platform. ok.

Joseph Myers Aug. 9, 2019, 12:01 a.m. UTC | #17

On Mon, 24 Jun 2019, Marc Glisse wrote:

> > OK, fair enough.  I just hoped that global
> > 
> > double x = 1.0/3.0;
> > 
> > do not become runtime initializers with -frounding-math ...
> 
> Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round, which I
> guess could affect this (the C draft isn't very explicit), the program doesn't

I think FENV_ROUND (for C) definitely affects both interpretation of 
constants (if the constant isn't exactly representable in the format in 
which it is evaluated) and the evaluation of operations in initializers 
with static or thread storage duration.  (F.8.2 in the current C2x draft 
says "During translation, constant rounding direction modes (7.6.2) are in 
effect where specified." and F.8.5 says "All computation for 
initialization of objects that have static or thread storage duration is 
done (as if) at translation time.".)

> One issue is that the C pragmas do not let me convey that I am interested in
> dynamic rounding but not exception flags. It is possible to optimize quite a

TS 18661-5 allows e.g. "#pragma STDC FENV_EXCEPT FE_ALL_EXCEPT NO_FLAG" 
or "#pragma STDC FENV_EXCEPT FE_ALL_EXCEPT OPTIONAL_FLAG".  (But it 
doesn't allow for saying you don't care about exceptions to the extent 
that raising spurious exceptions is OK.)

Some parts of 18661-5 are probably substantially more complicated to 
implement than any of the other floating-point pragmas.  I'm not sure if 
there's any implementation experience at all with 18661-5, in any C 
implementation.

(On the other hand, CX_LIMITED_RANGE is probably the easiest of the 
floating-point pragmas to implement, because it has purely local effects - 
you just need two different forms of IR for complex multiplication and 
division, chosen based on whether the pragma is in effect in the current 
scope, and then lower them in two different ways that GCC already 
supports.)

Start implementing -frounding-math

Commit Message

Comments

Patch