Message ID | alpine.DEB.2.02.1906221743430.16432@grove.saclay.inria.fr |
---|---|
State | New |
Headers | show |
Series | Start implementing -frounding-math | expand |
On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote: >Hello, > >as discussed in the PR, this seems like a simple enough approach to >handle >FENV functionality safely, while keeping it possible to implement >optimizations in the future. > >Some key missing things: >- handle C, not just C++ (I don't care, but some people probably do) As you tackle C++, what does the standard say to constexpr contexts and FENV? That is, what's the FP environment at compiler - time (I suppose FENV modifying functions are not constexpr declared). >- handle vectors (for complex, I don't know what it means) > >Then flag_trapping_math should also enable this path, meaning that we >should stop making it the default, or performance will suffer. Do we need N variants of the functions to really encode FP options into the IL and thus allow inlining of say different signed-zero flag functions? I didn't look at the patch but I suppose you rely on RTL to not do code motion across FENV modifications and not fold Constants? That is, don't we really need unspec_volatile variant patterns for the Operations? Thanks for working on this. Richard. >Nice to have: >- parse the fenv_access pragma and make it set flag_rounding_math or >similar. >- sqrt > >All the optimizations can come later (I count having different >functions >for flag_rounding_math and flag_trapping_math as one such >optimization). > > >I put the lowering in its own pass, because it needs to run at -O0 and >there aren't that many passes at -O0 where I could put it. It would >probably be better to handle this directly during expansion, but with >my >knowledge of the compiler it was easier to lower it before. > >This patch passes bootstrap+regtest on x86_64. I expect it may break a >few >testcases on some targets (arm?) that check that we optimize some >things >even with -frounding-math, but as far as I am concerned those do not >count >as regressions because -frounding-math was never really implemented, so >I >would encourage target maintainers to xfail those for now. > >I'd like to handle this incrementally, rather than wait for a >mega-patch >that does everything, if that's ok. For instance, I didn't handle >vectors >in this first patch because the interaction with vector lowering was >not >completely obvious. Plus it may help get others to implement some parts >of >it ;-) > >2019-06-24 Marc Glisse <marc.glisse@inria.fr> > > PR middle-end/34678 >gcc/cp/ > * typeck.c (cp_build_binary_op): Generate internal functions for float > operations with -frounding-math. > >gcc/ > * Makefile.in: Handle new file gimple-lower-fenv.cc. > * gimple-lower-fenv.cc: New file. >* internal-fn.c (expand_FENV_PLUS, expand_FENV_MINUS, expand_FENV_MULT, > expand_FENV_DIV): New functions. > * internal-fn.def (FENV_PLUS, FENV_MINUS, FENV_MULT, FENV_DIV): New > internal functions. > * passes.def (pass_lower_fenv): New pass. > * tree-pass.h (make_pass_lower_fenv): Declare new function.
On Sat, 22 Jun 2019, Richard Biener wrote: > On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote: >> Hello, >> >> as discussed in the PR, this seems like a simple enough approach to >> handle >> FENV functionality safely, while keeping it possible to implement >> optimizations in the future. >> >> Some key missing things: >> - handle C, not just C++ (I don't care, but some people probably do) > > As you tackle C++, what does the standard say to constexpr contexts and > FENV? That is, what's the FP environment at compiler - time (I suppose > FENV modifying functions are not constexpr declared). The C++ standard doesn't care much about fenv: [Note: This document does not require an implementation to support the FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma is supported. As a consequence, it is implementation- defined whether these functions can be used to test floating-point status flags, set floating-point control modes, or run under non-default mode settings. If the pragma is used to enable control over the floating-point environment, this document does not specify the effect on floating-point evaluation in constant expressions. — end note] We should care about the C standard, and do whatever makes sense for C++ without expecting the C++ standard to tell us exactly what that is. We can check what visual studio and intel do, but we don't have to follow them. -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access on" covering the whole program. For constant expressions, I see a difference between constexpr double third = 1. / 3.; which really needs to be done at compile time, and const double third = 1. / 3.; which will try to evaluate the rhs as constexpr, but where the program is still valid if that fails. The second one clearly should refuse to be evaluated at compile time if we are specifying a dynamic rounding direction. For the first one, I am not sure. I guess you should only write that in "fenv_access off" regions and I wouldn't mind a compile error. Note that C2x adds a pragma fenv_round that specifies a rounding direction for a region of code, which seems relevant for constant expressions. That pragma looks hard, but maybe some pieces would be nice to add. >> - handle vectors (for complex, I don't know what it means) >> >> Then flag_trapping_math should also enable this path, meaning that we >> should stop making it the default, or performance will suffer. > > Do we need N variants of the functions to really encode FP options into > the IL and thus allow inlining of say different signed-zero flag > functions? Not sure what you are suggesting. I am essentially creating a new tree_code (well, an internal function) for an addition-like function that actually reads/writes memory, so it should be orthogonal to inlining, and only the front-end should care about -frounding-math. I didn't think about the interaction with signed-zero. Ah, you mean IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc? The ones I am starting from are supposed to be safe-for-everything. As refinement, I was thinking in 2 directions: * add a third constant argument, where we can specify extra info * add a variant for the case where the function is pure (because I expect that's easier on the compiler than "pure if (arg3 & 8) != 0") I am not sure more variants are needed. Also, while rounding clearly applies to an operation, signed-zero kind of seems to apply to a variable, and in an operation, I don't really know if it means that I can pretend that an argument of -0. is +0. (I can return +inf for 1/-0.) or if it means I can return 0. when the operation should return -0.. Probably both... If we have just -fsigned-zeros but no rounding or trapping, the penalty of using an IFN would be bad. But indeed inlining functions with different -f(no-)signed-zeros forces to use -fsigned-zeros for the whole merged function if we don't encode it in the operations. Hmm > I didn't look at the patch but I suppose you rely on RTL to not do code > motion across FENV modifications and not fold Constants? No, I rely on asm volatile to prevent that, as in your recent hack, except that the asm only appears near expansion. I am trying to start from something safe and refine with optimizations, no subtlety. > That is, don't we really need unspec_volatile variant patterns for the > Operations? Yes. One future optimization (that I listed in the PR) is to let targets expand those IFN as they like (without the asm barriers), using some unspec_volatile. I hope we can get there, although just letting targets replace "=g" with whatever in the asm would already get most of the benefits. I just thought of one issue for vector intrinsics, say _mm_add_pd, where the fenv_access status that should matter is that of the caller, not the one in emmintrin.h. But since I don't have the pragma or vectors, that can wait.
On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Sat, 22 Jun 2019, Richard Biener wrote: > > > On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote: > >> Hello, > >> > >> as discussed in the PR, this seems like a simple enough approach to > >> handle > >> FENV functionality safely, while keeping it possible to implement > >> optimizations in the future. > >> > >> Some key missing things: > >> - handle C, not just C++ (I don't care, but some people probably do) > > > > As you tackle C++, what does the standard say to constexpr contexts and > > FENV? That is, what's the FP environment at compiler - time (I suppose > > FENV modifying functions are not constexpr declared). > > The C++ standard doesn't care much about fenv: > > [Note: This document does not require an implementation to support the > FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma > is supported. As a consequence, it is implementation- defined whether > these functions can be used to test floating-point status flags, set > floating-point control modes, or run under non-default mode settings. If > the pragma is used to enable control over the floating-point environment, > this document does not specify the effect on floating-point evaluation in > constant expressions. — end note] Oh, I see. > We should care about the C standard, and do whatever makes sense for C++ > without expecting the C++ standard to tell us exactly what that is. We can > check what visual studio and intel do, but we don't have to follow them. This makes it somewhat odd to implement this for C++ first and not C, but hey ;) > -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access > on" covering the whole program. > > For constant expressions, I see a difference between > constexpr double third = 1. / 3.; > which really needs to be done at compile time, and > const double third = 1. / 3.; > which will try to evaluate the rhs as constexpr, but where the program is > still valid if that fails. The second one clearly should refuse to be > evaluated at compile time if we are specifying a dynamic rounding > direction. For the first one, I am not sure. I guess you should only write > that in "fenv_access off" regions and I wouldn't mind a compile error. > > Note that C2x adds a pragma fenv_round that specifies a rounding direction > for a region of code, which seems relevant for constant expressions. That > pragma looks hard, but maybe some pieces would be nice to add. Hmm. My thinking was along the line that at the start of main() the C abstract machine might specify the initial rounding mode (and exception state) is implementation defined and all constant expressions are evaluated whilst being in this state. So we can define that to round-to-nearest and simply fold all constants in contexts we are allowed to evaluate at compile-time as we see them? I guess fenv_round aims at using a pragma to change the rounding mode? > >> - handle vectors (for complex, I don't know what it means) > >> > >> Then flag_trapping_math should also enable this path, meaning that we > >> should stop making it the default, or performance will suffer. > > > > Do we need N variants of the functions to really encode FP options into > > the IL and thus allow inlining of say different signed-zero flag > > functions? > > Not sure what you are suggesting. I am essentially creating a new > tree_code (well, an internal function) for an addition-like function that > actually reads/writes memory, so it should be orthogonal to inlining, and > only the front-end should care about -frounding-math. I didn't think about > the interaction with signed-zero. Ah, you mean > IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc? Yeah. Basically the goal is to have the IL fully defined on its own, without having its semantic depend on flag_*. > The ones I am starting > from are supposed to be safe-for-everything. As refinement, I was thinking > in 2 directions: > * add a third constant argument, where we can specify extra info > * add a variant for the case where the function is pure (because I expect > that's easier on the compiler than "pure if (arg3 & 8) != 0") > I am not sure more variants are needed. For optimization having a ADD_ROUND_TO_ZERO (or the extra params specifying an explicit rounding mode) might be interesting since on x86 there are now instructions with rounding mode control bits. > Also, while rounding clearly applies to an operation, signed-zero kind of > seems to apply to a variable, and in an operation, I don't really know if > it means that I can pretend that an argument of -0. is +0. (I can return > +inf for 1/-0.) or if it means I can return 0. when the operation should > return -0.. Probably both... If we have just -fsigned-zeros but no > rounding or trapping, the penalty of using an IFN would be bad. But indeed > inlining functions with different -f(no-)signed-zeros forces to use > -fsigned-zeros for the whole merged function if we don't encode it in the > operations. Hmm Yeah. I guess we need to think about each and every case and how to deal with it. There's denormals and flush-to-zero (not covered by posix fenv modification IIRC) and a lot of math optimization flags that do not map to FP operations directly... > > I didn't look at the patch but I suppose you rely on RTL to not do code > > motion across FENV modifications and not fold Constants? > > No, I rely on asm volatile to prevent that, as in your recent hack, except > that the asm only appears near expansion. I am trying to start from > something safe and refine with optimizations, no subtlety. Ah, OK. So indeed instead of a new pass doing the lowering on GIMPLE this should ideally be done by populating expand_FENV_* appropriately. > > That is, don't we really need unspec_volatile variant patterns for the > > Operations? > > Yes. One future optimization (that I listed in the PR) is to let targets > expand those IFN as they like (without the asm barriers), using some > unspec_volatile. I hope we can get there, although just letting targets > replace "=g" with whatever in the asm would already get most of the > benefits. > > > > I just thought of one issue for vector intrinsics, say _mm_add_pd, where > the fenv_access status that should matter is that of the caller, not the > one in emmintrin.h. But since I don't have the pragma or vectors, that can > wait. True. I guess for the intrinsic headers we could invent some new attribute (or assume such semantics for always_inline which IIRC they are) saying that a function inherits options from the caller (difficult if not inlined, it would imply cloning, thus always-inline again...). On the patch I'd name _DIV _RDIV (to match the tree code we are dealing with). You miss _NEGATE and also the _FIX_TRUNC and _FLOAT in case those might trap with -ftrapping-math. There are also internal functions for POW, FMOD and others which are ECF_CONST but may not end up being folded from their builtin counter-part with -frounding-math. I guess builtins need the same treatment for -ftrapping-math as they do for -frounding-math. I think you already mentioned the default of this flag doesn't make much sense (well, the flag isn't fully honored/implemented). So I think the patch is a good start but I'd say we should not introduce the new pass but instead expand to the asm() kludge directly which would make it also easier to handle some ops as unspecs in the target. In the future an optimize_fenv pass could annotate the call with the optional specifier if it detects regions with known exception/rounding state but it still may not rewrite back the internal functions to plain operations (at least before IPA) since the IFNs are required so the FENV modifying operations are code-motion barriers. Thanks, Richard. > > -- > Marc Glisse
On Mon, 24 Jun 2019, Richard Biener wrote: > On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse <marc.glisse@inria.fr> wrote: >> >> On Sat, 22 Jun 2019, Richard Biener wrote: >> >>> On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote: >>>> Hello, >>>> >>>> as discussed in the PR, this seems like a simple enough approach to >>>> handle >>>> FENV functionality safely, while keeping it possible to implement >>>> optimizations in the future. >>>> >>>> Some key missing things: >>>> - handle C, not just C++ (I don't care, but some people probably do) >>> >>> As you tackle C++, what does the standard say to constexpr contexts and >>> FENV? That is, what's the FP environment at compiler - time (I suppose >>> FENV modifying functions are not constexpr declared). >> >> The C++ standard doesn't care much about fenv: >> >> [Note: This document does not require an implementation to support the >> FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma >> is supported. As a consequence, it is implementation- defined whether >> these functions can be used to test floating-point status flags, set >> floating-point control modes, or run under non-default mode settings. If >> the pragma is used to enable control over the floating-point environment, >> this document does not specify the effect on floating-point evaluation in >> constant expressions. — end note] > > Oh, I see. > >> We should care about the C standard, and do whatever makes sense for C++ >> without expecting the C++ standard to tell us exactly what that is. We can >> check what visual studio and intel do, but we don't have to follow them. > > This makes it somewhat odd to implement this for C++ first and not C, but hey ;) Well, I maintain a part of CGAL, a C++ library, that uses interval arithmetic and thus relies on a non-default rounding direction. I am trying to prepare this dog food so I can eat it myself... >> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access >> on" covering the whole program. >> >> For constant expressions, I see a difference between >> constexpr double third = 1. / 3.; >> which really needs to be done at compile time, and >> const double third = 1. / 3.; >> which will try to evaluate the rhs as constexpr, but where the program is >> still valid if that fails. The second one clearly should refuse to be >> evaluated at compile time if we are specifying a dynamic rounding >> direction. For the first one, I am not sure. I guess you should only write >> that in "fenv_access off" regions and I wouldn't mind a compile error. >> >> Note that C2x adds a pragma fenv_round that specifies a rounding direction >> for a region of code, which seems relevant for constant expressions. That >> pragma looks hard, but maybe some pieces would be nice to add. > > Hmm. My thinking was along the line that at the start of main() the > C abstract machine might specify the initial rounding mode (and exception > state) is implementation defined and all constant expressions are evaluated > whilst being in this state. So we can define that to round-to-nearest and > simply fold all constants in contexts we are allowed to evaluate at > compile-time as we see them? There are way too many such contexts. In C++, any initializer is constexpr-evaluated if possible (PR 85746 shows that this is bad for __builtin_constant_p), and I do want double d = 1. / 3; to depend on the dynamic rounding direction. I'd rather err on the other extreme and only fold when we are forced to, say constexpr double d = 1. / 3; or even reject it because it is inexact, if pragmas put us in a region with dynamic rounding. > I guess fenv_round aims at using a pragma to change the rounding mode? Yes. You can specify either a fixed rounding mode, or "dynamic". In the first case, it overrides the dynamic rounding mode. >>>> - handle vectors (for complex, I don't know what it means) >>>> >>>> Then flag_trapping_math should also enable this path, meaning that we >>>> should stop making it the default, or performance will suffer. >>> >>> Do we need N variants of the functions to really encode FP options into >>> the IL and thus allow inlining of say different signed-zero flag >>> functions? >> >> Not sure what you are suggesting. I am essentially creating a new >> tree_code (well, an internal function) for an addition-like function that >> actually reads/writes memory, so it should be orthogonal to inlining, and >> only the front-end should care about -frounding-math. I didn't think about >> the interaction with signed-zero. Ah, you mean >> IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc? > > Yeah. Basically the goal is to have the IL fully defined on its own, without > having its semantic depend on flag_*. > >> The ones I am starting >> from are supposed to be safe-for-everything. As refinement, I was thinking >> in 2 directions: >> * add a third constant argument, where we can specify extra info >> * add a variant for the case where the function is pure (because I expect >> that's easier on the compiler than "pure if (arg3 & 8) != 0") >> I am not sure more variants are needed. > > For optimization having a ADD_ROUND_TO_ZERO (or the extra params > specifying an explicit rounding mode) might be interesting since on x86 > there are now instructions with rounding mode control bits. Yes. Pragma fenv_round would match well with that. On the other hand, it would be painful for platforms that do not have such instructions, forcing to generate plenty of fe[gs]etround, and probably have a pass to try and reduce their number. Side remark, I am sad that Intel added rounded versions for scalars and 512 bit vectors but not for intermediate sizes, while I am most interested in 128 bits. Masking most of the 512 bits still causes the dreaded clock slow-down. >> Also, while rounding clearly applies to an operation, signed-zero kind of >> seems to apply to a variable, and in an operation, I don't really know if >> it means that I can pretend that an argument of -0. is +0. (I can return >> +inf for 1/-0.) or if it means I can return 0. when the operation should >> return -0.. Probably both... If we have just -fsigned-zeros but no >> rounding or trapping, the penalty of using an IFN would be bad. But indeed >> inlining functions with different -f(no-)signed-zeros forces to use >> -fsigned-zeros for the whole merged function if we don't encode it in the >> operations. Hmm > > Yeah. I guess we need to think about each and every case and how > to deal with it. There's denormals and flush-to-zero (not covered by > posix fenv modification IIRC) and a lot of math optimization flags > that do not map to FP operations directly... If we really try to model all that, at some point we may as well remove PLUS_EXPR for floats... .FENV_PLUS (x, y, flags) where flags is a bitfield that specifies if we care about signed zeros, signalling NaNs, what the rounding is (dynamic, don't care, up, down, etc), if we care about exceptions, if we can do unsafe optimizations, if we can contract +* into fma, etc. That would force to rewrite a lot of optimizations :-( And CSE might become complicated with several expressions that differ only in their flags. .FENV_PLUS (x, y) was supposed to be equivalent to .FENV_PLUS (x, y, safeflags) where safeflags are the strictest flags possible, while leaving existing stuff like -funsafe-math-optimizations alone (so no regression), with the idea that the version with flags would come later. >>> I didn't look at the patch but I suppose you rely on RTL to not do code >>> motion across FENV modifications and not fold Constants? >> >> No, I rely on asm volatile to prevent that, as in your recent hack, except >> that the asm only appears near expansion. I am trying to start from >> something safe and refine with optimizations, no subtlety. > > Ah, OK. So indeed instead of a new pass doing the lowering on GIMPLE > this should ideally be done by populating expand_FENV_* appropriately. Yes, I was lazy because it means I need to understand better how expansion works :-( >>> That is, don't we really need unspec_volatile variant patterns for the >>> Operations? >> >> Yes. One future optimization (that I listed in the PR) is to let targets >> expand those IFN as they like (without the asm barriers), using some >> unspec_volatile. I hope we can get there, although just letting targets >> replace "=g" with whatever in the asm would already get most of the >> benefits. >> >> >> >> I just thought of one issue for vector intrinsics, say _mm_add_pd, where >> the fenv_access status that should matter is that of the caller, not the >> one in emmintrin.h. But since I don't have the pragma or vectors, that can >> wait. > > True. I guess for the intrinsic headers we could invent some new attribute > (or assume such semantics for always_inline which IIRC they are) saying > that a function inherits options from the caller (difficult if not > inlined, it would > imply cloning, thus always-inline again...). > > On the patch I'd name _DIV _RDIV (to match the tree code we are dealing > with). You miss _NEGATE True. I am only interested in -frounding-math, so my first reaction was that I don't need to do anything for NEGATE, but indeed with a signalling NaN anything can have an effect. > and also the _FIX_TRUNC and _FLOAT in case those might trap with > -ftrapping-math. I don't know much about fixed point, and I didn't think about conversions yet. I'll have to check what the C standard says about those. > There are also internal functions for POW, FMOD and others which are > ECF_CONST but may not end up being folded from their builtin > counter-part with -frounding-math. I don't know how far this needs to go. SQRT has correctly rounded instructions on several targets, so it is relevant. But unless your libm provides a correctly-rounded implementation of pow, the compiler could also ignore it. The new pragma fenv_round is scary in part because it seems to imply that all math functions need to have a correctly rounding implementation. > I guess builtins need the same treatment for -ftrapping-math as they > do for -frounding-math. I think you already mentioned the default > of this flag doesn't make much sense (well, the flag isn't fully > honored/implemented). PR 54192 (coincidentally, it caused a missed vectorization in https://stackoverflow.com/a/56681744/1918193 last week) > So I think the patch is a good start but I'd say we should not introduce > the new pass but instead expand to the asm() kludge directly which > would make it also easier to handle some ops as unspecs in the target. This also answers what should be done with vectors, I'll need to add code to tree-vect-generic for the new functions.
On Mon, Jun 24, 2019 at 3:47 PM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Mon, 24 Jun 2019, Richard Biener wrote: > > > On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse <marc.glisse@inria.fr> wrote: > >> > >> On Sat, 22 Jun 2019, Richard Biener wrote: > >> > >>> On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.glisse@inria.fr> wrote: > >>>> Hello, > >>>> > >>>> as discussed in the PR, this seems like a simple enough approach to > >>>> handle > >>>> FENV functionality safely, while keeping it possible to implement > >>>> optimizations in the future. > >>>> > >>>> Some key missing things: > >>>> - handle C, not just C++ (I don't care, but some people probably do) > >>> > >>> As you tackle C++, what does the standard say to constexpr contexts and > >>> FENV? That is, what's the FP environment at compiler - time (I suppose > >>> FENV modifying functions are not constexpr declared). > >> > >> The C++ standard doesn't care much about fenv: > >> > >> [Note: This document does not require an implementation to support the > >> FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma > >> is supported. As a consequence, it is implementation- defined whether > >> these functions can be used to test floating-point status flags, set > >> floating-point control modes, or run under non-default mode settings. If > >> the pragma is used to enable control over the floating-point environment, > >> this document does not specify the effect on floating-point evaluation in > >> constant expressions. — end note] > > > > Oh, I see. > > > >> We should care about the C standard, and do whatever makes sense for C++ > >> without expecting the C++ standard to tell us exactly what that is. We can > >> check what visual studio and intel do, but we don't have to follow them. > > > > This makes it somewhat odd to implement this for C++ first and not C, but hey ;) > > Well, I maintain a part of CGAL, a C++ library, that uses interval > arithmetic and thus relies on a non-default rounding direction. I am > trying to prepare this dog food so I can eat it myself... ;) > >> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access > >> on" covering the whole program. > >> > >> For constant expressions, I see a difference between > >> constexpr double third = 1. / 3.; > >> which really needs to be done at compile time, and > >> const double third = 1. / 3.; > >> which will try to evaluate the rhs as constexpr, but where the program is > >> still valid if that fails. The second one clearly should refuse to be > >> evaluated at compile time if we are specifying a dynamic rounding > >> direction. For the first one, I am not sure. I guess you should only write > >> that in "fenv_access off" regions and I wouldn't mind a compile error. > >> > >> Note that C2x adds a pragma fenv_round that specifies a rounding direction > >> for a region of code, which seems relevant for constant expressions. That > >> pragma looks hard, but maybe some pieces would be nice to add. > > > > Hmm. My thinking was along the line that at the start of main() the > > C abstract machine might specify the initial rounding mode (and exception > > state) is implementation defined and all constant expressions are evaluated > > whilst being in this state. So we can define that to round-to-nearest and > > simply fold all constants in contexts we are allowed to evaluate at > > compile-time as we see them? > > There are way too many such contexts. In C++, any initializer is > constexpr-evaluated if possible (PR 85746 shows that this is bad for > __builtin_constant_p), and I do want > double d = 1. / 3; > to depend on the dynamic rounding direction. I'd rather err on the other > extreme and only fold when we are forced to, say > constexpr double d = 1. / 3; > or even reject it because it is inexact, if pragmas put us in a region > with dynamic rounding. OK, fair enough. I just hoped that global double x = 1.0/3.0; do not become runtime initializers with -frounding-math ... > > I guess fenv_round aims at using a pragma to change the rounding mode? > > Yes. You can specify either a fixed rounding mode, or "dynamic". In the > first case, it overrides the dynamic rounding mode. > > >>>> - handle vectors (for complex, I don't know what it means) > >>>> > >>>> Then flag_trapping_math should also enable this path, meaning that we > >>>> should stop making it the default, or performance will suffer. > >>> > >>> Do we need N variants of the functions to really encode FP options into > >>> the IL and thus allow inlining of say different signed-zero flag > >>> functions? > >> > >> Not sure what you are suggesting. I am essentially creating a new > >> tree_code (well, an internal function) for an addition-like function that > >> actually reads/writes memory, so it should be orthogonal to inlining, and > >> only the front-end should care about -frounding-math. I didn't think about > >> the interaction with signed-zero. Ah, you mean > >> IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc? > > > > Yeah. Basically the goal is to have the IL fully defined on its own, without > > having its semantic depend on flag_*. > > > >> The ones I am starting > >> from are supposed to be safe-for-everything. As refinement, I was thinking > >> in 2 directions: > >> * add a third constant argument, where we can specify extra info > >> * add a variant for the case where the function is pure (because I expect > >> that's easier on the compiler than "pure if (arg3 & 8) != 0") > >> I am not sure more variants are needed. > > > > For optimization having a ADD_ROUND_TO_ZERO (or the extra params > > specifying an explicit rounding mode) might be interesting since on x86 > > there are now instructions with rounding mode control bits. > > Yes. Pragma fenv_round would match well with that. On the other hand, it > would be painful for platforms that do not have such instructions, forcing > to generate plenty of fe[gs]etround, and probably have a pass to try and > reduce their number. > > Side remark, I am sad that Intel added rounded versions for scalars and > 512 bit vectors but not for intermediate sizes, while I am most > interested in 128 bits. Masking most of the 512 bits still causes the > dreaded clock slow-down. Ick. I thought this was vector-length agnostic... > >> Also, while rounding clearly applies to an operation, signed-zero kind of > >> seems to apply to a variable, and in an operation, I don't really know if > >> it means that I can pretend that an argument of -0. is +0. (I can return > >> +inf for 1/-0.) or if it means I can return 0. when the operation should > >> return -0.. Probably both... If we have just -fsigned-zeros but no > >> rounding or trapping, the penalty of using an IFN would be bad. But indeed > >> inlining functions with different -f(no-)signed-zeros forces to use > >> -fsigned-zeros for the whole merged function if we don't encode it in the > >> operations. Hmm > > > > Yeah. I guess we need to think about each and every case and how > > to deal with it. There's denormals and flush-to-zero (not covered by > > posix fenv modification IIRC) and a lot of math optimization flags > > that do not map to FP operations directly... > > If we really try to model all that, at some point we may as well remove > PLUS_EXPR for floats... > > .FENV_PLUS (x, y, flags) > > where flags is a bitfield that specifies if we care about signed zeros, > signalling NaNs, what the rounding is (dynamic, don't care, up, down, > etc), if we care about exceptions, if we can do unsafe optimizations, if > we can contract +* into fma, etc. That would force to rewrite a lot of > optimizations :-( > > And CSE might become complicated with several expressions that differ only > in their flags. > > .FENV_PLUS (x, y) was supposed to be equivalent to .FENV_PLUS (x, y, > safeflags) where safeflags are the strictest flags possible, while leaving > existing stuff like -funsafe-math-optimizations alone (so no regression), > with the idea that the version with flags would come later. Yeah, I'm fine with this incremental approach and it really be constrained to FP environment access. > >>> I didn't look at the patch but I suppose you rely on RTL to not do code > >>> motion across FENV modifications and not fold Constants? > >> > >> No, I rely on asm volatile to prevent that, as in your recent hack, except > >> that the asm only appears near expansion. I am trying to start from > >> something safe and refine with optimizations, no subtlety. > > > > Ah, OK. So indeed instead of a new pass doing the lowering on GIMPLE > > this should ideally be done by populating expand_FENV_* appropriately. > > Yes, I was lazy because it means I need to understand better how expansion > works :-( A bit of copy&paste from examples could do the trick I guess... > >>> That is, don't we really need unspec_volatile variant patterns for the > >>> Operations? > >> > >> Yes. One future optimization (that I listed in the PR) is to let targets > >> expand those IFN as they like (without the asm barriers), using some > >> unspec_volatile. I hope we can get there, although just letting targets > >> replace "=g" with whatever in the asm would already get most of the > >> benefits. > >> > >> > >> > >> I just thought of one issue for vector intrinsics, say _mm_add_pd, where > >> the fenv_access status that should matter is that of the caller, not the > >> one in emmintrin.h. But since I don't have the pragma or vectors, that can > >> wait. > > > > True. I guess for the intrinsic headers we could invent some new attribute > > (or assume such semantics for always_inline which IIRC they are) saying > > that a function inherits options from the caller (difficult if not > > inlined, it would > > imply cloning, thus always-inline again...). > > > > On the patch I'd name _DIV _RDIV (to match the tree code we are dealing > > with). You miss _NEGATE > > True. I am only interested in -frounding-math, so my first reaction was > that I don't need to do anything for NEGATE, but indeed with a signalling > NaN anything can have an effect. > > > and also the _FIX_TRUNC and _FLOAT in case those might trap with > > -ftrapping-math. > > I don't know much about fixed point, and I didn't think about conversions > yet. I'll have to check what the C standard says about those. FIX_TRUNC is float -> integer conversion (overflow/underflow flag?) > > There are also internal functions for POW, FMOD and others which are > > ECF_CONST but may not end up being folded from their builtin > > counter-part with -frounding-math. > > I don't know how far this needs to go. SQRT has correctly rounded > instructions on several targets, so it is relevant. But unless your libm > provides a correctly-rounded implementation of pow, the compiler could > also ignore it. The new pragma fenv_round is scary in part because it > seems to imply that all math functions need to have a correctly rounding > implementation. > > > I guess builtins need the same treatment for -ftrapping-math as they > > do for -frounding-math. I think you already mentioned the default > > of this flag doesn't make much sense (well, the flag isn't fully > > honored/implemented). > > PR 54192 > (coincidentally, it caused a missed vectorization in > https://stackoverflow.com/a/56681744/1918193 last week) I commented there. Lets just make -frounding-math == FENV_ACCESS ON and keep -ftrapping-math as whether FP exceptions raise traps. > > So I think the patch is a good start but I'd say we should not introduce > > the new pass but instead expand to the asm() kludge directly which > > would make it also easier to handle some ops as unspecs in the target. > > This also answers what should be done with vectors, I'll need to add code > to tree-vect-generic for the new functions. Yeah. Auto-vectorizing would also need adjustment of course (also costing like estimate_num_insns or others). Richard. > -- > Marc Glisse
On Mon, 24 Jun 2019, Richard Biener wrote: >>>> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access >>>> on" covering the whole program. >>>> >>>> For constant expressions, I see a difference between >>>> constexpr double third = 1. / 3.; >>>> which really needs to be done at compile time, and >>>> const double third = 1. / 3.; >>>> which will try to evaluate the rhs as constexpr, but where the program is >>>> still valid if that fails. The second one clearly should refuse to be >>>> evaluated at compile time if we are specifying a dynamic rounding >>>> direction. For the first one, I am not sure. I guess you should only write >>>> that in "fenv_access off" regions and I wouldn't mind a compile error. >>>> >>>> Note that C2x adds a pragma fenv_round that specifies a rounding direction >>>> for a region of code, which seems relevant for constant expressions. That >>>> pragma looks hard, but maybe some pieces would be nice to add. >>> >>> Hmm. My thinking was along the line that at the start of main() the >>> C abstract machine might specify the initial rounding mode (and exception >>> state) is implementation defined and all constant expressions are evaluated >>> whilst being in this state. So we can define that to round-to-nearest and >>> simply fold all constants in contexts we are allowed to evaluate at >>> compile-time as we see them? >> >> There are way too many such contexts. In C++, any initializer is >> constexpr-evaluated if possible (PR 85746 shows that this is bad for >> __builtin_constant_p), and I do want >> double d = 1. / 3; >> to depend on the dynamic rounding direction. I'd rather err on the other >> extreme and only fold when we are forced to, say >> constexpr double d = 1. / 3; >> or even reject it because it is inexact, if pragmas put us in a region >> with dynamic rounding. > > OK, fair enough. I just hoped that global > > double x = 1.0/3.0; > > do not become runtime initializers with -frounding-math ... Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round, which I guess could affect this (the C draft isn't very explicit), the program doesn't have many chances to set a rounding mode before initializing globals. It could do so in the initializer of another variable, but relying on the order of initialization this way seems bad. Maybe in this case it would make sense to assume the default rounding mode... In practice, I would only set -frounding-math on a per function basis (possibly using pragma fenv_access), so the optimization of what happens to globals doesn't seem so important. >> Side remark, I am sad that Intel added rounded versions for scalars and >> 512 bit vectors but not for intermediate sizes, while I am most >> interested in 128 bits. Masking most of the 512 bits still causes the >> dreaded clock slow-down. > > Ick. I thought this was vector-length agnostic... I think all of the new stuff in AVX512 is, except rounding... Also, the rounded functions have exceptions disabled, which may make them hard to use with fenv_access. >>> I guess builtins need the same treatment for -ftrapping-math as they >>> do for -frounding-math. I think you already mentioned the default >>> of this flag doesn't make much sense (well, the flag isn't fully >>> honored/implemented). >> >> PR 54192 >> (coincidentally, it caused a missed vectorization in >> https://stackoverflow.com/a/56681744/1918193 last week) > > I commented there. Lets just make -frounding-math == FENV_ACCESS ON > and keep -ftrapping-math as whether FP exceptions raise traps. One issue is that the C pragmas do not let me convey that I am interested in dynamic rounding but not exception flags. It is possible to optimize quite a bit more with just rounding. In particular, the functions are pure (at some point we will have to teach the compiler the difference between the FP environment and general memory, but I'd rather wait). > Yeah. Auto-vectorizing would also need adjustment of course (also > costing like estimate_num_insns or others). Anything that is only about optimizing the code in -frounding-math functions can wait, that's the good point of implementing a new feature.
On 22/06/2019 23:21, Marc Glisse wrote: > We should care about the C standard, and do whatever makes sense for C++ without expecting the C++ standard to tell us exactly what that is. We > can check what visual studio and intel do, but we don't have to follow them. > > -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access on" covering the whole program. i think there are 4 settings that make sense: (i think function level granularity is ok for this, iso c has block scope granularity, gcc has translation unit level granularity.) (1) except flags + only caller observes it. i.e. exception flags raised during the execution of the function matter, but only the caller observes the flags by checking them. (2) rounding mode + only caller changes it. i.e. rounding mode may not be the default during the execution of the function, but only the caller may change the rounding mode. (3) except flags + anything may observe/unset it. i.e. exception flags raised during the execution of the function matter, and any call or inline asm may observe or unset them (unless the compiler can prove otherwise). (4) rounding mode + anything may change it. i.e. rounding mode may not be the default or change during the execution of a function, and any call or inline asm may change it. i think -frounding-math implements (2) fairly reliably, and #pragma stdc fenv_access on requires (3) and (4). -ftrapping-math was never clear, but it should probably do (1) or (5) := (3)+"exceptions may trap". so iso c has 2 levels: fenv access on/off, where "on" means that essentially everything has to be compiled with (3) and (4) (even functions that don't do anything with fenv). this is not very practical: most extern calls don't modify the fenv so fp operations can be reordered around them, (1) and (2) are more relaxed about this, however that model needs fp barriers around the few calls that actually does fenv access. to me (1) + (2) + builtins for fp barriers seems more useful than iso c (3) + (4), but iso c is worth implementing too, since that's the standard. so ideally there would be multiple flags/function attributes and builtin barriers to make fenv access usable in practice. (however not many things care about fenv access so i don't know if that amount of work is justifiable). > For constant expressions, I see a difference between > constexpr double third = 1. / 3.; > which really needs to be done at compile time, and > const double third = 1. / 3.; > which will try to evaluate the rhs as constexpr, but where the program is still valid if that fails. The second one clearly should refuse to be > evaluated at compile time if we are specifying a dynamic rounding direction. For the first one, I am not sure. I guess you should only write > that in "fenv_access off" regions and I wouldn't mind a compile error. iso c specifies rules for const expressions: http://port70.net/~nsz/c/c11/n1570.html#F.8.4 static/thread storage duration is evaluated with default rounding mode and no exceptions are signaled. other initialization is evaluated at runtime. (i.e. rounding-mode dependent result and exception flags are observable).
On Mon, Jun 24, 2019 at 4:57 PM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Mon, 24 Jun 2019, Richard Biener wrote: > > >>>> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access > >>>> on" covering the whole program. > >>>> > >>>> For constant expressions, I see a difference between > >>>> constexpr double third = 1. / 3.; > >>>> which really needs to be done at compile time, and > >>>> const double third = 1. / 3.; > >>>> which will try to evaluate the rhs as constexpr, but where the program is > >>>> still valid if that fails. The second one clearly should refuse to be > >>>> evaluated at compile time if we are specifying a dynamic rounding > >>>> direction. For the first one, I am not sure. I guess you should only write > >>>> that in "fenv_access off" regions and I wouldn't mind a compile error. > >>>> > >>>> Note that C2x adds a pragma fenv_round that specifies a rounding direction > >>>> for a region of code, which seems relevant for constant expressions. That > >>>> pragma looks hard, but maybe some pieces would be nice to add. > >>> > >>> Hmm. My thinking was along the line that at the start of main() the > >>> C abstract machine might specify the initial rounding mode (and exception > >>> state) is implementation defined and all constant expressions are evaluated > >>> whilst being in this state. So we can define that to round-to-nearest and > >>> simply fold all constants in contexts we are allowed to evaluate at > >>> compile-time as we see them? > >> > >> There are way too many such contexts. In C++, any initializer is > >> constexpr-evaluated if possible (PR 85746 shows that this is bad for > >> __builtin_constant_p), and I do want > >> double d = 1. / 3; > >> to depend on the dynamic rounding direction. I'd rather err on the other > >> extreme and only fold when we are forced to, say > >> constexpr double d = 1. / 3; > >> or even reject it because it is inexact, if pragmas put us in a region > >> with dynamic rounding. > > > > OK, fair enough. I just hoped that global > > > > double x = 1.0/3.0; > > > > do not become runtime initializers with -frounding-math ... > > Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round, > which I guess could affect this (the C draft isn't very explicit), the > program doesn't have many chances to set a rounding mode before > initializing globals. It could do so in the initializer of another > variable, but relying on the order of initialization this way seems bad. > Maybe in this case it would make sense to assume the default rounding > mode... > > In practice, I would only set -frounding-math on a per function basis > (possibly using pragma fenv_access), so the optimization of what happens > to globals doesn't seem so important. > > >> Side remark, I am sad that Intel added rounded versions for scalars and > >> 512 bit vectors but not for intermediate sizes, while I am most > >> interested in 128 bits. Masking most of the 512 bits still causes the > >> dreaded clock slow-down. > > > > Ick. I thought this was vector-length agnostic... > > I think all of the new stuff in AVX512 is, except rounding... > > Also, the rounded functions have exceptions disabled, which may make > them hard to use with fenv_access. > > >>> I guess builtins need the same treatment for -ftrapping-math as they > >>> do for -frounding-math. I think you already mentioned the default > >>> of this flag doesn't make much sense (well, the flag isn't fully > >>> honored/implemented). > >> > >> PR 54192 > >> (coincidentally, it caused a missed vectorization in > >> https://stackoverflow.com/a/56681744/1918193 last week) > > > > I commented there. Lets just make -frounding-math == FENV_ACCESS ON > > and keep -ftrapping-math as whether FP exceptions raise traps. > > One issue is that the C pragmas do not let me convey that I am interested > in dynamic rounding but not exception flags. It is possible to optimize > quite a bit more with just rounding. In particular, the functions are pure > (at some point we will have to teach the compiler the difference between > the FP environment and general memory, but I'd rather wait). > > > Yeah. Auto-vectorizing would also need adjustment of course (also > > costing like estimate_num_insns or others). > > Anything that is only about optimizing the code in -frounding-math > functions can wait, that's the good point of implementing a new feature. Sure - the only thing we may want to avoid is designing us into a corner we cannot easily escape from. Whenever I thought about -frounding-math and friends (and not doing asm()-like hacks ;)) I thought we need to make the data dependence on the FP environment explicit. So I'd have done { FP result, new FP ENV state } = FENV_PLUS (op1, op2, old FP ENV state); with the usual caveat of representing multiple return values. Our standard way via a projection riding ontop of _Complex types works as long as you use scalars and matching types, a more general projection facility would use N-uples of abitrary component types (since those are an implementation detail). My usual alternative was (ab-)using asm()s since those can have multiple outputs and provide internal-function like asm-body IDs more-or-less directly mapping to RTL instructions for example. With using global memory as FENV state you use virtual operands for this. And indeed for -frounding-math the operations itself do not change the FP environment (thus are pure) and the memory approach looks easiest (it's already implemented this way for builtins). Given the pace of improving -frounding-math support in the past I think it's fine to continue in this direction. Richard. > -- > Marc Glisse
On Mon, 24 Jun 2019, Szabolcs Nagy wrote: > On 22/06/2019 23:21, Marc Glisse wrote: >> We should care about the C standard, and do whatever makes sense for C++ without expecting the C++ standard to tell us exactly what that is. We >> can check what visual studio and intel do, but we don't have to follow them. >> >> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access on" covering the whole program. > > i think there are 4 settings that make sense: > (i think function level granularity is ok for > this, iso c has block scope granularity, gcc > has translation unit level granularity.) > > (1) except flags + only caller observes it. > i.e. exception flags raised during the execution > of the function matter, but only the caller > observes the flags by checking them. > > (2) rounding mode + only caller changes it. > i.e. rounding mode may not be the default during > the execution of the function, but only the > caller may change the rounding mode. > > (3) except flags + anything may observe/unset it. > i.e. exception flags raised during the execution > of the function matter, and any call or inline > asm may observe or unset them (unless the > compiler can prove otherwise). > > (4) rounding mode + anything may change it. > i.e. rounding mode may not be the default or > change during the execution of a function, > and any call or inline asm may change it. > > i think -frounding-math implements (2) fairly reliably, I hadn't thought of it that way, but it is true that this is fairly well handled. I could possibly use this in some places in CGAL, using a wrapper so I can specify noinline/noipa at the call site. I'll have to experiment. In particular it means that if I use -frounding-math to enable (4), there are valid uses where it will cause a speed regression :-( > and #pragma stdc fenv_access on requires (3) and (4). > > -ftrapping-math was never clear, but it should > probably do (1) or (5) := (3)+"exceptions may trap". > > so iso c has 2 levels: fenv access on/off, where > "on" means that essentially everything has to be > compiled with (3) and (4) (even functions that > don't do anything with fenv). this is not very > practical: most extern calls don't modify the fenv > so fp operations can be reordered around them, > (1) and (2) are more relaxed about this, however > that model needs fp barriers around the few calls > that actually does fenv access. > > to me (1) + (2) + builtins for fp barriers seems > more useful than iso c (3) + (4), but iso c is > worth implementing too, since that's the standard. > so ideally there would be multiple flags/function > attributes and builtin barriers to make fenv access > usable in practice. (however not many things care > about fenv access so i don't know if that amount > of work is justifiable). That makes sense. If we got (4), the interest for (2) would depend a lot on the speed difference. If the difference is small enough, then having only (4) might suffice. But at least separating rounding from exception flags seems good. Depending on how we change things, it could be nice to add to the decription of -frounding-math the precision you gave above (only the caller may change it). >> For constant expressions, I see a difference between >> constexpr double third = 1. / 3.; >> which really needs to be done at compile time, and >> const double third = 1. / 3.; >> which will try to evaluate the rhs as constexpr, but where the program is still valid if that fails. The second one clearly should refuse to be >> evaluated at compile time if we are specifying a dynamic rounding direction. For the first one, I am not sure. I guess you should only write >> that in "fenv_access off" regions and I wouldn't mind a compile error. > iso c specifies rules for const expressions: > http://port70.net/~nsz/c/c11/n1570.html#F.8.4 > > static/thread storage duration is evaluated with > default rounding mode and no exceptions are signaled. > > other initialization is evaluated at runtime. > (i.e. rounding-mode dependent result and > exception flags are observable). Thanks for the reference.
Hello, just posting the current version of this patch, in case people have comments. Some changes: the inline asm is introduced during expansion, and the thing is controlled by a different flag (it should be controlled by the pragma, but that's starting to be too many pieces to implement at the same time, and I didn't want to cause a regression for people using -frounding-math in the case where it actually works). I also added an extra parameter, currently always 0, to specify some properties of the operation : the first one I am thinking of is "don't care about exceptions" since I only care about rounding, but that will require even more flags / pragmas to specify the variants we want... For the inline asm, I hesitated between building a temporary GIMPLE_ASM just so I could pass it to the existing expansion, or "inlining" a simplified version. This version always goes through the stack, which matches well with the constraint "=m". One would have to modify the code to allow "=x". Using "=mx", the compiler does simplify things so we actually go through registers (it randomly leaves a dead store to the stack here or there, but not that many and it looks like an existing missed optimization), which makes me think it is not that important to write specific code to handle "=x". Some possible future work: - target hook to specify a constraint different from "=m" - target hook to expand the functions and/or the opaque pass-through - more operations (maybe comparisons, conversions, etc) - lowering generic vector operations, so I can enable them in the front-end - parsing the pragma - optimizations (at least exact constant folding) - constexpr? Disable in some contexts where a dynamic rounding mode makes less sense? - C front-end - Use caller's environment for always_inline callee? We would have to mark the call so we remember what the environment was, and it would be too late for some foldings, but we could still translate the operations that remain, which should be sufficient for the x86 *intrin.h files. To be safe we would have to assume fenv_access on for always_inline functions and only lower them to regular operations when we see the caller, but that might be too much.
On Sat, 22 Jun 2019, Marc Glisse wrote: > as discussed in the PR, this seems like a simple enough approach to handle > FENV functionality safely, while keeping it possible to implement > optimizations in the future. Could you give a high-level description of the implementation approach, and how this design is intended to (eventually) achieve the required constraints on code movement and removal? In <https://gcc.gnu.org/ml/gcc/2013-01/msg00095.html> I listed those constraints as: * General calls may set, clear or test exceptions, or manipulate the rounding mode (as may asms, depending on their inputs / outputs / clobbers). * Floating-point operations have the rounding mode as input. They may set (but not clear or test) floating-point exception flags. * Thus in general floating-point operations may not be moved across most calls (or relevant asms), or values from one side of a call reused for the same operation with the same inputs appearing on the other side of the call. * Statements such as "(void) (a * b);" can't be eliminated because they may raise exceptions. (That's purely about exceptions, not rounding modes.) (I should add that const function calls should not depend on the rounding mode, but pure calls may. Also, on some architectures there are explicit register names for asms to use in inputs / outputs / clobbers to refer to the floating-point state registers, and asms not referring to those can be taken not to manipulate floating-point state, but other architectures don't have such names. The safe approach for asms would be to assume that all asms on all architectures can manipulate floating-point state, until there is a way to declare what the relevant registers are.) (I should also note that DFP has a separate rounding mode from binary FP, but that is unlikely to affect anything in this patch - although there might end up being potential minor optimizations from knowing that certain asms only involve one of the two rounding modes.) > I'd like to handle this incrementally, rather than wait for a mega-patch that > does everything, if that's ok. For instance, I didn't handle vectors in this > first patch because the interaction with vector lowering was not completely > obvious. Plus it may help get others to implement some parts of it ;-) Are there testcases that could be added initially to demonstrate how this fixes cases that are currently broken, even if other cases aren't fixed?
On Sun, 23 Jun 2019, Marc Glisse wrote: > For constant expressions, I see a difference between > constexpr double third = 1. / 3.; > which really needs to be done at compile time, and > const double third = 1. / 3.; > which will try to evaluate the rhs as constexpr, but where the program is > still valid if that fails. The second one clearly should refuse to be > evaluated at compile time if we are specifying a dynamic rounding direction. For C, initializers with static or thread storage duration always use round-to-nearest and discard exceptions (see F.8.2 and F.8.5). This is unaffected by FENV_ACCESS (but *is* affected by FENV_ROUND). > Note that C2x adds a pragma fenv_round that specifies a rounding direction for > a region of code, which seems relevant for constant expressions. That pragma > looks hard, but maybe some pieces would be nice to add. FENV_ROUND (and FENV_DEC_ROUND) shouldn't be that hard, given the optimizers avoiding code movement that doesn't respect rounding modes (though I'm only thinking of C here, not C++). You'd insert appropriate built-in function calls to save and restore the dynamic rounding modes in scopes with a constant rounding mode set, taking due care about scopes being left through goto etc., and restore the mode around calls to functions that aren't meant to be affected by the constant rounding modes - you'd also need a built-in function to indicate to make a call that is affected by the constant rounding modes (and make __builtin_tgmath do that as well), and to define all the relevant functions as macros using that built-in function in the standard library headers. Optimizations for architectures supporting rounding modes embedded in instructions could come later. Complications would include: * <float.h> constants should use hex floats to avoid being affected by the constant rounding mode (in turn, this may mean disallowing the FENV_ROUND pragma in C90 mode because of the lack of hex floats there). If they use decimal rather than hex they'd need to be very long constants to have exactly the right value in all rounding modes. * The built-in functions to change the dynamic rounding mode can't involve calling fegetround / fesetround, because those are in libm and libm is not supposed to be required unless you call a function in <math.h>, <complex.h> or <fenv.h> (simply using a language feature such as a pragma should not introduce a libm dependency). So a similar issue applies as applied with atomic compound assignment for floating-point types: every target with hardware floating point needs to have its own support for expanding those built-in functions inline, and relevant tests will FAIL (or be UNSUPPORTED through the compiler calling sorry () when the pragma is used) on targets without that support, until it is added. (And in cases where the rounding modes is TLS data in libc rather than in hardware, such as soft-float PowerPC GNU/Linux and maybe some other cases for DFP, you need new implementation-namespace interfaces there to save / restore it.)
On Mon, 24 Jun 2019, Richard Biener wrote: > On the patch I'd name _DIV _RDIV (to match the tree code we are dealing > with). You miss _NEGATE and also the _FIX_TRUNC and _FLOAT in > case those might trap with -ftrapping-math. There are also internal Negation (and abs and copysign) can never raise any exceptions even with signaling NaN arguments. Conversion between integers and floating-point *can* raise exceptions (depending on the types involved, e.g. conversions from int to IEEE double are always exact with no exceptions raised). And conversions from integer to floating-point, when the types mean they aren't necessarily exact, depend on the rounding mode (whereas conversions from floating-point to integer types always truncate towards 0).
On Wed, 7 Aug 2019, Joseph Myers wrote: > On Sat, 22 Jun 2019, Marc Glisse wrote: > >> as discussed in the PR, this seems like a simple enough approach to handle >> FENV functionality safely, while keeping it possible to implement >> optimizations in the future. > > Could you give a high-level description of the implementation approach, At the GIMPLE level, z = x + y is represented as a function call z = .FENV_PLUS (x, y, options). The floating point environment (rounding mode, exceptions) is considered to be somewhere in memory (I think it still works if it is a hard register). Unless options say so, .FENV_PLUS may read/write to memory. There are very little optimizations that can be done on general function calls, so this should avoid unwanted movement or removal. We can still implement some specific optimizations just for those functions. At the RTL level, well the idea is that good back-ends would expand .FENV_PLUS however they want, but the default is to have the arguments and the result use an asm volatile pass-through, which is opaque to optimizers and prevents constant propagation, removal, movement, etc. (the use of "options" is to avoid having many variants depending on whether we only care about rounding, exceptions, maybe ignore signed zeros, etc, with 0 as the strictest, always-safe version. For explicitly rounded operations as with pragma fenv_round, a different function might be better since the 0 case is not a safe replacement anymore) > and how this design is intended to (eventually) achieve the required > constraints on code movement and removal? In > <https://gcc.gnu.org/ml/gcc/2013-01/msg00095.html> I listed those > constraints as: > > * General calls may set, clear or test exceptions, or manipulate the > rounding mode > (as may asms, depending on their inputs / outputs / clobbers). If the asm is volatile, this works fine. I'll come back to this below. > * Floating-point operations have the rounding mode as input. They may set > (but not clear or test) floating-point exception flags. > > * Thus in general floating-point operations may not be moved across most > calls (or relevant asms), or values from one side of a call reused for the > same operation with the same inputs appearing on the other side of the > call. > > * Statements such as "(void) (a * b);" can't be eliminated because they > may raise exceptions. (That's purely about exceptions, not rounding > modes.) I had to add TREE_SIDE_EFFECTS = 1 so the C++ front-end wouldn't remove it prematurely. > (I should add that const function calls should not depend on the rounding > mode, but pure calls may. That perfectly fits with the idea of having the FP env as part of memory. > Also, on some architectures there are explicit > register names for asms to use in inputs / outputs / clobbers to refer to > the floating-point state registers, and asms not referring to those can be > taken not to manipulate floating-point state, but other architectures > don't have such names. The safe approach for asms would be to assume that > all asms on all architectures can manipulate floating-point state, until > there is a way to declare what the relevant registers are.) I assume that an asm using this register as a constraint is already prevented from moving across function calls somehow? If so, at least gimple seems safe. For RTL, if those asm were volatile, the default expansion would be fine. If they don't need to be and somehow manage to cross the pass-through asm, I guess a target hook to add extra input/output/clobber to the pass-through asm would work. Or best the target would expand the operations to (unspec) insns that explicitly handle exactly those registers. > (I should also note that DFP has a separate rounding mode from binary FP, > but that is unlikely to affect anything in this patch - although there > might end up being potential minor optimizations from knowing that certain > asms only involve one of the two rounding modes.) > >> I'd like to handle this incrementally, rather than wait for a mega-patch that >> does everything, if that's ok. For instance, I didn't handle vectors in this >> first patch because the interaction with vector lowering was not completely >> obvious. Plus it may help get others to implement some parts of it ;-) > > Are there testcases that could be added initially to demonstrate how this > fixes cases that are currently broken, even if other cases aren't fixed? Yes. I'll need to look into dg-require-effective-target fenv(_exceptions) to see how to disable those new tests where they are not supported. There are many easy tests that already start working, say computing 1./3 twice with a change of rounding mode in between and checking that the results differ, or computing 1./3 and ignoring the result but checking FE_INEXACT. On Wed, 7 Aug 2019, Joseph Myers wrote: > On Sun, 23 Jun 2019, Marc Glisse wrote: > >> For constant expressions, I see a difference between >> constexpr double third = 1. / 3.; >> which really needs to be done at compile time, and >> const double third = 1. / 3.; >> which will try to evaluate the rhs as constexpr, but where the program is >> still valid if that fails. The second one clearly should refuse to be >> evaluated at compile time if we are specifying a dynamic rounding direction. > > For C, initializers with static or thread storage duration always use > round-to-nearest and discard exceptions (see F.8.2 and F.8.5). This is > unaffected by FENV_ACCESS (but *is* affected by FENV_ROUND). Thanks for the precision. >> Note that C2x adds a pragma fenv_round that specifies a rounding direction for >> a region of code, which seems relevant for constant expressions. That pragma >> looks hard, but maybe some pieces would be nice to add. > > FENV_ROUND (and FENV_DEC_ROUND) shouldn't be that hard, given the On the glibc side I expect it to be a lot of work, it seems to require a correctly rounded version of all math functions... > optimizers avoiding code movement that doesn't respect rounding modes > (though I'm only thinking of C here, not C++). You'd insert appropriate > built-in function calls to save and restore the dynamic rounding modes in > scopes with a constant rounding mode set, taking due care about scopes > being left through goto etc., and restore the mode around calls to > functions that aren't meant to be affected by the constant rounding modes > - you'd also need a built-in function to indicate to make a call that is > affected by the constant rounding modes (and make __builtin_tgmath do that > as well), and to define all the relevant functions as macros using that > built-in function in the standard library headers. Optimizations for > architectures supporting rounding modes embedded in instructions could > come later. > > Complications would include: > > * <float.h> constants should use hex floats to avoid being affected by the > constant rounding mode (in turn, this may mean disallowing the FENV_ROUND > pragma in C90 mode because of the lack of hex floats there). If they use > decimal rather than hex they'd need to be very long constants to have > exactly the right value in all rounding modes. True. I thought that was on the libc side, but no, float.h is in gcc indeed, and all the values are provided by the compiler as macros anyway. I didn't look at the rounding that happens while parsing a literal yet, and in particular which pragmas are supposed to affect it (probably not fenv_access, only fenv_round). It seems that hex floats are accepted even in C89 with a pedwarn that can be disabled with __extension__, although I am not sure if using __extension__ in __FLT_MAX__ (so it wouldn't be a pure literal anymore) would cause trouble. We could also have #pragma fenv_round to_nearest (not the exact syntax) in float.h, although the C standard doesn't seem to have a push/pop mechanism to restore fenv_round at the end of the file. > * The built-in functions to change the dynamic rounding mode can't involve > calling fegetround / fesetround, because those are in libm and libm is not > supposed to be required unless you call a function in <math.h>, > <complex.h> or <fenv.h> (simply using a language feature such as a pragma > should not introduce a libm dependency). So a similar issue applies as > applied with atomic compound assignment for floating-point types: every > target with hardware floating point needs to have its own support for > expanding those built-in functions inline, and relevant tests will FAIL > (or be UNSUPPORTED through the compiler calling sorry () when the pragma > is used) on targets without that support, until it is added. (And in > cases where the rounding modes is TLS data in libc rather than in > hardware, such as soft-float PowerPC GNU/Linux and maybe some other cases > for DFP, you need new implementation-namespace interfaces there to save / > restore it.) Honestly, that doesn't seem like a priority. Sure, long term for strict conformance (and a bit for performance) it could make sense, but having a not-strictly-legal dependency on libm when using a pragma that is meant for use with fenv.h seems much better than missing the functionality altogether.
On Thu, 8 Aug 2019, Marc Glisse wrote: > > FENV_ROUND (and FENV_DEC_ROUND) shouldn't be that hard, given the > > On the glibc side I expect it to be a lot of work, it seems to require a > correctly rounded version of all math functions... No, it doesn't. 18661-4 reserves cr* names for correctly rounded functions; most of the non-cr* names are not bound to the IEEE operations and so have no specific accuracy requirements, with FENV_ROUND they just need to behave the same as if the relevant dynamic rounding mode were set (via the compiler temporarily setting it before calling the function). > It seems that hex floats are accepted even in C89 with a pedwarn that can be Not for -std=c90 (since accepting p+ or p- as part of a pp-number would change the semantics of some valid C90 programs, see gcc.dg/c90-hexfloat-2.c), only -std=gnu* and C99 and later standards. > We could also have #pragma fenv_round to_nearest (not the exact syntax) in > float.h, although the C standard doesn't seem to have a push/pop mechanism to > restore fenv_round at the end of the file. Also, what's relevant is the state when the macro is expanded, not when it's defined. (The math.h M_* constants aren't a big issue; at most maybe they need a few more digits so the constant rounds the same as the underlying irrational number in all rounding modes. The float.h constants are an issue precisely because the values are dyadic rationals but need many decimal digits to represent them exactly in decimal.)
On Thu, 8 Aug 2019, Joseph Myers wrote: > On Thu, 8 Aug 2019, Marc Glisse wrote: > >>> FENV_ROUND (and FENV_DEC_ROUND) shouldn't be that hard, given the >> >> On the glibc side I expect it to be a lot of work, it seems to require a >> correctly rounded version of all math functions... > > No, it doesn't. 18661-4 reserves cr* names for correctly rounded > functions; most of the non-cr* names are not bound to the IEEE operations > and so have no specific accuracy requirements, with FENV_ROUND they just > need to behave the same as if the relevant dynamic rounding mode were set > (via the compiler temporarily setting it before calling the function). And since glibc just ignores the rounding mode, that's no constraint at all, at least on that platform. ok.
On Mon, 24 Jun 2019, Marc Glisse wrote: > > OK, fair enough. I just hoped that global > > > > double x = 1.0/3.0; > > > > do not become runtime initializers with -frounding-math ... > > Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round, which I > guess could affect this (the C draft isn't very explicit), the program doesn't I think FENV_ROUND (for C) definitely affects both interpretation of constants (if the constant isn't exactly representable in the format in which it is evaluated) and the evaluation of operations in initializers with static or thread storage duration. (F.8.2 in the current C2x draft says "During translation, constant rounding direction modes (7.6.2) are in effect where specified." and F.8.5 says "All computation for initialization of objects that have static or thread storage duration is done (as if) at translation time.".) > One issue is that the C pragmas do not let me convey that I am interested in > dynamic rounding but not exception flags. It is possible to optimize quite a TS 18661-5 allows e.g. "#pragma STDC FENV_EXCEPT FE_ALL_EXCEPT NO_FLAG" or "#pragma STDC FENV_EXCEPT FE_ALL_EXCEPT OPTIONAL_FLAG". (But it doesn't allow for saying you don't care about exceptions to the extent that raising spurious exceptions is OK.) Some parts of 18661-5 are probably substantially more complicated to implement than any of the other floating-point pragmas. I'm not sure if there's any implementation experience at all with 18661-5, in any C implementation. (On the other hand, CX_LIMITED_RANGE is probably the easiest of the floating-point pragmas to implement, because it has purely local effects - you just need two different forms of IR for complex multiplication and division, chosen based on whether the pragma is in effect in the current scope, and then lower them in two different ways that GCC already supports.)
Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in (revision 272586) +++ gcc/Makefile.in (working copy) @@ -1315,20 +1315,21 @@ OBJS = \ gimple.o \ gimple-builder.o \ gimple-expr.o \ gimple-iterator.o \ gimple-fold.o \ gimple-laddress.o \ gimple-loop-interchange.o \ gimple-loop-jam.o \ gimple-loop-versioning.o \ gimple-low.o \ + gimple-lower-fenv.o \ gimple-pretty-print.o \ gimple-ssa-backprop.o \ gimple-ssa-evrp.o \ gimple-ssa-evrp-analyze.o \ gimple-ssa-isolate-paths.o \ gimple-ssa-nonnull-compare.o \ gimple-ssa-split-paths.o \ gimple-ssa-store-merging.o \ gimple-ssa-strength-reduction.o \ gimple-ssa-sprintf.o \ Index: gcc/cp/typeck.c =================================================================== --- gcc/cp/typeck.c (revision 272586) +++ gcc/cp/typeck.c (working copy) @@ -5544,20 +5544,47 @@ cp_build_binary_op (const op_location_t if (TREE_TYPE (cop0) != orig_type) cop0 = cp_convert (orig_type, op0, complain); if (TREE_TYPE (cop1) != orig_type) cop1 = cp_convert (orig_type, op1, complain); instrument_expr = ubsan_instrument_division (location, cop0, cop1); } else if (doing_shift && sanitize_flags_p (SANITIZE_SHIFT)) instrument_expr = ubsan_instrument_shift (location, code, op0, op1); } + // FIXME: vectors (and complex?) as well + if (flag_rounding_math && SCALAR_FLOAT_TYPE_P (build_type)) + { + bool do_fenv_subst = true; + internal_fn ifn; + switch (resultcode) + { + case PLUS_EXPR: + ifn = IFN_FENV_PLUS; + break; + case MINUS_EXPR: + ifn = IFN_FENV_MINUS; + break; + case MULT_EXPR: + ifn = IFN_FENV_MULT; + break; + case RDIV_EXPR: + ifn = IFN_FENV_DIV; + break; + default: + do_fenv_subst = false; + } + if (do_fenv_subst) + return build_call_expr_internal_loc (location, ifn, build_type, + 2, op0, op1); + } + result = build2_loc (location, resultcode, build_type, op0, op1); if (final_type != 0) result = cp_convert (final_type, result, complain); if (instrument_expr != NULL) result = build2 (COMPOUND_EXPR, TREE_TYPE (result), instrument_expr, result); if (!processing_template_decl) { Index: gcc/gimple-lower-fenv.cc =================================================================== --- gcc/gimple-lower-fenv.cc (nonexistent) +++ gcc/gimple-lower-fenv.cc (working copy) @@ -0,0 +1,144 @@ +/* Lower correctly rounded operations. + Copyright (C) 2019 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +GCC is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "gimple-iterator.h" + +/* Create a pass-through inline asm barrier from IN to OUT. */ +static gasm* +asm_barrier (tree out, tree in) +{ + vec<tree, va_gc> *inputs = NULL, *outputs = NULL; + if (out) + { + vec_safe_push (inputs, + build_tree_list (build_tree_list + (NULL_TREE, build_string (2, "0")), in)); + vec_safe_push (outputs, + build_tree_list (build_tree_list + (NULL_TREE, build_string (3, "=g")), + out)); + } + else + { + vec_safe_push (inputs, + build_tree_list (build_tree_list + (NULL_TREE, build_string (2, "g")), in)); + } + gasm *g = gimple_build_asm_vec ("", inputs, outputs, NULL, NULL); + gimple_asm_set_volatile (g, true); + if (out) + SSA_NAME_DEF_STMT (out) = g; + return g; +} + +/* A simple pass that attempts to fold all fenv internal functions. */ + +namespace { + +const pass_data pass_data_lower_fenv = +{ + GIMPLE_PASS, /* type */ + "lfenv", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_NONE, /* tv_id */ + PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa, /* todo_flags_finish */ +}; + +class pass_lower_fenv : public gimple_opt_pass +{ +public: + pass_lower_fenv (gcc::context *ctxt) + : gimple_opt_pass (pass_data_lower_fenv, ctxt) + {} + + /* opt_pass methods: */ + virtual unsigned int execute (function *); +}; // class pass_lower_fenv + +unsigned int +pass_lower_fenv::execute (function *fun) +{ + basic_block bb; + FOR_EACH_BB_FN (bb, fun) + { + gimple_stmt_iterator i; + for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next (&i)) + { + gimple *stmt = gsi_stmt (i); + if (gimple_code (stmt) != GIMPLE_CALL + || !gimple_call_internal_p (stmt)) + continue; + + tree_code code; + switch (gimple_call_internal_fn (stmt)) + { + case IFN_FENV_PLUS: + code = PLUS_EXPR; + break; + case IFN_FENV_MINUS: + code = MINUS_EXPR; + break; + case IFN_FENV_MULT: + code = MULT_EXPR; + break; + case IFN_FENV_DIV: + code = RDIV_EXPR; + break; + default: + continue; + } + + tree op0 = gimple_call_arg (stmt, 0); + tree op1 = gimple_call_arg (stmt, 1); + tree ftype = TREE_TYPE (op0); + tree newop0 = make_ssa_name (ftype); + tree newop1 = make_ssa_name (ftype); + gsi_insert_before (&i, asm_barrier (newop0, op0), GSI_SAME_STMT); + gsi_insert_before (&i, asm_barrier (newop1, op1), GSI_SAME_STMT); + + tree lhs = gimple_call_lhs (stmt); + tree newlhs = make_ssa_name (ftype); + gimple *new_stmt = gimple_build_assign (newlhs, code, newop0, newop1); + gsi_insert_before (&i, new_stmt, GSI_SAME_STMT); + gsi_replace (&i, asm_barrier (lhs, newlhs), false); + unlink_stmt_vdef (stmt); + release_ssa_name (gimple_vdef (stmt)); + } + } + return 0; +} +} // anon namespace + +gimple_opt_pass * +make_pass_lower_fenv (gcc::context *ctxt) +{ + return new pass_lower_fenv (ctxt); +} Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c (revision 272586) +++ gcc/internal-fn.c (working copy) @@ -2869,20 +2869,46 @@ expand_DIVMOD (internal_fn, gcall *call_ } /* Expand a NOP. */ static void expand_NOP (internal_fn, gcall *) { /* Nothing. But it shouldn't really prevail. */ } +/* This should get expanded in the wmul pass. */ + +static void +expand_FENV_PLUS (internal_fn, gcall *) +{ + gcc_unreachable (); +} + +static void +expand_FENV_MINUS (internal_fn, gcall *) +{ + gcc_unreachable (); +} + +static void +expand_FENV_MULT (internal_fn, gcall *) +{ + gcc_unreachable (); +} + +static void +expand_FENV_DIV (internal_fn, gcall *) +{ + gcc_unreachable (); +} + /* Expand a call to FN using the operands in STMT. FN has a single output operand and NARGS input operands. */ static void expand_direct_optab_fn (internal_fn fn, gcall *stmt, direct_optab optab, unsigned int nargs) { expand_operand *ops = XALLOCAVEC (expand_operand, nargs + 1); tree_pair types = direct_internal_fn_types (fn, stmt); Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def (revision 272586) +++ gcc/internal-fn.def (working copy) @@ -345,16 +345,22 @@ DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | /* To implement __builtin_launder. */ DEF_INTERNAL_FN (LAUNDER, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, NULL) /* Divmod function. */ DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL) /* A NOP function with arbitrary arguments and return value. */ DEF_INTERNAL_FN (NOP, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +/* float operations with rounding / exception flags. */ +DEF_INTERNAL_FN (FENV_PLUS, ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (FENV_MINUS, ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (FENV_MULT, ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (FENV_DIV, ECF_LEAF | ECF_NOTHROW, NULL) + #undef DEF_INTERNAL_INT_FN #undef DEF_INTERNAL_FLT_FN #undef DEF_INTERNAL_FLT_FLOATN_FN #undef DEF_INTERNAL_SIGNED_OPTAB_FN #undef DEF_INTERNAL_OPTAB_FN #undef DEF_INTERNAL_FN Index: gcc/passes.def =================================================================== --- gcc/passes.def (revision 272586) +++ gcc/passes.def (working copy) @@ -377,20 +377,21 @@ along with GCC; see the file COPYING3. PUSH_INSERT_PASSES_WITHIN (pass_tm_init) NEXT_PASS (pass_tm_mark); NEXT_PASS (pass_tm_memopt); NEXT_PASS (pass_tm_edges); POP_INSERT_PASSES () NEXT_PASS (pass_simduid_cleanup); NEXT_PASS (pass_vtable_verify); NEXT_PASS (pass_lower_vaarg); NEXT_PASS (pass_lower_vector); NEXT_PASS (pass_lower_complex_O0); + NEXT_PASS (pass_lower_fenv); NEXT_PASS (pass_sancov_O0); NEXT_PASS (pass_lower_switch_O0); NEXT_PASS (pass_asan_O0); NEXT_PASS (pass_tsan_O0); NEXT_PASS (pass_sanopt); NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_lower_resx); NEXT_PASS (pass_nrv); NEXT_PASS (pass_cleanup_cfg_post_optimizing); NEXT_PASS (pass_warn_function_noreturn); Index: gcc/tree-pass.h =================================================================== --- gcc/tree-pass.h (revision 272586) +++ gcc/tree-pass.h (working copy) @@ -617,20 +617,21 @@ extern rtl_opt_pass *make_pass_shorten_b extern rtl_opt_pass *make_pass_set_nothrow_function_flags (gcc::context *ctxt); extern rtl_opt_pass *make_pass_dwarf2_frame (gcc::context *ctxt); extern rtl_opt_pass *make_pass_final (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_seqabstr (gcc::context *ctxt); extern gimple_opt_pass *make_pass_release_ssa_names (gcc::context *ctxt); extern gimple_opt_pass *make_pass_early_inline (gcc::context *ctxt); extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt); extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt); extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_lower_fenv (gcc::context *ctxt); /* Current optimization pass. */ extern opt_pass *current_pass; extern bool execute_one_pass (opt_pass *); extern void execute_pass_list (function *, opt_pass *); extern void execute_ipa_pass_list (opt_pass *); extern void execute_ipa_summary_passes (ipa_opt_pass_d *); extern void execute_all_ipa_transforms (void); extern void execute_all_ipa_stmt_fixups (struct cgraph_node *, gimple **);