Message ID | 1367258703-6930-1-git-send-email-aliguori@us.ibm.com |
---|---|
State | New |
Headers | show |
Anthony Liguori <aliguori@us.ibm.com> writes: > N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. > > Please note, this is a comment-only change. The resulting binary should > be the same. > > I created this patch using the following strategy: > > 1) Create a branch using the original import of softfloat code: > $ git checkout 158142c2c2df728cfa3b5320c65534921a764f26 > > 2) Remove carriage returns from Softfloat-2b > > 3) Compare each of the softfloat files against Softfloat-2b using the > following mapping to generate Fabrice's original softfloat changes: > > - fpu/softfloat.c -> softfloat/bits64/softfloat.c > - fpu/softfloat.h -> softfloat/bits64/386-Win32-gcc/softfloat.h > - fpu/softfloat-macros.h -> softfloat/bits64/softfloat-macros > - fpu/softfloat-specialize.h -> softfloat/bits64/386-Win32-gcc/softfloat-specialize > > 4) Replace our softfloat files with the corresponding files from Softfloat-2a > > 5) Apply the diffs from (3) to (4) and commit > > 6) Create a diff between (5) and 158142c2c2df728cfa3b5320c65534921a764f26 > - This diff consists 100% of licensing change + comment reformating > > 7) Checkout the latest master branch, apply the diff from (6) > - There were a lot of comment rejects, confirmed this was only comments > and then used an emacs macro to rewrite the comments to the Softfloat-2a > form. > > Cc: Andreas Färber <afaerber@suse.de> > Cc: Aurelien Jarno <aurelien@aurel32.net> > Cc: Avi Kivity <avi.kivity@gmail.com> > Cc: Ben Taylor <bentaylor.solx86@gmail.com> > Cc: Blue Swirl <blauwirbel@gmail.com> > Cc: Christophe Lyon <christophe.lyon@st.com> > Cc: Fabrice Bellard <fabrice@bellard.org> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> > Cc: Jocelyn Mayer <l_indien@magic.fr> > Cc: Juan Quintela <quintela@redhat.com> > Cc: malc <av1474@comtv.ru> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Paul Brook <paul@codesourcery.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Richard Sandiford <rdsandiford@googlemail.com> > Cc: Stefan Weil <weil@mail.berlios.de> > Cc: Thiemo Seufer <ths@networkno.de> > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > --- > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). > > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name <your@email.com> > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). s/SoftFloat-1a/SoftFloat-2a/g. Sorry about that. Thanks to Peter for spotting the typo. Regards, Anthony Liguori > > Please respond no later than May 6th, 2013. If we are unable to confirm > relicense from an author, changes from that author will be reverted. > --- > For completeness, here is the full listing of contributions: > > Andreas Färber <afaerber@suse.de> > be45f06 Silence softfloat warnings on OpenSolaris > 5aea4c5 softfloat: Replace uint16 type with uint_fast16_t > 94a49d8 softfloat: Replace int16 type with int_fast16_t > c969654 softfloat: Fix mixups of int and int16 > 38641f8 softfloat: Use uint16 consistently > 87b8cc3 softfloat: Resolve type mismatches between declaration and implementation > 8d725fa softfloat: Prepend QEMU-style header with derivation notice > 9f8d2a0 softfloat: Use uint32 consistently > bb98fe4 softfloat: Drop [s]bits{8, 16, 32, 64} types in favor of [u]int{8, 16, 32, 64}_t > > Aurelien Jarno <aurelien@aurel32.net> > 1020160 softfloat: fix default-NaN mode > 084d19b target-mips: Implement correct NaN propagation rules > 196cfc8 softfloat: add a 1.0 constant for float32 and float64 > 1b2ad2e softfloat-native: fix *nan() > 1f398e0 softfloat: use float{32,64,x80,128}_maybe_silence_nan() > 211315f softfloat: rename float*_eq() into float*_eq_quiet() > 2657d0f softfloat: rename float*_eq_signaling() into float*_eq() > 30e7a22 Use float_relation_* constants > 326b9e9 softfloat: fix float*_scalnb() corner cases > 34d2386 softfloat: remove HPPA specific code > 374dfc3 soft-float: add float32_log2() and float64_log2() > 4cc5383 softfloat-native: add float*_is_any_nan() functions > 587eabf softfloat: add float*_is_zero_or_denormal() > 629bd74 softfloat-native: add float32_is_nan() > 67b7861 softfloat: add float*_unordered_{,quiet}() functions > 8229c99 softfloat: add float32_exp2() > 85016c9 Assortment of soft-float fixes, by Aurelien Jarno. > 8d6c92b softfloat-native: improve correctness of floatXX_is_neg() > 93ae1c6 softfloat: fix float{32,64}_maybe_silence_nan() for MIPS > a167ba5 Add support for GNU/kFreeBSD > b3b4c7f softfloat: use GCC builtins to count the leading zeros > b4a0ef7 softfloat-native: add float*_unordered_quiet() functions > b689362 softfloat: move float*_eq and float*_eq_quiet > b76235e softfloat: fix floatx80_is_infinity() > bbc1ded softfloat: implement fused multiply-add NaN propagation for MIPS > be22a9a softfloat: always enable floatx80 and float128 support > c4b4c77 softfloat: add pi constants > c52ab6f fp: add floatXX_is_infinity(), floatXX_is_neg(), floatXX_is_zero() > cf67c6b softfloat-native: remove > d2b1027 softfloat-native: add a few constant values > d6882cf softfloat-native: fix float*_scalbn() functions > d735d69 softfloat: rename *IsNaN variables to *IsQuietNaN > dadd71a fp: fix float32_is_infinity() > de4af5f softfloat: fix floatx80_is_{quiet,signaling}_nan() > e024e88 target-ppc: Implement correct NaN propagation rules > e2f4220 softfloat: fix floatx80 handling of NaN > e872aa8 softfloat-native: fix type of float_rounding_mode > e908775 softfloat: SH4 has the sNaN bit set > f3218a8 softfloat: add floatx80 constants > f5a6425 softfloat: improve description of comparison functions > f6714d3 softfloat: add floatx80_compare*() functions > f6a7d92 softfloat: add float{x80,128}_maybe_silence_nan() > > Avi Kivity <avi.kivity@gmail.com> > 3bf7e40 softfloat: fix for C99 > > Ben Taylor <bentaylor.solx86@gmail.com> > 0475a5c Solaris 9/x86 support, by Ben Taylor. > c94655b Updated Solaris isinf support, by Juergen Keil and Ben Taylor. > > Blue Swirl <blauwirbel@gmail.com> > 128ab2f Preliminary OpenBSD host support (based on OpenBSD patches by Todd T. Fries) > 14d483e Fix OpenSolaris softfloat warnings > 179a2c1 Rename _BSD to HOST_BSD so that it's more obvious that it's defined by configure > 1d6198c Remove unnecessary trailing newlines > 1f58732 128-bit float support for user mode > 2734c70 Rename one more _BSD to HOST_BSD (spotted by Hasso Tepper) > 3f4cb3d Fix OpenSolaris gcc4 warnings: iovec type mismatches, missing 'static' > 70c1470 Sparse fixes: dubious mixing of bitwise and logical operations > 7c2a9d0 Fix math warnings on OpenBSD -current > b1d8e52 Fix undeclared symbol warnings from sparse > b55266b Suppress gcc 4.x -Wpointer-sign (included in -Wall) warnings > cd8a253 Fix more typos in softloat code (Eduardo Felipe) > d07cca0 Add native softfloat fpu functions (Christoph Egger) > ed086f3 softfloat: remove dead assignments, spotted by clang > > Christophe Lyon <christophe.lyon@st.com> > 8559666 softfloat: move all default NaN definitions to softfloat.h. > bcd4d9a softfloat: Honour default_nan_mode for float-to-float conversions > c30fe7d softfloat: add _set_sign(), _infinity and _half for 32 and 64 bits floats. > > Fabrice Bellard <fabrice@bellard.org> > 158142c soft float support > 1b2b0af 64 bit fix > 1d6bda3 added abs, chs and compare functions > 38cfa06 Solaris port (Ben Taylor) > 750afe9 avoid using char when it is not necessary > b109f9f more native FPU comparison functions - native FPU remainder > ec530c8 Solaris port (Ben Taylor) > fdbb469 Solaris/SPARC host port (Ben Taylor) > > Guan Xuetao <gxt@mprc.pku.edu.cn> > d2fbca9 unicore32: necessary modifications for other files to support unicore32 > > Jocelyn Mayer <l_indien@magic.fr> > 3430b0b Ooops... Typo. > 75d62a5 Add missing softfloat helpers. > > Juan Quintela <quintela@redhat.com> > 0eb4fc8 softfloat: make USE_SOFTFLOAT_STRUCT_TYPES compile > 71e72a1 rename HOST_BSD to CONFIG_BSD > 75b5a69 rename NEEDS_LIBSUNMATH to CONFIG_NEEDS_LIBSUNMATH > dfe5fff change HOST_SOLARIS to CONFIG_SOLARIS{_VERSION} > e2542fe rename WORDS_BIGENDIAN to HOST_WORDS_BIGENDIAN > > malc <av1474@comtv.ru> > 947f5fc Add static qualifier to local functions > e58ffeb Remove all traces of __powerpc__ > > Max Filippov <jcmvbkbc@gmail.com> > 6617680 softfloat: make float_muladd_negate_* flags independent > 213ff4e softfloat: add NO_SIGNALING_NANS > b81fe82 target-xtensa: specialize softfloat NaN rules > > Paolo Bonzini <pbonzini@redhat.com> > 1de7afc misc: move include files to include/qemu/ > 6b4c305 fpu: move public header file to include/fpu > 789ec7c softfloat: change default nan definitions to variables > > Paul Brook <paul@codesourcery.com> > 6001149 ARM FP16 support > 6939754 Correctly normalize values and handle zero inputs to scalbn functions. > 3598ecb Remove missing include. > 5c7908e Implement default-NaN mode. > 7918bf4 Fix typo in BSD FP rounding mode names. > 9027db8 Fix ARM default NaN. > 9ee6e8b ARMv7 support. > a1b91bb Fix typo in softfloat code. > e6e5906 ColdFire target. > f090c9d Add strict checking mode for softfp code. > fe76d97 Implement flush-to-zero mode (denormal results are replaced with zero). > > Peter Maydell <peter.maydell@linaro.org> > 1856987 softfloat: Rename float*_is_nan() functions to float*_is_quiet_nan() > 760e141 softfloat: roundAndPackInt{32, 64}: Don't assume int32 is 32 bits > 011da61 target-arm: Implement correct NaN propagation rules > 21d6ebd softfloat: Add float*_is_any_nan() functions > 274f1b0 softfloat: Add float*_min() and float*_max() functions > 2ac8bd0 softfloat: Reinstate accidentally disabled target-specific NaN handling > 2bed652 softfloat: Implement floatx80_is_any_nan() and float128_is_any_nan() > 354f211 softfloat: abstract out target-specific NaN propagation rules > 369be8f softfloat: Implement fused multiply-add > 37d1866 softfloat: Implement flushing input denormals to zero > 4be8eea fpu/softfloat.c: Remove pointless shift of always-zero value > 600e30d softfloat: Fix single-to-half precision float conversions > 6f3300a softfloat: Add float32_is_zero_or_denormal() function > b3a6a2e softfloat: float*_to_int32_round_to_zero: don't assume int32 is 32 bits > b408dbd softfloat: Add float*_maybe_silence_nan() functions > bb4d4bb softfloat: Add float16 type and float16 NaN handling functions > c29aca4 softfloat: Add setter function for tininess detection mode > cbcef45 softfloat: Add float/double to 16 bit integer conversion functions > d5138cf softfloat: Fix compilation failures with USE_SOFTFLOAT_STRUCT_TYPES > e3d142d fpu: Correct edgecase in float64_muladd > e6afc87 softfloat: Add new flag for when denormal result is flushed to zero > e744c06 fpu/softfloat.c: Return correctly signed values from uint64_to_float32 > f591e1b softfloat: Correctly handle NaNs in float16_to_float32() > > Richard Henderson <rth@twiddle.net> > 17ed229 softfloat: Fix uint64_to_float64 > 1e397ea softfloat: Implement uint64_to_float128 > 8443eff target-alpha: Split up FPCR value into separate fields. > 990b3e1 target-alpha: Enable softfloat. > ba0e276 target-alpha: Fixes for alpha-linux syscalls. > > Richard Sandiford <rdsandiford@googlemail.com> > a6e7c18 softfloat: Handle float_muladd_negate_c when product is zero > > Stefan Weil <weil@mail.berlios.de> > bc4347b arm host: fix compiler warning > > Thiemo Seufer <ths@networkno.de> > 5a6932d Fix NaN handling for MIPS and HPPA. > 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files > 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. > 924b2c0 Add proper float*_is_nan prototypes. > b645bb4 Fix softfloat NaN handling. > fc81ba5 Check that HOST_SOLARIS is defined before relying on its value. Spotted by Joachim Henke. > --- > fpu/softfloat-macros.h | 430 ++++---- > fpu/softfloat-specialize.h | 494 +++++---- > fpu/softfloat.c | 2436 ++++++++++++++++++++++++-------------------- > include/fpu/softfloat.h | 242 +++-- > 4 files changed, 1981 insertions(+), 1621 deletions(-) > > diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h > index b5164af..2009315 100644 > --- a/fpu/softfloat-macros.h > +++ b/fpu/softfloat-macros.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,28 +17,27 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal notice) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > =============================================================================*/ > > -/*---------------------------------------------------------------------------- > -| This macro tests for minimum version of the GNU C compiler. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +This macro tests for minimum version of the GNU C compiler. > +------------------------------------------------------------------------------- > +*/ > #if defined(__GNUC__) && defined(__GNUC_MINOR__) > # define SOFTFLOAT_GNUC_PREREQ(maj, min) \ > ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) > @@ -46,14 +46,16 @@ these four paragraphs for those parts of this code that are retained. > #endif > > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant bit of > -| the result by setting the least significant bit to 1. The value of `count' > -| can be arbitrarily large; in particular, if `count' is greater than 32, the > -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. > -| The result is stored in the location pointed to by `zPtr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit of > +the result by setting the least significant bit to 1. The value of `count' > +can be arbitrarily large; in particular, if `count' is greater than 32, the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) > { > @@ -72,14 +74,16 @@ INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) > > } > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant bit of > -| the result by setting the least significant bit to 1. The value of `count' > -| can be arbitrarily large; in particular, if `count' is greater than 64, the > -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. > -| The result is stored in the location pointed to by `zPtr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit of > +the result by setting the least significant bit to 1. The value of `count' > +can be arbitrarily large; in particular, if `count' is greater than 64, the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) > { > @@ -98,23 +102,24 @@ INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 > -| _plus_ the number of bits given in `count'. The shifted result is at most > -| 64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The > -| bits shifted off form a second 64-bit result as follows: The _last_ bit > -| shifted off is the most-significant bit of the extra result, and the other > -| 63 bits of the extra result are all zero if and only if _all_but_the_last_ > -| bits shifted off were all zero. This extra result is stored in the location > -| pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0' and `a1' are considered to form > -| a fixed-point value with binary point between `a0' and `a1'. This fixed- > -| point value is shifted right by the number of bits given in `count', and > -| the integer part of the result is returned at the location pointed to by > -| `z0Ptr'. The fractional part of the result may be slightly corrupted as > -| described above, and is returned at the location pointed to by `z1Ptr'.) > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 > +_plus_ the number of bits given in `count'. The shifted result is at most > +64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The > +bits shifted off form a second 64-bit result as follows: The _last_ bit > +shifted off is the most-significant bit of the extra result, and the other > +63 bits of the extra result are all zero if and only if _all_but_the_last_ > +bits shifted off were all zero. This extra result is stored in the location > +pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0' and `a1' are considered to form a > +fixed-point value with binary point between `a0' and `a1'. This fixed-point > +value is shifted right by the number of bits given in `count', and the > +integer part of the result is returned at the location pointed to by > +`z0Ptr'. The fractional part of the result may be slightly corrupted as > +described above, and is returned at the location pointed to by `z1Ptr'.) > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift64ExtraRightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -144,14 +149,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > -| number of bits given in `count'. Any bits shifted off are lost. The value > -| of `count' can be arbitrarily large; in particular, if `count' is greater > -| than 128, the result will be 0. The result is broken into two 64-bit pieces > -| which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > +number of bits given in `count'. Any bits shifted off are lost. The value > +of `count' can be arbitrarily large; in particular, if `count' is greater > +than 128, the result will be 0. The result is broken into two 64-bit pieces > +which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128Right( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -176,17 +182,18 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > -| number of bits given in `count'. If any nonzero bits are shifted off, they > -| are ``jammed'' into the least significant bit of the result by setting the > -| least significant bit to 1. The value of `count' can be arbitrarily large; > -| in particular, if `count' is greater than 128, the result will be either > -| 0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or > -| nonzero. The result is broken into two 64-bit pieces which are stored at > -| the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > +number of bits given in `count'. If any nonzero bits are shifted off, they > +are ``jammed'' into the least significant bit of the result by setting the > +least significant bit to 1. The value of `count' can be arbitrarily large; > +in particular, if `count' is greater than 128, the result will be either > +0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or > +nonzero. The result is broken into two 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128RightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -219,25 +226,26 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right > -| by 64 _plus_ the number of bits given in `count'. The shifted result is > -| at most 128 nonzero bits; these are broken into two 64-bit pieces which are > -| stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted > -| off form a third 64-bit result as follows: The _last_ bit shifted off is > -| the most-significant bit of the extra result, and the other 63 bits of the > -| extra result are all zero if and only if _all_but_the_last_ bits shifted off > -| were all zero. This extra result is stored in the location pointed to by > -| `z2Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0', `a1', and `a2' are considered > -| to form a fixed-point value with binary point between `a1' and `a2'. This > -| fixed-point value is shifted right by the number of bits given in `count', > -| and the integer part of the result is returned at the locations pointed to > -| by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly > -| corrupted as described above, and is returned at the location pointed to by > -| `z2Ptr'.) > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right > +by 64 _plus_ the number of bits given in `count'. The shifted result is > +at most 128 nonzero bits; these are broken into two 64-bit pieces which are > +stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted > +off form a third 64-bit result as follows: The _last_ bit shifted off is > +the most-significant bit of the extra result, and the other 63 bits of the > +extra result are all zero if and only if _all_but_the_last_ bits shifted off > +were all zero. This extra result is stored in the location pointed to by > +`z2Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0', `a1', and `a2' are considered > +to form a fixed-point value with binary point between `a1' and `a2'. This > +fixed-point value is shifted right by the number of bits given in `count', > +and the integer part of the result is returned at the locations pointed to > +by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly > +corrupted as described above, and is returned at the location pointed to by > +`z2Ptr'.) > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128ExtraRightJamming( > uint64_t a0, > @@ -289,13 +297,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the > -| number of bits given in `count'. Any bits shifted off are lost. The value > -| of `count' must be less than 64. The result is broken into two 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the > +number of bits given in `count'. Any bits shifted off are lost. The value > +of `count' must be less than 64. The result is broken into two 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift128Left( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -307,14 +316,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left > -| by the number of bits given in `count'. Any bits shifted off are lost. > -| The value of `count' must be less than 64. The result is broken into three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left > +by the number of bits given in `count'. Any bits shifted off are lost. > +The value of `count' must be less than 64. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift192Left( > uint64_t a0, > @@ -343,13 +353,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit > -| value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so > -| any carry out is lost. The result is broken into two 64-bit pieces which > -| are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit > +value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so > +any carry out is lost. The result is broken into two 64-bit pieces which > +are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > add128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) > @@ -362,14 +373,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the > -| 192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > -| modulo 2^192, so any carry out is lost. The result is broken into three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the > +192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > +modulo 2^192, so any carry out is lost. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > add192( > uint64_t a0, > @@ -400,14 +412,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the > -| 128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo > -| 2^128, so any borrow out (carry out) is lost. The result is broken into two > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > -| `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the > +128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo > +2^128, so any borrow out (carry out) is lost. The result is broken into two > +64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > +`z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) > @@ -418,14 +431,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > -| from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > -| Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > -| result is broken into three 64-bit pieces which are stored at the locations > -| pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > +from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > +Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > +result is broken into three 64-bit pieces which are stored at the locations > +pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub192( > uint64_t a0, > @@ -456,11 +470,13 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > -| into two 64-bit pieces which are stored at the locations pointed to by > -| `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > +into two 64-bit pieces which are stored at the locations pointed to by > +`z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr ) > { > @@ -485,13 +501,14 @@ INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > -| `b' to obtain a 192-bit product. The product is broken into three 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and > -| `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > +`b' to obtain a 192-bit product. The product is broken into three 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and > +`z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128By64To192( > uint64_t a0, > @@ -513,13 +530,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > -| 128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > -| product. The product is broken into four 64-bit pieces which are stored at > -| the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > +128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > +product. The product is broken into four 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128To256( > uint64_t a0, > @@ -550,14 +568,16 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the 64-bit integer quotient obtained by dividing > -| `b' into the 128-bit value formed by concatenating `a0' and `a1'. The > -| divisor `b' must be at least 2^63. If q is the exact quotient truncated > -| toward zero, the approximation returned lies between q and q + 2 inclusive. > -| If the exact quotient q is larger than 64 bits, the maximum positive 64-bit > -| unsigned integer is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns an approximation to the 64-bit integer quotient obtained by dividing > +`b' into the 128-bit value formed by concatenating `a0' and `a1'. The > +divisor `b' must be at least 2^63. If q is the exact quotient truncated > +toward zero, the approximation returned lies between q and q + 2 inclusive. > +If the exact quotient q is larger than 64 bits, the maximum positive 64-bit > +unsigned integer is returned. > +------------------------------------------------------------------------------- > +*/ > > static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) > { > @@ -581,15 +601,17 @@ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the square root of the 32-bit significand given > -| by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of > -| `aExp' (the least significant bit) is 1, the integer returned approximates > -| 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' > -| is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > -| case, the approximation returned lies strictly within +/-2 of the exact > -| value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns an approximation to the square root of the 32-bit significand given > +by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of > +`aExp' (the least significant bit) is 1, the integer returned approximates > +2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' > +is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > +case, the approximation returned lies strictly within +/-2 of the exact > +value. > +------------------------------------------------------------------------------- > +*/ > > static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > { > @@ -620,10 +642,12 @@ static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit of > -| `a'. If `a' is zero, 32 is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 32 is returned. > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros32( uint32_t a ) > { > @@ -668,10 +692,12 @@ static int8 countLeadingZeros32( uint32_t a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit of > -| `a'. If `a' is zero, 64 is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 64 is returned. > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros64( uint64_t a ) > { > @@ -696,11 +722,13 @@ static int8 countLeadingZeros64( uint64_t a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > -| is equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > +is equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -709,11 +737,13 @@ INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > -| than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > +than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -722,11 +752,13 @@ INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > -| than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > -| returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > +than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > +returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -735,11 +767,13 @@ INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > -| not equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > +not equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag ne128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h > index 518f694..ba9bfeb 100644 > --- a/fpu/softfloat-specialize.h > +++ b/fpu/softfloat-specialize.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,22 +17,19 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > =============================================================================*/ > > @@ -48,9 +46,11 @@ these four paragraphs for those parts of this code that are retained. > #define NO_SIGNALING_NANS 1 > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > const float16 float16_default_nan = const_float16(0x7E00); > #elif SNAN_BIT_IS_ONE > @@ -59,9 +59,11 @@ const float16 float16_default_nan = const_float16(0x7DFF); > const float16 float16_default_nan = const_float16(0xFE00); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float32 float32_default_nan = const_float32(0x7FFFFFFF); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) || \ > @@ -73,9 +75,11 @@ const float32 float32_default_nan = const_float32(0x7FBFFFFF); > const float32 float32_default_nan = const_float32(0xFFC00000); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float64 float64_default_nan = const_float64(LIT64( 0x7FFFFFFFFFFFFFFF )); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) > @@ -86,9 +90,11 @@ const float64 float64_default_nan = const_float64(LIT64( 0x7FF7FFFFFFFFFFFF )); > const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define floatx80_default_nan_high 0x7FFF > #define floatx80_default_nan_low LIT64( 0xBFFFFFFFFFFFFFFF ) > @@ -100,10 +106,12 @@ const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); > const floatx80 floatx80_default_nan > = make_floatx80_init(floatx80_default_nan_high, floatx80_default_nan_low); > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. The `high' and > -| `low' values hold the most- and least-significant bits, respectively. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. The `high' and > +`low' values hold the most- and least-significant bits, respectively. > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF ) > #define float128_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) > @@ -115,21 +123,25 @@ const floatx80 floatx80_default_nan > const float128 float128_default_nan > = make_float128_init(float128_default_nan_high, float128_default_nan_low); > > -/*---------------------------------------------------------------------------- > -| Raises the exceptions specified by `flags'. Floating-point traps can be > -| defined here if desired. It is currently not possible for such a trap > -| to substitute a result value. If traps are not implemented, this routine > -| should be simply `float_exception_flags |= flags;'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Raises the exceptions specified by `flags'. Floating-point traps can be > +defined here if desired. It is currently not possible for such a trap > +to substitute a result value. If traps are not implemented, this routine > +should be simply `float_exception_flags |= flags;'. > +------------------------------------------------------------------------------- > +*/ > > void float_raise( int8 flags STATUS_PARAM ) > { > STATUS(float_exception_flags) |= flags; > } > > -/*---------------------------------------------------------------------------- > -| Internal canonical NaN format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Internal canonical NaN format. > +------------------------------------------------------------------------------- > +*/ > typedef struct { > flag sign; > uint64_t high, low; > @@ -146,10 +158,12 @@ int float16_is_signaling_nan(float16 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float16_is_quiet_nan(float16 a_) > { > @@ -161,10 +175,12 @@ int float16_is_quiet_nan(float16 a_) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float16_is_signaling_nan(float16 a_) > { > @@ -177,10 +193,12 @@ int float16_is_signaling_nan(float16 a_) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the half-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the half-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > float16 float16_maybe_silence_nan(float16 a_) > { > if (float16_is_signaling_nan(a_)) { > @@ -199,11 +217,13 @@ float16 float16_maybe_silence_nan(float16 a_) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the half-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the half-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > > static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > { > @@ -216,10 +236,12 @@ static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the half- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the half- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM) > { > @@ -248,10 +270,12 @@ int float32_is_signaling_nan(float32 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float32_is_quiet_nan( float32 a_ ) > { > @@ -263,10 +287,12 @@ int float32_is_quiet_nan( float32 a_ ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float32_is_signaling_nan( float32 a_ ) > { > @@ -279,10 +305,12 @@ int float32_is_signaling_nan( float32 a_ ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the single-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the single-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float32 float32_maybe_silence_nan( float32 a_ ) > { > @@ -302,12 +330,13 @@ float32 float32_maybe_silence_nan( float32 a_ ) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > { > commonNaNT z; > @@ -319,10 +348,12 @@ static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the single- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the single- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > { > @@ -339,22 +370,24 @@ static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > return float32_default_nan; > } > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a two-input operation. > -| IEEE754 doesn't specify all the details of this, so the > -| algorithm is target-specific. > -| The routine is passed various bits of information about the > -| two NaNs and should return 0 to select NaN a and 1 for NaN b. > -| Note that signalling NaNs are always squashed to quiet NaNs > -| by the caller, by calling floatXX_maybe_silence_nan() before > -| returning them. > -| > -| aIsLargerSignificand is only valid if both a and b are NaNs > -| of some kind, and is true if a has the larger significand, > -| or if both a and b have the same significand but a is > -| positive but b is negative. It is only needed for the x87 > -| tie-break rule. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a two-input operation. > +IEEE754 doesn't specify all the details of this, so the > +algorithm is target-specific. > +The routine is passed various bits of information about the > +two NaNs and should return 0 to select NaN a and 1 for NaN b. > +Note that signalling NaNs are always squashed to quiet NaNs > +by the caller, by calling floatXX_maybe_silence_nan() before > +returning them. > + > +aIsLargerSignificand is only valid if both a and b are NaNs > +of some kind, and is true if a has the larger significand, > +or if both a and b have the same significand but a is > +positive but b is negative. It is only needed for the x87 > +tie-break rule. > +------------------------------------------------------------------------------- > +*/ > > #if defined(TARGET_ARM) > static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > @@ -451,12 +484,14 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > } > #endif > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a three-input operation. > -| For the moment we assume that no CPU needs the 'larger significand' > -| information. > -| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a three-input operation. > +For the moment we assume that no CPU needs the 'larger significand' > +information. > +Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM) > @@ -554,12 +589,13 @@ static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > } > #endif > > -/*---------------------------------------------------------------------------- > -| Takes two single-precision floating-point values `a' and `b', one of which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > -| signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two single-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -594,14 +630,16 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Takes three single-precision floating-point values `a', `b' and `c', one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes three single-precision floating-point values `a', `b' and `c', one of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------------- > +*/ > > static float32 propagateFloat32MulAddNaN(float32 a, float32 b, > float32 c, flag infzero STATUS_PARAM) > @@ -656,10 +694,12 @@ int float64_is_signaling_nan(float64 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float64_is_quiet_nan( float64 a_ ) > { > @@ -673,10 +713,12 @@ int float64_is_quiet_nan( float64 a_ ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float64_is_signaling_nan( float64 a_ ) > { > @@ -691,10 +733,12 @@ int float64_is_signaling_nan( float64 a_ ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the double-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the double-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float64 float64_maybe_silence_nan( float64 a_ ) > { > @@ -714,12 +758,13 @@ float64 float64_maybe_silence_nan( float64 a_ ) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > { > commonNaNT z; > @@ -731,10 +776,12 @@ static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the double- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the double- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > { > @@ -753,12 +800,13 @@ static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > return float64_default_nan; > } > > -/*---------------------------------------------------------------------------- > -| Takes two double-precision floating-point values `a' and `b', one of which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > -| signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two double-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -793,14 +841,16 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Takes three double-precision floating-point values `a', `b' and `c', one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes three double-precision floating-point values `a', `b' and `c', one of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------------- > +*/ > > static float64 propagateFloat64MulAddNaN(float64 a, float64 b, > float64 c, flag infzero STATUS_PARAM) > @@ -855,11 +905,13 @@ int floatx80_is_signaling_nan(floatx80 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| quiet NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +quiet NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_quiet_nan( floatx80 a ) > { > @@ -877,11 +929,13 @@ int floatx80_is_quiet_nan( floatx80 a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_signaling_nan( floatx80 a ) > { > @@ -900,10 +954,12 @@ int floatx80_is_signaling_nan( floatx80 a ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the extended double-precision floating point value > -| `a' is a signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the extended double-precision floating point value > +`a' is a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > floatx80 floatx80_maybe_silence_nan( floatx80 a ) > { > @@ -923,12 +979,13 @@ floatx80 floatx80_maybe_silence_nan( floatx80 a ) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the > -| invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the > +invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > { > commonNaNT z; > @@ -946,10 +1003,12 @@ static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the extended > -| double-precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the extended > +double-precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > { > @@ -972,12 +1031,13 @@ static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Takes two extended double-precision floating-point values `a' and `b', one > -| of which is a NaN, and returns the appropriate NaN result. If either `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two extended double-precision floating-point values `a' and `b', one > +of which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -1023,10 +1083,12 @@ int float128_is_signaling_nan(float128 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float128_is_quiet_nan( float128 a ) > { > @@ -1041,10 +1103,12 @@ int float128_is_quiet_nan( float128 a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float128_is_signaling_nan( float128 a ) > { > @@ -1060,10 +1124,12 @@ int float128_is_signaling_nan( float128 a ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the quadruple-precision floating point value `a' is > -| a signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the quadruple-precision floating point value `a' is > +a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float128 float128_maybe_silence_nan( float128 a ) > { > @@ -1083,12 +1149,13 @@ float128 float128_maybe_silence_nan( float128 a ) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > { > commonNaNT z; > @@ -1099,10 +1166,12 @@ static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the quadruple- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the quadruple- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > { > @@ -1119,12 +1188,13 @@ static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Takes two quadruple-precision floating-point values `a' and `b', one of > -| which is a NaN, and returns the appropriate NaN result. If either `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two quadruple-precision floating-point values `a' and `b', one of > +which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float128 propagateFloat128NaN( float128 a, float128 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > diff --git a/fpu/softfloat.c b/fpu/softfloat.c > index 7ba51b6..9145582 100644 > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > -This C source file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic > -Package, Release 2b. > +This C source file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > -=============================================================================*/ > +=============================================================================== > +*/ > > /* softfloat (and in particular the code in softfloat-specialize.h) is > * target-dependent and needs the TARGET_* macros. > @@ -42,21 +41,25 @@ these four paragraphs for those parts of this code that are retained. > > #include "fpu/softfloat.h" > > -/*---------------------------------------------------------------------------- > -| Primitive arithmetic functions, including multi-word arithmetic, and > -| division and square root approximations. (Can be specialized to target if > -| desired.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Primitive arithmetic functions, including multi-word arithmetic, and > +division and square root approximations. (Can be specialized to target if > +desired.) > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-macros.h" > > -/*---------------------------------------------------------------------------- > -| Functions and definitions to determine: (1) whether tininess for underflow > -| is detected before or after rounding by default, (2) what (if anything) > -| happens when exceptions are raised, (3) how signaling NaNs are distinguished > -| from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > -| are propagated from function inputs to output. These details are target- > -| specific. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Functions and definitions to determine: (1) whether tininess for underflow > +is detected before or after rounding by default, (2) what (if anything) > +happens when exceptions are raised, (3) how signaling NaNs are distinguished > +from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > +are propagated from function inputs to output. These details are target- > +specific. > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-specialize.h" > > void set_float_rounding_mode(int val STATUS_PARAM) > @@ -74,43 +77,51 @@ void set_floatx80_rounding_precision(int val STATUS_PARAM) > STATUS(floatx80_rounding_precision) = val; > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the half-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat16Frac(float16 a) > { > return float16_val(a) & 0x3ff; > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the half-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat16Exp(float16 a) > { > return (float16_val(a) >> 10) & 0x1f; > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag extractFloat16Sign(float16 a) > { > return float16_val(a)>>15; > } > > -/*---------------------------------------------------------------------------- > -| Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > -| and 7, and returns the properly rounded 32-bit integer corresponding to the > -| input. If `zSign' is 1, the input is negated before being converted to an > -| integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input > -| is simply rounded to an integer, with the inexact exception raised if the > -| input cannot be represented exactly as an integer. However, if the fixed- > -| point input is too large, the invalid exception is raised and the largest > -| positive or negative integer is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > +and 7, and returns the properly rounded 32-bit integer corresponding to the > +input. If `zSign' is 1, the input is negated before being converted to an > +integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input > +is simply rounded to an integer, with the inexact exception raised if the > +input cannot be represented exactly as an integer. However, if the fixed- > +point input is too large, the invalid exception is raised and the largest > +positive or negative integer is returned. > +------------------------------------------------------------------------------- > +*/ > > static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > { > @@ -150,17 +161,19 @@ static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > -| `absZ1', with binary point between bits 63 and 64 (between the input words), > -| and returns the properly rounded 64-bit integer corresponding to the input. > -| If `zSign' is 1, the input is negated before being converted to an integer. > -| Ordinarily, the fixed-point input is simply rounded to an integer, with > -| the inexact exception raised if the input cannot be represented exactly as > -| an integer. However, if the fixed-point input is too large, the invalid > -| exception is raised and the largest positive or negative integer is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > +`absZ1', with binary point between bits 63 and 64 (between the input words), > +and returns the properly rounded 64-bit integer corresponding to the input. > +If `zSign' is 1, the input is negated before being converted to an integer. > +Ordinarily, the fixed-point input is simply rounded to an integer, with > +the inexact exception raised if the input cannot be represented exactly as > +an integer. However, if the fixed-point input is too large, the invalid > +exception is raised and the largest positive or negative integer is > +returned. > +------------------------------------------------------------------------------- > +*/ > > static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATUS_PARAM) > { > @@ -203,9 +216,11 @@ static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATU > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat32Frac( float32 a ) > { > @@ -214,9 +229,11 @@ INLINE uint32_t extractFloat32Frac( float32 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat32Exp(float32 a) > { > @@ -225,10 +242,11 @@ INLINE int_fast16_t extractFloat32Exp(float32 a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat32Sign( float32 a ) > { > > @@ -236,10 +254,12 @@ INLINE flag extractFloat32Sign( float32 a ) > > } > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > +------------------------------------------------------------------------------- > +*/ > static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -251,13 +271,14 @@ static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal single-precision floating-point value represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal single-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat32Subnormal(uint32_t aSig, int_fast16_t *zExpPtr, uint32_t *zSigPtr) > { > @@ -269,16 +290,18 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| single-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +single-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > > INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > { > @@ -288,27 +311,29 @@ INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper single-precision floating- > -| point value corresponding to the abstract input. Ordinarily, the abstract > -| value is simply rounded and packed into the single-precision format, with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal single- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 30 > -| and 29, which is 7 bits to the left of the usual location. This shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the single-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal single- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 30 > +and 29, which is 7 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) > { > @@ -366,15 +391,16 @@ static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper single-precision floating- > -| point value corresponding to the abstract input. This routine is just like > -| `roundAndPackFloat32' except that `zSig' does not have to be normalized. > -| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > -| floating-point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. This routine is just like > +`roundAndPackFloat32' except that `zSig' does not have to be normalized. > +Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > +------------------------------------------------------------------------------- > +*/ > static float32 > normalizeRoundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) > { > @@ -385,9 +411,11 @@ static float32 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat64Frac( float64 a ) > { > @@ -396,9 +424,11 @@ INLINE uint64_t extractFloat64Frac( float64 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat64Exp(float64 a) > { > @@ -407,10 +437,11 @@ INLINE int_fast16_t extractFloat64Exp(float64 a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat64Sign( float64 a ) > { > > @@ -418,10 +449,12 @@ INLINE flag extractFloat64Sign( float64 a ) > > } > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > +------------------------------------------------------------------------------- > +*/ > static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -433,13 +466,14 @@ static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal double-precision floating-point value represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal double-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat64Subnormal(uint64_t aSig, int_fast16_t *zExpPtr, uint64_t *zSigPtr) > { > @@ -451,16 +485,18 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| double-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +double-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > > INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > { > @@ -470,27 +506,29 @@ INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper double-precision floating- > -| point value corresponding to the abstract input. Ordinarily, the abstract > -| value is simply rounded and packed into the double-precision format, with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded > -| to a subnormal number, and the underflow and inexact exceptions are raised > -| if the abstract input cannot be represented exactly as a subnormal double- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 62 > -| and 61, which is 10 bits to the left of the usual location. This shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the double-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded > +to a subnormal number, and the underflow and inexact exceptions are raised > +if the abstract input cannot be represented exactly as a subnormal double- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 62 > +and 61, which is 10 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) > { > @@ -548,15 +586,16 @@ static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper double-precision floating- > -| point value corresponding to the abstract input. This routine is just like > -| `roundAndPackFloat64' except that `zSig' does not have to be normalized. > -| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > -| floating-point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. This routine is just like > +`roundAndPackFloat64' except that `zSig' does not have to be normalized. > +Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > +------------------------------------------------------------------------------- > +*/ > static float64 > normalizeRoundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) > { > @@ -567,10 +606,12 @@ static float64 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the extended double-precision floating-point > -| value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloatx80Frac( floatx80 a ) > { > @@ -579,11 +620,12 @@ INLINE uint64_t extractFloatx80Frac( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the extended double-precision floating-point > -| value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloatx80Exp( floatx80 a ) > { > > @@ -591,11 +633,12 @@ INLINE int32 extractFloatx80Exp( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the extended double-precision floating-point value > -| `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the extended double-precision floating-point value > +`a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloatx80Sign( floatx80 a ) > { > > @@ -603,13 +646,14 @@ INLINE flag extractFloatx80Sign( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal extended double-precision floating-point value > -| represented by the denormalized significand `aSig'. The normalized exponent > -| and significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal extended double-precision floating-point value > +represented by the denormalized significand `aSig'. The normalized exponent > +and significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloatx80Subnormal( uint64_t aSig, int32 *zExpPtr, uint64_t *zSigPtr ) > { > @@ -621,10 +665,12 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > -| extended double-precision floating-point value, returning the result. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > +extended double-precision floating-point value, returning the result. > +------------------------------------------------------------------------------- > +*/ > > INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > { > @@ -636,30 +682,31 @@ INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and extended significand formed by the concatenation of `zSig0' and `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| rounded and packed into the extended double-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal extended > -| double-precision floating-point number. > -| If `roundingPrecision' is 32 or 64, the result is rounded to the same > -| number of bits as single or double precision, respectively. Otherwise, the > -| result is rounded to the full precision of the extended double-precision > -| format. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the result > -| returned is a subnormal number, and it must not require rounding. The > -| handling of underflow and overflow follows the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and extended significand formed by the concatenation of `zSig0' and `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +rounded and packed into the extended double-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal extended > +double-precision floating-point number. > + If `roundingPrecision' is 32 or 64, the result is rounded to the same > +number of bits as single or double precision, respectively. Otherwise, the > +result is rounded to the full precision of the extended double-precision > +format. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. The > +handling of underflow and overflow follows the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 > roundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 > @@ -823,15 +870,16 @@ static floatx80 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > -| `zExp', and significand formed by the concatenation of `zSig0' and `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. This routine is just like > -| `roundAndPackFloatx80' except that the input significand does not have to be > -| normalized. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > +`zExp', and significand formed by the concatenation of `zSig0' and `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. This routine is just like > +`roundAndPackFloatx80' except that the input significand does not have to be > +normalized. > +------------------------------------------------------------------------------- > +*/ > static floatx80 > normalizeRoundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 > @@ -852,10 +900,12 @@ static floatx80 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the least-significant 64 fraction bits of the quadruple-precision > -| floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the least-significant 64 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac1( float128 a ) > { > @@ -864,10 +914,12 @@ INLINE uint64_t extractFloat128Frac1( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the most-significant 48 fraction bits of the quadruple-precision > -| floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the most-significant 48 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac0( float128 a ) > { > @@ -876,11 +928,12 @@ INLINE uint64_t extractFloat128Frac0( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the quadruple-precision floating-point value > -| `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the quadruple-precision floating-point value > +`a'. > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloat128Exp( float128 a ) > { > > @@ -888,10 +941,11 @@ INLINE int32 extractFloat128Exp( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the quadruple-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the quadruple-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat128Sign( float128 a ) > { > > @@ -899,16 +953,17 @@ INLINE flag extractFloat128Sign( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal quadruple-precision floating-point value > -| represented by the denormalized significand formed by the concatenation of > -| `aSig0' and `aSig1'. The normalized exponent is stored at the location > -| pointed to by `zExpPtr'. The most significant 49 bits of the normalized > -| significand are stored at the location pointed to by `zSig0Ptr', and the > -| least significant 64 bits of the normalized significand are stored at the > -| location pointed to by `zSig1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal quadruple-precision floating-point value > +represented by the denormalized significand formed by the concatenation of > +`aSig0' and `aSig1'. The normalized exponent is stored at the location > +pointed to by `zExpPtr'. The most significant 49 bits of the normalized > +significand are stored at the location pointed to by `zSig0Ptr', and the > +least significant 64 bits of the normalized significand are stored at the > +location pointed to by `zSig1Ptr'. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat128Subnormal( > uint64_t aSig0, > @@ -940,19 +995,20 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', the exponent `zExp', and the significand formed > -| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > -| floating-point value, returning the result. After being shifted into the > -| proper positions, the three fields `zSign', `zExp', and `zSig0' are simply > -| added together to form the most significant 32 bits of the result. This > -| means that any integer portion of `zSig0' will be added into the exponent. > -| Since a properly normalized significand will have an integer portion equal > -| to 1, the `zExp' input should be 1 less than the desired result exponent > -| whenever `zSig0' and `zSig1' concatenated form a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', the exponent `zExp', and the significand formed > +by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > +floating-point value, returning the result. After being shifted into the > +proper positions, the three fields `zSign', `zExp', and `zSig0' are simply > +added together to form the most significant 32 bits of the result. This > +means that any integer portion of `zSig0' will be added into the exponent. > +Since a properly normalized significand will have an integer portion equal > +to 1, the `zExp' input should be 1 less than the desired result exponent > +whenever `zSig0' and `zSig1' concatenated form a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > INLINE float128 > packFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 ) > { > @@ -964,27 +1020,28 @@ INLINE float128 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and extended significand formed by the concatenation of `zSig0', `zSig1', > -| and `zSig2', and returns the proper quadruple-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| simply rounded and packed into the quadruple-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal quadruple- > -| precision floating-point number. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the result > -| returned is a subnormal number, and it must not require rounding. In the > -| usual case that the input significand is normalized, `zExp' must be 1 less > -| than the ``true'' floating-point exponent. The handling of underflow and > -| overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and extended significand formed by the concatenation of `zSig0', `zSig1', > +and `zSig2', and returns the proper quadruple-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +simply rounded and packed into the quadruple-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal quadruple- > +precision floating-point number. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. In the > +usual case that the input significand is normalized, `zExp' must be 1 less > +than the ``true'' floating-point exponent. The handling of underflow and > +overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 > roundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1, uint64_t zSig2 STATUS_PARAM) > @@ -1079,16 +1136,17 @@ static float128 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand formed by the concatenation of `zSig0' and `zSig1', and > -| returns the proper quadruple-precision floating-point value corresponding > -| to the abstract input. This routine is just like `roundAndPackFloat128' > -| except that the input significand has fewer bits and does not have to be > -| normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- > -| point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand formed by the concatenation of `zSig0' and `zSig1', and > +returns the proper quadruple-precision floating-point value corresponding > +to the abstract input. This routine is just like `roundAndPackFloat128' > +except that the input significand has fewer bits and does not have to be > +normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- > +point exponent. > +------------------------------------------------------------------------------- > +*/ > static float128 > normalizeRoundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 STATUS_PARAM) > @@ -1115,13 +1173,14 @@ static float128 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float32 int32_to_float32( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float32 int32_to_float32( int32 a STATUS_PARAM) > { > flag zSign; > > @@ -1132,13 +1191,14 @@ float32 int32_to_float32( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float64 int32_to_float64( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float64 int32_to_float64( int32 a STATUS_PARAM) > { > flag zSign; > uint32 absA; > @@ -1154,13 +1214,14 @@ float64 int32_to_float64( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1177,12 +1238,13 @@ floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' to > -| the quadruple-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' to > +the quadruple-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 int32_to_float128( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1199,12 +1261,13 @@ float128 int32_to_float128( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 int64_to_float32( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1252,12 +1315,13 @@ float32 uint64_to_float32( uint64 a STATUS_PARAM ) > } > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 int64_to_float64( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1285,13 +1349,14 @@ float64 uint64_to_float64(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat64(0, exp, a STATUS_VAR); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1306,12 +1371,13 @@ floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' to > -| the quadruple-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' to > +the quadruple-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 int64_to_float128( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1347,16 +1413,17 @@ float128 uint64_to_float128(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat128(0, 0x406E, a, 0 STATUS_VAR); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1378,16 +1445,17 @@ int32 float32_to_int32( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1421,15 +1489,17 @@ int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > { > @@ -1470,16 +1540,17 @@ int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1507,16 +1578,17 @@ int64 float32_to_int64( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1554,13 +1626,14 @@ int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float32_to_float64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1584,13 +1657,14 @@ float64 float32_to_float64( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1614,13 +1688,14 @@ floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float32_to_float128( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1644,14 +1719,15 @@ float128 float32_to_float128( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the single-precision floating-point value `a' to an integer, and > -| returns the result as a single-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float32 float32_round_to_int( float32 a STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Rounds the single-precision floating-point value `a' to an integer, and > +returns the result as a single-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float32 float32_round_to_int( float32 a STATUS_PARAM ) > { > flag aSign; > int_fast16_t aExp; > @@ -1704,15 +1780,16 @@ float32 float32_round_to_int( float32 a STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the single-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the single-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1783,15 +1860,16 @@ static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the single- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the single- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1858,12 +1936,13 @@ static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the single-precision floating-point values `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the single-precision floating-point values `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_add( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1881,12 +1960,13 @@ float32 float32_add( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the single-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the single-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1904,12 +1984,13 @@ float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -1967,12 +2048,13 @@ float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the single-precision floating-point value `a' > -| by the corresponding value `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the single-precision floating-point value `a' > +by the corresponding value `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_div( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -2031,12 +2113,13 @@ float32 float32_div( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the single-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the single-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2132,16 +2215,18 @@ float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point values > -| `a' and `b' then adding 'c', with no intermediate rounding step after the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that negating > -| externally will flip the sign bit on NaNs.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------------- > +*/ > > float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) > { > @@ -2339,12 +2424,13 @@ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) > } > > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the single-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_sqrt( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -2394,23 +2480,25 @@ float32 float32_sqrt( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary exponential of the single-precision floating-point value > -| `a'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -| > -| Uses the following identities: > -| > -| 1. ------------------------------------------------------------------------- > -| x x*ln(2) > -| 2 = e > -| > -| 2. ------------------------------------------------------------------------- > -| 2 3 4 5 n > -| x x x x x x x > -| e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > -| 1! 2! 3! 4! 5! n! > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary exponential of the single-precision floating-point value > +`a'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > + > +Uses the following identities: > + > +1. ------------------------------------------------------------------------- > + x x*ln(2) > + 2 = e > + > +2. ------------------------------------------------------------------------- > + 2 3 4 5 n > + x x x x x x x > + e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > + 1! 2! 3! 4! 5! n! > +------------------------------------------------------------------------------- > +*/ > > static const float64 float32_exp2_coefficients[15] = > { > @@ -2474,11 +2562,13 @@ float32 float32_exp2( float32 a STATUS_PARAM ) > return float64_to_float32(r, status); > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the single-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary log of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_log2( float32 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2522,12 +2612,14 @@ float32 float32_log2( float32 a STATUS_PARAM ) > return normalizeRoundAndPackFloat32( zSign, 0x85, zSig STATUS_VAR ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_eq( float32 a, float32 b STATUS_PARAM ) > { > @@ -2546,12 +2638,14 @@ int float32_eq( float32 a, float32 b STATUS_PARAM ) > return ( av == bv ) || ( (uint32_t) ( ( av | bv )<<1 ) == 0 ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_le( float32 a, float32 b STATUS_PARAM ) > { > @@ -2575,12 +2669,14 @@ int float32_le( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_lt( float32 a, float32 b STATUS_PARAM ) > { > @@ -2604,12 +2700,14 @@ int float32_lt( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered( float32 a, float32 b STATUS_PARAM ) > { > @@ -2625,12 +2723,14 @@ int float32_unordered( float32 a, float32 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2649,12 +2749,14 @@ int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > ( (uint32_t) ( ( float32_val(a) | float32_val(b) )<<1 ) == 0 ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2680,12 +2782,14 @@ int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2711,12 +2815,14 @@ int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2734,16 +2840,17 @@ int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2762,16 +2869,17 @@ int32 float64_to_int32( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2809,15 +2917,17 @@ int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > { > @@ -2860,16 +2970,17 @@ int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2903,16 +3014,17 @@ int64 float64_to_int64( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2956,13 +3068,14 @@ int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the single-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the single-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float64_to_float32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2989,16 +3102,18 @@ float32 float64_to_float32( float64 a STATUS_PARAM ) > } > > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| half-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +half-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > static float16 packFloat16(flag zSign, int_fast16_t zExp, uint16_t zSig) > { > return make_float16( > @@ -3132,13 +3247,14 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM) > return packFloat16(aSign, aExp + 14, aSig >> 13); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3163,13 +3279,14 @@ floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the quadruple-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the quadruple-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float64_to_float128( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3194,13 +3311,14 @@ float128 float64_to_float128( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the double-precision floating-point value `a' to an integer, and > -| returns the result as a double-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the double-precision floating-point value `a' to an integer, and > +returns the result as a double-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3267,14 +3385,15 @@ float64 float64_trunc_to_int( float64 a STATUS_PARAM) > return res; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the double-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the double-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3346,14 +3465,15 @@ static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3421,12 +3541,13 @@ static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the double-precision floating-point values `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the double-precision floating-point values `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_add( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3444,12 +3565,13 @@ float64 float64_add( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the double-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the double-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3467,12 +3589,13 @@ float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3528,12 +3651,13 @@ float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the double-precision floating-point value `a' > -| by the corresponding value `b'. The operation is performed according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the double-precision floating-point value `a' > +by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_div( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3600,12 +3724,13 @@ float64 float64_div( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the double-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the double-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -3686,16 +3811,18 @@ float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point values > -| `a' and `b' then adding 'c', with no intermediate rounding step after the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that negating > -| externally will flip the sign bit on NaNs.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------------- > +*/ > > float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) > { > @@ -3912,12 +4039,13 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the double-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_sqrt( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3964,11 +4092,13 @@ float64 float64_sqrt( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the double-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary log of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_log2( float64 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4011,12 +4141,14 @@ float64 float64_log2( float64 a STATUS_PARAM ) > return normalizeRoundAndPackFloat64( zSign, 0x408, zSig STATUS_VAR ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to the > -| corresponding value `b', and 0 otherwise. The invalid exception is raised > -| if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. The invalid exception is raised > +if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_eq( float64 a, float64 b STATUS_PARAM ) > { > @@ -4036,12 +4168,14 @@ int float64_eq( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_le( float64 a, float64 b STATUS_PARAM ) > { > @@ -4065,12 +4199,14 @@ int float64_le( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_lt( float64 a, float64 b STATUS_PARAM ) > { > @@ -4094,12 +4230,14 @@ int float64_lt( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered( float64 a, float64 b STATUS_PARAM ) > { > @@ -4115,12 +4253,14 @@ int float64_unordered( float64 a, float64 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to the > -| corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception.The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception.The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4142,12 +4282,14 @@ int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4173,12 +4315,14 @@ int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4204,12 +4348,14 @@ int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4227,16 +4373,17 @@ int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, the > -| largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, the > +largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4254,16 +4401,17 @@ int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the same > -| sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4299,16 +4447,17 @@ int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, > -| the largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, > +the largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4339,16 +4488,17 @@ int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the same > -| sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4383,13 +4533,14 @@ int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the single-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the single-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4411,13 +4562,14 @@ float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4439,13 +4591,14 @@ float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the quadruple-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the quadruple-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4463,13 +4616,14 @@ float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the extended double-precision floating-point value `a' to an integer, > -| and returns the result as an extended quadruple-precision floating-point > -| value. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the extended double-precision floating-point value `a' to an integer, > +and returns the result as an extended quadruple-precision floating-point > +value. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4536,14 +4690,15 @@ floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the extended double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > -| negated before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the extended double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > +negated before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -4602,14 +4757,15 @@ static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the extended > -| double-precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the extended > +double-precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM ) > { > int32 aExp, bExp, zExp; > @@ -4670,12 +4826,13 @@ static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the extended double-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the extended double-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4691,12 +4848,13 @@ floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the extended double-precision floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4712,12 +4870,13 @@ floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the extended double-precision floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4771,12 +4930,13 @@ floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the extended double-precision floating-point > -| value `a' by the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the extended double-precision floating-point > +value `a' by the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4851,12 +5011,13 @@ floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the extended double-precision floating-point value > -| `a' with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the extended double-precision floating-point value > +`a' with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4947,12 +5108,13 @@ floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the extended double-precision floating-point > -| value `a'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the extended double-precision floating-point > +value `a'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -5017,12 +5179,14 @@ floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is equal > -| to the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is equal > +to the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5044,13 +5208,15 @@ int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than or equal to the corresponding value `b', and 0 otherwise. The > -| invalid exception is raised if either operand is a NaN. The comparison is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than or equal to the corresponding value `b', and 0 otherwise. The > +invalid exception is raised if either operand is a NaN. The comparison is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5078,12 +5244,14 @@ int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5111,12 +5279,14 @@ int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' and `b' > -| cannot be compared, and 0 otherwise. The invalid exception is raised if > -| either operand is a NaN. The comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and `b' > +cannot be compared, and 0 otherwise. The invalid exception is raised if > +either operand is a NaN. The comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5130,12 +5300,14 @@ int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5160,12 +5332,14 @@ int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is less > -| than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > -| do not cause an exception. Otherwise, the comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is less > +than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > +do not cause an exception. Otherwise, the comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5196,12 +5370,14 @@ int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is less > -| than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > -| an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is less > +than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > +an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5232,12 +5408,14 @@ int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' and `b' > -| cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > -| The comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and `b' > +cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > +The comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5254,16 +5432,17 @@ int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5283,16 +5462,17 @@ int32 float128_to_int32( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5331,16 +5511,17 @@ int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5374,16 +5555,17 @@ int64 float128_to_int64( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5435,13 +5617,14 @@ int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the single-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the single-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float128_to_float32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5470,13 +5653,14 @@ float32 float128_to_float32( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float128_to_float64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5503,13 +5687,14 @@ float64 float128_to_float64( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the extended double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the extended double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5538,13 +5723,14 @@ floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the quadruple-precision floating-point value `a' to an integer, and > -| returns the result as a quadruple-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the quadruple-precision floating-point value `a' to an integer, and > +returns the result as a quadruple-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5641,14 +5827,15 @@ float128 float128_round_to_int( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the quadruple-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the quadruple-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5727,14 +5914,15 @@ static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the quadruple- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the quadruple- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5811,12 +5999,13 @@ static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the quadruple-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the quadruple-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_add( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5832,12 +6021,13 @@ float128 float128_add( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5853,12 +6043,13 @@ float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -5917,12 +6108,13 @@ float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the quadruple-precision floating-point value > -| `a' by the corresponding value `b'. The operation is performed according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the quadruple-precision floating-point value > +`a' by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_div( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -6001,12 +6193,13 @@ float128 float128_div( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the quadruple-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the quadruple-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -6110,12 +6303,13 @@ float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the quadruple-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the quadruple-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_sqrt( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -6179,12 +6373,14 @@ float128 float128_sqrt( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_eq( float128 a, float128 b STATUS_PARAM ) > { > @@ -6206,12 +6402,14 @@ int float128_eq( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_le( float128 a, float128 b STATUS_PARAM ) > { > @@ -6239,12 +6437,14 @@ int float128_le( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_lt( float128 a, float128 b STATUS_PARAM ) > { > @@ -6272,12 +6472,14 @@ int float128_lt( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered( float128 a, float128 b STATUS_PARAM ) > { > @@ -6292,12 +6494,14 @@ int float128_unordered( float128 a, float128 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6322,12 +6526,14 @@ int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6358,12 +6564,14 @@ int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6394,12 +6602,14 @@ int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered_quiet( float128 a, float128 b STATUS_PARAM ) > { > diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h > index f3927e2..b646621 100644 > --- a/include/fpu/softfloat.h > +++ b/include/fpu/softfloat.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +============================================================================ > > -This C header file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic > -Package, Release 2b. > +This C header file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > -=============================================================================*/ > +=============================================================================== > +*/ > > #ifndef SOFTFLOAT_H > #define SOFTFLOAT_H > @@ -46,14 +45,16 @@ these four paragraphs for those parts of this code that are retained. > #include "config-host.h" > #include "qemu/osdep.h" > > -/*---------------------------------------------------------------------------- > -| Each of the following `typedef's defines the most convenient type that holds > -| integers of at least as many bits as specified. For example, `uint8' should > -| be the most convenient type that can hold unsigned integers of as many as > -| 8 bits. The `flag' type must be able to hold either a 0 or 1. For most > -| implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed > -| to the same as `int'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Each of the following `typedef's defines the most convenient type that holds > +integers of at least as many bits as specified. For example, `uint8' should > +be the most convenient type that can hold unsigned integers of as many as > +8 bits. The `flag' type must be able to hold either a 0 or 1. For most > +implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed > +to the same as `int'. > +------------------------------------------------------------------------------- > +*/ > typedef uint8_t flag; > typedef uint8_t uint8; > typedef int8_t int8; > @@ -69,9 +70,11 @@ typedef int64_t int64; > #define STATUS(field) status->field > #define STATUS_VAR , status > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point ordering relations > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point ordering relations > +------------------------------------------------------------------------------- > +*/ > enum { > float_relation_less = -1, > float_relation_equal = 0, > @@ -79,9 +82,11 @@ enum { > float_relation_unordered = 2 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point types. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point types. > +------------------------------------------------------------------------------- > +*/ > /* Use structures for soft-float types. This prevents accidentally mixing > them with native int/float types. A sufficiently clever compiler and > sane ABI should be able to see though these structs. However > @@ -137,17 +142,21 @@ typedef struct { > #define make_float128(high_, low_) ((float128) { .high = high_, .low = low_ }) > #define make_float128_init(high_, low_) { .high = high_, .low = low_ } > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point underflow tininess-detection mode. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point underflow tininess-detection mode. > +------------------------------------------------------------------------------- > +*/ > enum { > float_tininess_after_rounding = 0, > float_tininess_before_rounding = 1 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point rounding mode. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point rounding mode. > +------------------------------------------------------------------------------- > +*/ > enum { > float_round_nearest_even = 0, > float_round_down = 1, > @@ -155,9 +164,11 @@ enum { > float_round_to_zero = 3 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point exception flags. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point exception flags. > +------------------------------------------------------------------------------- > +*/ > enum { > float_flag_invalid = 1, > float_flag_divbyzero = 4, > @@ -167,7 +178,6 @@ enum { > float_flag_input_denormal = 64, > float_flag_output_denormal = 128 > }; > - > typedef struct float_status { > signed char float_detect_tininess; > signed char float_rounding_mode; > @@ -204,27 +214,33 @@ INLINE int get_float_exception_flags(float_status *status) > } > void set_floatx80_rounding_precision(int val STATUS_PARAM); > > -/*---------------------------------------------------------------------------- > -| Routine to raise any or all of the software IEC/IEEE floating-point > -| exception flags. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Routine to raise any or all of the software IEC/IEEE floating-point > +exception flags. > +------------------------------------------------------------------------------- > +*/ > void float_raise( int8 flags STATUS_PARAM); > > -/*---------------------------------------------------------------------------- > -| Options to indicate which negations to perform in float*_muladd() > -| Using these differs from negating an input or output before calling > -| the muladd function in that this means that a NaN doesn't have its > -| sign bit inverted before it is propagated. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Options to indicate which negations to perform in float*_muladd() > +Using these differs from negating an input or output before calling > +the muladd function in that this means that a NaN doesn't have its > +sign bit inverted before it is propagated. > +------------------------------------------------------------------------------- > +*/ > enum { > float_muladd_negate_c = 1, > float_muladd_negate_product = 2, > float_muladd_negate_result = 4, > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE integer-to-floating-point conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE integer-to-floating-point conversion routines. > +------------------------------------------------------------------------------- > +*/ > float32 int32_to_float32( int32 STATUS_PARAM ); > float64 int32_to_float64( int32 STATUS_PARAM ); > float32 uint32_to_float32( uint32 STATUS_PARAM ); > @@ -239,15 +255,19 @@ floatx80 int64_to_floatx80( int64 STATUS_PARAM ); > float128 int64_to_float128( int64 STATUS_PARAM ); > float128 uint64_to_float128( uint64 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software half-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software half-precision conversion routines. > +*---------------------------------------------------------------------------- > +*/ > float16 float32_to_float16( float32, flag STATUS_PARAM ); > float32 float16_to_float32( float16, flag STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software half-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software half-precision operations. > +------------------------------------------------------------------------------- > +*/ > int float16_is_quiet_nan( float16 ); > int float16_is_signaling_nan( float16 ); > float16 float16_maybe_silence_nan( float16 ); > @@ -257,14 +277,18 @@ INLINE int float16_is_any_nan(float16 a) > return ((float16_val(a) & ~0x8000) > 0x7c00); > } > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float16 float16_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float32_to_int16_round_to_zero(float32 STATUS_PARAM); > uint_fast16_t float32_to_uint16_round_to_zero(float32 STATUS_PARAM); > int32 float32_to_int32( float32 STATUS_PARAM ); > @@ -277,9 +301,11 @@ float64 float32_to_float64( float32 STATUS_PARAM ); > floatx80 float32_to_floatx80( float32 STATUS_PARAM ); > float128 float32_to_float128( float32 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision operations. > +------------------------------------------------------------------------------- > +*/ > float32 float32_round_to_int( float32 STATUS_PARAM ); > float32 float32_add( float32, float32 STATUS_PARAM ); > float32 float32_sub( float32, float32 STATUS_PARAM ); > @@ -361,14 +387,18 @@ INLINE float32 float32_set_sign(float32 a, int sign) > #define float32_infinity make_float32(0x7f800000) > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float32 float32_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float64_to_int16_round_to_zero(float64 STATUS_PARAM); > uint_fast16_t float64_to_uint16_round_to_zero(float64 STATUS_PARAM); > int32 float64_to_int32( float64 STATUS_PARAM ); > @@ -383,9 +413,11 @@ float32 float64_to_float32( float64 STATUS_PARAM ); > floatx80 float64_to_floatx80( float64 STATUS_PARAM ); > float128 float64_to_float128( float64 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision operations. > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 STATUS_PARAM ); > float64 float64_trunc_to_int( float64 STATUS_PARAM ); > float64 float64_add( float64, float64 STATUS_PARAM ); > @@ -467,14 +499,18 @@ INLINE float64 float64_set_sign(float64 a, int sign) > #define float64_half make_float64(0x3fe0000000000000LL) > #define float64_infinity make_float64(0x7ff0000000000000LL) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float64 float64_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 STATUS_PARAM ); > int32 floatx80_to_int32_round_to_zero( floatx80 STATUS_PARAM ); > int64 floatx80_to_int64( floatx80 STATUS_PARAM ); > @@ -483,9 +519,11 @@ float32 floatx80_to_float32( floatx80 STATUS_PARAM ); > float64 floatx80_to_float64( floatx80 STATUS_PARAM ); > float128 floatx80_to_float128( floatx80 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision operations. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 STATUS_PARAM ); > floatx80 floatx80_add( floatx80, floatx80 STATUS_PARAM ); > floatx80 floatx80_sub( floatx80, floatx80 STATUS_PARAM ); > @@ -552,14 +590,18 @@ INLINE int floatx80_is_any_nan(floatx80 a) > #define floatx80_half make_floatx80(0x3ffe, 0x8000000000000000LL) > #define floatx80_infinity make_floatx80(0x7fff, 0x8000000000000000LL) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const floatx80 floatx80_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 STATUS_PARAM ); > int32 float128_to_int32_round_to_zero( float128 STATUS_PARAM ); > int64 float128_to_int64( float128 STATUS_PARAM ); > @@ -568,9 +610,11 @@ float32 float128_to_float32( float128 STATUS_PARAM ); > float64 float128_to_float64( float128 STATUS_PARAM ); > floatx80 float128_to_floatx80( float128 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision operations. > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 STATUS_PARAM ); > float128 float128_add( float128, float128 STATUS_PARAM ); > float128 float128_sub( float128, float128 STATUS_PARAM ); > @@ -633,9 +677,11 @@ INLINE int float128_is_any_nan(float128 a) > > #define float128_zero make_float128(0, 0) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float128 float128_default_nan; > > #endif /* !SOFTFLOAT_H */ > -- > 1.8.0
Anthony Liguori <aliguori@us.ibm.com> writes: > Thiemo Seufer <ths@networkno.de> > 5a6932d Fix NaN handling for MIPS and HPPA. > 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files > 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. > 924b2c0 Add proper float*_is_nan prototypes. > b645bb4 Fix softfloat NaN handling. > fc81ba5 Check that HOST_SOLARIS is defined before relying on its > value. Spotted by Joachim Henke. As most people know, Thiemo passed away a few years ago. I think we're going to have to revert all of these commits unless someone is able to contact his estate and get permission to relicense. Regards, Anthony Liguori > --- > fpu/softfloat-macros.h | 430 ++++---- > fpu/softfloat-specialize.h | 494 +++++---- > fpu/softfloat.c | 2436 ++++++++++++++++++++++++-------------------- > include/fpu/softfloat.h | 242 +++-- > 4 files changed, 1981 insertions(+), 1621 deletions(-) > > diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h > index b5164af..2009315 100644 > --- a/fpu/softfloat-macros.h > +++ b/fpu/softfloat-macros.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,28 +17,27 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal notice) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > =============================================================================*/ > > -/*---------------------------------------------------------------------------- > -| This macro tests for minimum version of the GNU C compiler. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +This macro tests for minimum version of the GNU C compiler. > +------------------------------------------------------------------------------- > +*/ > #if defined(__GNUC__) && defined(__GNUC_MINOR__) > # define SOFTFLOAT_GNUC_PREREQ(maj, min) \ > ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) > @@ -46,14 +46,16 @@ these four paragraphs for those parts of this code that are retained. > #endif > > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant bit of > -| the result by setting the least significant bit to 1. The value of `count' > -| can be arbitrarily large; in particular, if `count' is greater than 32, the > -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. > -| The result is stored in the location pointed to by `zPtr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit of > +the result by setting the least significant bit to 1. The value of `count' > +can be arbitrarily large; in particular, if `count' is greater than 32, the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) > { > @@ -72,14 +74,16 @@ INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) > > } > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant bit of > -| the result by setting the least significant bit to 1. The value of `count' > -| can be arbitrarily large; in particular, if `count' is greater than 64, the > -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. > -| The result is stored in the location pointed to by `zPtr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit of > +the result by setting the least significant bit to 1. The value of `count' > +can be arbitrarily large; in particular, if `count' is greater than 64, the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) > { > @@ -98,23 +102,24 @@ INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 > -| _plus_ the number of bits given in `count'. The shifted result is at most > -| 64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The > -| bits shifted off form a second 64-bit result as follows: The _last_ bit > -| shifted off is the most-significant bit of the extra result, and the other > -| 63 bits of the extra result are all zero if and only if _all_but_the_last_ > -| bits shifted off were all zero. This extra result is stored in the location > -| pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0' and `a1' are considered to form > -| a fixed-point value with binary point between `a0' and `a1'. This fixed- > -| point value is shifted right by the number of bits given in `count', and > -| the integer part of the result is returned at the location pointed to by > -| `z0Ptr'. The fractional part of the result may be slightly corrupted as > -| described above, and is returned at the location pointed to by `z1Ptr'.) > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 > +_plus_ the number of bits given in `count'. The shifted result is at most > +64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The > +bits shifted off form a second 64-bit result as follows: The _last_ bit > +shifted off is the most-significant bit of the extra result, and the other > +63 bits of the extra result are all zero if and only if _all_but_the_last_ > +bits shifted off were all zero. This extra result is stored in the location > +pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0' and `a1' are considered to form a > +fixed-point value with binary point between `a0' and `a1'. This fixed-point > +value is shifted right by the number of bits given in `count', and the > +integer part of the result is returned at the location pointed to by > +`z0Ptr'. The fractional part of the result may be slightly corrupted as > +described above, and is returned at the location pointed to by `z1Ptr'.) > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift64ExtraRightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -144,14 +149,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > -| number of bits given in `count'. Any bits shifted off are lost. The value > -| of `count' can be arbitrarily large; in particular, if `count' is greater > -| than 128, the result will be 0. The result is broken into two 64-bit pieces > -| which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > +number of bits given in `count'. Any bits shifted off are lost. The value > +of `count' can be arbitrarily large; in particular, if `count' is greater > +than 128, the result will be 0. The result is broken into two 64-bit pieces > +which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128Right( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -176,17 +182,18 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > -| number of bits given in `count'. If any nonzero bits are shifted off, they > -| are ``jammed'' into the least significant bit of the result by setting the > -| least significant bit to 1. The value of `count' can be arbitrarily large; > -| in particular, if `count' is greater than 128, the result will be either > -| 0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or > -| nonzero. The result is broken into two 64-bit pieces which are stored at > -| the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > +number of bits given in `count'. If any nonzero bits are shifted off, they > +are ``jammed'' into the least significant bit of the result by setting the > +least significant bit to 1. The value of `count' can be arbitrarily large; > +in particular, if `count' is greater than 128, the result will be either > +0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or > +nonzero. The result is broken into two 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128RightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -219,25 +226,26 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right > -| by 64 _plus_ the number of bits given in `count'. The shifted result is > -| at most 128 nonzero bits; these are broken into two 64-bit pieces which are > -| stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted > -| off form a third 64-bit result as follows: The _last_ bit shifted off is > -| the most-significant bit of the extra result, and the other 63 bits of the > -| extra result are all zero if and only if _all_but_the_last_ bits shifted off > -| were all zero. This extra result is stored in the location pointed to by > -| `z2Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0', `a1', and `a2' are considered > -| to form a fixed-point value with binary point between `a1' and `a2'. This > -| fixed-point value is shifted right by the number of bits given in `count', > -| and the integer part of the result is returned at the locations pointed to > -| by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly > -| corrupted as described above, and is returned at the location pointed to by > -| `z2Ptr'.) > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right > +by 64 _plus_ the number of bits given in `count'. The shifted result is > +at most 128 nonzero bits; these are broken into two 64-bit pieces which are > +stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted > +off form a third 64-bit result as follows: The _last_ bit shifted off is > +the most-significant bit of the extra result, and the other 63 bits of the > +extra result are all zero if and only if _all_but_the_last_ bits shifted off > +were all zero. This extra result is stored in the location pointed to by > +`z2Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0', `a1', and `a2' are considered > +to form a fixed-point value with binary point between `a1' and `a2'. This > +fixed-point value is shifted right by the number of bits given in `count', > +and the integer part of the result is returned at the locations pointed to > +by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly > +corrupted as described above, and is returned at the location pointed to by > +`z2Ptr'.) > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128ExtraRightJamming( > uint64_t a0, > @@ -289,13 +297,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the > -| number of bits given in `count'. Any bits shifted off are lost. The value > -| of `count' must be less than 64. The result is broken into two 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the > +number of bits given in `count'. Any bits shifted off are lost. The value > +of `count' must be less than 64. The result is broken into two 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift128Left( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -307,14 +316,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left > -| by the number of bits given in `count'. Any bits shifted off are lost. > -| The value of `count' must be less than 64. The result is broken into three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left > +by the number of bits given in `count'. Any bits shifted off are lost. > +The value of `count' must be less than 64. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift192Left( > uint64_t a0, > @@ -343,13 +353,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit > -| value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so > -| any carry out is lost. The result is broken into two 64-bit pieces which > -| are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit > +value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so > +any carry out is lost. The result is broken into two 64-bit pieces which > +are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > add128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) > @@ -362,14 +373,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the > -| 192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > -| modulo 2^192, so any carry out is lost. The result is broken into three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the > +192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > +modulo 2^192, so any carry out is lost. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > add192( > uint64_t a0, > @@ -400,14 +412,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the > -| 128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo > -| 2^128, so any borrow out (carry out) is lost. The result is broken into two > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > -| `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the > +128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo > +2^128, so any borrow out (carry out) is lost. The result is broken into two > +64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > +`z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) > @@ -418,14 +431,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > -| from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > -| Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > -| result is broken into three 64-bit pieces which are stored at the locations > -| pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > +from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > +Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > +result is broken into three 64-bit pieces which are stored at the locations > +pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub192( > uint64_t a0, > @@ -456,11 +470,13 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > -| into two 64-bit pieces which are stored at the locations pointed to by > -| `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > +into two 64-bit pieces which are stored at the locations pointed to by > +`z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr ) > { > @@ -485,13 +501,14 @@ INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > -| `b' to obtain a 192-bit product. The product is broken into three 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and > -| `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > +`b' to obtain a 192-bit product. The product is broken into three 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and > +`z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128By64To192( > uint64_t a0, > @@ -513,13 +530,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > -| 128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > -| product. The product is broken into four 64-bit pieces which are stored at > -| the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > +128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > +product. The product is broken into four 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128To256( > uint64_t a0, > @@ -550,14 +568,16 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the 64-bit integer quotient obtained by dividing > -| `b' into the 128-bit value formed by concatenating `a0' and `a1'. The > -| divisor `b' must be at least 2^63. If q is the exact quotient truncated > -| toward zero, the approximation returned lies between q and q + 2 inclusive. > -| If the exact quotient q is larger than 64 bits, the maximum positive 64-bit > -| unsigned integer is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns an approximation to the 64-bit integer quotient obtained by dividing > +`b' into the 128-bit value formed by concatenating `a0' and `a1'. The > +divisor `b' must be at least 2^63. If q is the exact quotient truncated > +toward zero, the approximation returned lies between q and q + 2 inclusive. > +If the exact quotient q is larger than 64 bits, the maximum positive 64-bit > +unsigned integer is returned. > +------------------------------------------------------------------------------- > +*/ > > static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) > { > @@ -581,15 +601,17 @@ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the square root of the 32-bit significand given > -| by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of > -| `aExp' (the least significant bit) is 1, the integer returned approximates > -| 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' > -| is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > -| case, the approximation returned lies strictly within +/-2 of the exact > -| value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns an approximation to the square root of the 32-bit significand given > +by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of > +`aExp' (the least significant bit) is 1, the integer returned approximates > +2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' > +is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > +case, the approximation returned lies strictly within +/-2 of the exact > +value. > +------------------------------------------------------------------------------- > +*/ > > static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > { > @@ -620,10 +642,12 @@ static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit of > -| `a'. If `a' is zero, 32 is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 32 is returned. > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros32( uint32_t a ) > { > @@ -668,10 +692,12 @@ static int8 countLeadingZeros32( uint32_t a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit of > -| `a'. If `a' is zero, 64 is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 64 is returned. > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros64( uint64_t a ) > { > @@ -696,11 +722,13 @@ static int8 countLeadingZeros64( uint64_t a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > -| is equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > +is equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -709,11 +737,13 @@ INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > -| than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > +than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -722,11 +752,13 @@ INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > -| than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > -| returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > +than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > +returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -735,11 +767,13 @@ INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > -| not equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > +not equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag ne128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h > index 518f694..ba9bfeb 100644 > --- a/fpu/softfloat-specialize.h > +++ b/fpu/softfloat-specialize.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,22 +17,19 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > =============================================================================*/ > > @@ -48,9 +46,11 @@ these four paragraphs for those parts of this code that are retained. > #define NO_SIGNALING_NANS 1 > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > const float16 float16_default_nan = const_float16(0x7E00); > #elif SNAN_BIT_IS_ONE > @@ -59,9 +59,11 @@ const float16 float16_default_nan = const_float16(0x7DFF); > const float16 float16_default_nan = const_float16(0xFE00); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float32 float32_default_nan = const_float32(0x7FFFFFFF); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) || \ > @@ -73,9 +75,11 @@ const float32 float32_default_nan = const_float32(0x7FBFFFFF); > const float32 float32_default_nan = const_float32(0xFFC00000); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float64 float64_default_nan = const_float64(LIT64( 0x7FFFFFFFFFFFFFFF )); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) > @@ -86,9 +90,11 @@ const float64 float64_default_nan = const_float64(LIT64( 0x7FF7FFFFFFFFFFFF )); > const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define floatx80_default_nan_high 0x7FFF > #define floatx80_default_nan_low LIT64( 0xBFFFFFFFFFFFFFFF ) > @@ -100,10 +106,12 @@ const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); > const floatx80 floatx80_default_nan > = make_floatx80_init(floatx80_default_nan_high, floatx80_default_nan_low); > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. The `high' and > -| `low' values hold the most- and least-significant bits, respectively. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. The `high' and > +`low' values hold the most- and least-significant bits, respectively. > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF ) > #define float128_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) > @@ -115,21 +123,25 @@ const floatx80 floatx80_default_nan > const float128 float128_default_nan > = make_float128_init(float128_default_nan_high, float128_default_nan_low); > > -/*---------------------------------------------------------------------------- > -| Raises the exceptions specified by `flags'. Floating-point traps can be > -| defined here if desired. It is currently not possible for such a trap > -| to substitute a result value. If traps are not implemented, this routine > -| should be simply `float_exception_flags |= flags;'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Raises the exceptions specified by `flags'. Floating-point traps can be > +defined here if desired. It is currently not possible for such a trap > +to substitute a result value. If traps are not implemented, this routine > +should be simply `float_exception_flags |= flags;'. > +------------------------------------------------------------------------------- > +*/ > > void float_raise( int8 flags STATUS_PARAM ) > { > STATUS(float_exception_flags) |= flags; > } > > -/*---------------------------------------------------------------------------- > -| Internal canonical NaN format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Internal canonical NaN format. > +------------------------------------------------------------------------------- > +*/ > typedef struct { > flag sign; > uint64_t high, low; > @@ -146,10 +158,12 @@ int float16_is_signaling_nan(float16 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float16_is_quiet_nan(float16 a_) > { > @@ -161,10 +175,12 @@ int float16_is_quiet_nan(float16 a_) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float16_is_signaling_nan(float16 a_) > { > @@ -177,10 +193,12 @@ int float16_is_signaling_nan(float16 a_) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the half-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the half-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > float16 float16_maybe_silence_nan(float16 a_) > { > if (float16_is_signaling_nan(a_)) { > @@ -199,11 +217,13 @@ float16 float16_maybe_silence_nan(float16 a_) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the half-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the half-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > > static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > { > @@ -216,10 +236,12 @@ static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the half- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the half- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM) > { > @@ -248,10 +270,12 @@ int float32_is_signaling_nan(float32 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float32_is_quiet_nan( float32 a_ ) > { > @@ -263,10 +287,12 @@ int float32_is_quiet_nan( float32 a_ ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float32_is_signaling_nan( float32 a_ ) > { > @@ -279,10 +305,12 @@ int float32_is_signaling_nan( float32 a_ ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the single-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the single-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float32 float32_maybe_silence_nan( float32 a_ ) > { > @@ -302,12 +330,13 @@ float32 float32_maybe_silence_nan( float32 a_ ) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > { > commonNaNT z; > @@ -319,10 +348,12 @@ static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the single- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the single- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > { > @@ -339,22 +370,24 @@ static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > return float32_default_nan; > } > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a two-input operation. > -| IEEE754 doesn't specify all the details of this, so the > -| algorithm is target-specific. > -| The routine is passed various bits of information about the > -| two NaNs and should return 0 to select NaN a and 1 for NaN b. > -| Note that signalling NaNs are always squashed to quiet NaNs > -| by the caller, by calling floatXX_maybe_silence_nan() before > -| returning them. > -| > -| aIsLargerSignificand is only valid if both a and b are NaNs > -| of some kind, and is true if a has the larger significand, > -| or if both a and b have the same significand but a is > -| positive but b is negative. It is only needed for the x87 > -| tie-break rule. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a two-input operation. > +IEEE754 doesn't specify all the details of this, so the > +algorithm is target-specific. > +The routine is passed various bits of information about the > +two NaNs and should return 0 to select NaN a and 1 for NaN b. > +Note that signalling NaNs are always squashed to quiet NaNs > +by the caller, by calling floatXX_maybe_silence_nan() before > +returning them. > + > +aIsLargerSignificand is only valid if both a and b are NaNs > +of some kind, and is true if a has the larger significand, > +or if both a and b have the same significand but a is > +positive but b is negative. It is only needed for the x87 > +tie-break rule. > +------------------------------------------------------------------------------- > +*/ > > #if defined(TARGET_ARM) > static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > @@ -451,12 +484,14 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > } > #endif > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a three-input operation. > -| For the moment we assume that no CPU needs the 'larger significand' > -| information. > -| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a three-input operation. > +For the moment we assume that no CPU needs the 'larger significand' > +information. > +Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM) > @@ -554,12 +589,13 @@ static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > } > #endif > > -/*---------------------------------------------------------------------------- > -| Takes two single-precision floating-point values `a' and `b', one of which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > -| signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two single-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -594,14 +630,16 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Takes three single-precision floating-point values `a', `b' and `c', one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes three single-precision floating-point values `a', `b' and `c', one of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------------- > +*/ > > static float32 propagateFloat32MulAddNaN(float32 a, float32 b, > float32 c, flag infzero STATUS_PARAM) > @@ -656,10 +694,12 @@ int float64_is_signaling_nan(float64 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float64_is_quiet_nan( float64 a_ ) > { > @@ -673,10 +713,12 @@ int float64_is_quiet_nan( float64 a_ ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float64_is_signaling_nan( float64 a_ ) > { > @@ -691,10 +733,12 @@ int float64_is_signaling_nan( float64 a_ ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the double-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the double-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float64 float64_maybe_silence_nan( float64 a_ ) > { > @@ -714,12 +758,13 @@ float64 float64_maybe_silence_nan( float64 a_ ) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > { > commonNaNT z; > @@ -731,10 +776,12 @@ static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the double- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the double- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > { > @@ -753,12 +800,13 @@ static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > return float64_default_nan; > } > > -/*---------------------------------------------------------------------------- > -| Takes two double-precision floating-point values `a' and `b', one of which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > -| signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two double-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -793,14 +841,16 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Takes three double-precision floating-point values `a', `b' and `c', one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes three double-precision floating-point values `a', `b' and `c', one of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------------- > +*/ > > static float64 propagateFloat64MulAddNaN(float64 a, float64 b, > float64 c, flag infzero STATUS_PARAM) > @@ -855,11 +905,13 @@ int floatx80_is_signaling_nan(floatx80 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| quiet NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +quiet NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_quiet_nan( floatx80 a ) > { > @@ -877,11 +929,13 @@ int floatx80_is_quiet_nan( floatx80 a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_signaling_nan( floatx80 a ) > { > @@ -900,10 +954,12 @@ int floatx80_is_signaling_nan( floatx80 a ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the extended double-precision floating point value > -| `a' is a signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the extended double-precision floating point value > +`a' is a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > floatx80 floatx80_maybe_silence_nan( floatx80 a ) > { > @@ -923,12 +979,13 @@ floatx80 floatx80_maybe_silence_nan( floatx80 a ) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the > -| invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the > +invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > { > commonNaNT z; > @@ -946,10 +1003,12 @@ static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the extended > -| double-precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the extended > +double-precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > { > @@ -972,12 +1031,13 @@ static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Takes two extended double-precision floating-point values `a' and `b', one > -| of which is a NaN, and returns the appropriate NaN result. If either `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two extended double-precision floating-point values `a' and `b', one > +of which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -1023,10 +1083,12 @@ int float128_is_signaling_nan(float128 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float128_is_quiet_nan( float128 a ) > { > @@ -1041,10 +1103,12 @@ int float128_is_quiet_nan( float128 a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float128_is_signaling_nan( float128 a ) > { > @@ -1060,10 +1124,12 @@ int float128_is_signaling_nan( float128 a ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the quadruple-precision floating point value `a' is > -| a signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the quadruple-precision floating point value `a' is > +a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float128 float128_maybe_silence_nan( float128 a ) > { > @@ -1083,12 +1149,13 @@ float128 float128_maybe_silence_nan( float128 a ) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > { > commonNaNT z; > @@ -1099,10 +1166,12 @@ static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the quadruple- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the quadruple- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > { > @@ -1119,12 +1188,13 @@ static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Takes two quadruple-precision floating-point values `a' and `b', one of > -| which is a NaN, and returns the appropriate NaN result. If either `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two quadruple-precision floating-point values `a' and `b', one of > +which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float128 propagateFloat128NaN( float128 a, float128 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > diff --git a/fpu/softfloat.c b/fpu/softfloat.c > index 7ba51b6..9145582 100644 > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > -This C source file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic > -Package, Release 2b. > +This C source file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > -=============================================================================*/ > +=============================================================================== > +*/ > > /* softfloat (and in particular the code in softfloat-specialize.h) is > * target-dependent and needs the TARGET_* macros. > @@ -42,21 +41,25 @@ these four paragraphs for those parts of this code that are retained. > > #include "fpu/softfloat.h" > > -/*---------------------------------------------------------------------------- > -| Primitive arithmetic functions, including multi-word arithmetic, and > -| division and square root approximations. (Can be specialized to target if > -| desired.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Primitive arithmetic functions, including multi-word arithmetic, and > +division and square root approximations. (Can be specialized to target if > +desired.) > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-macros.h" > > -/*---------------------------------------------------------------------------- > -| Functions and definitions to determine: (1) whether tininess for underflow > -| is detected before or after rounding by default, (2) what (if anything) > -| happens when exceptions are raised, (3) how signaling NaNs are distinguished > -| from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > -| are propagated from function inputs to output. These details are target- > -| specific. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Functions and definitions to determine: (1) whether tininess for underflow > +is detected before or after rounding by default, (2) what (if anything) > +happens when exceptions are raised, (3) how signaling NaNs are distinguished > +from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > +are propagated from function inputs to output. These details are target- > +specific. > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-specialize.h" > > void set_float_rounding_mode(int val STATUS_PARAM) > @@ -74,43 +77,51 @@ void set_floatx80_rounding_precision(int val STATUS_PARAM) > STATUS(floatx80_rounding_precision) = val; > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the half-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat16Frac(float16 a) > { > return float16_val(a) & 0x3ff; > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the half-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat16Exp(float16 a) > { > return (float16_val(a) >> 10) & 0x1f; > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag extractFloat16Sign(float16 a) > { > return float16_val(a)>>15; > } > > -/*---------------------------------------------------------------------------- > -| Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > -| and 7, and returns the properly rounded 32-bit integer corresponding to the > -| input. If `zSign' is 1, the input is negated before being converted to an > -| integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input > -| is simply rounded to an integer, with the inexact exception raised if the > -| input cannot be represented exactly as an integer. However, if the fixed- > -| point input is too large, the invalid exception is raised and the largest > -| positive or negative integer is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > +and 7, and returns the properly rounded 32-bit integer corresponding to the > +input. If `zSign' is 1, the input is negated before being converted to an > +integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input > +is simply rounded to an integer, with the inexact exception raised if the > +input cannot be represented exactly as an integer. However, if the fixed- > +point input is too large, the invalid exception is raised and the largest > +positive or negative integer is returned. > +------------------------------------------------------------------------------- > +*/ > > static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > { > @@ -150,17 +161,19 @@ static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > -| `absZ1', with binary point between bits 63 and 64 (between the input words), > -| and returns the properly rounded 64-bit integer corresponding to the input. > -| If `zSign' is 1, the input is negated before being converted to an integer. > -| Ordinarily, the fixed-point input is simply rounded to an integer, with > -| the inexact exception raised if the input cannot be represented exactly as > -| an integer. However, if the fixed-point input is too large, the invalid > -| exception is raised and the largest positive or negative integer is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > +`absZ1', with binary point between bits 63 and 64 (between the input words), > +and returns the properly rounded 64-bit integer corresponding to the input. > +If `zSign' is 1, the input is negated before being converted to an integer. > +Ordinarily, the fixed-point input is simply rounded to an integer, with > +the inexact exception raised if the input cannot be represented exactly as > +an integer. However, if the fixed-point input is too large, the invalid > +exception is raised and the largest positive or negative integer is > +returned. > +------------------------------------------------------------------------------- > +*/ > > static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATUS_PARAM) > { > @@ -203,9 +216,11 @@ static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATU > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat32Frac( float32 a ) > { > @@ -214,9 +229,11 @@ INLINE uint32_t extractFloat32Frac( float32 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat32Exp(float32 a) > { > @@ -225,10 +242,11 @@ INLINE int_fast16_t extractFloat32Exp(float32 a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat32Sign( float32 a ) > { > > @@ -236,10 +254,12 @@ INLINE flag extractFloat32Sign( float32 a ) > > } > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > +------------------------------------------------------------------------------- > +*/ > static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -251,13 +271,14 @@ static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal single-precision floating-point value represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal single-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat32Subnormal(uint32_t aSig, int_fast16_t *zExpPtr, uint32_t *zSigPtr) > { > @@ -269,16 +290,18 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| single-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +single-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > > INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > { > @@ -288,27 +311,29 @@ INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper single-precision floating- > -| point value corresponding to the abstract input. Ordinarily, the abstract > -| value is simply rounded and packed into the single-precision format, with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal single- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 30 > -| and 29, which is 7 bits to the left of the usual location. This shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the single-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal single- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 30 > +and 29, which is 7 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) > { > @@ -366,15 +391,16 @@ static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper single-precision floating- > -| point value corresponding to the abstract input. This routine is just like > -| `roundAndPackFloat32' except that `zSig' does not have to be normalized. > -| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > -| floating-point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. This routine is just like > +`roundAndPackFloat32' except that `zSig' does not have to be normalized. > +Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > +------------------------------------------------------------------------------- > +*/ > static float32 > normalizeRoundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) > { > @@ -385,9 +411,11 @@ static float32 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat64Frac( float64 a ) > { > @@ -396,9 +424,11 @@ INLINE uint64_t extractFloat64Frac( float64 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat64Exp(float64 a) > { > @@ -407,10 +437,11 @@ INLINE int_fast16_t extractFloat64Exp(float64 a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat64Sign( float64 a ) > { > > @@ -418,10 +449,12 @@ INLINE flag extractFloat64Sign( float64 a ) > > } > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > +------------------------------------------------------------------------------- > +*/ > static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -433,13 +466,14 @@ static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal double-precision floating-point value represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal double-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat64Subnormal(uint64_t aSig, int_fast16_t *zExpPtr, uint64_t *zSigPtr) > { > @@ -451,16 +485,18 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| double-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +double-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > > INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > { > @@ -470,27 +506,29 @@ INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper double-precision floating- > -| point value corresponding to the abstract input. Ordinarily, the abstract > -| value is simply rounded and packed into the double-precision format, with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded > -| to a subnormal number, and the underflow and inexact exceptions are raised > -| if the abstract input cannot be represented exactly as a subnormal double- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 62 > -| and 61, which is 10 bits to the left of the usual location. This shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the double-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded > +to a subnormal number, and the underflow and inexact exceptions are raised > +if the abstract input cannot be represented exactly as a subnormal double- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 62 > +and 61, which is 10 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) > { > @@ -548,15 +586,16 @@ static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper double-precision floating- > -| point value corresponding to the abstract input. This routine is just like > -| `roundAndPackFloat64' except that `zSig' does not have to be normalized. > -| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > -| floating-point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. This routine is just like > +`roundAndPackFloat64' except that `zSig' does not have to be normalized. > +Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > +------------------------------------------------------------------------------- > +*/ > static float64 > normalizeRoundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) > { > @@ -567,10 +606,12 @@ static float64 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the extended double-precision floating-point > -| value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloatx80Frac( floatx80 a ) > { > @@ -579,11 +620,12 @@ INLINE uint64_t extractFloatx80Frac( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the extended double-precision floating-point > -| value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloatx80Exp( floatx80 a ) > { > > @@ -591,11 +633,12 @@ INLINE int32 extractFloatx80Exp( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the extended double-precision floating-point value > -| `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the extended double-precision floating-point value > +`a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloatx80Sign( floatx80 a ) > { > > @@ -603,13 +646,14 @@ INLINE flag extractFloatx80Sign( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal extended double-precision floating-point value > -| represented by the denormalized significand `aSig'. The normalized exponent > -| and significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal extended double-precision floating-point value > +represented by the denormalized significand `aSig'. The normalized exponent > +and significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloatx80Subnormal( uint64_t aSig, int32 *zExpPtr, uint64_t *zSigPtr ) > { > @@ -621,10 +665,12 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > -| extended double-precision floating-point value, returning the result. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > +extended double-precision floating-point value, returning the result. > +------------------------------------------------------------------------------- > +*/ > > INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > { > @@ -636,30 +682,31 @@ INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and extended significand formed by the concatenation of `zSig0' and `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| rounded and packed into the extended double-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal extended > -| double-precision floating-point number. > -| If `roundingPrecision' is 32 or 64, the result is rounded to the same > -| number of bits as single or double precision, respectively. Otherwise, the > -| result is rounded to the full precision of the extended double-precision > -| format. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the result > -| returned is a subnormal number, and it must not require rounding. The > -| handling of underflow and overflow follows the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and extended significand formed by the concatenation of `zSig0' and `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +rounded and packed into the extended double-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal extended > +double-precision floating-point number. > + If `roundingPrecision' is 32 or 64, the result is rounded to the same > +number of bits as single or double precision, respectively. Otherwise, the > +result is rounded to the full precision of the extended double-precision > +format. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. The > +handling of underflow and overflow follows the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 > roundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 > @@ -823,15 +870,16 @@ static floatx80 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > -| `zExp', and significand formed by the concatenation of `zSig0' and `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. This routine is just like > -| `roundAndPackFloatx80' except that the input significand does not have to be > -| normalized. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > +`zExp', and significand formed by the concatenation of `zSig0' and `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. This routine is just like > +`roundAndPackFloatx80' except that the input significand does not have to be > +normalized. > +------------------------------------------------------------------------------- > +*/ > static floatx80 > normalizeRoundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 > @@ -852,10 +900,12 @@ static floatx80 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the least-significant 64 fraction bits of the quadruple-precision > -| floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the least-significant 64 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac1( float128 a ) > { > @@ -864,10 +914,12 @@ INLINE uint64_t extractFloat128Frac1( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the most-significant 48 fraction bits of the quadruple-precision > -| floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the most-significant 48 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac0( float128 a ) > { > @@ -876,11 +928,12 @@ INLINE uint64_t extractFloat128Frac0( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the quadruple-precision floating-point value > -| `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the quadruple-precision floating-point value > +`a'. > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloat128Exp( float128 a ) > { > > @@ -888,10 +941,11 @@ INLINE int32 extractFloat128Exp( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the quadruple-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the quadruple-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat128Sign( float128 a ) > { > > @@ -899,16 +953,17 @@ INLINE flag extractFloat128Sign( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal quadruple-precision floating-point value > -| represented by the denormalized significand formed by the concatenation of > -| `aSig0' and `aSig1'. The normalized exponent is stored at the location > -| pointed to by `zExpPtr'. The most significant 49 bits of the normalized > -| significand are stored at the location pointed to by `zSig0Ptr', and the > -| least significant 64 bits of the normalized significand are stored at the > -| location pointed to by `zSig1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal quadruple-precision floating-point value > +represented by the denormalized significand formed by the concatenation of > +`aSig0' and `aSig1'. The normalized exponent is stored at the location > +pointed to by `zExpPtr'. The most significant 49 bits of the normalized > +significand are stored at the location pointed to by `zSig0Ptr', and the > +least significant 64 bits of the normalized significand are stored at the > +location pointed to by `zSig1Ptr'. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat128Subnormal( > uint64_t aSig0, > @@ -940,19 +995,20 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', the exponent `zExp', and the significand formed > -| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > -| floating-point value, returning the result. After being shifted into the > -| proper positions, the three fields `zSign', `zExp', and `zSig0' are simply > -| added together to form the most significant 32 bits of the result. This > -| means that any integer portion of `zSig0' will be added into the exponent. > -| Since a properly normalized significand will have an integer portion equal > -| to 1, the `zExp' input should be 1 less than the desired result exponent > -| whenever `zSig0' and `zSig1' concatenated form a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', the exponent `zExp', and the significand formed > +by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > +floating-point value, returning the result. After being shifted into the > +proper positions, the three fields `zSign', `zExp', and `zSig0' are simply > +added together to form the most significant 32 bits of the result. This > +means that any integer portion of `zSig0' will be added into the exponent. > +Since a properly normalized significand will have an integer portion equal > +to 1, the `zExp' input should be 1 less than the desired result exponent > +whenever `zSig0' and `zSig1' concatenated form a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > INLINE float128 > packFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 ) > { > @@ -964,27 +1020,28 @@ INLINE float128 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and extended significand formed by the concatenation of `zSig0', `zSig1', > -| and `zSig2', and returns the proper quadruple-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| simply rounded and packed into the quadruple-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal quadruple- > -| precision floating-point number. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the result > -| returned is a subnormal number, and it must not require rounding. In the > -| usual case that the input significand is normalized, `zExp' must be 1 less > -| than the ``true'' floating-point exponent. The handling of underflow and > -| overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and extended significand formed by the concatenation of `zSig0', `zSig1', > +and `zSig2', and returns the proper quadruple-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +simply rounded and packed into the quadruple-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal quadruple- > +precision floating-point number. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. In the > +usual case that the input significand is normalized, `zExp' must be 1 less > +than the ``true'' floating-point exponent. The handling of underflow and > +overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 > roundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1, uint64_t zSig2 STATUS_PARAM) > @@ -1079,16 +1136,17 @@ static float128 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand formed by the concatenation of `zSig0' and `zSig1', and > -| returns the proper quadruple-precision floating-point value corresponding > -| to the abstract input. This routine is just like `roundAndPackFloat128' > -| except that the input significand has fewer bits and does not have to be > -| normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- > -| point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand formed by the concatenation of `zSig0' and `zSig1', and > +returns the proper quadruple-precision floating-point value corresponding > +to the abstract input. This routine is just like `roundAndPackFloat128' > +except that the input significand has fewer bits and does not have to be > +normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- > +point exponent. > +------------------------------------------------------------------------------- > +*/ > static float128 > normalizeRoundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 STATUS_PARAM) > @@ -1115,13 +1173,14 @@ static float128 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float32 int32_to_float32( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float32 int32_to_float32( int32 a STATUS_PARAM) > { > flag zSign; > > @@ -1132,13 +1191,14 @@ float32 int32_to_float32( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float64 int32_to_float64( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float64 int32_to_float64( int32 a STATUS_PARAM) > { > flag zSign; > uint32 absA; > @@ -1154,13 +1214,14 @@ float64 int32_to_float64( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1177,12 +1238,13 @@ floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' to > -| the quadruple-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' to > +the quadruple-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 int32_to_float128( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1199,12 +1261,13 @@ float128 int32_to_float128( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 int64_to_float32( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1252,12 +1315,13 @@ float32 uint64_to_float32( uint64 a STATUS_PARAM ) > } > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 int64_to_float64( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1285,13 +1349,14 @@ float64 uint64_to_float64(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat64(0, exp, a STATUS_VAR); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1306,12 +1371,13 @@ floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' to > -| the quadruple-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' to > +the quadruple-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 int64_to_float128( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1347,16 +1413,17 @@ float128 uint64_to_float128(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat128(0, 0x406E, a, 0 STATUS_VAR); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1378,16 +1445,17 @@ int32 float32_to_int32( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1421,15 +1489,17 @@ int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > { > @@ -1470,16 +1540,17 @@ int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1507,16 +1578,17 @@ int64 float32_to_int64( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1554,13 +1626,14 @@ int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float32_to_float64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1584,13 +1657,14 @@ float64 float32_to_float64( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1614,13 +1688,14 @@ floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float32_to_float128( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1644,14 +1719,15 @@ float128 float32_to_float128( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the single-precision floating-point value `a' to an integer, and > -| returns the result as a single-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float32 float32_round_to_int( float32 a STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Rounds the single-precision floating-point value `a' to an integer, and > +returns the result as a single-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float32 float32_round_to_int( float32 a STATUS_PARAM ) > { > flag aSign; > int_fast16_t aExp; > @@ -1704,15 +1780,16 @@ float32 float32_round_to_int( float32 a STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the single-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the single-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1783,15 +1860,16 @@ static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the single- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the single- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1858,12 +1936,13 @@ static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the single-precision floating-point values `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the single-precision floating-point values `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_add( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1881,12 +1960,13 @@ float32 float32_add( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the single-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the single-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1904,12 +1984,13 @@ float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -1967,12 +2048,13 @@ float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the single-precision floating-point value `a' > -| by the corresponding value `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the single-precision floating-point value `a' > +by the corresponding value `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_div( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -2031,12 +2113,13 @@ float32 float32_div( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the single-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the single-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2132,16 +2215,18 @@ float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point values > -| `a' and `b' then adding 'c', with no intermediate rounding step after the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that negating > -| externally will flip the sign bit on NaNs.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------------- > +*/ > > float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) > { > @@ -2339,12 +2424,13 @@ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) > } > > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the single-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_sqrt( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -2394,23 +2480,25 @@ float32 float32_sqrt( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary exponential of the single-precision floating-point value > -| `a'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -| > -| Uses the following identities: > -| > -| 1. ------------------------------------------------------------------------- > -| x x*ln(2) > -| 2 = e > -| > -| 2. ------------------------------------------------------------------------- > -| 2 3 4 5 n > -| x x x x x x x > -| e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > -| 1! 2! 3! 4! 5! n! > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary exponential of the single-precision floating-point value > +`a'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > + > +Uses the following identities: > + > +1. ------------------------------------------------------------------------- > + x x*ln(2) > + 2 = e > + > +2. ------------------------------------------------------------------------- > + 2 3 4 5 n > + x x x x x x x > + e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > + 1! 2! 3! 4! 5! n! > +------------------------------------------------------------------------------- > +*/ > > static const float64 float32_exp2_coefficients[15] = > { > @@ -2474,11 +2562,13 @@ float32 float32_exp2( float32 a STATUS_PARAM ) > return float64_to_float32(r, status); > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the single-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary log of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_log2( float32 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2522,12 +2612,14 @@ float32 float32_log2( float32 a STATUS_PARAM ) > return normalizeRoundAndPackFloat32( zSign, 0x85, zSig STATUS_VAR ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_eq( float32 a, float32 b STATUS_PARAM ) > { > @@ -2546,12 +2638,14 @@ int float32_eq( float32 a, float32 b STATUS_PARAM ) > return ( av == bv ) || ( (uint32_t) ( ( av | bv )<<1 ) == 0 ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_le( float32 a, float32 b STATUS_PARAM ) > { > @@ -2575,12 +2669,14 @@ int float32_le( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_lt( float32 a, float32 b STATUS_PARAM ) > { > @@ -2604,12 +2700,14 @@ int float32_lt( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered( float32 a, float32 b STATUS_PARAM ) > { > @@ -2625,12 +2723,14 @@ int float32_unordered( float32 a, float32 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2649,12 +2749,14 @@ int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > ( (uint32_t) ( ( float32_val(a) | float32_val(b) )<<1 ) == 0 ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2680,12 +2782,14 @@ int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2711,12 +2815,14 @@ int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2734,16 +2840,17 @@ int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2762,16 +2869,17 @@ int32 float64_to_int32( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2809,15 +2917,17 @@ int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > { > @@ -2860,16 +2970,17 @@ int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2903,16 +3014,17 @@ int64 float64_to_int64( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2956,13 +3068,14 @@ int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the single-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the single-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float64_to_float32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2989,16 +3102,18 @@ float32 float64_to_float32( float64 a STATUS_PARAM ) > } > > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| half-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +half-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > static float16 packFloat16(flag zSign, int_fast16_t zExp, uint16_t zSig) > { > return make_float16( > @@ -3132,13 +3247,14 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM) > return packFloat16(aSign, aExp + 14, aSig >> 13); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3163,13 +3279,14 @@ floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the quadruple-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the quadruple-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float64_to_float128( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3194,13 +3311,14 @@ float128 float64_to_float128( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the double-precision floating-point value `a' to an integer, and > -| returns the result as a double-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the double-precision floating-point value `a' to an integer, and > +returns the result as a double-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3267,14 +3385,15 @@ float64 float64_trunc_to_int( float64 a STATUS_PARAM) > return res; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the double-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the double-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3346,14 +3465,15 @@ static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3421,12 +3541,13 @@ static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the double-precision floating-point values `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the double-precision floating-point values `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_add( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3444,12 +3565,13 @@ float64 float64_add( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the double-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the double-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3467,12 +3589,13 @@ float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3528,12 +3651,13 @@ float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the double-precision floating-point value `a' > -| by the corresponding value `b'. The operation is performed according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the double-precision floating-point value `a' > +by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_div( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3600,12 +3724,13 @@ float64 float64_div( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the double-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the double-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -3686,16 +3811,18 @@ float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point values > -| `a' and `b' then adding 'c', with no intermediate rounding step after the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that negating > -| externally will flip the sign bit on NaNs.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------------- > +*/ > > float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) > { > @@ -3912,12 +4039,13 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the double-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_sqrt( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3964,11 +4092,13 @@ float64 float64_sqrt( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the double-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary log of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_log2( float64 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4011,12 +4141,14 @@ float64 float64_log2( float64 a STATUS_PARAM ) > return normalizeRoundAndPackFloat64( zSign, 0x408, zSig STATUS_VAR ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to the > -| corresponding value `b', and 0 otherwise. The invalid exception is raised > -| if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. The invalid exception is raised > +if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_eq( float64 a, float64 b STATUS_PARAM ) > { > @@ -4036,12 +4168,14 @@ int float64_eq( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_le( float64 a, float64 b STATUS_PARAM ) > { > @@ -4065,12 +4199,14 @@ int float64_le( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_lt( float64 a, float64 b STATUS_PARAM ) > { > @@ -4094,12 +4230,14 @@ int float64_lt( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered( float64 a, float64 b STATUS_PARAM ) > { > @@ -4115,12 +4253,14 @@ int float64_unordered( float64 a, float64 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to the > -| corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception.The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception.The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4142,12 +4282,14 @@ int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4173,12 +4315,14 @@ int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4204,12 +4348,14 @@ int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4227,16 +4373,17 @@ int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, the > -| largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, the > +largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4254,16 +4401,17 @@ int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the same > -| sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4299,16 +4447,17 @@ int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, > -| the largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, > +the largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4339,16 +4488,17 @@ int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the same > -| sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4383,13 +4533,14 @@ int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the single-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the single-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4411,13 +4562,14 @@ float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4439,13 +4591,14 @@ float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the quadruple-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the quadruple-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4463,13 +4616,14 @@ float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the extended double-precision floating-point value `a' to an integer, > -| and returns the result as an extended quadruple-precision floating-point > -| value. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the extended double-precision floating-point value `a' to an integer, > +and returns the result as an extended quadruple-precision floating-point > +value. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4536,14 +4690,15 @@ floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the extended double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > -| negated before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the extended double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > +negated before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -4602,14 +4757,15 @@ static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the extended > -| double-precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the extended > +double-precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM ) > { > int32 aExp, bExp, zExp; > @@ -4670,12 +4826,13 @@ static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the extended double-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the extended double-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4691,12 +4848,13 @@ floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the extended double-precision floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4712,12 +4870,13 @@ floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the extended double-precision floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4771,12 +4930,13 @@ floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the extended double-precision floating-point > -| value `a' by the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the extended double-precision floating-point > +value `a' by the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4851,12 +5011,13 @@ floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the extended double-precision floating-point value > -| `a' with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the extended double-precision floating-point value > +`a' with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4947,12 +5108,13 @@ floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the extended double-precision floating-point > -| value `a'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the extended double-precision floating-point > +value `a'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -5017,12 +5179,14 @@ floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is equal > -| to the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is equal > +to the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5044,13 +5208,15 @@ int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than or equal to the corresponding value `b', and 0 otherwise. The > -| invalid exception is raised if either operand is a NaN. The comparison is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than or equal to the corresponding value `b', and 0 otherwise. The > +invalid exception is raised if either operand is a NaN. The comparison is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5078,12 +5244,14 @@ int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5111,12 +5279,14 @@ int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' and `b' > -| cannot be compared, and 0 otherwise. The invalid exception is raised if > -| either operand is a NaN. The comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and `b' > +cannot be compared, and 0 otherwise. The invalid exception is raised if > +either operand is a NaN. The comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5130,12 +5300,14 @@ int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5160,12 +5332,14 @@ int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is less > -| than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > -| do not cause an exception. Otherwise, the comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is less > +than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > +do not cause an exception. Otherwise, the comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5196,12 +5370,14 @@ int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is less > -| than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > -| an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is less > +than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > +an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5232,12 +5408,14 @@ int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' and `b' > -| cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > -| The comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and `b' > +cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > +The comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5254,16 +5432,17 @@ int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5283,16 +5462,17 @@ int32 float128_to_int32( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5331,16 +5511,17 @@ int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5374,16 +5555,17 @@ int64 float128_to_int64( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5435,13 +5617,14 @@ int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the single-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the single-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float128_to_float32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5470,13 +5653,14 @@ float32 float128_to_float32( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float128_to_float64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5503,13 +5687,14 @@ float64 float128_to_float64( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the extended double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the extended double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5538,13 +5723,14 @@ floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the quadruple-precision floating-point value `a' to an integer, and > -| returns the result as a quadruple-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the quadruple-precision floating-point value `a' to an integer, and > +returns the result as a quadruple-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5641,14 +5827,15 @@ float128 float128_round_to_int( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the quadruple-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the quadruple-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5727,14 +5914,15 @@ static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the quadruple- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the quadruple- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5811,12 +5999,13 @@ static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the quadruple-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the quadruple-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_add( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5832,12 +6021,13 @@ float128 float128_add( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5853,12 +6043,13 @@ float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -5917,12 +6108,13 @@ float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the quadruple-precision floating-point value > -| `a' by the corresponding value `b'. The operation is performed according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the quadruple-precision floating-point value > +`a' by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_div( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -6001,12 +6193,13 @@ float128 float128_div( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the quadruple-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the quadruple-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -6110,12 +6303,13 @@ float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the quadruple-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the quadruple-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_sqrt( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -6179,12 +6373,14 @@ float128 float128_sqrt( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_eq( float128 a, float128 b STATUS_PARAM ) > { > @@ -6206,12 +6402,14 @@ int float128_eq( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_le( float128 a, float128 b STATUS_PARAM ) > { > @@ -6239,12 +6437,14 @@ int float128_le( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_lt( float128 a, float128 b STATUS_PARAM ) > { > @@ -6272,12 +6472,14 @@ int float128_lt( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered( float128 a, float128 b STATUS_PARAM ) > { > @@ -6292,12 +6494,14 @@ int float128_unordered( float128 a, float128 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6322,12 +6526,14 @@ int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6358,12 +6564,14 @@ int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6394,12 +6602,14 @@ int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered_quiet( float128 a, float128 b STATUS_PARAM ) > { > diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h > index f3927e2..b646621 100644 > --- a/include/fpu/softfloat.h > +++ b/include/fpu/softfloat.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +============================================================================ > > -This C header file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic > -Package, Release 2b. > +This C header file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > -=============================================================================*/ > +=============================================================================== > +*/ > > #ifndef SOFTFLOAT_H > #define SOFTFLOAT_H > @@ -46,14 +45,16 @@ these four paragraphs for those parts of this code that are retained. > #include "config-host.h" > #include "qemu/osdep.h" > > -/*---------------------------------------------------------------------------- > -| Each of the following `typedef's defines the most convenient type that holds > -| integers of at least as many bits as specified. For example, `uint8' should > -| be the most convenient type that can hold unsigned integers of as many as > -| 8 bits. The `flag' type must be able to hold either a 0 or 1. For most > -| implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed > -| to the same as `int'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Each of the following `typedef's defines the most convenient type that holds > +integers of at least as many bits as specified. For example, `uint8' should > +be the most convenient type that can hold unsigned integers of as many as > +8 bits. The `flag' type must be able to hold either a 0 or 1. For most > +implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed > +to the same as `int'. > +------------------------------------------------------------------------------- > +*/ > typedef uint8_t flag; > typedef uint8_t uint8; > typedef int8_t int8; > @@ -69,9 +70,11 @@ typedef int64_t int64; > #define STATUS(field) status->field > #define STATUS_VAR , status > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point ordering relations > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point ordering relations > +------------------------------------------------------------------------------- > +*/ > enum { > float_relation_less = -1, > float_relation_equal = 0, > @@ -79,9 +82,11 @@ enum { > float_relation_unordered = 2 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point types. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point types. > +------------------------------------------------------------------------------- > +*/ > /* Use structures for soft-float types. This prevents accidentally mixing > them with native int/float types. A sufficiently clever compiler and > sane ABI should be able to see though these structs. However > @@ -137,17 +142,21 @@ typedef struct { > #define make_float128(high_, low_) ((float128) { .high = high_, .low = low_ }) > #define make_float128_init(high_, low_) { .high = high_, .low = low_ } > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point underflow tininess-detection mode. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point underflow tininess-detection mode. > +------------------------------------------------------------------------------- > +*/ > enum { > float_tininess_after_rounding = 0, > float_tininess_before_rounding = 1 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point rounding mode. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point rounding mode. > +------------------------------------------------------------------------------- > +*/ > enum { > float_round_nearest_even = 0, > float_round_down = 1, > @@ -155,9 +164,11 @@ enum { > float_round_to_zero = 3 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point exception flags. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point exception flags. > +------------------------------------------------------------------------------- > +*/ > enum { > float_flag_invalid = 1, > float_flag_divbyzero = 4, > @@ -167,7 +178,6 @@ enum { > float_flag_input_denormal = 64, > float_flag_output_denormal = 128 > }; > - > typedef struct float_status { > signed char float_detect_tininess; > signed char float_rounding_mode; > @@ -204,27 +214,33 @@ INLINE int get_float_exception_flags(float_status *status) > } > void set_floatx80_rounding_precision(int val STATUS_PARAM); > > -/*---------------------------------------------------------------------------- > -| Routine to raise any or all of the software IEC/IEEE floating-point > -| exception flags. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Routine to raise any or all of the software IEC/IEEE floating-point > +exception flags. > +------------------------------------------------------------------------------- > +*/ > void float_raise( int8 flags STATUS_PARAM); > > -/*---------------------------------------------------------------------------- > -| Options to indicate which negations to perform in float*_muladd() > -| Using these differs from negating an input or output before calling > -| the muladd function in that this means that a NaN doesn't have its > -| sign bit inverted before it is propagated. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Options to indicate which negations to perform in float*_muladd() > +Using these differs from negating an input or output before calling > +the muladd function in that this means that a NaN doesn't have its > +sign bit inverted before it is propagated. > +------------------------------------------------------------------------------- > +*/ > enum { > float_muladd_negate_c = 1, > float_muladd_negate_product = 2, > float_muladd_negate_result = 4, > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE integer-to-floating-point conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE integer-to-floating-point conversion routines. > +------------------------------------------------------------------------------- > +*/ > float32 int32_to_float32( int32 STATUS_PARAM ); > float64 int32_to_float64( int32 STATUS_PARAM ); > float32 uint32_to_float32( uint32 STATUS_PARAM ); > @@ -239,15 +255,19 @@ floatx80 int64_to_floatx80( int64 STATUS_PARAM ); > float128 int64_to_float128( int64 STATUS_PARAM ); > float128 uint64_to_float128( uint64 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software half-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software half-precision conversion routines. > +*---------------------------------------------------------------------------- > +*/ > float16 float32_to_float16( float32, flag STATUS_PARAM ); > float32 float16_to_float32( float16, flag STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software half-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software half-precision operations. > +------------------------------------------------------------------------------- > +*/ > int float16_is_quiet_nan( float16 ); > int float16_is_signaling_nan( float16 ); > float16 float16_maybe_silence_nan( float16 ); > @@ -257,14 +277,18 @@ INLINE int float16_is_any_nan(float16 a) > return ((float16_val(a) & ~0x8000) > 0x7c00); > } > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float16 float16_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float32_to_int16_round_to_zero(float32 STATUS_PARAM); > uint_fast16_t float32_to_uint16_round_to_zero(float32 STATUS_PARAM); > int32 float32_to_int32( float32 STATUS_PARAM ); > @@ -277,9 +301,11 @@ float64 float32_to_float64( float32 STATUS_PARAM ); > floatx80 float32_to_floatx80( float32 STATUS_PARAM ); > float128 float32_to_float128( float32 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision operations. > +------------------------------------------------------------------------------- > +*/ > float32 float32_round_to_int( float32 STATUS_PARAM ); > float32 float32_add( float32, float32 STATUS_PARAM ); > float32 float32_sub( float32, float32 STATUS_PARAM ); > @@ -361,14 +387,18 @@ INLINE float32 float32_set_sign(float32 a, int sign) > #define float32_infinity make_float32(0x7f800000) > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float32 float32_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float64_to_int16_round_to_zero(float64 STATUS_PARAM); > uint_fast16_t float64_to_uint16_round_to_zero(float64 STATUS_PARAM); > int32 float64_to_int32( float64 STATUS_PARAM ); > @@ -383,9 +413,11 @@ float32 float64_to_float32( float64 STATUS_PARAM ); > floatx80 float64_to_floatx80( float64 STATUS_PARAM ); > float128 float64_to_float128( float64 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision operations. > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 STATUS_PARAM ); > float64 float64_trunc_to_int( float64 STATUS_PARAM ); > float64 float64_add( float64, float64 STATUS_PARAM ); > @@ -467,14 +499,18 @@ INLINE float64 float64_set_sign(float64 a, int sign) > #define float64_half make_float64(0x3fe0000000000000LL) > #define float64_infinity make_float64(0x7ff0000000000000LL) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float64 float64_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 STATUS_PARAM ); > int32 floatx80_to_int32_round_to_zero( floatx80 STATUS_PARAM ); > int64 floatx80_to_int64( floatx80 STATUS_PARAM ); > @@ -483,9 +519,11 @@ float32 floatx80_to_float32( floatx80 STATUS_PARAM ); > float64 floatx80_to_float64( floatx80 STATUS_PARAM ); > float128 floatx80_to_float128( floatx80 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision operations. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 STATUS_PARAM ); > floatx80 floatx80_add( floatx80, floatx80 STATUS_PARAM ); > floatx80 floatx80_sub( floatx80, floatx80 STATUS_PARAM ); > @@ -552,14 +590,18 @@ INLINE int floatx80_is_any_nan(floatx80 a) > #define floatx80_half make_floatx80(0x3ffe, 0x8000000000000000LL) > #define floatx80_infinity make_floatx80(0x7fff, 0x8000000000000000LL) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const floatx80 floatx80_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 STATUS_PARAM ); > int32 float128_to_int32_round_to_zero( float128 STATUS_PARAM ); > int64 float128_to_int64( float128 STATUS_PARAM ); > @@ -568,9 +610,11 @@ float32 float128_to_float32( float128 STATUS_PARAM ); > float64 float128_to_float64( float128 STATUS_PARAM ); > floatx80 float128_to_floatx80( float128 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision operations. > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 STATUS_PARAM ); > float128 float128_add( float128, float128 STATUS_PARAM ); > float128 float128_sub( float128, float128 STATUS_PARAM ); > @@ -633,9 +677,11 @@ INLINE int float128_is_any_nan(float128 a) > > #define float128_zero make_float128(0, 0) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float128 float128_default_nan; > > #endif /* !SOFTFLOAT_H */ > -- > 1.8.0
On 2013-04-29 19:05, Anthony Liguori wrote: > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. Acked-by: Richard Henderson <rth@twiddle.net> r~
On 29 April 2013 19:53, Anthony Liguori <aliguori@us.ibm.com> wrote: > Anthony Liguori <aliguori@us.ibm.com> writes: > >> Thiemo Seufer <ths@networkno.de> [rearranging the order a little] >> 5a6932d Fix NaN handling for MIPS and HPPA. >> b645bb4 Fix softfloat NaN handling. These two commits are basically the SNAN_BIT_IS_ONE support, needed for MIPS, SH4 and Unicore32. It should be easy enough for somebody who cares about those targets to reinstate. >> 924b2c0 Add proper float*_is_nan prototypes. This is just adding some function prototypes -- we can trivially reimplement it (probably by fixing the compile errors that result when it's reverted). >> 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files >> 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. >> fc81ba5 Check that HOST_SOLARIS is defined before relying on its >> value. Spotted by Joachim Henke. These three are all changes to files that have subsequently been deleted (the softfloat-native support was dropped altogether). -- PMM
Am 29.04.2013 20:53, schrieb Anthony Liguori: > Anthony Liguori <aliguori@us.ibm.com> writes: > >> Thiemo Seufer <ths@networkno.de> >> 5a6932d Fix NaN handling for MIPS and HPPA. >> 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files >> 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. >> 924b2c0 Add proper float*_is_nan prototypes. >> b645bb4 Fix softfloat NaN handling. >> fc81ba5 Check that HOST_SOLARIS is defined before relying on its >> value. Spotted by Joachim Henke. > As most people know, Thiemo passed away a few years ago. I think we're > going to have to revert all of these commits unless someone is able to > contact his estate and get permission to relicense. > > Regards, > > Anthony Liguori I'll try to ask his parents for the permission. Regards, Stefan Weil
Il 29/04/2013 20:05, Anthony Liguori ha scritto: > N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. > > Please note, this is a comment-only change. The resulting binary should > be the same. > > I created this patch using the following strategy: > > 1) Create a branch using the original import of softfloat code: > $ git checkout 158142c2c2df728cfa3b5320c65534921a764f26 > > 2) Remove carriage returns from Softfloat-2b > > 3) Compare each of the softfloat files against Softfloat-2b using the > following mapping to generate Fabrice's original softfloat changes: > > - fpu/softfloat.c -> softfloat/bits64/softfloat.c > - fpu/softfloat.h -> softfloat/bits64/386-Win32-gcc/softfloat.h > - fpu/softfloat-macros.h -> softfloat/bits64/softfloat-macros > - fpu/softfloat-specialize.h -> softfloat/bits64/386-Win32-gcc/softfloat-specialize > > 4) Replace our softfloat files with the corresponding files from Softfloat-2a > > 5) Apply the diffs from (3) to (4) and commit > > 6) Create a diff between (5) and 158142c2c2df728cfa3b5320c65534921a764f26 > - This diff consists 100% of licensing change + comment reformating > > 7) Checkout the latest master branch, apply the diff from (6) > - There were a lot of comment rejects, confirmed this was only comments > and then used an emacs macro to rewrite the comments to the Softfloat-2a > form. > > Cc: Andreas Färber <afaerber@suse.de> > Cc: Aurelien Jarno <aurelien@aurel32.net> > Cc: Avi Kivity <avi.kivity@gmail.com> > Cc: Ben Taylor <bentaylor.solx86@gmail.com> > Cc: Blue Swirl <blauwirbel@gmail.com> > Cc: Christophe Lyon <christophe.lyon@st.com> > Cc: Fabrice Bellard <fabrice@bellard.org> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> > Cc: Jocelyn Mayer <l_indien@magic.fr> > Cc: Juan Quintela <quintela@redhat.com> > Cc: malc <av1474@comtv.ru> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Paul Brook <paul@codesourcery.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Richard Sandiford <rdsandiford@googlemail.com> > Cc: Stefan Weil <weil@mail.berlios.de> > Cc: Thiemo Seufer <ths@networkno.de> > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > --- > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). All Red Hat contributions (at least Avi, Juan, me; don't know about rth) are available under GPLv2+; also other authors agreed on it. For this particular license, Acked-by: Paolo Bonzini <pbonzini@redhat.com> But it doesn't look like a task that can be ever completed. I'll shortly find out how many of those addresses bounce. > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name <your@email.com> > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). > > Please respond no later than May 6th, 2013. If we are unable to confirm > relicense from an author, changes from that author will be reverted. How is that possible for six year old patches such as commit b645bb4 (Fix softfloat NaN handling., 2007-05-11)? Ten days before a release, even? Paolo > --- > For completeness, here is the full listing of contributions: > > Andreas Färber <afaerber@suse.de> > be45f06 Silence softfloat warnings on OpenSolaris > 5aea4c5 softfloat: Replace uint16 type with uint_fast16_t > 94a49d8 softfloat: Replace int16 type with int_fast16_t > c969654 softfloat: Fix mixups of int and int16 > 38641f8 softfloat: Use uint16 consistently > 87b8cc3 softfloat: Resolve type mismatches between declaration and implementation > 8d725fa softfloat: Prepend QEMU-style header with derivation notice > 9f8d2a0 softfloat: Use uint32 consistently > bb98fe4 softfloat: Drop [s]bits{8, 16, 32, 64} types in favor of [u]int{8, 16, 32, 64}_t > > Aurelien Jarno <aurelien@aurel32.net> > 1020160 softfloat: fix default-NaN mode > 084d19b target-mips: Implement correct NaN propagation rules > 196cfc8 softfloat: add a 1.0 constant for float32 and float64 > 1b2ad2e softfloat-native: fix *nan() > 1f398e0 softfloat: use float{32,64,x80,128}_maybe_silence_nan() > 211315f softfloat: rename float*_eq() into float*_eq_quiet() > 2657d0f softfloat: rename float*_eq_signaling() into float*_eq() > 30e7a22 Use float_relation_* constants > 326b9e9 softfloat: fix float*_scalnb() corner cases > 34d2386 softfloat: remove HPPA specific code > 374dfc3 soft-float: add float32_log2() and float64_log2() > 4cc5383 softfloat-native: add float*_is_any_nan() functions > 587eabf softfloat: add float*_is_zero_or_denormal() > 629bd74 softfloat-native: add float32_is_nan() > 67b7861 softfloat: add float*_unordered_{,quiet}() functions > 8229c99 softfloat: add float32_exp2() > 85016c9 Assortment of soft-float fixes, by Aurelien Jarno. > 8d6c92b softfloat-native: improve correctness of floatXX_is_neg() > 93ae1c6 softfloat: fix float{32,64}_maybe_silence_nan() for MIPS > a167ba5 Add support for GNU/kFreeBSD > b3b4c7f softfloat: use GCC builtins to count the leading zeros > b4a0ef7 softfloat-native: add float*_unordered_quiet() functions > b689362 softfloat: move float*_eq and float*_eq_quiet > b76235e softfloat: fix floatx80_is_infinity() > bbc1ded softfloat: implement fused multiply-add NaN propagation for MIPS > be22a9a softfloat: always enable floatx80 and float128 support > c4b4c77 softfloat: add pi constants > c52ab6f fp: add floatXX_is_infinity(), floatXX_is_neg(), floatXX_is_zero() > cf67c6b softfloat-native: remove > d2b1027 softfloat-native: add a few constant values > d6882cf softfloat-native: fix float*_scalbn() functions > d735d69 softfloat: rename *IsNaN variables to *IsQuietNaN > dadd71a fp: fix float32_is_infinity() > de4af5f softfloat: fix floatx80_is_{quiet,signaling}_nan() > e024e88 target-ppc: Implement correct NaN propagation rules > e2f4220 softfloat: fix floatx80 handling of NaN > e872aa8 softfloat-native: fix type of float_rounding_mode > e908775 softfloat: SH4 has the sNaN bit set > f3218a8 softfloat: add floatx80 constants > f5a6425 softfloat: improve description of comparison functions > f6714d3 softfloat: add floatx80_compare*() functions > f6a7d92 softfloat: add float{x80,128}_maybe_silence_nan() > > Avi Kivity <avi.kivity@gmail.com> > 3bf7e40 softfloat: fix for C99 > > Ben Taylor <bentaylor.solx86@gmail.com> > 0475a5c Solaris 9/x86 support, by Ben Taylor. > c94655b Updated Solaris isinf support, by Juergen Keil and Ben Taylor. > > Blue Swirl <blauwirbel@gmail.com> > 128ab2f Preliminary OpenBSD host support (based on OpenBSD patches by Todd T. Fries) > 14d483e Fix OpenSolaris softfloat warnings > 179a2c1 Rename _BSD to HOST_BSD so that it's more obvious that it's defined by configure > 1d6198c Remove unnecessary trailing newlines > 1f58732 128-bit float support for user mode > 2734c70 Rename one more _BSD to HOST_BSD (spotted by Hasso Tepper) > 3f4cb3d Fix OpenSolaris gcc4 warnings: iovec type mismatches, missing 'static' > 70c1470 Sparse fixes: dubious mixing of bitwise and logical operations > 7c2a9d0 Fix math warnings on OpenBSD -current > b1d8e52 Fix undeclared symbol warnings from sparse > b55266b Suppress gcc 4.x -Wpointer-sign (included in -Wall) warnings > cd8a253 Fix more typos in softloat code (Eduardo Felipe) > d07cca0 Add native softfloat fpu functions (Christoph Egger) > ed086f3 softfloat: remove dead assignments, spotted by clang > > Christophe Lyon <christophe.lyon@st.com> > 8559666 softfloat: move all default NaN definitions to softfloat.h. > bcd4d9a softfloat: Honour default_nan_mode for float-to-float conversions > c30fe7d softfloat: add _set_sign(), _infinity and _half for 32 and 64 bits floats. > > Fabrice Bellard <fabrice@bellard.org> > 158142c soft float support > 1b2b0af 64 bit fix > 1d6bda3 added abs, chs and compare functions > 38cfa06 Solaris port (Ben Taylor) > 750afe9 avoid using char when it is not necessary > b109f9f more native FPU comparison functions - native FPU remainder > ec530c8 Solaris port (Ben Taylor) > fdbb469 Solaris/SPARC host port (Ben Taylor) > > Guan Xuetao <gxt@mprc.pku.edu.cn> > d2fbca9 unicore32: necessary modifications for other files to support unicore32 > > Jocelyn Mayer <l_indien@magic.fr> > 3430b0b Ooops... Typo. > 75d62a5 Add missing softfloat helpers. > > Juan Quintela <quintela@redhat.com> > 0eb4fc8 softfloat: make USE_SOFTFLOAT_STRUCT_TYPES compile > 71e72a1 rename HOST_BSD to CONFIG_BSD > 75b5a69 rename NEEDS_LIBSUNMATH to CONFIG_NEEDS_LIBSUNMATH > dfe5fff change HOST_SOLARIS to CONFIG_SOLARIS{_VERSION} > e2542fe rename WORDS_BIGENDIAN to HOST_WORDS_BIGENDIAN > > malc <av1474@comtv.ru> > 947f5fc Add static qualifier to local functions > e58ffeb Remove all traces of __powerpc__ > > Max Filippov <jcmvbkbc@gmail.com> > 6617680 softfloat: make float_muladd_negate_* flags independent > 213ff4e softfloat: add NO_SIGNALING_NANS > b81fe82 target-xtensa: specialize softfloat NaN rules > > Paolo Bonzini <pbonzini@redhat.com> > 1de7afc misc: move include files to include/qemu/ > 6b4c305 fpu: move public header file to include/fpu > 789ec7c softfloat: change default nan definitions to variables > > Paul Brook <paul@codesourcery.com> > 6001149 ARM FP16 support > 6939754 Correctly normalize values and handle zero inputs to scalbn functions. > 3598ecb Remove missing include. > 5c7908e Implement default-NaN mode. > 7918bf4 Fix typo in BSD FP rounding mode names. > 9027db8 Fix ARM default NaN. > 9ee6e8b ARMv7 support. > a1b91bb Fix typo in softfloat code. > e6e5906 ColdFire target. > f090c9d Add strict checking mode for softfp code. > fe76d97 Implement flush-to-zero mode (denormal results are replaced with zero). > > Peter Maydell <peter.maydell@linaro.org> > 1856987 softfloat: Rename float*_is_nan() functions to float*_is_quiet_nan() > 760e141 softfloat: roundAndPackInt{32, 64}: Don't assume int32 is 32 bits > 011da61 target-arm: Implement correct NaN propagation rules > 21d6ebd softfloat: Add float*_is_any_nan() functions > 274f1b0 softfloat: Add float*_min() and float*_max() functions > 2ac8bd0 softfloat: Reinstate accidentally disabled target-specific NaN handling > 2bed652 softfloat: Implement floatx80_is_any_nan() and float128_is_any_nan() > 354f211 softfloat: abstract out target-specific NaN propagation rules > 369be8f softfloat: Implement fused multiply-add > 37d1866 softfloat: Implement flushing input denormals to zero > 4be8eea fpu/softfloat.c: Remove pointless shift of always-zero value > 600e30d softfloat: Fix single-to-half precision float conversions > 6f3300a softfloat: Add float32_is_zero_or_denormal() function > b3a6a2e softfloat: float*_to_int32_round_to_zero: don't assume int32 is 32 bits > b408dbd softfloat: Add float*_maybe_silence_nan() functions > bb4d4bb softfloat: Add float16 type and float16 NaN handling functions > c29aca4 softfloat: Add setter function for tininess detection mode > cbcef45 softfloat: Add float/double to 16 bit integer conversion functions > d5138cf softfloat: Fix compilation failures with USE_SOFTFLOAT_STRUCT_TYPES > e3d142d fpu: Correct edgecase in float64_muladd > e6afc87 softfloat: Add new flag for when denormal result is flushed to zero > e744c06 fpu/softfloat.c: Return correctly signed values from uint64_to_float32 > f591e1b softfloat: Correctly handle NaNs in float16_to_float32() > > Richard Henderson <rth@twiddle.net> > 17ed229 softfloat: Fix uint64_to_float64 > 1e397ea softfloat: Implement uint64_to_float128 > 8443eff target-alpha: Split up FPCR value into separate fields. > 990b3e1 target-alpha: Enable softfloat. > ba0e276 target-alpha: Fixes for alpha-linux syscalls. > > Richard Sandiford <rdsandiford@googlemail.com> > a6e7c18 softfloat: Handle float_muladd_negate_c when product is zero > > Stefan Weil <weil@mail.berlios.de> > bc4347b arm host: fix compiler warning > > Thiemo Seufer <ths@networkno.de> > 5a6932d Fix NaN handling for MIPS and HPPA. > 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files > 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. > 924b2c0 Add proper float*_is_nan prototypes. > b645bb4 Fix softfloat NaN handling. > fc81ba5 Check that HOST_SOLARIS is defined before relying on its value. Spotted by Joachim Henke. > --- > fpu/softfloat-macros.h | 430 ++++---- > fpu/softfloat-specialize.h | 494 +++++---- > fpu/softfloat.c | 2436 ++++++++++++++++++++++++-------------------- > include/fpu/softfloat.h | 242 +++-- > 4 files changed, 1981 insertions(+), 1621 deletions(-) > > diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h > index b5164af..2009315 100644 > --- a/fpu/softfloat-macros.h > +++ b/fpu/softfloat-macros.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,28 +17,27 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal notice) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > =============================================================================*/ > > -/*---------------------------------------------------------------------------- > -| This macro tests for minimum version of the GNU C compiler. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +This macro tests for minimum version of the GNU C compiler. > +------------------------------------------------------------------------------- > +*/ > #if defined(__GNUC__) && defined(__GNUC_MINOR__) > # define SOFTFLOAT_GNUC_PREREQ(maj, min) \ > ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) > @@ -46,14 +46,16 @@ these four paragraphs for those parts of this code that are retained. > #endif > > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant bit of > -| the result by setting the least significant bit to 1. The value of `count' > -| can be arbitrarily large; in particular, if `count' is greater than 32, the > -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. > -| The result is stored in the location pointed to by `zPtr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit of > +the result by setting the least significant bit to 1. The value of `count' > +can be arbitrarily large; in particular, if `count' is greater than 32, the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) > { > @@ -72,14 +74,16 @@ INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) > > } > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant bit of > -| the result by setting the least significant bit to 1. The value of `count' > -| can be arbitrarily large; in particular, if `count' is greater than 64, the > -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. > -| The result is stored in the location pointed to by `zPtr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit of > +the result by setting the least significant bit to 1. The value of `count' > +can be arbitrarily large; in particular, if `count' is greater than 64, the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) > { > @@ -98,23 +102,24 @@ INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 > -| _plus_ the number of bits given in `count'. The shifted result is at most > -| 64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The > -| bits shifted off form a second 64-bit result as follows: The _last_ bit > -| shifted off is the most-significant bit of the extra result, and the other > -| 63 bits of the extra result are all zero if and only if _all_but_the_last_ > -| bits shifted off were all zero. This extra result is stored in the location > -| pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0' and `a1' are considered to form > -| a fixed-point value with binary point between `a0' and `a1'. This fixed- > -| point value is shifted right by the number of bits given in `count', and > -| the integer part of the result is returned at the location pointed to by > -| `z0Ptr'. The fractional part of the result may be slightly corrupted as > -| described above, and is returned at the location pointed to by `z1Ptr'.) > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 > +_plus_ the number of bits given in `count'. The shifted result is at most > +64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The > +bits shifted off form a second 64-bit result as follows: The _last_ bit > +shifted off is the most-significant bit of the extra result, and the other > +63 bits of the extra result are all zero if and only if _all_but_the_last_ > +bits shifted off were all zero. This extra result is stored in the location > +pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0' and `a1' are considered to form a > +fixed-point value with binary point between `a0' and `a1'. This fixed-point > +value is shifted right by the number of bits given in `count', and the > +integer part of the result is returned at the location pointed to by > +`z0Ptr'. The fractional part of the result may be slightly corrupted as > +described above, and is returned at the location pointed to by `z1Ptr'.) > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift64ExtraRightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -144,14 +149,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > -| number of bits given in `count'. Any bits shifted off are lost. The value > -| of `count' can be arbitrarily large; in particular, if `count' is greater > -| than 128, the result will be 0. The result is broken into two 64-bit pieces > -| which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > +number of bits given in `count'. Any bits shifted off are lost. The value > +of `count' can be arbitrarily large; in particular, if `count' is greater > +than 128, the result will be 0. The result is broken into two 64-bit pieces > +which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128Right( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -176,17 +182,18 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > -| number of bits given in `count'. If any nonzero bits are shifted off, they > -| are ``jammed'' into the least significant bit of the result by setting the > -| least significant bit to 1. The value of `count' can be arbitrarily large; > -| in particular, if `count' is greater than 128, the result will be either > -| 0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or > -| nonzero. The result is broken into two 64-bit pieces which are stored at > -| the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > +number of bits given in `count'. If any nonzero bits are shifted off, they > +are ``jammed'' into the least significant bit of the result by setting the > +least significant bit to 1. The value of `count' can be arbitrarily large; > +in particular, if `count' is greater than 128, the result will be either > +0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or > +nonzero. The result is broken into two 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128RightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -219,25 +226,26 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right > -| by 64 _plus_ the number of bits given in `count'. The shifted result is > -| at most 128 nonzero bits; these are broken into two 64-bit pieces which are > -| stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted > -| off form a third 64-bit result as follows: The _last_ bit shifted off is > -| the most-significant bit of the extra result, and the other 63 bits of the > -| extra result are all zero if and only if _all_but_the_last_ bits shifted off > -| were all zero. This extra result is stored in the location pointed to by > -| `z2Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0', `a1', and `a2' are considered > -| to form a fixed-point value with binary point between `a1' and `a2'. This > -| fixed-point value is shifted right by the number of bits given in `count', > -| and the integer part of the result is returned at the locations pointed to > -| by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly > -| corrupted as described above, and is returned at the location pointed to by > -| `z2Ptr'.) > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right > +by 64 _plus_ the number of bits given in `count'. The shifted result is > +at most 128 nonzero bits; these are broken into two 64-bit pieces which are > +stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted > +off form a third 64-bit result as follows: The _last_ bit shifted off is > +the most-significant bit of the extra result, and the other 63 bits of the > +extra result are all zero if and only if _all_but_the_last_ bits shifted off > +were all zero. This extra result is stored in the location pointed to by > +`z2Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0', `a1', and `a2' are considered > +to form a fixed-point value with binary point between `a1' and `a2'. This > +fixed-point value is shifted right by the number of bits given in `count', > +and the integer part of the result is returned at the locations pointed to > +by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly > +corrupted as described above, and is returned at the location pointed to by > +`z2Ptr'.) > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128ExtraRightJamming( > uint64_t a0, > @@ -289,13 +297,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the > -| number of bits given in `count'. Any bits shifted off are lost. The value > -| of `count' must be less than 64. The result is broken into two 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the > +number of bits given in `count'. Any bits shifted off are lost. The value > +of `count' must be less than 64. The result is broken into two 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift128Left( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -307,14 +316,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left > -| by the number of bits given in `count'. Any bits shifted off are lost. > -| The value of `count' must be less than 64. The result is broken into three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left > +by the number of bits given in `count'. Any bits shifted off are lost. > +The value of `count' must be less than 64. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift192Left( > uint64_t a0, > @@ -343,13 +353,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit > -| value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so > -| any carry out is lost. The result is broken into two 64-bit pieces which > -| are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit > +value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so > +any carry out is lost. The result is broken into two 64-bit pieces which > +are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > add128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) > @@ -362,14 +373,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the > -| 192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > -| modulo 2^192, so any carry out is lost. The result is broken into three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the > +192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > +modulo 2^192, so any carry out is lost. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > add192( > uint64_t a0, > @@ -400,14 +412,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the > -| 128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo > -| 2^128, so any borrow out (carry out) is lost. The result is broken into two > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > -| `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the > +128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo > +2^128, so any borrow out (carry out) is lost. The result is broken into two > +64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > +`z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) > @@ -418,14 +431,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > -| from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > -| Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > -| result is broken into three 64-bit pieces which are stored at the locations > -| pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > +from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > +Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > +result is broken into three 64-bit pieces which are stored at the locations > +pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub192( > uint64_t a0, > @@ -456,11 +470,13 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > -| into two 64-bit pieces which are stored at the locations pointed to by > -| `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > +into two 64-bit pieces which are stored at the locations pointed to by > +`z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr ) > { > @@ -485,13 +501,14 @@ INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > -| `b' to obtain a 192-bit product. The product is broken into three 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and > -| `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > +`b' to obtain a 192-bit product. The product is broken into three 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and > +`z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128By64To192( > uint64_t a0, > @@ -513,13 +530,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > -| 128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > -| product. The product is broken into four 64-bit pieces which are stored at > -| the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > +128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > +product. The product is broken into four 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128To256( > uint64_t a0, > @@ -550,14 +568,16 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the 64-bit integer quotient obtained by dividing > -| `b' into the 128-bit value formed by concatenating `a0' and `a1'. The > -| divisor `b' must be at least 2^63. If q is the exact quotient truncated > -| toward zero, the approximation returned lies between q and q + 2 inclusive. > -| If the exact quotient q is larger than 64 bits, the maximum positive 64-bit > -| unsigned integer is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns an approximation to the 64-bit integer quotient obtained by dividing > +`b' into the 128-bit value formed by concatenating `a0' and `a1'. The > +divisor `b' must be at least 2^63. If q is the exact quotient truncated > +toward zero, the approximation returned lies between q and q + 2 inclusive. > +If the exact quotient q is larger than 64 bits, the maximum positive 64-bit > +unsigned integer is returned. > +------------------------------------------------------------------------------- > +*/ > > static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) > { > @@ -581,15 +601,17 @@ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the square root of the 32-bit significand given > -| by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of > -| `aExp' (the least significant bit) is 1, the integer returned approximates > -| 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' > -| is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > -| case, the approximation returned lies strictly within +/-2 of the exact > -| value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns an approximation to the square root of the 32-bit significand given > +by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of > +`aExp' (the least significant bit) is 1, the integer returned approximates > +2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' > +is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > +case, the approximation returned lies strictly within +/-2 of the exact > +value. > +------------------------------------------------------------------------------- > +*/ > > static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > { > @@ -620,10 +642,12 @@ static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit of > -| `a'. If `a' is zero, 32 is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 32 is returned. > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros32( uint32_t a ) > { > @@ -668,10 +692,12 @@ static int8 countLeadingZeros32( uint32_t a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit of > -| `a'. If `a' is zero, 64 is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 64 is returned. > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros64( uint64_t a ) > { > @@ -696,11 +722,13 @@ static int8 countLeadingZeros64( uint64_t a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > -| is equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > +is equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -709,11 +737,13 @@ INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > -| than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > +than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -722,11 +752,13 @@ INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > -| than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > -| returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > +than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > +returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -735,11 +767,13 @@ INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > -| not equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > +not equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag ne128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h > index 518f694..ba9bfeb 100644 > --- a/fpu/softfloat-specialize.h > +++ b/fpu/softfloat-specialize.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,22 +17,19 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > =============================================================================*/ > > @@ -48,9 +46,11 @@ these four paragraphs for those parts of this code that are retained. > #define NO_SIGNALING_NANS 1 > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > const float16 float16_default_nan = const_float16(0x7E00); > #elif SNAN_BIT_IS_ONE > @@ -59,9 +59,11 @@ const float16 float16_default_nan = const_float16(0x7DFF); > const float16 float16_default_nan = const_float16(0xFE00); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float32 float32_default_nan = const_float32(0x7FFFFFFF); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) || \ > @@ -73,9 +75,11 @@ const float32 float32_default_nan = const_float32(0x7FBFFFFF); > const float32 float32_default_nan = const_float32(0xFFC00000); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float64 float64_default_nan = const_float64(LIT64( 0x7FFFFFFFFFFFFFFF )); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) > @@ -86,9 +90,11 @@ const float64 float64_default_nan = const_float64(LIT64( 0x7FF7FFFFFFFFFFFF )); > const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define floatx80_default_nan_high 0x7FFF > #define floatx80_default_nan_low LIT64( 0xBFFFFFFFFFFFFFFF ) > @@ -100,10 +106,12 @@ const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); > const floatx80 floatx80_default_nan > = make_floatx80_init(floatx80_default_nan_high, floatx80_default_nan_low); > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. The `high' and > -| `low' values hold the most- and least-significant bits, respectively. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. The `high' and > +`low' values hold the most- and least-significant bits, respectively. > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF ) > #define float128_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) > @@ -115,21 +123,25 @@ const floatx80 floatx80_default_nan > const float128 float128_default_nan > = make_float128_init(float128_default_nan_high, float128_default_nan_low); > > -/*---------------------------------------------------------------------------- > -| Raises the exceptions specified by `flags'. Floating-point traps can be > -| defined here if desired. It is currently not possible for such a trap > -| to substitute a result value. If traps are not implemented, this routine > -| should be simply `float_exception_flags |= flags;'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Raises the exceptions specified by `flags'. Floating-point traps can be > +defined here if desired. It is currently not possible for such a trap > +to substitute a result value. If traps are not implemented, this routine > +should be simply `float_exception_flags |= flags;'. > +------------------------------------------------------------------------------- > +*/ > > void float_raise( int8 flags STATUS_PARAM ) > { > STATUS(float_exception_flags) |= flags; > } > > -/*---------------------------------------------------------------------------- > -| Internal canonical NaN format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Internal canonical NaN format. > +------------------------------------------------------------------------------- > +*/ > typedef struct { > flag sign; > uint64_t high, low; > @@ -146,10 +158,12 @@ int float16_is_signaling_nan(float16 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float16_is_quiet_nan(float16 a_) > { > @@ -161,10 +175,12 @@ int float16_is_quiet_nan(float16 a_) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float16_is_signaling_nan(float16 a_) > { > @@ -177,10 +193,12 @@ int float16_is_signaling_nan(float16 a_) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the half-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the half-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > float16 float16_maybe_silence_nan(float16 a_) > { > if (float16_is_signaling_nan(a_)) { > @@ -199,11 +217,13 @@ float16 float16_maybe_silence_nan(float16 a_) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the half-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the half-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > > static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > { > @@ -216,10 +236,12 @@ static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the half- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the half- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM) > { > @@ -248,10 +270,12 @@ int float32_is_signaling_nan(float32 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float32_is_quiet_nan( float32 a_ ) > { > @@ -263,10 +287,12 @@ int float32_is_quiet_nan( float32 a_ ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float32_is_signaling_nan( float32 a_ ) > { > @@ -279,10 +305,12 @@ int float32_is_signaling_nan( float32 a_ ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the single-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the single-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float32 float32_maybe_silence_nan( float32 a_ ) > { > @@ -302,12 +330,13 @@ float32 float32_maybe_silence_nan( float32 a_ ) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > { > commonNaNT z; > @@ -319,10 +348,12 @@ static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the single- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the single- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > { > @@ -339,22 +370,24 @@ static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > return float32_default_nan; > } > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a two-input operation. > -| IEEE754 doesn't specify all the details of this, so the > -| algorithm is target-specific. > -| The routine is passed various bits of information about the > -| two NaNs and should return 0 to select NaN a and 1 for NaN b. > -| Note that signalling NaNs are always squashed to quiet NaNs > -| by the caller, by calling floatXX_maybe_silence_nan() before > -| returning them. > -| > -| aIsLargerSignificand is only valid if both a and b are NaNs > -| of some kind, and is true if a has the larger significand, > -| or if both a and b have the same significand but a is > -| positive but b is negative. It is only needed for the x87 > -| tie-break rule. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a two-input operation. > +IEEE754 doesn't specify all the details of this, so the > +algorithm is target-specific. > +The routine is passed various bits of information about the > +two NaNs and should return 0 to select NaN a and 1 for NaN b. > +Note that signalling NaNs are always squashed to quiet NaNs > +by the caller, by calling floatXX_maybe_silence_nan() before > +returning them. > + > +aIsLargerSignificand is only valid if both a and b are NaNs > +of some kind, and is true if a has the larger significand, > +or if both a and b have the same significand but a is > +positive but b is negative. It is only needed for the x87 > +tie-break rule. > +------------------------------------------------------------------------------- > +*/ > > #if defined(TARGET_ARM) > static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > @@ -451,12 +484,14 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > } > #endif > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a three-input operation. > -| For the moment we assume that no CPU needs the 'larger significand' > -| information. > -| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a three-input operation. > +For the moment we assume that no CPU needs the 'larger significand' > +information. > +Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM) > @@ -554,12 +589,13 @@ static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > } > #endif > > -/*---------------------------------------------------------------------------- > -| Takes two single-precision floating-point values `a' and `b', one of which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > -| signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two single-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -594,14 +630,16 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Takes three single-precision floating-point values `a', `b' and `c', one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes three single-precision floating-point values `a', `b' and `c', one of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------------- > +*/ > > static float32 propagateFloat32MulAddNaN(float32 a, float32 b, > float32 c, flag infzero STATUS_PARAM) > @@ -656,10 +694,12 @@ int float64_is_signaling_nan(float64 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float64_is_quiet_nan( float64 a_ ) > { > @@ -673,10 +713,12 @@ int float64_is_quiet_nan( float64 a_ ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float64_is_signaling_nan( float64 a_ ) > { > @@ -691,10 +733,12 @@ int float64_is_signaling_nan( float64 a_ ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the double-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the double-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float64 float64_maybe_silence_nan( float64 a_ ) > { > @@ -714,12 +758,13 @@ float64 float64_maybe_silence_nan( float64 a_ ) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > { > commonNaNT z; > @@ -731,10 +776,12 @@ static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the double- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the double- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > { > @@ -753,12 +800,13 @@ static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > return float64_default_nan; > } > > -/*---------------------------------------------------------------------------- > -| Takes two double-precision floating-point values `a' and `b', one of which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > -| signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two double-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -793,14 +841,16 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Takes three double-precision floating-point values `a', `b' and `c', one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes three double-precision floating-point values `a', `b' and `c', one of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------------- > +*/ > > static float64 propagateFloat64MulAddNaN(float64 a, float64 b, > float64 c, flag infzero STATUS_PARAM) > @@ -855,11 +905,13 @@ int floatx80_is_signaling_nan(floatx80 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| quiet NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +quiet NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_quiet_nan( floatx80 a ) > { > @@ -877,11 +929,13 @@ int floatx80_is_quiet_nan( floatx80 a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_signaling_nan( floatx80 a ) > { > @@ -900,10 +954,12 @@ int floatx80_is_signaling_nan( floatx80 a ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the extended double-precision floating point value > -| `a' is a signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the extended double-precision floating point value > +`a' is a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > floatx80 floatx80_maybe_silence_nan( floatx80 a ) > { > @@ -923,12 +979,13 @@ floatx80 floatx80_maybe_silence_nan( floatx80 a ) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the > -| invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the > +invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > { > commonNaNT z; > @@ -946,10 +1003,12 @@ static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the extended > -| double-precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the extended > +double-precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > { > @@ -972,12 +1031,13 @@ static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Takes two extended double-precision floating-point values `a' and `b', one > -| of which is a NaN, and returns the appropriate NaN result. If either `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two extended double-precision floating-point values `a' and `b', one > +of which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -1023,10 +1083,12 @@ int float128_is_signaling_nan(float128 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float128_is_quiet_nan( float128 a ) > { > @@ -1041,10 +1103,12 @@ int float128_is_quiet_nan( float128 a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float128_is_signaling_nan( float128 a ) > { > @@ -1060,10 +1124,12 @@ int float128_is_signaling_nan( float128 a ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the quadruple-precision floating point value `a' is > -| a signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the quadruple-precision floating point value `a' is > +a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float128 float128_maybe_silence_nan( float128 a ) > { > @@ -1083,12 +1149,13 @@ float128 float128_maybe_silence_nan( float128 a ) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > { > commonNaNT z; > @@ -1099,10 +1166,12 @@ static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the quadruple- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the quadruple- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > { > @@ -1119,12 +1188,13 @@ static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Takes two quadruple-precision floating-point values `a' and `b', one of > -| which is a NaN, and returns the appropriate NaN result. If either `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two quadruple-precision floating-point values `a' and `b', one of > +which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float128 propagateFloat128NaN( float128 a, float128 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > diff --git a/fpu/softfloat.c b/fpu/softfloat.c > index 7ba51b6..9145582 100644 > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > -This C source file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic > -Package, Release 2b. > +This C source file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > -=============================================================================*/ > +=============================================================================== > +*/ > > /* softfloat (and in particular the code in softfloat-specialize.h) is > * target-dependent and needs the TARGET_* macros. > @@ -42,21 +41,25 @@ these four paragraphs for those parts of this code that are retained. > > #include "fpu/softfloat.h" > > -/*---------------------------------------------------------------------------- > -| Primitive arithmetic functions, including multi-word arithmetic, and > -| division and square root approximations. (Can be specialized to target if > -| desired.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Primitive arithmetic functions, including multi-word arithmetic, and > +division and square root approximations. (Can be specialized to target if > +desired.) > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-macros.h" > > -/*---------------------------------------------------------------------------- > -| Functions and definitions to determine: (1) whether tininess for underflow > -| is detected before or after rounding by default, (2) what (if anything) > -| happens when exceptions are raised, (3) how signaling NaNs are distinguished > -| from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > -| are propagated from function inputs to output. These details are target- > -| specific. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Functions and definitions to determine: (1) whether tininess for underflow > +is detected before or after rounding by default, (2) what (if anything) > +happens when exceptions are raised, (3) how signaling NaNs are distinguished > +from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > +are propagated from function inputs to output. These details are target- > +specific. > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-specialize.h" > > void set_float_rounding_mode(int val STATUS_PARAM) > @@ -74,43 +77,51 @@ void set_floatx80_rounding_precision(int val STATUS_PARAM) > STATUS(floatx80_rounding_precision) = val; > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the half-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat16Frac(float16 a) > { > return float16_val(a) & 0x3ff; > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the half-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat16Exp(float16 a) > { > return (float16_val(a) >> 10) & 0x1f; > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag extractFloat16Sign(float16 a) > { > return float16_val(a)>>15; > } > > -/*---------------------------------------------------------------------------- > -| Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > -| and 7, and returns the properly rounded 32-bit integer corresponding to the > -| input. If `zSign' is 1, the input is negated before being converted to an > -| integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input > -| is simply rounded to an integer, with the inexact exception raised if the > -| input cannot be represented exactly as an integer. However, if the fixed- > -| point input is too large, the invalid exception is raised and the largest > -| positive or negative integer is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > +and 7, and returns the properly rounded 32-bit integer corresponding to the > +input. If `zSign' is 1, the input is negated before being converted to an > +integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input > +is simply rounded to an integer, with the inexact exception raised if the > +input cannot be represented exactly as an integer. However, if the fixed- > +point input is too large, the invalid exception is raised and the largest > +positive or negative integer is returned. > +------------------------------------------------------------------------------- > +*/ > > static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > { > @@ -150,17 +161,19 @@ static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > -| `absZ1', with binary point between bits 63 and 64 (between the input words), > -| and returns the properly rounded 64-bit integer corresponding to the input. > -| If `zSign' is 1, the input is negated before being converted to an integer. > -| Ordinarily, the fixed-point input is simply rounded to an integer, with > -| the inexact exception raised if the input cannot be represented exactly as > -| an integer. However, if the fixed-point input is too large, the invalid > -| exception is raised and the largest positive or negative integer is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > +`absZ1', with binary point between bits 63 and 64 (between the input words), > +and returns the properly rounded 64-bit integer corresponding to the input. > +If `zSign' is 1, the input is negated before being converted to an integer. > +Ordinarily, the fixed-point input is simply rounded to an integer, with > +the inexact exception raised if the input cannot be represented exactly as > +an integer. However, if the fixed-point input is too large, the invalid > +exception is raised and the largest positive or negative integer is > +returned. > +------------------------------------------------------------------------------- > +*/ > > static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATUS_PARAM) > { > @@ -203,9 +216,11 @@ static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATU > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat32Frac( float32 a ) > { > @@ -214,9 +229,11 @@ INLINE uint32_t extractFloat32Frac( float32 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat32Exp(float32 a) > { > @@ -225,10 +242,11 @@ INLINE int_fast16_t extractFloat32Exp(float32 a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat32Sign( float32 a ) > { > > @@ -236,10 +254,12 @@ INLINE flag extractFloat32Sign( float32 a ) > > } > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > +------------------------------------------------------------------------------- > +*/ > static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -251,13 +271,14 @@ static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal single-precision floating-point value represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal single-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat32Subnormal(uint32_t aSig, int_fast16_t *zExpPtr, uint32_t *zSigPtr) > { > @@ -269,16 +290,18 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| single-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +single-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > > INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > { > @@ -288,27 +311,29 @@ INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper single-precision floating- > -| point value corresponding to the abstract input. Ordinarily, the abstract > -| value is simply rounded and packed into the single-precision format, with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal single- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 30 > -| and 29, which is 7 bits to the left of the usual location. This shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the single-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal single- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 30 > +and 29, which is 7 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) > { > @@ -366,15 +391,16 @@ static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper single-precision floating- > -| point value corresponding to the abstract input. This routine is just like > -| `roundAndPackFloat32' except that `zSig' does not have to be normalized. > -| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > -| floating-point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. This routine is just like > +`roundAndPackFloat32' except that `zSig' does not have to be normalized. > +Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > +------------------------------------------------------------------------------- > +*/ > static float32 > normalizeRoundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) > { > @@ -385,9 +411,11 @@ static float32 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat64Frac( float64 a ) > { > @@ -396,9 +424,11 @@ INLINE uint64_t extractFloat64Frac( float64 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat64Exp(float64 a) > { > @@ -407,10 +437,11 @@ INLINE int_fast16_t extractFloat64Exp(float64 a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat64Sign( float64 a ) > { > > @@ -418,10 +449,12 @@ INLINE flag extractFloat64Sign( float64 a ) > > } > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > +------------------------------------------------------------------------------- > +*/ > static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -433,13 +466,14 @@ static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal double-precision floating-point value represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal double-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat64Subnormal(uint64_t aSig, int_fast16_t *zExpPtr, uint64_t *zSigPtr) > { > @@ -451,16 +485,18 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| double-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +double-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > > INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > { > @@ -470,27 +506,29 @@ INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper double-precision floating- > -| point value corresponding to the abstract input. Ordinarily, the abstract > -| value is simply rounded and packed into the double-precision format, with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded > -| to a subnormal number, and the underflow and inexact exceptions are raised > -| if the abstract input cannot be represented exactly as a subnormal double- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 62 > -| and 61, which is 10 bits to the left of the usual location. This shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the double-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded > +to a subnormal number, and the underflow and inexact exceptions are raised > +if the abstract input cannot be represented exactly as a subnormal double- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 62 > +and 61, which is 10 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) > { > @@ -548,15 +586,16 @@ static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper double-precision floating- > -| point value corresponding to the abstract input. This routine is just like > -| `roundAndPackFloat64' except that `zSig' does not have to be normalized. > -| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > -| floating-point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. This routine is just like > +`roundAndPackFloat64' except that `zSig' does not have to be normalized. > +Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > +------------------------------------------------------------------------------- > +*/ > static float64 > normalizeRoundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) > { > @@ -567,10 +606,12 @@ static float64 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the extended double-precision floating-point > -| value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloatx80Frac( floatx80 a ) > { > @@ -579,11 +620,12 @@ INLINE uint64_t extractFloatx80Frac( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the extended double-precision floating-point > -| value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloatx80Exp( floatx80 a ) > { > > @@ -591,11 +633,12 @@ INLINE int32 extractFloatx80Exp( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the extended double-precision floating-point value > -| `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the extended double-precision floating-point value > +`a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloatx80Sign( floatx80 a ) > { > > @@ -603,13 +646,14 @@ INLINE flag extractFloatx80Sign( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal extended double-precision floating-point value > -| represented by the denormalized significand `aSig'. The normalized exponent > -| and significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal extended double-precision floating-point value > +represented by the denormalized significand `aSig'. The normalized exponent > +and significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloatx80Subnormal( uint64_t aSig, int32 *zExpPtr, uint64_t *zSigPtr ) > { > @@ -621,10 +665,12 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > -| extended double-precision floating-point value, returning the result. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > +extended double-precision floating-point value, returning the result. > +------------------------------------------------------------------------------- > +*/ > > INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > { > @@ -636,30 +682,31 @@ INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and extended significand formed by the concatenation of `zSig0' and `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| rounded and packed into the extended double-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal extended > -| double-precision floating-point number. > -| If `roundingPrecision' is 32 or 64, the result is rounded to the same > -| number of bits as single or double precision, respectively. Otherwise, the > -| result is rounded to the full precision of the extended double-precision > -| format. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the result > -| returned is a subnormal number, and it must not require rounding. The > -| handling of underflow and overflow follows the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and extended significand formed by the concatenation of `zSig0' and `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +rounded and packed into the extended double-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal extended > +double-precision floating-point number. > + If `roundingPrecision' is 32 or 64, the result is rounded to the same > +number of bits as single or double precision, respectively. Otherwise, the > +result is rounded to the full precision of the extended double-precision > +format. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. The > +handling of underflow and overflow follows the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 > roundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 > @@ -823,15 +870,16 @@ static floatx80 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > -| `zExp', and significand formed by the concatenation of `zSig0' and `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. This routine is just like > -| `roundAndPackFloatx80' except that the input significand does not have to be > -| normalized. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > +`zExp', and significand formed by the concatenation of `zSig0' and `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. This routine is just like > +`roundAndPackFloatx80' except that the input significand does not have to be > +normalized. > +------------------------------------------------------------------------------- > +*/ > static floatx80 > normalizeRoundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 > @@ -852,10 +900,12 @@ static floatx80 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the least-significant 64 fraction bits of the quadruple-precision > -| floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the least-significant 64 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac1( float128 a ) > { > @@ -864,10 +914,12 @@ INLINE uint64_t extractFloat128Frac1( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the most-significant 48 fraction bits of the quadruple-precision > -| floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the most-significant 48 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac0( float128 a ) > { > @@ -876,11 +928,12 @@ INLINE uint64_t extractFloat128Frac0( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the quadruple-precision floating-point value > -| `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the quadruple-precision floating-point value > +`a'. > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloat128Exp( float128 a ) > { > > @@ -888,10 +941,11 @@ INLINE int32 extractFloat128Exp( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the quadruple-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the quadruple-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat128Sign( float128 a ) > { > > @@ -899,16 +953,17 @@ INLINE flag extractFloat128Sign( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal quadruple-precision floating-point value > -| represented by the denormalized significand formed by the concatenation of > -| `aSig0' and `aSig1'. The normalized exponent is stored at the location > -| pointed to by `zExpPtr'. The most significant 49 bits of the normalized > -| significand are stored at the location pointed to by `zSig0Ptr', and the > -| least significant 64 bits of the normalized significand are stored at the > -| location pointed to by `zSig1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal quadruple-precision floating-point value > +represented by the denormalized significand formed by the concatenation of > +`aSig0' and `aSig1'. The normalized exponent is stored at the location > +pointed to by `zExpPtr'. The most significant 49 bits of the normalized > +significand are stored at the location pointed to by `zSig0Ptr', and the > +least significant 64 bits of the normalized significand are stored at the > +location pointed to by `zSig1Ptr'. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat128Subnormal( > uint64_t aSig0, > @@ -940,19 +995,20 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', the exponent `zExp', and the significand formed > -| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > -| floating-point value, returning the result. After being shifted into the > -| proper positions, the three fields `zSign', `zExp', and `zSig0' are simply > -| added together to form the most significant 32 bits of the result. This > -| means that any integer portion of `zSig0' will be added into the exponent. > -| Since a properly normalized significand will have an integer portion equal > -| to 1, the `zExp' input should be 1 less than the desired result exponent > -| whenever `zSig0' and `zSig1' concatenated form a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', the exponent `zExp', and the significand formed > +by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > +floating-point value, returning the result. After being shifted into the > +proper positions, the three fields `zSign', `zExp', and `zSig0' are simply > +added together to form the most significant 32 bits of the result. This > +means that any integer portion of `zSig0' will be added into the exponent. > +Since a properly normalized significand will have an integer portion equal > +to 1, the `zExp' input should be 1 less than the desired result exponent > +whenever `zSig0' and `zSig1' concatenated form a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > INLINE float128 > packFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 ) > { > @@ -964,27 +1020,28 @@ INLINE float128 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and extended significand formed by the concatenation of `zSig0', `zSig1', > -| and `zSig2', and returns the proper quadruple-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| simply rounded and packed into the quadruple-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal quadruple- > -| precision floating-point number. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the result > -| returned is a subnormal number, and it must not require rounding. In the > -| usual case that the input significand is normalized, `zExp' must be 1 less > -| than the ``true'' floating-point exponent. The handling of underflow and > -| overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and extended significand formed by the concatenation of `zSig0', `zSig1', > +and `zSig2', and returns the proper quadruple-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +simply rounded and packed into the quadruple-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal quadruple- > +precision floating-point number. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. In the > +usual case that the input significand is normalized, `zExp' must be 1 less > +than the ``true'' floating-point exponent. The handling of underflow and > +overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 > roundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1, uint64_t zSig2 STATUS_PARAM) > @@ -1079,16 +1136,17 @@ static float128 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand formed by the concatenation of `zSig0' and `zSig1', and > -| returns the proper quadruple-precision floating-point value corresponding > -| to the abstract input. This routine is just like `roundAndPackFloat128' > -| except that the input significand has fewer bits and does not have to be > -| normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- > -| point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand formed by the concatenation of `zSig0' and `zSig1', and > +returns the proper quadruple-precision floating-point value corresponding > +to the abstract input. This routine is just like `roundAndPackFloat128' > +except that the input significand has fewer bits and does not have to be > +normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- > +point exponent. > +------------------------------------------------------------------------------- > +*/ > static float128 > normalizeRoundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 STATUS_PARAM) > @@ -1115,13 +1173,14 @@ static float128 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float32 int32_to_float32( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float32 int32_to_float32( int32 a STATUS_PARAM) > { > flag zSign; > > @@ -1132,13 +1191,14 @@ float32 int32_to_float32( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float64 int32_to_float64( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float64 int32_to_float64( int32 a STATUS_PARAM) > { > flag zSign; > uint32 absA; > @@ -1154,13 +1214,14 @@ float64 int32_to_float64( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1177,12 +1238,13 @@ floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' to > -| the quadruple-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' to > +the quadruple-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 int32_to_float128( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1199,12 +1261,13 @@ float128 int32_to_float128( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 int64_to_float32( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1252,12 +1315,13 @@ float32 uint64_to_float32( uint64 a STATUS_PARAM ) > } > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 int64_to_float64( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1285,13 +1349,14 @@ float64 uint64_to_float64(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat64(0, exp, a STATUS_VAR); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1306,12 +1371,13 @@ floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' to > -| the quadruple-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' to > +the quadruple-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 int64_to_float128( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1347,16 +1413,17 @@ float128 uint64_to_float128(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat128(0, 0x406E, a, 0 STATUS_VAR); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1378,16 +1445,17 @@ int32 float32_to_int32( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1421,15 +1489,17 @@ int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > { > @@ -1470,16 +1540,17 @@ int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1507,16 +1578,17 @@ int64 float32_to_int64( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1554,13 +1626,14 @@ int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float32_to_float64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1584,13 +1657,14 @@ float64 float32_to_float64( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1614,13 +1688,14 @@ floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float32_to_float128( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1644,14 +1719,15 @@ float128 float32_to_float128( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the single-precision floating-point value `a' to an integer, and > -| returns the result as a single-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float32 float32_round_to_int( float32 a STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Rounds the single-precision floating-point value `a' to an integer, and > +returns the result as a single-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float32 float32_round_to_int( float32 a STATUS_PARAM ) > { > flag aSign; > int_fast16_t aExp; > @@ -1704,15 +1780,16 @@ float32 float32_round_to_int( float32 a STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the single-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the single-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1783,15 +1860,16 @@ static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the single- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the single- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1858,12 +1936,13 @@ static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the single-precision floating-point values `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the single-precision floating-point values `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_add( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1881,12 +1960,13 @@ float32 float32_add( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the single-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the single-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1904,12 +1984,13 @@ float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -1967,12 +2048,13 @@ float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the single-precision floating-point value `a' > -| by the corresponding value `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the single-precision floating-point value `a' > +by the corresponding value `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_div( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -2031,12 +2113,13 @@ float32 float32_div( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the single-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the single-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2132,16 +2215,18 @@ float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point values > -| `a' and `b' then adding 'c', with no intermediate rounding step after the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that negating > -| externally will flip the sign bit on NaNs.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------------- > +*/ > > float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) > { > @@ -2339,12 +2424,13 @@ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) > } > > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the single-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_sqrt( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -2394,23 +2480,25 @@ float32 float32_sqrt( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary exponential of the single-precision floating-point value > -| `a'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -| > -| Uses the following identities: > -| > -| 1. ------------------------------------------------------------------------- > -| x x*ln(2) > -| 2 = e > -| > -| 2. ------------------------------------------------------------------------- > -| 2 3 4 5 n > -| x x x x x x x > -| e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > -| 1! 2! 3! 4! 5! n! > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary exponential of the single-precision floating-point value > +`a'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > + > +Uses the following identities: > + > +1. ------------------------------------------------------------------------- > + x x*ln(2) > + 2 = e > + > +2. ------------------------------------------------------------------------- > + 2 3 4 5 n > + x x x x x x x > + e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > + 1! 2! 3! 4! 5! n! > +------------------------------------------------------------------------------- > +*/ > > static const float64 float32_exp2_coefficients[15] = > { > @@ -2474,11 +2562,13 @@ float32 float32_exp2( float32 a STATUS_PARAM ) > return float64_to_float32(r, status); > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the single-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary log of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_log2( float32 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2522,12 +2612,14 @@ float32 float32_log2( float32 a STATUS_PARAM ) > return normalizeRoundAndPackFloat32( zSign, 0x85, zSig STATUS_VAR ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_eq( float32 a, float32 b STATUS_PARAM ) > { > @@ -2546,12 +2638,14 @@ int float32_eq( float32 a, float32 b STATUS_PARAM ) > return ( av == bv ) || ( (uint32_t) ( ( av | bv )<<1 ) == 0 ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_le( float32 a, float32 b STATUS_PARAM ) > { > @@ -2575,12 +2669,14 @@ int float32_le( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_lt( float32 a, float32 b STATUS_PARAM ) > { > @@ -2604,12 +2700,14 @@ int float32_lt( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered( float32 a, float32 b STATUS_PARAM ) > { > @@ -2625,12 +2723,14 @@ int float32_unordered( float32 a, float32 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2649,12 +2749,14 @@ int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > ( (uint32_t) ( ( float32_val(a) | float32_val(b) )<<1 ) == 0 ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2680,12 +2782,14 @@ int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2711,12 +2815,14 @@ int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2734,16 +2840,17 @@ int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2762,16 +2869,17 @@ int32 float64_to_int32( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2809,15 +2917,17 @@ int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > { > @@ -2860,16 +2970,17 @@ int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2903,16 +3014,17 @@ int64 float64_to_int64( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2956,13 +3068,14 @@ int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the single-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the single-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float64_to_float32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2989,16 +3102,18 @@ float32 float64_to_float32( float64 a STATUS_PARAM ) > } > > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| half-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +half-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > static float16 packFloat16(flag zSign, int_fast16_t zExp, uint16_t zSig) > { > return make_float16( > @@ -3132,13 +3247,14 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM) > return packFloat16(aSign, aExp + 14, aSig >> 13); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3163,13 +3279,14 @@ floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the quadruple-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the quadruple-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float64_to_float128( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3194,13 +3311,14 @@ float128 float64_to_float128( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the double-precision floating-point value `a' to an integer, and > -| returns the result as a double-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the double-precision floating-point value `a' to an integer, and > +returns the result as a double-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3267,14 +3385,15 @@ float64 float64_trunc_to_int( float64 a STATUS_PARAM) > return res; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the double-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the double-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3346,14 +3465,15 @@ static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3421,12 +3541,13 @@ static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the double-precision floating-point values `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the double-precision floating-point values `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_add( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3444,12 +3565,13 @@ float64 float64_add( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the double-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the double-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3467,12 +3589,13 @@ float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3528,12 +3651,13 @@ float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the double-precision floating-point value `a' > -| by the corresponding value `b'. The operation is performed according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the double-precision floating-point value `a' > +by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_div( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3600,12 +3724,13 @@ float64 float64_div( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the double-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the double-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -3686,16 +3811,18 @@ float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point values > -| `a' and `b' then adding 'c', with no intermediate rounding step after the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that negating > -| externally will flip the sign bit on NaNs.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------------- > +*/ > > float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) > { > @@ -3912,12 +4039,13 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the double-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_sqrt( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3964,11 +4092,13 @@ float64 float64_sqrt( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the double-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary log of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_log2( float64 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4011,12 +4141,14 @@ float64 float64_log2( float64 a STATUS_PARAM ) > return normalizeRoundAndPackFloat64( zSign, 0x408, zSig STATUS_VAR ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to the > -| corresponding value `b', and 0 otherwise. The invalid exception is raised > -| if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. The invalid exception is raised > +if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_eq( float64 a, float64 b STATUS_PARAM ) > { > @@ -4036,12 +4168,14 @@ int float64_eq( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_le( float64 a, float64 b STATUS_PARAM ) > { > @@ -4065,12 +4199,14 @@ int float64_le( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_lt( float64 a, float64 b STATUS_PARAM ) > { > @@ -4094,12 +4230,14 @@ int float64_lt( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered( float64 a, float64 b STATUS_PARAM ) > { > @@ -4115,12 +4253,14 @@ int float64_unordered( float64 a, float64 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to the > -| corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception.The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception.The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4142,12 +4282,14 @@ int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4173,12 +4315,14 @@ int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4204,12 +4348,14 @@ int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4227,16 +4373,17 @@ int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, the > -| largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, the > +largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4254,16 +4401,17 @@ int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the same > -| sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4299,16 +4447,17 @@ int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, > -| the largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, > +the largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4339,16 +4488,17 @@ int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the same > -| sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4383,13 +4533,14 @@ int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the single-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the single-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4411,13 +4562,14 @@ float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4439,13 +4591,14 @@ float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the quadruple-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the quadruple-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4463,13 +4616,14 @@ float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the extended double-precision floating-point value `a' to an integer, > -| and returns the result as an extended quadruple-precision floating-point > -| value. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the extended double-precision floating-point value `a' to an integer, > +and returns the result as an extended quadruple-precision floating-point > +value. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4536,14 +4690,15 @@ floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the extended double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > -| negated before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the extended double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > +negated before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -4602,14 +4757,15 @@ static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the extended > -| double-precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the extended > +double-precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM ) > { > int32 aExp, bExp, zExp; > @@ -4670,12 +4826,13 @@ static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the extended double-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the extended double-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4691,12 +4848,13 @@ floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the extended double-precision floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4712,12 +4870,13 @@ floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the extended double-precision floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4771,12 +4930,13 @@ floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the extended double-precision floating-point > -| value `a' by the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the extended double-precision floating-point > +value `a' by the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4851,12 +5011,13 @@ floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the extended double-precision floating-point value > -| `a' with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the extended double-precision floating-point value > +`a' with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4947,12 +5108,13 @@ floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the extended double-precision floating-point > -| value `a'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the extended double-precision floating-point > +value `a'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -5017,12 +5179,14 @@ floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is equal > -| to the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is equal > +to the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5044,13 +5208,15 @@ int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than or equal to the corresponding value `b', and 0 otherwise. The > -| invalid exception is raised if either operand is a NaN. The comparison is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than or equal to the corresponding value `b', and 0 otherwise. The > +invalid exception is raised if either operand is a NaN. The comparison is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5078,12 +5244,14 @@ int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5111,12 +5279,14 @@ int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' and `b' > -| cannot be compared, and 0 otherwise. The invalid exception is raised if > -| either operand is a NaN. The comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and `b' > +cannot be compared, and 0 otherwise. The invalid exception is raised if > +either operand is a NaN. The comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5130,12 +5300,14 @@ int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5160,12 +5332,14 @@ int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is less > -| than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > -| do not cause an exception. Otherwise, the comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is less > +than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > +do not cause an exception. Otherwise, the comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5196,12 +5370,14 @@ int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is less > -| than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > -| an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is less > +than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > +an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5232,12 +5408,14 @@ int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' and `b' > -| cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > -| The comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and `b' > +cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > +The comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5254,16 +5432,17 @@ int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5283,16 +5462,17 @@ int32 float128_to_int32( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5331,16 +5511,17 @@ int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5374,16 +5555,17 @@ int64 float128_to_int64( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5435,13 +5617,14 @@ int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the single-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the single-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float128_to_float32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5470,13 +5653,14 @@ float32 float128_to_float32( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float128_to_float64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5503,13 +5687,14 @@ float64 float128_to_float64( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the extended double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the extended double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5538,13 +5723,14 @@ floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the quadruple-precision floating-point value `a' to an integer, and > -| returns the result as a quadruple-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the quadruple-precision floating-point value `a' to an integer, and > +returns the result as a quadruple-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5641,14 +5827,15 @@ float128 float128_round_to_int( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the quadruple-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the quadruple-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5727,14 +5914,15 @@ static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the quadruple- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the quadruple- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5811,12 +5999,13 @@ static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the quadruple-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the quadruple-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_add( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5832,12 +6021,13 @@ float128 float128_add( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5853,12 +6043,13 @@ float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -5917,12 +6108,13 @@ float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the quadruple-precision floating-point value > -| `a' by the corresponding value `b'. The operation is performed according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the quadruple-precision floating-point value > +`a' by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_div( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -6001,12 +6193,13 @@ float128 float128_div( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the quadruple-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the quadruple-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -6110,12 +6303,13 @@ float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the quadruple-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the quadruple-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_sqrt( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -6179,12 +6373,14 @@ float128 float128_sqrt( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_eq( float128 a, float128 b STATUS_PARAM ) > { > @@ -6206,12 +6402,14 @@ int float128_eq( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_le( float128 a, float128 b STATUS_PARAM ) > { > @@ -6239,12 +6437,14 @@ int float128_le( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_lt( float128 a, float128 b STATUS_PARAM ) > { > @@ -6272,12 +6472,14 @@ int float128_lt( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered( float128 a, float128 b STATUS_PARAM ) > { > @@ -6292,12 +6494,14 @@ int float128_unordered( float128 a, float128 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6322,12 +6526,14 @@ int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6358,12 +6564,14 @@ int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6394,12 +6602,14 @@ int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered_quiet( float128 a, float128 b STATUS_PARAM ) > { > diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h > index f3927e2..b646621 100644 > --- a/include/fpu/softfloat.h > +++ b/include/fpu/softfloat.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +============================================================================ > > -This C header file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic > -Package, Release 2b. > +This C header file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > -=============================================================================*/ > +=============================================================================== > +*/ > > #ifndef SOFTFLOAT_H > #define SOFTFLOAT_H > @@ -46,14 +45,16 @@ these four paragraphs for those parts of this code that are retained. > #include "config-host.h" > #include "qemu/osdep.h" > > -/*---------------------------------------------------------------------------- > -| Each of the following `typedef's defines the most convenient type that holds > -| integers of at least as many bits as specified. For example, `uint8' should > -| be the most convenient type that can hold unsigned integers of as many as > -| 8 bits. The `flag' type must be able to hold either a 0 or 1. For most > -| implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed > -| to the same as `int'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Each of the following `typedef's defines the most convenient type that holds > +integers of at least as many bits as specified. For example, `uint8' should > +be the most convenient type that can hold unsigned integers of as many as > +8 bits. The `flag' type must be able to hold either a 0 or 1. For most > +implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed > +to the same as `int'. > +------------------------------------------------------------------------------- > +*/ > typedef uint8_t flag; > typedef uint8_t uint8; > typedef int8_t int8; > @@ -69,9 +70,11 @@ typedef int64_t int64; > #define STATUS(field) status->field > #define STATUS_VAR , status > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point ordering relations > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point ordering relations > +------------------------------------------------------------------------------- > +*/ > enum { > float_relation_less = -1, > float_relation_equal = 0, > @@ -79,9 +82,11 @@ enum { > float_relation_unordered = 2 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point types. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point types. > +------------------------------------------------------------------------------- > +*/ > /* Use structures for soft-float types. This prevents accidentally mixing > them with native int/float types. A sufficiently clever compiler and > sane ABI should be able to see though these structs. However > @@ -137,17 +142,21 @@ typedef struct { > #define make_float128(high_, low_) ((float128) { .high = high_, .low = low_ }) > #define make_float128_init(high_, low_) { .high = high_, .low = low_ } > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point underflow tininess-detection mode. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point underflow tininess-detection mode. > +------------------------------------------------------------------------------- > +*/ > enum { > float_tininess_after_rounding = 0, > float_tininess_before_rounding = 1 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point rounding mode. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point rounding mode. > +------------------------------------------------------------------------------- > +*/ > enum { > float_round_nearest_even = 0, > float_round_down = 1, > @@ -155,9 +164,11 @@ enum { > float_round_to_zero = 3 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point exception flags. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point exception flags. > +------------------------------------------------------------------------------- > +*/ > enum { > float_flag_invalid = 1, > float_flag_divbyzero = 4, > @@ -167,7 +178,6 @@ enum { > float_flag_input_denormal = 64, > float_flag_output_denormal = 128 > }; > - > typedef struct float_status { > signed char float_detect_tininess; > signed char float_rounding_mode; > @@ -204,27 +214,33 @@ INLINE int get_float_exception_flags(float_status *status) > } > void set_floatx80_rounding_precision(int val STATUS_PARAM); > > -/*---------------------------------------------------------------------------- > -| Routine to raise any or all of the software IEC/IEEE floating-point > -| exception flags. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Routine to raise any or all of the software IEC/IEEE floating-point > +exception flags. > +------------------------------------------------------------------------------- > +*/ > void float_raise( int8 flags STATUS_PARAM); > > -/*---------------------------------------------------------------------------- > -| Options to indicate which negations to perform in float*_muladd() > -| Using these differs from negating an input or output before calling > -| the muladd function in that this means that a NaN doesn't have its > -| sign bit inverted before it is propagated. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Options to indicate which negations to perform in float*_muladd() > +Using these differs from negating an input or output before calling > +the muladd function in that this means that a NaN doesn't have its > +sign bit inverted before it is propagated. > +------------------------------------------------------------------------------- > +*/ > enum { > float_muladd_negate_c = 1, > float_muladd_negate_product = 2, > float_muladd_negate_result = 4, > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE integer-to-floating-point conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE integer-to-floating-point conversion routines. > +------------------------------------------------------------------------------- > +*/ > float32 int32_to_float32( int32 STATUS_PARAM ); > float64 int32_to_float64( int32 STATUS_PARAM ); > float32 uint32_to_float32( uint32 STATUS_PARAM ); > @@ -239,15 +255,19 @@ floatx80 int64_to_floatx80( int64 STATUS_PARAM ); > float128 int64_to_float128( int64 STATUS_PARAM ); > float128 uint64_to_float128( uint64 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software half-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software half-precision conversion routines. > +*---------------------------------------------------------------------------- > +*/ > float16 float32_to_float16( float32, flag STATUS_PARAM ); > float32 float16_to_float32( float16, flag STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software half-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software half-precision operations. > +------------------------------------------------------------------------------- > +*/ > int float16_is_quiet_nan( float16 ); > int float16_is_signaling_nan( float16 ); > float16 float16_maybe_silence_nan( float16 ); > @@ -257,14 +277,18 @@ INLINE int float16_is_any_nan(float16 a) > return ((float16_val(a) & ~0x8000) > 0x7c00); > } > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float16 float16_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float32_to_int16_round_to_zero(float32 STATUS_PARAM); > uint_fast16_t float32_to_uint16_round_to_zero(float32 STATUS_PARAM); > int32 float32_to_int32( float32 STATUS_PARAM ); > @@ -277,9 +301,11 @@ float64 float32_to_float64( float32 STATUS_PARAM ); > floatx80 float32_to_floatx80( float32 STATUS_PARAM ); > float128 float32_to_float128( float32 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision operations. > +------------------------------------------------------------------------------- > +*/ > float32 float32_round_to_int( float32 STATUS_PARAM ); > float32 float32_add( float32, float32 STATUS_PARAM ); > float32 float32_sub( float32, float32 STATUS_PARAM ); > @@ -361,14 +387,18 @@ INLINE float32 float32_set_sign(float32 a, int sign) > #define float32_infinity make_float32(0x7f800000) > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float32 float32_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float64_to_int16_round_to_zero(float64 STATUS_PARAM); > uint_fast16_t float64_to_uint16_round_to_zero(float64 STATUS_PARAM); > int32 float64_to_int32( float64 STATUS_PARAM ); > @@ -383,9 +413,11 @@ float32 float64_to_float32( float64 STATUS_PARAM ); > floatx80 float64_to_floatx80( float64 STATUS_PARAM ); > float128 float64_to_float128( float64 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision operations. > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 STATUS_PARAM ); > float64 float64_trunc_to_int( float64 STATUS_PARAM ); > float64 float64_add( float64, float64 STATUS_PARAM ); > @@ -467,14 +499,18 @@ INLINE float64 float64_set_sign(float64 a, int sign) > #define float64_half make_float64(0x3fe0000000000000LL) > #define float64_infinity make_float64(0x7ff0000000000000LL) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float64 float64_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 STATUS_PARAM ); > int32 floatx80_to_int32_round_to_zero( floatx80 STATUS_PARAM ); > int64 floatx80_to_int64( floatx80 STATUS_PARAM ); > @@ -483,9 +519,11 @@ float32 floatx80_to_float32( floatx80 STATUS_PARAM ); > float64 floatx80_to_float64( floatx80 STATUS_PARAM ); > float128 floatx80_to_float128( floatx80 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision operations. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 STATUS_PARAM ); > floatx80 floatx80_add( floatx80, floatx80 STATUS_PARAM ); > floatx80 floatx80_sub( floatx80, floatx80 STATUS_PARAM ); > @@ -552,14 +590,18 @@ INLINE int floatx80_is_any_nan(floatx80 a) > #define floatx80_half make_floatx80(0x3ffe, 0x8000000000000000LL) > #define floatx80_infinity make_floatx80(0x7fff, 0x8000000000000000LL) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const floatx80 floatx80_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 STATUS_PARAM ); > int32 float128_to_int32_round_to_zero( float128 STATUS_PARAM ); > int64 float128_to_int64( float128 STATUS_PARAM ); > @@ -568,9 +610,11 @@ float32 float128_to_float32( float128 STATUS_PARAM ); > float64 float128_to_float64( float128 STATUS_PARAM ); > floatx80 float128_to_floatx80( float128 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision operations. > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 STATUS_PARAM ); > float128 float128_add( float128, float128 STATUS_PARAM ); > float128 float128_sub( float128, float128 STATUS_PARAM ); > @@ -633,9 +677,11 @@ INLINE int float128_is_any_nan(float128 a) > > #define float128_zero make_float128(0, 0) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float128 float128_default_nan; > > #endif /* !SOFTFLOAT_H */ >
On Mon, Apr 29, 2013 at 10:05 PM, Anthony Liguori <aliguori@us.ibm.com> wrote: > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. ... > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). > > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name <your@email.com> > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Paolo Bonzini <pbonzini@redhat.com> writes: > Il 29/04/2013 20:05, Anthony Liguori ha scritto: >> In order to make this change, we need to relicense all contributions >> from initial import of the SoftFloat code to match the license of >> SoftFloat-2a (instead of the implied SoftFloat-2b license). > > All Red Hat contributions (at least Avi, Juan, me; don't know about rth) > are available under GPLv2+; also other authors agreed on it. For this > particular license, > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> Thanks Paolo. > But it doesn't look like a task that can be ever completed. I'll > shortly find out how many of those addresses bounce. > >> If you are on CC, it is because you have contributed to the softfloat >> code in QEMU. Please response to this note with: >> >> Acked-by: Your Name <your@email.com> >> >> To significant that you are able and willing to relicense your changes >> to the SoftFloat-1a license (or a GPL compatible license). >> >> Please respond no later than May 6th, 2013. If we are unable to confirm >> relicense from an author, changes from that author will be reverted. > > How is that possible for six year old patches such as commit b645bb4 > (Fix softfloat NaN handling., 2007-05-11)? Ten days before a release, even? In a perfect world where we can get responses from all copyright holders, this patch is comments-only so it could go into 1.5 with no risk at all. I don't expect we live in a perfect world though. So let's see what response we can get over the next week. Then we'll look at what would need to be reverted, and depending on what can and can't be reasonably reverted, we can either (1) try harder to contact the author or (2) figure out whether we can rewrite the code. It's too hard to tell what we can do now until we get a first round of Acks. I expect that we will also need to make a decision on the 6th about whether to delay the release or not. I'm strongly leaning toward delaying the 1.5 release until we can straighten out this issue. Regards, Anthony Liguori > > Paolo > >> --- >> For completeness, here is the full listing of contributions: >> >> Andreas Färber <afaerber@suse.de> >> be45f06 Silence softfloat warnings on OpenSolaris >> 5aea4c5 softfloat: Replace uint16 type with uint_fast16_t >> 94a49d8 softfloat: Replace int16 type with int_fast16_t >> c969654 softfloat: Fix mixups of int and int16 >> 38641f8 softfloat: Use uint16 consistently >> 87b8cc3 softfloat: Resolve type mismatches between declaration and implementation >> 8d725fa softfloat: Prepend QEMU-style header with derivation notice >> 9f8d2a0 softfloat: Use uint32 consistently >> bb98fe4 softfloat: Drop [s]bits{8, 16, 32, 64} types in favor of [u]int{8, 16, 32, 64}_t >> >> Aurelien Jarno <aurelien@aurel32.net> >> 1020160 softfloat: fix default-NaN mode >> 084d19b target-mips: Implement correct NaN propagation rules >> 196cfc8 softfloat: add a 1.0 constant for float32 and float64 >> 1b2ad2e softfloat-native: fix *nan() >> 1f398e0 softfloat: use float{32,64,x80,128}_maybe_silence_nan() >> 211315f softfloat: rename float*_eq() into float*_eq_quiet() >> 2657d0f softfloat: rename float*_eq_signaling() into float*_eq() >> 30e7a22 Use float_relation_* constants >> 326b9e9 softfloat: fix float*_scalnb() corner cases >> 34d2386 softfloat: remove HPPA specific code >> 374dfc3 soft-float: add float32_log2() and float64_log2() >> 4cc5383 softfloat-native: add float*_is_any_nan() functions >> 587eabf softfloat: add float*_is_zero_or_denormal() >> 629bd74 softfloat-native: add float32_is_nan() >> 67b7861 softfloat: add float*_unordered_{,quiet}() functions >> 8229c99 softfloat: add float32_exp2() >> 85016c9 Assortment of soft-float fixes, by Aurelien Jarno. >> 8d6c92b softfloat-native: improve correctness of floatXX_is_neg() >> 93ae1c6 softfloat: fix float{32,64}_maybe_silence_nan() for MIPS >> a167ba5 Add support for GNU/kFreeBSD >> b3b4c7f softfloat: use GCC builtins to count the leading zeros >> b4a0ef7 softfloat-native: add float*_unordered_quiet() functions >> b689362 softfloat: move float*_eq and float*_eq_quiet >> b76235e softfloat: fix floatx80_is_infinity() >> bbc1ded softfloat: implement fused multiply-add NaN propagation for MIPS >> be22a9a softfloat: always enable floatx80 and float128 support >> c4b4c77 softfloat: add pi constants >> c52ab6f fp: add floatXX_is_infinity(), floatXX_is_neg(), floatXX_is_zero() >> cf67c6b softfloat-native: remove >> d2b1027 softfloat-native: add a few constant values >> d6882cf softfloat-native: fix float*_scalbn() functions >> d735d69 softfloat: rename *IsNaN variables to *IsQuietNaN >> dadd71a fp: fix float32_is_infinity() >> de4af5f softfloat: fix floatx80_is_{quiet,signaling}_nan() >> e024e88 target-ppc: Implement correct NaN propagation rules >> e2f4220 softfloat: fix floatx80 handling of NaN >> e872aa8 softfloat-native: fix type of float_rounding_mode >> e908775 softfloat: SH4 has the sNaN bit set >> f3218a8 softfloat: add floatx80 constants >> f5a6425 softfloat: improve description of comparison functions >> f6714d3 softfloat: add floatx80_compare*() functions >> f6a7d92 softfloat: add float{x80,128}_maybe_silence_nan() >> >> Avi Kivity <avi.kivity@gmail.com> >> 3bf7e40 softfloat: fix for C99 >> >> Ben Taylor <bentaylor.solx86@gmail.com> >> 0475a5c Solaris 9/x86 support, by Ben Taylor. >> c94655b Updated Solaris isinf support, by Juergen Keil and Ben Taylor. >> >> Blue Swirl <blauwirbel@gmail.com> >> 128ab2f Preliminary OpenBSD host support (based on OpenBSD patches by Todd T. Fries) >> 14d483e Fix OpenSolaris softfloat warnings >> 179a2c1 Rename _BSD to HOST_BSD so that it's more obvious that it's defined by configure >> 1d6198c Remove unnecessary trailing newlines >> 1f58732 128-bit float support for user mode >> 2734c70 Rename one more _BSD to HOST_BSD (spotted by Hasso Tepper) >> 3f4cb3d Fix OpenSolaris gcc4 warnings: iovec type mismatches, missing 'static' >> 70c1470 Sparse fixes: dubious mixing of bitwise and logical operations >> 7c2a9d0 Fix math warnings on OpenBSD -current >> b1d8e52 Fix undeclared symbol warnings from sparse >> b55266b Suppress gcc 4.x -Wpointer-sign (included in -Wall) warnings >> cd8a253 Fix more typos in softloat code (Eduardo Felipe) >> d07cca0 Add native softfloat fpu functions (Christoph Egger) >> ed086f3 softfloat: remove dead assignments, spotted by clang >> >> Christophe Lyon <christophe.lyon@st.com> >> 8559666 softfloat: move all default NaN definitions to softfloat.h. >> bcd4d9a softfloat: Honour default_nan_mode for float-to-float conversions >> c30fe7d softfloat: add _set_sign(), _infinity and _half for 32 and 64 bits floats. >> >> Fabrice Bellard <fabrice@bellard.org> >> 158142c soft float support >> 1b2b0af 64 bit fix >> 1d6bda3 added abs, chs and compare functions >> 38cfa06 Solaris port (Ben Taylor) >> 750afe9 avoid using char when it is not necessary >> b109f9f more native FPU comparison functions - native FPU remainder >> ec530c8 Solaris port (Ben Taylor) >> fdbb469 Solaris/SPARC host port (Ben Taylor) >> >> Guan Xuetao <gxt@mprc.pku.edu.cn> >> d2fbca9 unicore32: necessary modifications for other files to support unicore32 >> >> Jocelyn Mayer <l_indien@magic.fr> >> 3430b0b Ooops... Typo. >> 75d62a5 Add missing softfloat helpers. >> >> Juan Quintela <quintela@redhat.com> >> 0eb4fc8 softfloat: make USE_SOFTFLOAT_STRUCT_TYPES compile >> 71e72a1 rename HOST_BSD to CONFIG_BSD >> 75b5a69 rename NEEDS_LIBSUNMATH to CONFIG_NEEDS_LIBSUNMATH >> dfe5fff change HOST_SOLARIS to CONFIG_SOLARIS{_VERSION} >> e2542fe rename WORDS_BIGENDIAN to HOST_WORDS_BIGENDIAN >> >> malc <av1474@comtv.ru> >> 947f5fc Add static qualifier to local functions >> e58ffeb Remove all traces of __powerpc__ >> >> Max Filippov <jcmvbkbc@gmail.com> >> 6617680 softfloat: make float_muladd_negate_* flags independent >> 213ff4e softfloat: add NO_SIGNALING_NANS >> b81fe82 target-xtensa: specialize softfloat NaN rules >> >> Paolo Bonzini <pbonzini@redhat.com> >> 1de7afc misc: move include files to include/qemu/ >> 6b4c305 fpu: move public header file to include/fpu >> 789ec7c softfloat: change default nan definitions to variables >> >> Paul Brook <paul@codesourcery.com> >> 6001149 ARM FP16 support >> 6939754 Correctly normalize values and handle zero inputs to scalbn functions. >> 3598ecb Remove missing include. >> 5c7908e Implement default-NaN mode. >> 7918bf4 Fix typo in BSD FP rounding mode names. >> 9027db8 Fix ARM default NaN. >> 9ee6e8b ARMv7 support. >> a1b91bb Fix typo in softfloat code. >> e6e5906 ColdFire target. >> f090c9d Add strict checking mode for softfp code. >> fe76d97 Implement flush-to-zero mode (denormal results are replaced with zero). >> >> Peter Maydell <peter.maydell@linaro.org> >> 1856987 softfloat: Rename float*_is_nan() functions to float*_is_quiet_nan() >> 760e141 softfloat: roundAndPackInt{32, 64}: Don't assume int32 is 32 bits >> 011da61 target-arm: Implement correct NaN propagation rules >> 21d6ebd softfloat: Add float*_is_any_nan() functions >> 274f1b0 softfloat: Add float*_min() and float*_max() functions >> 2ac8bd0 softfloat: Reinstate accidentally disabled target-specific NaN handling >> 2bed652 softfloat: Implement floatx80_is_any_nan() and float128_is_any_nan() >> 354f211 softfloat: abstract out target-specific NaN propagation rules >> 369be8f softfloat: Implement fused multiply-add >> 37d1866 softfloat: Implement flushing input denormals to zero >> 4be8eea fpu/softfloat.c: Remove pointless shift of always-zero value >> 600e30d softfloat: Fix single-to-half precision float conversions >> 6f3300a softfloat: Add float32_is_zero_or_denormal() function >> b3a6a2e softfloat: float*_to_int32_round_to_zero: don't assume int32 is 32 bits >> b408dbd softfloat: Add float*_maybe_silence_nan() functions >> bb4d4bb softfloat: Add float16 type and float16 NaN handling functions >> c29aca4 softfloat: Add setter function for tininess detection mode >> cbcef45 softfloat: Add float/double to 16 bit integer conversion functions >> d5138cf softfloat: Fix compilation failures with USE_SOFTFLOAT_STRUCT_TYPES >> e3d142d fpu: Correct edgecase in float64_muladd >> e6afc87 softfloat: Add new flag for when denormal result is flushed to zero >> e744c06 fpu/softfloat.c: Return correctly signed values from uint64_to_float32 >> f591e1b softfloat: Correctly handle NaNs in float16_to_float32() >> >> Richard Henderson <rth@twiddle.net> >> 17ed229 softfloat: Fix uint64_to_float64 >> 1e397ea softfloat: Implement uint64_to_float128 >> 8443eff target-alpha: Split up FPCR value into separate fields. >> 990b3e1 target-alpha: Enable softfloat. >> ba0e276 target-alpha: Fixes for alpha-linux syscalls. >> >> Richard Sandiford <rdsandiford@googlemail.com> >> a6e7c18 softfloat: Handle float_muladd_negate_c when product is zero >> >> Stefan Weil <weil@mail.berlios.de> >> bc4347b arm host: fix compiler warning >> >> Thiemo Seufer <ths@networkno.de> >> 5a6932d Fix NaN handling for MIPS and HPPA. >> 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files >> 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. >> 924b2c0 Add proper float*_is_nan prototypes. >> b645bb4 Fix softfloat NaN handling. >> fc81ba5 Check that HOST_SOLARIS is defined before relying on its value. Spotted by Joachim Henke. >> --- >> fpu/softfloat-macros.h | 430 ++++---- >> fpu/softfloat-specialize.h | 494 +++++---- >> fpu/softfloat.c | 2436 ++++++++++++++++++++++++-------------------- >> include/fpu/softfloat.h | 242 +++-- >> 4 files changed, 1981 insertions(+), 1621 deletions(-) >> >> diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h >> index b5164af..2009315 100644 >> --- a/fpu/softfloat-macros.h >> +++ b/fpu/softfloat-macros.h >> @@ -4,10 +4,11 @@ >> * Derived from SoftFloat. >> */ >> >> -/*============================================================================ >> +/* >> +=============================================================================== >> >> This C source fragment is part of the SoftFloat IEC/IEEE Floating-point >> -Arithmetic Package, Release 2b. >> +Arithmetic Package, Release 2a. >> >> Written by John R. Hauser. This work was made possible in part by the >> International Computer Science Institute, located at Suite 600, 1947 Center >> @@ -16,28 +17,27 @@ National Science Foundation under grant MIP-9311980. The original version >> of this code was written as part of a project to build a fixed-point vector >> processor in collaboration with the University of California at Berkeley, >> overseen by Profs. Nelson Morgan and John Wawrzynek. More information >> -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ >> +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ >> arithmetic/SoftFloat.html'. >> >> -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has >> -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES >> -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS >> -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, >> -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE >> -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE >> -INSTITUTE (possibly via similar legal notice) AGAINST ALL LOSSES, COSTS, OR >> -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. >> +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort >> +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT >> +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO >> +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY >> +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. >> >> Derivative works are acceptable, even for commercial purposes, so long as >> -(1) the source code for the derivative work includes prominent notice that >> -the work is derivative, and (2) the source code includes prominent notice with >> -these four paragraphs for those parts of this code that are retained. >> +(1) they include prominent notice that the work is derivative, and (2) they >> +include prominent notice akin to these four paragraphs for those parts of >> +this code that are retained. >> >> =============================================================================*/ >> >> -/*---------------------------------------------------------------------------- >> -| This macro tests for minimum version of the GNU C compiler. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +This macro tests for minimum version of the GNU C compiler. >> +------------------------------------------------------------------------------- >> +*/ >> #if defined(__GNUC__) && defined(__GNUC_MINOR__) >> # define SOFTFLOAT_GNUC_PREREQ(maj, min) \ >> ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) >> @@ -46,14 +46,16 @@ these four paragraphs for those parts of this code that are retained. >> #endif >> >> >> -/*---------------------------------------------------------------------------- >> -| Shifts `a' right by the number of bits given in `count'. If any nonzero >> -| bits are shifted off, they are ``jammed'' into the least significant bit of >> -| the result by setting the least significant bit to 1. The value of `count' >> -| can be arbitrarily large; in particular, if `count' is greater than 32, the >> -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. >> -| The result is stored in the location pointed to by `zPtr'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Shifts `a' right by the number of bits given in `count'. If any nonzero >> +bits are shifted off, they are ``jammed'' into the least significant bit of >> +the result by setting the least significant bit to 1. The value of `count' >> +can be arbitrarily large; in particular, if `count' is greater than 32, the >> +result will be either 0 or 1, depending on whether `a' is zero or nonzero. >> +The result is stored in the location pointed to by `zPtr'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) >> { >> @@ -72,14 +74,16 @@ INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Shifts `a' right by the number of bits given in `count'. If any nonzero >> -| bits are shifted off, they are ``jammed'' into the least significant bit of >> -| the result by setting the least significant bit to 1. The value of `count' >> -| can be arbitrarily large; in particular, if `count' is greater than 64, the >> -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. >> -| The result is stored in the location pointed to by `zPtr'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Shifts `a' right by the number of bits given in `count'. If any nonzero >> +bits are shifted off, they are ``jammed'' into the least significant bit of >> +the result by setting the least significant bit to 1. The value of `count' >> +can be arbitrarily large; in particular, if `count' is greater than 64, the >> +result will be either 0 or 1, depending on whether `a' is zero or nonzero. >> +The result is stored in the location pointed to by `zPtr'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) >> { >> @@ -98,23 +102,24 @@ INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 >> -| _plus_ the number of bits given in `count'. The shifted result is at most >> -| 64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The >> -| bits shifted off form a second 64-bit result as follows: The _last_ bit >> -| shifted off is the most-significant bit of the extra result, and the other >> -| 63 bits of the extra result are all zero if and only if _all_but_the_last_ >> -| bits shifted off were all zero. This extra result is stored in the location >> -| pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. >> -| (This routine makes more sense if `a0' and `a1' are considered to form >> -| a fixed-point value with binary point between `a0' and `a1'. This fixed- >> -| point value is shifted right by the number of bits given in `count', and >> -| the integer part of the result is returned at the location pointed to by >> -| `z0Ptr'. The fractional part of the result may be slightly corrupted as >> -| described above, and is returned at the location pointed to by `z1Ptr'.) >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 >> +_plus_ the number of bits given in `count'. The shifted result is at most >> +64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The >> +bits shifted off form a second 64-bit result as follows: The _last_ bit >> +shifted off is the most-significant bit of the extra result, and the other >> +63 bits of the extra result are all zero if and only if _all_but_the_last_ >> +bits shifted off were all zero. This extra result is stored in the location >> +pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. >> + (This routine makes more sense if `a0' and `a1' are considered to form a >> +fixed-point value with binary point between `a0' and `a1'. This fixed-point >> +value is shifted right by the number of bits given in `count', and the >> +integer part of the result is returned at the location pointed to by >> +`z0Ptr'. The fractional part of the result may be slightly corrupted as >> +described above, and is returned at the location pointed to by `z1Ptr'.) >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> shift64ExtraRightJamming( >> uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) >> @@ -144,14 +149,15 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the >> -| number of bits given in `count'. Any bits shifted off are lost. The value >> -| of `count' can be arbitrarily large; in particular, if `count' is greater >> -| than 128, the result will be 0. The result is broken into two 64-bit pieces >> -| which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the >> +number of bits given in `count'. Any bits shifted off are lost. The value >> +of `count' can be arbitrarily large; in particular, if `count' is greater >> +than 128, the result will be 0. The result is broken into two 64-bit pieces >> +which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> shift128Right( >> uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) >> @@ -176,17 +182,18 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the >> -| number of bits given in `count'. If any nonzero bits are shifted off, they >> -| are ``jammed'' into the least significant bit of the result by setting the >> -| least significant bit to 1. The value of `count' can be arbitrarily large; >> -| in particular, if `count' is greater than 128, the result will be either >> -| 0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or >> -| nonzero. The result is broken into two 64-bit pieces which are stored at >> -| the locations pointed to by `z0Ptr' and `z1Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the >> +number of bits given in `count'. If any nonzero bits are shifted off, they >> +are ``jammed'' into the least significant bit of the result by setting the >> +least significant bit to 1. The value of `count' can be arbitrarily large; >> +in particular, if `count' is greater than 128, the result will be either >> +0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or >> +nonzero. The result is broken into two 64-bit pieces which are stored at >> +the locations pointed to by `z0Ptr' and `z1Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> shift128RightJamming( >> uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) >> @@ -219,25 +226,26 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right >> -| by 64 _plus_ the number of bits given in `count'. The shifted result is >> -| at most 128 nonzero bits; these are broken into two 64-bit pieces which are >> -| stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted >> -| off form a third 64-bit result as follows: The _last_ bit shifted off is >> -| the most-significant bit of the extra result, and the other 63 bits of the >> -| extra result are all zero if and only if _all_but_the_last_ bits shifted off >> -| were all zero. This extra result is stored in the location pointed to by >> -| `z2Ptr'. The value of `count' can be arbitrarily large. >> -| (This routine makes more sense if `a0', `a1', and `a2' are considered >> -| to form a fixed-point value with binary point between `a1' and `a2'. This >> -| fixed-point value is shifted right by the number of bits given in `count', >> -| and the integer part of the result is returned at the locations pointed to >> -| by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly >> -| corrupted as described above, and is returned at the location pointed to by >> -| `z2Ptr'.) >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right >> +by 64 _plus_ the number of bits given in `count'. The shifted result is >> +at most 128 nonzero bits; these are broken into two 64-bit pieces which are >> +stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted >> +off form a third 64-bit result as follows: The _last_ bit shifted off is >> +the most-significant bit of the extra result, and the other 63 bits of the >> +extra result are all zero if and only if _all_but_the_last_ bits shifted off >> +were all zero. This extra result is stored in the location pointed to by >> +`z2Ptr'. The value of `count' can be arbitrarily large. >> + (This routine makes more sense if `a0', `a1', and `a2' are considered >> +to form a fixed-point value with binary point between `a1' and `a2'. This >> +fixed-point value is shifted right by the number of bits given in `count', >> +and the integer part of the result is returned at the locations pointed to >> +by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly >> +corrupted as described above, and is returned at the location pointed to by >> +`z2Ptr'.) >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> shift128ExtraRightJamming( >> uint64_t a0, >> @@ -289,13 +297,14 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the >> -| number of bits given in `count'. Any bits shifted off are lost. The value >> -| of `count' must be less than 64. The result is broken into two 64-bit >> -| pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the >> +number of bits given in `count'. Any bits shifted off are lost. The value >> +of `count' must be less than 64. The result is broken into two 64-bit >> +pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> shortShift128Left( >> uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) >> @@ -307,14 +316,15 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left >> -| by the number of bits given in `count'. Any bits shifted off are lost. >> -| The value of `count' must be less than 64. The result is broken into three >> -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', >> -| `z1Ptr', and `z2Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left >> +by the number of bits given in `count'. Any bits shifted off are lost. >> +The value of `count' must be less than 64. The result is broken into three >> +64-bit pieces which are stored at the locations pointed to by `z0Ptr', >> +`z1Ptr', and `z2Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> shortShift192Left( >> uint64_t a0, >> @@ -343,13 +353,14 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit >> -| value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so >> -| any carry out is lost. The result is broken into two 64-bit pieces which >> -| are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit >> +value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so >> +any carry out is lost. The result is broken into two 64-bit pieces which >> +are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> add128( >> uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) >> @@ -362,14 +373,15 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the >> -| 192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is >> -| modulo 2^192, so any carry out is lost. The result is broken into three >> -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', >> -| `z1Ptr', and `z2Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the >> +192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is >> +modulo 2^192, so any carry out is lost. The result is broken into three >> +64-bit pieces which are stored at the locations pointed to by `z0Ptr', >> +`z1Ptr', and `z2Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> add192( >> uint64_t a0, >> @@ -400,14 +412,15 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the >> -| 128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo >> -| 2^128, so any borrow out (carry out) is lost. The result is broken into two >> -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and >> -| `z1Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the >> +128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo >> +2^128, so any borrow out (carry out) is lost. The result is broken into two >> +64-bit pieces which are stored at the locations pointed to by `z0Ptr' and >> +`z1Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> sub128( >> uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) >> @@ -418,14 +431,15 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' >> -| from the 192-bit value formed by concatenating `a0', `a1', and `a2'. >> -| Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The >> -| result is broken into three 64-bit pieces which are stored at the locations >> -| pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' >> +from the 192-bit value formed by concatenating `a0', `a1', and `a2'. >> +Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The >> +result is broken into three 64-bit pieces which are stored at the locations >> +pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> sub192( >> uint64_t a0, >> @@ -456,11 +470,13 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Multiplies `a' by `b' to obtain a 128-bit product. The product is broken >> -| into two 64-bit pieces which are stored at the locations pointed to by >> -| `z0Ptr' and `z1Ptr'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Multiplies `a' by `b' to obtain a 128-bit product. The product is broken >> +into two 64-bit pieces which are stored at the locations pointed to by >> +`z0Ptr' and `z1Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr ) >> { >> @@ -485,13 +501,14 @@ INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' by >> -| `b' to obtain a 192-bit product. The product is broken into three 64-bit >> -| pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and >> -| `z2Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Multiplies the 128-bit value formed by concatenating `a0' and `a1' by >> +`b' to obtain a 192-bit product. The product is broken into three 64-bit >> +pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and >> +`z2Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> mul128By64To192( >> uint64_t a0, >> @@ -513,13 +530,14 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the >> -| 128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit >> -| product. The product is broken into four 64-bit pieces which are stored at >> -| the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the >> +128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit >> +product. The product is broken into four 64-bit pieces which are stored at >> +the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE void >> mul128To256( >> uint64_t a0, >> @@ -550,14 +568,16 @@ INLINE void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns an approximation to the 64-bit integer quotient obtained by dividing >> -| `b' into the 128-bit value formed by concatenating `a0' and `a1'. The >> -| divisor `b' must be at least 2^63. If q is the exact quotient truncated >> -| toward zero, the approximation returned lies between q and q + 2 inclusive. >> -| If the exact quotient q is larger than 64 bits, the maximum positive 64-bit >> -| unsigned integer is returned. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns an approximation to the 64-bit integer quotient obtained by dividing >> +`b' into the 128-bit value formed by concatenating `a0' and `a1'. The >> +divisor `b' must be at least 2^63. If q is the exact quotient truncated >> +toward zero, the approximation returned lies between q and q + 2 inclusive. >> +If the exact quotient q is larger than 64 bits, the maximum positive 64-bit >> +unsigned integer is returned. >> +------------------------------------------------------------------------------- >> +*/ >> >> static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) >> { >> @@ -581,15 +601,17 @@ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns an approximation to the square root of the 32-bit significand given >> -| by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of >> -| `aExp' (the least significant bit) is 1, the integer returned approximates >> -| 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' >> -| is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either >> -| case, the approximation returned lies strictly within +/-2 of the exact >> -| value. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns an approximation to the square root of the 32-bit significand given >> +by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of >> +`aExp' (the least significant bit) is 1, the integer returned approximates >> +2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' >> +is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either >> +case, the approximation returned lies strictly within +/-2 of the exact >> +value. >> +------------------------------------------------------------------------------- >> +*/ >> >> static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) >> { >> @@ -620,10 +642,12 @@ static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the number of leading 0 bits before the most-significant 1 bit of >> -| `a'. If `a' is zero, 32 is returned. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the number of leading 0 bits before the most-significant 1 bit of >> +`a'. If `a' is zero, 32 is returned. >> +------------------------------------------------------------------------------- >> +*/ >> >> static int8 countLeadingZeros32( uint32_t a ) >> { >> @@ -668,10 +692,12 @@ static int8 countLeadingZeros32( uint32_t a ) >> #endif >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the number of leading 0 bits before the most-significant 1 bit of >> -| `a'. If `a' is zero, 64 is returned. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the number of leading 0 bits before the most-significant 1 bit of >> +`a'. If `a' is zero, 64 is returned. >> +------------------------------------------------------------------------------- >> +*/ >> >> static int8 countLeadingZeros64( uint64_t a ) >> { >> @@ -696,11 +722,13 @@ static int8 countLeadingZeros64( uint64_t a ) >> #endif >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' >> -| is equal to the 128-bit value formed by concatenating `b0' and `b1'. >> -| Otherwise, returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' >> +is equal to the 128-bit value formed by concatenating `b0' and `b1'. >> +Otherwise, returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) >> { >> @@ -709,11 +737,13 @@ INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less >> -| than or equal to the 128-bit value formed by concatenating `b0' and `b1'. >> -| Otherwise, returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less >> +than or equal to the 128-bit value formed by concatenating `b0' and `b1'. >> +Otherwise, returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) >> { >> @@ -722,11 +752,13 @@ INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less >> -| than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, >> -| returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less >> +than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, >> +returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) >> { >> @@ -735,11 +767,13 @@ INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is >> -| not equal to the 128-bit value formed by concatenating `b0' and `b1'. >> -| Otherwise, returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is >> +not equal to the 128-bit value formed by concatenating `b0' and `b1'. >> +Otherwise, returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE flag ne128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) >> { >> diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h >> index 518f694..ba9bfeb 100644 >> --- a/fpu/softfloat-specialize.h >> +++ b/fpu/softfloat-specialize.h >> @@ -4,10 +4,11 @@ >> * Derived from SoftFloat. >> */ >> >> -/*============================================================================ >> +/* >> +=============================================================================== >> >> This C source fragment is part of the SoftFloat IEC/IEEE Floating-point >> -Arithmetic Package, Release 2b. >> +Arithmetic Package, Release 2a. >> >> Written by John R. Hauser. This work was made possible in part by the >> International Computer Science Institute, located at Suite 600, 1947 Center >> @@ -16,22 +17,19 @@ National Science Foundation under grant MIP-9311980. The original version >> of this code was written as part of a project to build a fixed-point vector >> processor in collaboration with the University of California at Berkeley, >> overseen by Profs. Nelson Morgan and John Wawrzynek. More information >> -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ >> +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ >> arithmetic/SoftFloat.html'. >> >> -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has >> -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES >> -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS >> -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, >> -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE >> -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE >> -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR >> -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. >> +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort >> +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT >> +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO >> +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY >> +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. >> >> Derivative works are acceptable, even for commercial purposes, so long as >> -(1) the source code for the derivative work includes prominent notice that >> -the work is derivative, and (2) the source code includes prominent notice with >> -these four paragraphs for those parts of this code that are retained. >> +(1) they include prominent notice that the work is derivative, and (2) they >> +include prominent notice akin to these four paragraphs for those parts of >> +this code that are retained. >> >> =============================================================================*/ >> >> @@ -48,9 +46,11 @@ these four paragraphs for those parts of this code that are retained. >> #define NO_SIGNALING_NANS 1 >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated half-precision NaN. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated half-precision NaN. >> +------------------------------------------------------------------------------- >> +*/ >> #if defined(TARGET_ARM) >> const float16 float16_default_nan = const_float16(0x7E00); >> #elif SNAN_BIT_IS_ONE >> @@ -59,9 +59,11 @@ const float16 float16_default_nan = const_float16(0x7DFF); >> const float16 float16_default_nan = const_float16(0xFE00); >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated single-precision NaN. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated single-precision NaN. >> +------------------------------------------------------------------------------- >> +*/ >> #if defined(TARGET_SPARC) >> const float32 float32_default_nan = const_float32(0x7FFFFFFF); >> #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) || \ >> @@ -73,9 +75,11 @@ const float32 float32_default_nan = const_float32(0x7FBFFFFF); >> const float32 float32_default_nan = const_float32(0xFFC00000); >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated double-precision NaN. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated double-precision NaN. >> +------------------------------------------------------------------------------- >> +*/ >> #if defined(TARGET_SPARC) >> const float64 float64_default_nan = const_float64(LIT64( 0x7FFFFFFFFFFFFFFF )); >> #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) >> @@ -86,9 +90,11 @@ const float64 float64_default_nan = const_float64(LIT64( 0x7FF7FFFFFFFFFFFF )); >> const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated extended double-precision NaN. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated extended double-precision NaN. >> +------------------------------------------------------------------------------- >> +*/ >> #if SNAN_BIT_IS_ONE >> #define floatx80_default_nan_high 0x7FFF >> #define floatx80_default_nan_low LIT64( 0xBFFFFFFFFFFFFFFF ) >> @@ -100,10 +106,12 @@ const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); >> const floatx80 floatx80_default_nan >> = make_floatx80_init(floatx80_default_nan_high, floatx80_default_nan_low); >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated quadruple-precision NaN. The `high' and >> -| `low' values hold the most- and least-significant bits, respectively. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated quadruple-precision NaN. The `high' and >> +`low' values hold the most- and least-significant bits, respectively. >> +------------------------------------------------------------------------------- >> +*/ >> #if SNAN_BIT_IS_ONE >> #define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF ) >> #define float128_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) >> @@ -115,21 +123,25 @@ const floatx80 floatx80_default_nan >> const float128 float128_default_nan >> = make_float128_init(float128_default_nan_high, float128_default_nan_low); >> >> -/*---------------------------------------------------------------------------- >> -| Raises the exceptions specified by `flags'. Floating-point traps can be >> -| defined here if desired. It is currently not possible for such a trap >> -| to substitute a result value. If traps are not implemented, this routine >> -| should be simply `float_exception_flags |= flags;'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Raises the exceptions specified by `flags'. Floating-point traps can be >> +defined here if desired. It is currently not possible for such a trap >> +to substitute a result value. If traps are not implemented, this routine >> +should be simply `float_exception_flags |= flags;'. >> +------------------------------------------------------------------------------- >> +*/ >> >> void float_raise( int8 flags STATUS_PARAM ) >> { >> STATUS(float_exception_flags) |= flags; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Internal canonical NaN format. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Internal canonical NaN format. >> +------------------------------------------------------------------------------- >> +*/ >> typedef struct { >> flag sign; >> uint64_t high, low; >> @@ -146,10 +158,12 @@ int float16_is_signaling_nan(float16 a_) >> return 0; >> } >> #else >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the half-precision floating-point value `a' is a quiet >> -| NaN; otherwise returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the half-precision floating-point value `a' is a quiet >> +NaN; otherwise returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float16_is_quiet_nan(float16 a_) >> { >> @@ -161,10 +175,12 @@ int float16_is_quiet_nan(float16 a_) >> #endif >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the half-precision floating-point value `a' is a signaling >> -| NaN; otherwise returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the half-precision floating-point value `a' is a signaling >> +NaN; otherwise returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float16_is_signaling_nan(float16 a_) >> { >> @@ -177,10 +193,12 @@ int float16_is_signaling_nan(float16 a_) >> } >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| Returns a quiet NaN if the half-precision floating point value `a' is a >> -| signaling NaN; otherwise returns `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns a quiet NaN if the half-precision floating point value `a' is a >> +signaling NaN; otherwise returns `a'. >> +------------------------------------------------------------------------------- >> +*/ >> float16 float16_maybe_silence_nan(float16 a_) >> { >> if (float16_is_signaling_nan(a_)) { >> @@ -199,11 +217,13 @@ float16 float16_maybe_silence_nan(float16 a_) >> return a_; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the half-precision floating-point NaN >> -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid >> -| exception is raised. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the half-precision floating-point NaN >> +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid >> +exception is raised. >> +------------------------------------------------------------------------------- >> +*/ >> >> static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) >> { >> @@ -216,10 +236,12 @@ static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) >> return z; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the canonical NaN `a' to the half- >> -| precision floating-point format. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the canonical NaN `a' to the half- >> +precision floating-point format. >> +------------------------------------------------------------------------------- >> +*/ >> >> static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM) >> { >> @@ -248,10 +270,12 @@ int float32_is_signaling_nan(float32 a_) >> return 0; >> } >> #else >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point value `a' is a quiet >> -| NaN; otherwise returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point value `a' is a quiet >> +NaN; otherwise returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_is_quiet_nan( float32 a_ ) >> { >> @@ -263,10 +287,12 @@ int float32_is_quiet_nan( float32 a_ ) >> #endif >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point value `a' is a signaling >> -| NaN; otherwise returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point value `a' is a signaling >> +NaN; otherwise returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_is_signaling_nan( float32 a_ ) >> { >> @@ -279,10 +305,12 @@ int float32_is_signaling_nan( float32 a_ ) >> } >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| Returns a quiet NaN if the single-precision floating point value `a' is a >> -| signaling NaN; otherwise returns `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns a quiet NaN if the single-precision floating point value `a' is a >> +signaling NaN; otherwise returns `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> float32 float32_maybe_silence_nan( float32 a_ ) >> { >> @@ -302,12 +330,13 @@ float32 float32_maybe_silence_nan( float32 a_ ) >> return a_; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the single-precision floating-point NaN >> -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid >> -| exception is raised. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the single-precision floating-point NaN >> +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid >> +exception is raised. >> +------------------------------------------------------------------------------- >> +*/ >> static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) >> { >> commonNaNT z; >> @@ -319,10 +348,12 @@ static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) >> return z; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the canonical NaN `a' to the single- >> -| precision floating-point format. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the canonical NaN `a' to the single- >> +precision floating-point format. >> +------------------------------------------------------------------------------- >> +*/ >> >> static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) >> { >> @@ -339,22 +370,24 @@ static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) >> return float32_default_nan; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Select which NaN to propagate for a two-input operation. >> -| IEEE754 doesn't specify all the details of this, so the >> -| algorithm is target-specific. >> -| The routine is passed various bits of information about the >> -| two NaNs and should return 0 to select NaN a and 1 for NaN b. >> -| Note that signalling NaNs are always squashed to quiet NaNs >> -| by the caller, by calling floatXX_maybe_silence_nan() before >> -| returning them. >> -| >> -| aIsLargerSignificand is only valid if both a and b are NaNs >> -| of some kind, and is true if a has the larger significand, >> -| or if both a and b have the same significand but a is >> -| positive but b is negative. It is only needed for the x87 >> -| tie-break rule. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Select which NaN to propagate for a two-input operation. >> +IEEE754 doesn't specify all the details of this, so the >> +algorithm is target-specific. >> +The routine is passed various bits of information about the >> +two NaNs and should return 0 to select NaN a and 1 for NaN b. >> +Note that signalling NaNs are always squashed to quiet NaNs >> +by the caller, by calling floatXX_maybe_silence_nan() before >> +returning them. >> + >> +aIsLargerSignificand is only valid if both a and b are NaNs >> +of some kind, and is true if a has the larger significand, >> +or if both a and b have the same significand but a is >> +positive but b is negative. It is only needed for the x87 >> +tie-break rule. >> +------------------------------------------------------------------------------- >> +*/ >> >> #if defined(TARGET_ARM) >> static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, >> @@ -451,12 +484,14 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, >> } >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| Select which NaN to propagate for a three-input operation. >> -| For the moment we assume that no CPU needs the 'larger significand' >> -| information. >> -| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Select which NaN to propagate for a three-input operation. >> +For the moment we assume that no CPU needs the 'larger significand' >> +information. >> +Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN >> +------------------------------------------------------------------------------- >> +*/ >> #if defined(TARGET_ARM) >> static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, >> flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM) >> @@ -554,12 +589,13 @@ static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, >> } >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| Takes two single-precision floating-point values `a' and `b', one of which >> -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a >> -| signaling NaN, the invalid exception is raised. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes two single-precision floating-point values `a' and `b', one of which >> +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a >> +signaling NaN, the invalid exception is raised. >> +------------------------------------------------------------------------------- >> +*/ >> static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) >> { >> flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; >> @@ -594,14 +630,16 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) >> } >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes three single-precision floating-point values `a', `b' and `c', one of >> -| which is a NaN, and returns the appropriate NaN result. If any of `a', >> -| `b' or `c' is a signaling NaN, the invalid exception is raised. >> -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case >> -| obviously c is a NaN, and whether to propagate c or some other NaN is >> -| implementation defined). >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Takes three single-precision floating-point values `a', `b' and `c', one of >> +which is a NaN, and returns the appropriate NaN result. If any of `a', >> +`b' or `c' is a signaling NaN, the invalid exception is raised. >> +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case >> +obviously c is a NaN, and whether to propagate c or some other NaN is >> +implementation defined). >> +------------------------------------------------------------------------------- >> +*/ >> >> static float32 propagateFloat32MulAddNaN(float32 a, float32 b, >> float32 c, flag infzero STATUS_PARAM) >> @@ -656,10 +694,12 @@ int float64_is_signaling_nan(float64 a_) >> return 0; >> } >> #else >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point value `a' is a quiet >> -| NaN; otherwise returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point value `a' is a quiet >> +NaN; otherwise returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_is_quiet_nan( float64 a_ ) >> { >> @@ -673,10 +713,12 @@ int float64_is_quiet_nan( float64 a_ ) >> #endif >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point value `a' is a signaling >> -| NaN; otherwise returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point value `a' is a signaling >> +NaN; otherwise returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_is_signaling_nan( float64 a_ ) >> { >> @@ -691,10 +733,12 @@ int float64_is_signaling_nan( float64 a_ ) >> } >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| Returns a quiet NaN if the double-precision floating point value `a' is a >> -| signaling NaN; otherwise returns `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns a quiet NaN if the double-precision floating point value `a' is a >> +signaling NaN; otherwise returns `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> float64 float64_maybe_silence_nan( float64 a_ ) >> { >> @@ -714,12 +758,13 @@ float64 float64_maybe_silence_nan( float64 a_ ) >> return a_; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the double-precision floating-point NaN >> -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid >> -| exception is raised. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the double-precision floating-point NaN >> +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid >> +exception is raised. >> +------------------------------------------------------------------------------- >> +*/ >> static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) >> { >> commonNaNT z; >> @@ -731,10 +776,12 @@ static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) >> return z; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the canonical NaN `a' to the double- >> -| precision floating-point format. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the canonical NaN `a' to the double- >> +precision floating-point format. >> +------------------------------------------------------------------------------- >> +*/ >> >> static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) >> { >> @@ -753,12 +800,13 @@ static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) >> return float64_default_nan; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes two double-precision floating-point values `a' and `b', one of which >> -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a >> -| signaling NaN, the invalid exception is raised. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes two double-precision floating-point values `a' and `b', one of which >> +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a >> +signaling NaN, the invalid exception is raised. >> +------------------------------------------------------------------------------- >> +*/ >> static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) >> { >> flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; >> @@ -793,14 +841,16 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) >> } >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes three double-precision floating-point values `a', `b' and `c', one of >> -| which is a NaN, and returns the appropriate NaN result. If any of `a', >> -| `b' or `c' is a signaling NaN, the invalid exception is raised. >> -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case >> -| obviously c is a NaN, and whether to propagate c or some other NaN is >> -| implementation defined). >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Takes three double-precision floating-point values `a', `b' and `c', one of >> +which is a NaN, and returns the appropriate NaN result. If any of `a', >> +`b' or `c' is a signaling NaN, the invalid exception is raised. >> +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case >> +obviously c is a NaN, and whether to propagate c or some other NaN is >> +implementation defined). >> +------------------------------------------------------------------------------- >> +*/ >> >> static float64 propagateFloat64MulAddNaN(float64 a, float64 b, >> float64 c, flag infzero STATUS_PARAM) >> @@ -855,11 +905,13 @@ int floatx80_is_signaling_nan(floatx80 a_) >> return 0; >> } >> #else >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point value `a' is a >> -| quiet NaN; otherwise returns 0. This slightly differs from the same >> -| function for other types as floatx80 has an explicit bit. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point value `a' is a >> +quiet NaN; otherwise returns 0. This slightly differs from the same >> +function for other types as floatx80 has an explicit bit. >> +------------------------------------------------------------------------------- >> +*/ >> >> int floatx80_is_quiet_nan( floatx80 a ) >> { >> @@ -877,11 +929,13 @@ int floatx80_is_quiet_nan( floatx80 a ) >> #endif >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point value `a' is a >> -| signaling NaN; otherwise returns 0. This slightly differs from the same >> -| function for other types as floatx80 has an explicit bit. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point value `a' is a >> +signaling NaN; otherwise returns 0. This slightly differs from the same >> +function for other types as floatx80 has an explicit bit. >> +------------------------------------------------------------------------------- >> +*/ >> >> int floatx80_is_signaling_nan( floatx80 a ) >> { >> @@ -900,10 +954,12 @@ int floatx80_is_signaling_nan( floatx80 a ) >> } >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| Returns a quiet NaN if the extended double-precision floating point value >> -| `a' is a signaling NaN; otherwise returns `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns a quiet NaN if the extended double-precision floating point value >> +`a' is a signaling NaN; otherwise returns `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> floatx80 floatx80_maybe_silence_nan( floatx80 a ) >> { >> @@ -923,12 +979,13 @@ floatx80 floatx80_maybe_silence_nan( floatx80 a ) >> return a; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the extended double-precision floating- >> -| point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the >> -| invalid exception is raised. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the extended double-precision floating- >> +point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the >> +invalid exception is raised. >> +------------------------------------------------------------------------------- >> +*/ >> static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) >> { >> commonNaNT z; >> @@ -946,10 +1003,12 @@ static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) >> return z; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the canonical NaN `a' to the extended >> -| double-precision floating-point format. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the canonical NaN `a' to the extended >> +double-precision floating-point format. >> +------------------------------------------------------------------------------- >> +*/ >> >> static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) >> { >> @@ -972,12 +1031,13 @@ static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) >> return z; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes two extended double-precision floating-point values `a' and `b', one >> -| of which is a NaN, and returns the appropriate NaN result. If either `a' or >> -| `b' is a signaling NaN, the invalid exception is raised. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes two extended double-precision floating-point values `a' and `b', one >> +of which is a NaN, and returns the appropriate NaN result. If either `a' or >> +`b' is a signaling NaN, the invalid exception is raised. >> +------------------------------------------------------------------------------- >> +*/ >> static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARAM) >> { >> flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; >> @@ -1023,10 +1083,12 @@ int float128_is_signaling_nan(float128 a_) >> return 0; >> } >> #else >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point value `a' is a quiet >> -| NaN; otherwise returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point value `a' is a quiet >> +NaN; otherwise returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_is_quiet_nan( float128 a ) >> { >> @@ -1041,10 +1103,12 @@ int float128_is_quiet_nan( float128 a ) >> #endif >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point value `a' is a >> -| signaling NaN; otherwise returns 0. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point value `a' is a >> +signaling NaN; otherwise returns 0. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_is_signaling_nan( float128 a ) >> { >> @@ -1060,10 +1124,12 @@ int float128_is_signaling_nan( float128 a ) >> } >> #endif >> >> -/*---------------------------------------------------------------------------- >> -| Returns a quiet NaN if the quadruple-precision floating point value `a' is >> -| a signaling NaN; otherwise returns `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns a quiet NaN if the quadruple-precision floating point value `a' is >> +a signaling NaN; otherwise returns `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> float128 float128_maybe_silence_nan( float128 a ) >> { >> @@ -1083,12 +1149,13 @@ float128 float128_maybe_silence_nan( float128 a ) >> return a; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the quadruple-precision floating-point NaN >> -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid >> -| exception is raised. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the quadruple-precision floating-point NaN >> +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid >> +exception is raised. >> +------------------------------------------------------------------------------- >> +*/ >> static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) >> { >> commonNaNT z; >> @@ -1099,10 +1166,12 @@ static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) >> return z; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the canonical NaN `a' to the quadruple- >> -| precision floating-point format. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the canonical NaN `a' to the quadruple- >> +precision floating-point format. >> +------------------------------------------------------------------------------- >> +*/ >> >> static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) >> { >> @@ -1119,12 +1188,13 @@ static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) >> return z; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes two quadruple-precision floating-point values `a' and `b', one of >> -| which is a NaN, and returns the appropriate NaN result. If either `a' or >> -| `b' is a signaling NaN, the invalid exception is raised. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes two quadruple-precision floating-point values `a' and `b', one of >> +which is a NaN, and returns the appropriate NaN result. If either `a' or >> +`b' is a signaling NaN, the invalid exception is raised. >> +------------------------------------------------------------------------------- >> +*/ >> static float128 propagateFloat128NaN( float128 a, float128 b STATUS_PARAM) >> { >> flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; >> diff --git a/fpu/softfloat.c b/fpu/softfloat.c >> index 7ba51b6..9145582 100644 >> --- a/fpu/softfloat.c >> +++ b/fpu/softfloat.c >> @@ -4,10 +4,11 @@ >> * Derived from SoftFloat. >> */ >> >> -/*============================================================================ >> +/* >> +=============================================================================== >> >> -This C source file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic >> -Package, Release 2b. >> +This C source file is part of the SoftFloat IEC/IEEE Floating-point >> +Arithmetic Package, Release 2a. >> >> Written by John R. Hauser. This work was made possible in part by the >> International Computer Science Institute, located at Suite 600, 1947 Center >> @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version >> of this code was written as part of a project to build a fixed-point vector >> processor in collaboration with the University of California at Berkeley, >> overseen by Profs. Nelson Morgan and John Wawrzynek. More information >> -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ >> +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ >> arithmetic/SoftFloat.html'. >> >> -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has >> -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES >> -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS >> -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, >> -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE >> -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE >> -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR >> -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. >> +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort >> +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT >> +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO >> +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY >> +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. >> >> Derivative works are acceptable, even for commercial purposes, so long as >> -(1) the source code for the derivative work includes prominent notice that >> -the work is derivative, and (2) the source code includes prominent notice with >> -these four paragraphs for those parts of this code that are retained. >> +(1) they include prominent notice that the work is derivative, and (2) they >> +include prominent notice akin to these four paragraphs for those parts of >> +this code that are retained. >> >> -=============================================================================*/ >> +=============================================================================== >> +*/ >> >> /* softfloat (and in particular the code in softfloat-specialize.h) is >> * target-dependent and needs the TARGET_* macros. >> @@ -42,21 +41,25 @@ these four paragraphs for those parts of this code that are retained. >> >> #include "fpu/softfloat.h" >> >> -/*---------------------------------------------------------------------------- >> -| Primitive arithmetic functions, including multi-word arithmetic, and >> -| division and square root approximations. (Can be specialized to target if >> -| desired.) >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Primitive arithmetic functions, including multi-word arithmetic, and >> +division and square root approximations. (Can be specialized to target if >> +desired.) >> +------------------------------------------------------------------------------- >> +*/ >> #include "softfloat-macros.h" >> >> -/*---------------------------------------------------------------------------- >> -| Functions and definitions to determine: (1) whether tininess for underflow >> -| is detected before or after rounding by default, (2) what (if anything) >> -| happens when exceptions are raised, (3) how signaling NaNs are distinguished >> -| from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs >> -| are propagated from function inputs to output. These details are target- >> -| specific. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Functions and definitions to determine: (1) whether tininess for underflow >> +is detected before or after rounding by default, (2) what (if anything) >> +happens when exceptions are raised, (3) how signaling NaNs are distinguished >> +from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs >> +are propagated from function inputs to output. These details are target- >> +specific. >> +------------------------------------------------------------------------------- >> +*/ >> #include "softfloat-specialize.h" >> >> void set_float_rounding_mode(int val STATUS_PARAM) >> @@ -74,43 +77,51 @@ void set_floatx80_rounding_precision(int val STATUS_PARAM) >> STATUS(floatx80_rounding_precision) = val; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the fraction bits of the half-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the fraction bits of the half-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE uint32_t extractFloat16Frac(float16 a) >> { >> return float16_val(a) & 0x3ff; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the exponent bits of the half-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the exponent bits of the half-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE int_fast16_t extractFloat16Exp(float16 a) >> { >> return (float16_val(a) >> 10) & 0x1f; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the sign bit of the single-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the sign bit of the single-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE flag extractFloat16Sign(float16 a) >> { >> return float16_val(a)>>15; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 >> -| and 7, and returns the properly rounded 32-bit integer corresponding to the >> -| input. If `zSign' is 1, the input is negated before being converted to an >> -| integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input >> -| is simply rounded to an integer, with the inexact exception raised if the >> -| input cannot be represented exactly as an integer. However, if the fixed- >> -| point input is too large, the invalid exception is raised and the largest >> -| positive or negative integer is returned. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 >> +and 7, and returns the properly rounded 32-bit integer corresponding to the >> +input. If `zSign' is 1, the input is negated before being converted to an >> +integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input >> +is simply rounded to an integer, with the inexact exception raised if the >> +input cannot be represented exactly as an integer. However, if the fixed- >> +point input is too large, the invalid exception is raised and the largest >> +positive or negative integer is returned. >> +------------------------------------------------------------------------------- >> +*/ >> >> static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) >> { >> @@ -150,17 +161,19 @@ static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and >> -| `absZ1', with binary point between bits 63 and 64 (between the input words), >> -| and returns the properly rounded 64-bit integer corresponding to the input. >> -| If `zSign' is 1, the input is negated before being converted to an integer. >> -| Ordinarily, the fixed-point input is simply rounded to an integer, with >> -| the inexact exception raised if the input cannot be represented exactly as >> -| an integer. However, if the fixed-point input is too large, the invalid >> -| exception is raised and the largest positive or negative integer is >> -| returned. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Takes the 128-bit fixed-point value formed by concatenating `absZ0' and >> +`absZ1', with binary point between bits 63 and 64 (between the input words), >> +and returns the properly rounded 64-bit integer corresponding to the input. >> +If `zSign' is 1, the input is negated before being converted to an integer. >> +Ordinarily, the fixed-point input is simply rounded to an integer, with >> +the inexact exception raised if the input cannot be represented exactly as >> +an integer. However, if the fixed-point input is too large, the invalid >> +exception is raised and the largest positive or negative integer is >> +returned. >> +------------------------------------------------------------------------------- >> +*/ >> >> static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATUS_PARAM) >> { >> @@ -203,9 +216,11 @@ static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATU >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the fraction bits of the single-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the fraction bits of the single-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE uint32_t extractFloat32Frac( float32 a ) >> { >> @@ -214,9 +229,11 @@ INLINE uint32_t extractFloat32Frac( float32 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the exponent bits of the single-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the exponent bits of the single-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE int_fast16_t extractFloat32Exp(float32 a) >> { >> @@ -225,10 +242,11 @@ INLINE int_fast16_t extractFloat32Exp(float32 a) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the sign bit of the single-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the sign bit of the single-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE flag extractFloat32Sign( float32 a ) >> { >> >> @@ -236,10 +254,12 @@ INLINE flag extractFloat32Sign( float32 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| If `a' is denormal and we are in flush-to-zero mode then set the >> -| input-denormal exception and return zero. Otherwise just return the value. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +If `a' is denormal and we are in flush-to-zero mode then set the >> +input-denormal exception and return zero. Otherwise just return the value. >> +------------------------------------------------------------------------------- >> +*/ >> static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) >> { >> if (STATUS(flush_inputs_to_zero)) { >> @@ -251,13 +271,14 @@ static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) >> return a; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Normalizes the subnormal single-precision floating-point value represented >> -| by the denormalized significand `aSig'. The normalized exponent and >> -| significand are stored at the locations pointed to by `zExpPtr' and >> -| `zSigPtr', respectively. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Normalizes the subnormal single-precision floating-point value represented >> +by the denormalized significand `aSig'. The normalized exponent and >> +significand are stored at the locations pointed to by `zExpPtr' and >> +`zSigPtr', respectively. >> +------------------------------------------------------------------------------- >> +*/ >> static void >> normalizeFloat32Subnormal(uint32_t aSig, int_fast16_t *zExpPtr, uint32_t *zSigPtr) >> { >> @@ -269,16 +290,18 @@ static void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a >> -| single-precision floating-point value, returning the result. After being >> -| shifted into the proper positions, the three fields are simply added >> -| together to form the result. This means that any integer portion of `zSig' >> -| will be added into the exponent. Since a properly normalized significand >> -| will have an integer portion equal to 1, the `zExp' input should be 1 less >> -| than the desired result exponent whenever `zSig' is a complete, normalized >> -| significand. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a >> +single-precision floating-point value, returning the result. After being >> +shifted into the proper positions, the three fields are simply added >> +together to form the result. This means that any integer portion of `zSig' >> +will be added into the exponent. Since a properly normalized significand >> +will have an integer portion equal to 1, the `zExp' input should be 1 less >> +than the desired result exponent whenever `zSig' is a complete, normalized >> +significand. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) >> { >> @@ -288,27 +311,29 @@ INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> -| and significand `zSig', and returns the proper single-precision floating- >> -| point value corresponding to the abstract input. Ordinarily, the abstract >> -| value is simply rounded and packed into the single-precision format, with >> -| the inexact exception raised if the abstract input cannot be represented >> -| exactly. However, if the abstract value is too large, the overflow and >> -| inexact exceptions are raised and an infinity or maximal finite value is >> -| returned. If the abstract value is too small, the input value is rounded to >> -| a subnormal number, and the underflow and inexact exceptions are raised if >> -| the abstract input cannot be represented exactly as a subnormal single- >> -| precision floating-point number. >> -| The input significand `zSig' has its binary point between bits 30 >> -| and 29, which is 7 bits to the left of the usual location. This shifted >> -| significand must be normalized or smaller. If `zSig' is not normalized, >> -| `zExp' must be 0; in that case, the result returned is a subnormal number, >> -| and it must not require rounding. In the usual case that `zSig' is >> -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. >> -| The handling of underflow and overflow follows the IEC/IEEE Standard for >> -| Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> +and significand `zSig', and returns the proper single-precision floating- >> +point value corresponding to the abstract input. Ordinarily, the abstract >> +value is simply rounded and packed into the single-precision format, with >> +the inexact exception raised if the abstract input cannot be represented >> +exactly. However, if the abstract value is too large, the overflow and >> +inexact exceptions are raised and an infinity or maximal finite value is >> +returned. If the abstract value is too small, the input value is rounded to >> +a subnormal number, and the underflow and inexact exceptions are raised if >> +the abstract input cannot be represented exactly as a subnormal single- >> +precision floating-point number. >> + The input significand `zSig' has its binary point between bits 30 >> +and 29, which is 7 bits to the left of the usual location. This shifted >> +significand must be normalized or smaller. If `zSig' is not normalized, >> +`zExp' must be 0; in that case, the result returned is a subnormal number, >> +and it must not require rounding. In the usual case that `zSig' is >> +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. >> +The handling of underflow and overflow follows the IEC/IEEE Standard for >> +Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) >> { >> @@ -366,15 +391,16 @@ static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> -| and significand `zSig', and returns the proper single-precision floating- >> -| point value corresponding to the abstract input. This routine is just like >> -| `roundAndPackFloat32' except that `zSig' does not have to be normalized. >> -| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' >> -| floating-point exponent. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> +and significand `zSig', and returns the proper single-precision floating- >> +point value corresponding to the abstract input. This routine is just like >> +`roundAndPackFloat32' except that `zSig' does not have to be normalized. >> +Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' >> +floating-point exponent. >> +------------------------------------------------------------------------------- >> +*/ >> static float32 >> normalizeRoundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) >> { >> @@ -385,9 +411,11 @@ static float32 >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the fraction bits of the double-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the fraction bits of the double-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE uint64_t extractFloat64Frac( float64 a ) >> { >> @@ -396,9 +424,11 @@ INLINE uint64_t extractFloat64Frac( float64 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the exponent bits of the double-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the exponent bits of the double-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE int_fast16_t extractFloat64Exp(float64 a) >> { >> @@ -407,10 +437,11 @@ INLINE int_fast16_t extractFloat64Exp(float64 a) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the sign bit of the double-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the sign bit of the double-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE flag extractFloat64Sign( float64 a ) >> { >> >> @@ -418,10 +449,12 @@ INLINE flag extractFloat64Sign( float64 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| If `a' is denormal and we are in flush-to-zero mode then set the >> -| input-denormal exception and return zero. Otherwise just return the value. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +If `a' is denormal and we are in flush-to-zero mode then set the >> +input-denormal exception and return zero. Otherwise just return the value. >> +------------------------------------------------------------------------------- >> +*/ >> static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) >> { >> if (STATUS(flush_inputs_to_zero)) { >> @@ -433,13 +466,14 @@ static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) >> return a; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Normalizes the subnormal double-precision floating-point value represented >> -| by the denormalized significand `aSig'. The normalized exponent and >> -| significand are stored at the locations pointed to by `zExpPtr' and >> -| `zSigPtr', respectively. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Normalizes the subnormal double-precision floating-point value represented >> +by the denormalized significand `aSig'. The normalized exponent and >> +significand are stored at the locations pointed to by `zExpPtr' and >> +`zSigPtr', respectively. >> +------------------------------------------------------------------------------- >> +*/ >> static void >> normalizeFloat64Subnormal(uint64_t aSig, int_fast16_t *zExpPtr, uint64_t *zSigPtr) >> { >> @@ -451,16 +485,18 @@ static void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a >> -| double-precision floating-point value, returning the result. After being >> -| shifted into the proper positions, the three fields are simply added >> -| together to form the result. This means that any integer portion of `zSig' >> -| will be added into the exponent. Since a properly normalized significand >> -| will have an integer portion equal to 1, the `zExp' input should be 1 less >> -| than the desired result exponent whenever `zSig' is a complete, normalized >> -| significand. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a >> +double-precision floating-point value, returning the result. After being >> +shifted into the proper positions, the three fields are simply added >> +together to form the result. This means that any integer portion of `zSig' >> +will be added into the exponent. Since a properly normalized significand >> +will have an integer portion equal to 1, the `zExp' input should be 1 less >> +than the desired result exponent whenever `zSig' is a complete, normalized >> +significand. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) >> { >> @@ -470,27 +506,29 @@ INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> -| and significand `zSig', and returns the proper double-precision floating- >> -| point value corresponding to the abstract input. Ordinarily, the abstract >> -| value is simply rounded and packed into the double-precision format, with >> -| the inexact exception raised if the abstract input cannot be represented >> -| exactly. However, if the abstract value is too large, the overflow and >> -| inexact exceptions are raised and an infinity or maximal finite value is >> -| returned. If the abstract value is too small, the input value is rounded >> -| to a subnormal number, and the underflow and inexact exceptions are raised >> -| if the abstract input cannot be represented exactly as a subnormal double- >> -| precision floating-point number. >> -| The input significand `zSig' has its binary point between bits 62 >> -| and 61, which is 10 bits to the left of the usual location. This shifted >> -| significand must be normalized or smaller. If `zSig' is not normalized, >> -| `zExp' must be 0; in that case, the result returned is a subnormal number, >> -| and it must not require rounding. In the usual case that `zSig' is >> -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. >> -| The handling of underflow and overflow follows the IEC/IEEE Standard for >> -| Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> +and significand `zSig', and returns the proper double-precision floating- >> +point value corresponding to the abstract input. Ordinarily, the abstract >> +value is simply rounded and packed into the double-precision format, with >> +the inexact exception raised if the abstract input cannot be represented >> +exactly. However, if the abstract value is too large, the overflow and >> +inexact exceptions are raised and an infinity or maximal finite value is >> +returned. If the abstract value is too small, the input value is rounded >> +to a subnormal number, and the underflow and inexact exceptions are raised >> +if the abstract input cannot be represented exactly as a subnormal double- >> +precision floating-point number. >> + The input significand `zSig' has its binary point between bits 62 >> +and 61, which is 10 bits to the left of the usual location. This shifted >> +significand must be normalized or smaller. If `zSig' is not normalized, >> +`zExp' must be 0; in that case, the result returned is a subnormal number, >> +and it must not require rounding. In the usual case that `zSig' is >> +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. >> +The handling of underflow and overflow follows the IEC/IEEE Standard for >> +Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) >> { >> @@ -548,15 +586,16 @@ static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> -| and significand `zSig', and returns the proper double-precision floating- >> -| point value corresponding to the abstract input. This routine is just like >> -| `roundAndPackFloat64' except that `zSig' does not have to be normalized. >> -| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' >> -| floating-point exponent. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> +and significand `zSig', and returns the proper double-precision floating- >> +point value corresponding to the abstract input. This routine is just like >> +`roundAndPackFloat64' except that `zSig' does not have to be normalized. >> +Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' >> +floating-point exponent. >> +------------------------------------------------------------------------------- >> +*/ >> static float64 >> normalizeRoundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) >> { >> @@ -567,10 +606,12 @@ static float64 >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the fraction bits of the extended double-precision floating-point >> -| value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the fraction bits of the extended double-precision floating-point >> +value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE uint64_t extractFloatx80Frac( floatx80 a ) >> { >> @@ -579,11 +620,12 @@ INLINE uint64_t extractFloatx80Frac( floatx80 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the exponent bits of the extended double-precision floating-point >> -| value `a'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the exponent bits of the extended double-precision floating-point >> +value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE int32 extractFloatx80Exp( floatx80 a ) >> { >> >> @@ -591,11 +633,12 @@ INLINE int32 extractFloatx80Exp( floatx80 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the sign bit of the extended double-precision floating-point value >> -| `a'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the sign bit of the extended double-precision floating-point value >> +`a'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE flag extractFloatx80Sign( floatx80 a ) >> { >> >> @@ -603,13 +646,14 @@ INLINE flag extractFloatx80Sign( floatx80 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Normalizes the subnormal extended double-precision floating-point value >> -| represented by the denormalized significand `aSig'. The normalized exponent >> -| and significand are stored at the locations pointed to by `zExpPtr' and >> -| `zSigPtr', respectively. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Normalizes the subnormal extended double-precision floating-point value >> +represented by the denormalized significand `aSig'. The normalized exponent >> +and significand are stored at the locations pointed to by `zExpPtr' and >> +`zSigPtr', respectively. >> +------------------------------------------------------------------------------- >> +*/ >> static void >> normalizeFloatx80Subnormal( uint64_t aSig, int32 *zExpPtr, uint64_t *zSigPtr ) >> { >> @@ -621,10 +665,12 @@ static void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into an >> -| extended double-precision floating-point value, returning the result. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Packs the sign `zSign', exponent `zExp', and significand `zSig' into an >> +extended double-precision floating-point value, returning the result. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) >> { >> @@ -636,30 +682,31 @@ INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> -| and extended significand formed by the concatenation of `zSig0' and `zSig1', >> -| and returns the proper extended double-precision floating-point value >> -| corresponding to the abstract input. Ordinarily, the abstract value is >> -| rounded and packed into the extended double-precision format, with the >> -| inexact exception raised if the abstract input cannot be represented >> -| exactly. However, if the abstract value is too large, the overflow and >> -| inexact exceptions are raised and an infinity or maximal finite value is >> -| returned. If the abstract value is too small, the input value is rounded to >> -| a subnormal number, and the underflow and inexact exceptions are raised if >> -| the abstract input cannot be represented exactly as a subnormal extended >> -| double-precision floating-point number. >> -| If `roundingPrecision' is 32 or 64, the result is rounded to the same >> -| number of bits as single or double precision, respectively. Otherwise, the >> -| result is rounded to the full precision of the extended double-precision >> -| format. >> -| The input significand must be normalized or smaller. If the input >> -| significand is not normalized, `zExp' must be 0; in that case, the result >> -| returned is a subnormal number, and it must not require rounding. The >> -| handling of underflow and overflow follows the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> +and extended significand formed by the concatenation of `zSig0' and `zSig1', >> +and returns the proper extended double-precision floating-point value >> +corresponding to the abstract input. Ordinarily, the abstract value is >> +rounded and packed into the extended double-precision format, with the >> +inexact exception raised if the abstract input cannot be represented >> +exactly. However, if the abstract value is too large, the overflow and >> +inexact exceptions are raised and an infinity or maximal finite value is >> +returned. If the abstract value is too small, the input value is rounded to >> +a subnormal number, and the underflow and inexact exceptions are raised if >> +the abstract input cannot be represented exactly as a subnormal extended >> +double-precision floating-point number. >> + If `roundingPrecision' is 32 or 64, the result is rounded to the same >> +number of bits as single or double precision, respectively. Otherwise, the >> +result is rounded to the full precision of the extended double-precision >> +format. >> + The input significand must be normalized or smaller. If the input >> +significand is not normalized, `zExp' must be 0; in that case, the result >> +returned is a subnormal number, and it must not require rounding. The >> +handling of underflow and overflow follows the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> static floatx80 >> roundAndPackFloatx80( >> int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 >> @@ -823,15 +870,16 @@ static floatx80 >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes an abstract floating-point value having sign `zSign', exponent >> -| `zExp', and significand formed by the concatenation of `zSig0' and `zSig1', >> -| and returns the proper extended double-precision floating-point value >> -| corresponding to the abstract input. This routine is just like >> -| `roundAndPackFloatx80' except that the input significand does not have to be >> -| normalized. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes an abstract floating-point value having sign `zSign', exponent >> +`zExp', and significand formed by the concatenation of `zSig0' and `zSig1', >> +and returns the proper extended double-precision floating-point value >> +corresponding to the abstract input. This routine is just like >> +`roundAndPackFloatx80' except that the input significand does not have to be >> +normalized. >> +------------------------------------------------------------------------------- >> +*/ >> static floatx80 >> normalizeRoundAndPackFloatx80( >> int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 >> @@ -852,10 +900,12 @@ static floatx80 >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the least-significant 64 fraction bits of the quadruple-precision >> -| floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the least-significant 64 fraction bits of the quadruple-precision >> +floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE uint64_t extractFloat128Frac1( float128 a ) >> { >> @@ -864,10 +914,12 @@ INLINE uint64_t extractFloat128Frac1( float128 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the most-significant 48 fraction bits of the quadruple-precision >> -| floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the most-significant 48 fraction bits of the quadruple-precision >> +floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> >> INLINE uint64_t extractFloat128Frac0( float128 a ) >> { >> @@ -876,11 +928,12 @@ INLINE uint64_t extractFloat128Frac0( float128 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the exponent bits of the quadruple-precision floating-point value >> -| `a'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the exponent bits of the quadruple-precision floating-point value >> +`a'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE int32 extractFloat128Exp( float128 a ) >> { >> >> @@ -888,10 +941,11 @@ INLINE int32 extractFloat128Exp( float128 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the sign bit of the quadruple-precision floating-point value `a'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the sign bit of the quadruple-precision floating-point value `a'. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE flag extractFloat128Sign( float128 a ) >> { >> >> @@ -899,16 +953,17 @@ INLINE flag extractFloat128Sign( float128 a ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Normalizes the subnormal quadruple-precision floating-point value >> -| represented by the denormalized significand formed by the concatenation of >> -| `aSig0' and `aSig1'. The normalized exponent is stored at the location >> -| pointed to by `zExpPtr'. The most significant 49 bits of the normalized >> -| significand are stored at the location pointed to by `zSig0Ptr', and the >> -| least significant 64 bits of the normalized significand are stored at the >> -| location pointed to by `zSig1Ptr'. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Normalizes the subnormal quadruple-precision floating-point value >> +represented by the denormalized significand formed by the concatenation of >> +`aSig0' and `aSig1'. The normalized exponent is stored at the location >> +pointed to by `zExpPtr'. The most significant 49 bits of the normalized >> +significand are stored at the location pointed to by `zSig0Ptr', and the >> +least significant 64 bits of the normalized significand are stored at the >> +location pointed to by `zSig1Ptr'. >> +------------------------------------------------------------------------------- >> +*/ >> static void >> normalizeFloat128Subnormal( >> uint64_t aSig0, >> @@ -940,19 +995,20 @@ static void >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Packs the sign `zSign', the exponent `zExp', and the significand formed >> -| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision >> -| floating-point value, returning the result. After being shifted into the >> -| proper positions, the three fields `zSign', `zExp', and `zSig0' are simply >> -| added together to form the most significant 32 bits of the result. This >> -| means that any integer portion of `zSig0' will be added into the exponent. >> -| Since a properly normalized significand will have an integer portion equal >> -| to 1, the `zExp' input should be 1 less than the desired result exponent >> -| whenever `zSig0' and `zSig1' concatenated form a complete, normalized >> -| significand. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Packs the sign `zSign', the exponent `zExp', and the significand formed >> +by the concatenation of `zSig0' and `zSig1' into a quadruple-precision >> +floating-point value, returning the result. After being shifted into the >> +proper positions, the three fields `zSign', `zExp', and `zSig0' are simply >> +added together to form the most significant 32 bits of the result. This >> +means that any integer portion of `zSig0' will be added into the exponent. >> +Since a properly normalized significand will have an integer portion equal >> +to 1, the `zExp' input should be 1 less than the desired result exponent >> +whenever `zSig0' and `zSig1' concatenated form a complete, normalized >> +significand. >> +------------------------------------------------------------------------------- >> +*/ >> INLINE float128 >> packFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 ) >> { >> @@ -964,27 +1020,28 @@ INLINE float128 >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> -| and extended significand formed by the concatenation of `zSig0', `zSig1', >> -| and `zSig2', and returns the proper quadruple-precision floating-point value >> -| corresponding to the abstract input. Ordinarily, the abstract value is >> -| simply rounded and packed into the quadruple-precision format, with the >> -| inexact exception raised if the abstract input cannot be represented >> -| exactly. However, if the abstract value is too large, the overflow and >> -| inexact exceptions are raised and an infinity or maximal finite value is >> -| returned. If the abstract value is too small, the input value is rounded to >> -| a subnormal number, and the underflow and inexact exceptions are raised if >> -| the abstract input cannot be represented exactly as a subnormal quadruple- >> -| precision floating-point number. >> -| The input significand must be normalized or smaller. If the input >> -| significand is not normalized, `zExp' must be 0; in that case, the result >> -| returned is a subnormal number, and it must not require rounding. In the >> -| usual case that the input significand is normalized, `zExp' must be 1 less >> -| than the ``true'' floating-point exponent. The handling of underflow and >> -| overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> +and extended significand formed by the concatenation of `zSig0', `zSig1', >> +and `zSig2', and returns the proper quadruple-precision floating-point value >> +corresponding to the abstract input. Ordinarily, the abstract value is >> +simply rounded and packed into the quadruple-precision format, with the >> +inexact exception raised if the abstract input cannot be represented >> +exactly. However, if the abstract value is too large, the overflow and >> +inexact exceptions are raised and an infinity or maximal finite value is >> +returned. If the abstract value is too small, the input value is rounded to >> +a subnormal number, and the underflow and inexact exceptions are raised if >> +the abstract input cannot be represented exactly as a subnormal quadruple- >> +precision floating-point number. >> + The input significand must be normalized or smaller. If the input >> +significand is not normalized, `zExp' must be 0; in that case, the result >> +returned is a subnormal number, and it must not require rounding. In the >> +usual case that the input significand is normalized, `zExp' must be 1 less >> +than the ``true'' floating-point exponent. The handling of underflow and >> +overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> static float128 >> roundAndPackFloat128( >> flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1, uint64_t zSig2 STATUS_PARAM) >> @@ -1079,16 +1136,17 @@ static float128 >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> -| and significand formed by the concatenation of `zSig0' and `zSig1', and >> -| returns the proper quadruple-precision floating-point value corresponding >> -| to the abstract input. This routine is just like `roundAndPackFloat128' >> -| except that the input significand has fewer bits and does not have to be >> -| normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- >> -| point exponent. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Takes an abstract floating-point value having sign `zSign', exponent `zExp', >> +and significand formed by the concatenation of `zSig0' and `zSig1', and >> +returns the proper quadruple-precision floating-point value corresponding >> +to the abstract input. This routine is just like `roundAndPackFloat128' >> +except that the input significand has fewer bits and does not have to be >> +normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- >> +point exponent. >> +------------------------------------------------------------------------------- >> +*/ >> static float128 >> normalizeRoundAndPackFloat128( >> flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 STATUS_PARAM) >> @@ -1115,13 +1173,14 @@ static float128 >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the 32-bit two's complement integer `a' >> -| to the single-precision floating-point format. The conversion is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> -float32 int32_to_float32( int32 a STATUS_PARAM ) >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the 32-bit two's complement integer `a' >> +to the single-precision floating-point format. The conversion is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> +float32 int32_to_float32( int32 a STATUS_PARAM) >> { >> flag zSign; >> >> @@ -1132,13 +1191,14 @@ float32 int32_to_float32( int32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the 32-bit two's complement integer `a' >> -| to the double-precision floating-point format. The conversion is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> -float64 int32_to_float64( int32 a STATUS_PARAM ) >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the 32-bit two's complement integer `a' >> +to the double-precision floating-point format. The conversion is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> +float64 int32_to_float64( int32 a STATUS_PARAM) >> { >> flag zSign; >> uint32 absA; >> @@ -1154,13 +1214,14 @@ float64 int32_to_float64( int32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the 32-bit two's complement integer `a' >> -| to the extended double-precision floating-point format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the 32-bit two's complement integer `a' >> +to the extended double-precision floating-point format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) >> { >> flag zSign; >> @@ -1177,12 +1238,13 @@ floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the 32-bit two's complement integer `a' to >> -| the quadruple-precision floating-point format. The conversion is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the 32-bit two's complement integer `a' to >> +the quadruple-precision floating-point format. The conversion is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 int32_to_float128( int32 a STATUS_PARAM ) >> { >> flag zSign; >> @@ -1199,12 +1261,13 @@ float128 int32_to_float128( int32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the 64-bit two's complement integer `a' >> -| to the single-precision floating-point format. The conversion is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the 64-bit two's complement integer `a' >> +to the single-precision floating-point format. The conversion is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 int64_to_float32( int64 a STATUS_PARAM ) >> { >> flag zSign; >> @@ -1252,12 +1315,13 @@ float32 uint64_to_float32( uint64 a STATUS_PARAM ) >> } >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the 64-bit two's complement integer `a' >> -| to the double-precision floating-point format. The conversion is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the 64-bit two's complement integer `a' >> +to the double-precision floating-point format. The conversion is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 int64_to_float64( int64 a STATUS_PARAM ) >> { >> flag zSign; >> @@ -1285,13 +1349,14 @@ float64 uint64_to_float64(uint64 a STATUS_PARAM) >> return normalizeRoundAndPackFloat64(0, exp, a STATUS_VAR); >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the 64-bit two's complement integer `a' >> -| to the extended double-precision floating-point format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the 64-bit two's complement integer `a' >> +to the extended double-precision floating-point format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) >> { >> flag zSign; >> @@ -1306,12 +1371,13 @@ floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the 64-bit two's complement integer `a' to >> -| the quadruple-precision floating-point format. The conversion is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the 64-bit two's complement integer `a' to >> +the quadruple-precision floating-point format. The conversion is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 int64_to_float128( int64 a STATUS_PARAM ) >> { >> flag zSign; >> @@ -1347,16 +1413,17 @@ float128 uint64_to_float128(uint64 a STATUS_PARAM) >> return normalizeRoundAndPackFloat128(0, 0x406E, a, 0 STATUS_VAR); >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the single-precision floating-point value >> -| `a' to the 32-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic---which means in particular that the conversion is rounded >> -| according to the current rounding mode. If `a' is a NaN, the largest >> -| positive integer is returned. Otherwise, if the conversion overflows, the >> -| largest integer with the same sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the single-precision floating-point value >> +`a' to the 32-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic---which means in particular that the conversion is rounded >> +according to the current rounding mode. If `a' is a NaN, the largest >> +positive integer is returned. Otherwise, if the conversion overflows, the >> +largest integer with the same sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int32 float32_to_int32( float32 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -1378,16 +1445,17 @@ int32 float32_to_int32( float32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the single-precision floating-point value >> -| `a' to the 32-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic, except that the conversion is always rounded toward zero. >> -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> -| the conversion overflows, the largest integer with the same sign as `a' is >> -| returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the single-precision floating-point value >> +`a' to the 32-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic, except that the conversion is always rounded toward zero. >> +If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> +the conversion overflows, the largest integer with the same sign as `a' is >> +returned. >> +------------------------------------------------------------------------------- >> +*/ >> int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -1421,15 +1489,17 @@ int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the single-precision floating-point value >> -| `a' to the 16-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic, except that the conversion is always rounded toward zero. >> -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> -| the conversion overflows, the largest integer with the same sign as `a' is >> -| returned. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the single-precision floating-point value >> +`a' to the 16-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic, except that the conversion is always rounded toward zero. >> +If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> +the conversion overflows, the largest integer with the same sign as `a' is >> +returned. >> +------------------------------------------------------------------------------- >> +*/ >> >> int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) >> { >> @@ -1470,16 +1540,17 @@ int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the single-precision floating-point value >> -| `a' to the 64-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic---which means in particular that the conversion is rounded >> -| according to the current rounding mode. If `a' is a NaN, the largest >> -| positive integer is returned. Otherwise, if the conversion overflows, the >> -| largest integer with the same sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the single-precision floating-point value >> +`a' to the 64-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic---which means in particular that the conversion is rounded >> +according to the current rounding mode. If `a' is a NaN, the largest >> +positive integer is returned. Otherwise, if the conversion overflows, the >> +largest integer with the same sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int64 float32_to_int64( float32 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -1507,16 +1578,17 @@ int64 float32_to_int64( float32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the single-precision floating-point value >> -| `a' to the 64-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic, except that the conversion is always rounded toward zero. If >> -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the >> -| conversion overflows, the largest integer with the same sign as `a' is >> -| returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the single-precision floating-point value >> +`a' to the 64-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic, except that the conversion is always rounded toward zero. If >> +`a' is a NaN, the largest positive integer is returned. Otherwise, if the >> +conversion overflows, the largest integer with the same sign as `a' is >> +returned. >> +------------------------------------------------------------------------------- >> +*/ >> int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -1554,13 +1626,14 @@ int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the single-precision floating-point value >> -| `a' to the double-precision floating-point format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the single-precision floating-point value >> +`a' to the double-precision floating-point format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float32_to_float64( float32 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -1584,13 +1657,14 @@ float64 float32_to_float64( float32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the single-precision floating-point value >> -| `a' to the extended double-precision floating-point format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the single-precision floating-point value >> +`a' to the extended double-precision floating-point format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -1614,13 +1688,14 @@ floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the single-precision floating-point value >> -| `a' to the double-precision floating-point format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the single-precision floating-point value >> +`a' to the double-precision floating-point format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float32_to_float128( float32 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -1644,14 +1719,15 @@ float128 float32_to_float128( float32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Rounds the single-precision floating-point value `a' to an integer, and >> -| returns the result as a single-precision floating-point value. The >> -| operation is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> -float32 float32_round_to_int( float32 a STATUS_PARAM) >> +/* >> +------------------------------------------------------------------------------- >> +Rounds the single-precision floating-point value `a' to an integer, and >> +returns the result as a single-precision floating-point value. The >> +operation is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> +float32 float32_round_to_int( float32 a STATUS_PARAM ) >> { >> flag aSign; >> int_fast16_t aExp; >> @@ -1704,15 +1780,16 @@ float32 float32_round_to_int( float32 a STATUS_PARAM) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of adding the absolute values of the single-precision >> -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated >> -| before being returned. `zSign' is ignored if the result is a NaN. >> -| The addition is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> -static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of adding the absolute values of the single-precision >> +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated >> +before being returned. `zSign' is ignored if the result is a NaN. >> +The addition is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> +static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) >> { >> int_fast16_t aExp, bExp, zExp; >> uint32_t aSig, bSig, zSig; >> @@ -1783,15 +1860,16 @@ static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of subtracting the absolute values of the single- >> -| precision floating-point values `a' and `b'. If `zSign' is 1, the >> -| difference is negated before being returned. `zSign' is ignored if the >> -| result is a NaN. The subtraction is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> -static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of subtracting the absolute values of the single- >> +precision floating-point values `a' and `b'. If `zSign' is 1, the >> +difference is negated before being returned. `zSign' is ignored if the >> +result is a NaN. The subtraction is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> +static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) >> { >> int_fast16_t aExp, bExp, zExp; >> uint32_t aSig, bSig, zSig; >> @@ -1858,12 +1936,13 @@ static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of adding the single-precision floating-point values `a' >> -| and `b'. The operation is performed according to the IEC/IEEE Standard for >> -| Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of adding the single-precision floating-point values `a' >> +and `b'. The operation is performed according to the IEC/IEEE Standard for >> +Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float32_add( float32 a, float32 b STATUS_PARAM ) >> { >> flag aSign, bSign; >> @@ -1881,12 +1960,13 @@ float32 float32_add( float32 a, float32 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of subtracting the single-precision floating-point values >> -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard >> -| for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of subtracting the single-precision floating-point values >> +`a' and `b'. The operation is performed according to the IEC/IEEE Standard >> +for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float32_sub( float32 a, float32 b STATUS_PARAM ) >> { >> flag aSign, bSign; >> @@ -1904,12 +1984,13 @@ float32 float32_sub( float32 a, float32 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of multiplying the single-precision floating-point values >> -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard >> -| for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of multiplying the single-precision floating-point values >> +`a' and `b'. The operation is performed according to the IEC/IEEE Standard >> +for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float32_mul( float32 a, float32 b STATUS_PARAM ) >> { >> flag aSign, bSign, zSign; >> @@ -1967,12 +2048,13 @@ float32 float32_mul( float32 a, float32 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of dividing the single-precision floating-point value `a' >> -| by the corresponding value `b'. The operation is performed according to the >> -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of dividing the single-precision floating-point value `a' >> +by the corresponding value `b'. The operation is performed according to the >> +IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float32_div( float32 a, float32 b STATUS_PARAM ) >> { >> flag aSign, bSign, zSign; >> @@ -2031,12 +2113,13 @@ float32 float32_div( float32 a, float32 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the remainder of the single-precision floating-point value `a' >> -| with respect to the corresponding value `b'. The operation is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the remainder of the single-precision floating-point value `a' >> +with respect to the corresponding value `b'. The operation is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float32_rem( float32 a, float32 b STATUS_PARAM ) >> { >> flag aSign, zSign; >> @@ -2132,16 +2215,18 @@ float32 float32_rem( float32 a, float32 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of multiplying the single-precision floating-point values >> -| `a' and `b' then adding 'c', with no intermediate rounding step after the >> -| multiplication. The operation is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic 754-2008. >> -| The flags argument allows the caller to select negation of the >> -| addend, the intermediate product, or the final result. (The difference >> -| between this and having the caller do a separate negation is that negating >> -| externally will flip the sign bit on NaNs.) >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of multiplying the single-precision floating-point values >> +`a' and `b' then adding 'c', with no intermediate rounding step after the >> +multiplication. The operation is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic 754-2008. >> +The flags argument allows the caller to select negation of the >> +addend, the intermediate product, or the final result. (The difference >> +between this and having the caller do a separate negation is that negating >> +externally will flip the sign bit on NaNs.) >> +------------------------------------------------------------------------------- >> +*/ >> >> float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) >> { >> @@ -2339,12 +2424,13 @@ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) >> } >> >> >> -/*---------------------------------------------------------------------------- >> -| Returns the square root of the single-precision floating-point value `a'. >> -| The operation is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the square root of the single-precision floating-point value `a'. >> +The operation is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float32_sqrt( float32 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -2394,23 +2480,25 @@ float32 float32_sqrt( float32 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the binary exponential of the single-precision floating-point value >> -| `a'. The operation is performed according to the IEC/IEEE Standard for >> -| Binary Floating-Point Arithmetic. >> -| >> -| Uses the following identities: >> -| >> -| 1. ------------------------------------------------------------------------- >> -| x x*ln(2) >> -| 2 = e >> -| >> -| 2. ------------------------------------------------------------------------- >> -| 2 3 4 5 n >> -| x x x x x x x >> -| e = 1 + --- + --- + --- + --- + --- + ... + --- + ... >> -| 1! 2! 3! 4! 5! n! >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the binary exponential of the single-precision floating-point value >> +`a'. The operation is performed according to the IEC/IEEE Standard for >> +Binary Floating-Point Arithmetic. >> + >> +Uses the following identities: >> + >> +1. ------------------------------------------------------------------------- >> + x x*ln(2) >> + 2 = e >> + >> +2. ------------------------------------------------------------------------- >> + 2 3 4 5 n >> + x x x x x x x >> + e = 1 + --- + --- + --- + --- + --- + ... + --- + ... >> + 1! 2! 3! 4! 5! n! >> +------------------------------------------------------------------------------- >> +*/ >> >> static const float64 float32_exp2_coefficients[15] = >> { >> @@ -2474,11 +2562,13 @@ float32 float32_exp2( float32 a STATUS_PARAM ) >> return float64_to_float32(r, status); >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the binary log of the single-precision floating-point value `a'. >> -| The operation is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the binary log of the single-precision floating-point value `a'. >> +The operation is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float32_log2( float32 a STATUS_PARAM ) >> { >> flag aSign, zSign; >> @@ -2522,12 +2612,14 @@ float32 float32_log2( float32 a STATUS_PARAM ) >> return normalizeRoundAndPackFloat32( zSign, 0x85, zSig STATUS_VAR ); >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point value `a' is equal to >> -| the corresponding value `b', and 0 otherwise. The invalid exception is >> -| raised if either operand is a NaN. Otherwise, the comparison is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point value `a' is equal to >> +the corresponding value `b', and 0 otherwise. The invalid exception is >> +raised if either operand is a NaN. Otherwise, the comparison is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_eq( float32 a, float32 b STATUS_PARAM ) >> { >> @@ -2546,12 +2638,14 @@ int float32_eq( float32 a, float32 b STATUS_PARAM ) >> return ( av == bv ) || ( (uint32_t) ( ( av | bv )<<1 ) == 0 ); >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point value `a' is less than >> -| or equal to the corresponding value `b', and 0 otherwise. The invalid >> -| exception is raised if either operand is a NaN. The comparison is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point value `a' is less than >> +or equal to the corresponding value `b', and 0 otherwise. The invalid >> +exception is raised if either operand is a NaN. The comparison is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_le( float32 a, float32 b STATUS_PARAM ) >> { >> @@ -2575,12 +2669,14 @@ int float32_le( float32 a, float32 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point value `a' is less than >> -| the corresponding value `b', and 0 otherwise. The invalid exception is >> -| raised if either operand is a NaN. The comparison is performed according >> -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point value `a' is less than >> +the corresponding value `b', and 0 otherwise. The invalid exception is >> +raised if either operand is a NaN. The comparison is performed according >> +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_lt( float32 a, float32 b STATUS_PARAM ) >> { >> @@ -2604,12 +2700,14 @@ int float32_lt( float32 a, float32 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point values `a' and `b' cannot >> -| be compared, and 0 otherwise. The invalid exception is raised if either >> -| operand is a NaN. The comparison is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point values `a' and `b' cannot >> +be compared, and 0 otherwise. The invalid exception is raised if either >> +operand is a NaN. The comparison is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_unordered( float32 a, float32 b STATUS_PARAM ) >> { >> @@ -2625,12 +2723,14 @@ int float32_unordered( float32 a, float32 b STATUS_PARAM ) >> return 0; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point value `a' is equal to >> -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> -| exception. The comparison is performed according to the IEC/IEEE Standard >> -| for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point value `a' is equal to >> +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> +exception. The comparison is performed according to the IEC/IEEE Standard >> +for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) >> { >> @@ -2649,12 +2749,14 @@ int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) >> ( (uint32_t) ( ( float32_val(a) | float32_val(b) )<<1 ) == 0 ); >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point value `a' is less than or >> -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not >> -| cause an exception. Otherwise, the comparison is performed according to the >> -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point value `a' is less than or >> +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not >> +cause an exception. Otherwise, the comparison is performed according to the >> +IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) >> { >> @@ -2680,12 +2782,14 @@ int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point value `a' is less than >> -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> -| exception. Otherwise, the comparison is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point value `a' is less than >> +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> +exception. Otherwise, the comparison is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) >> { >> @@ -2711,12 +2815,14 @@ int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the single-precision floating-point values `a' and `b' cannot >> -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The >> -| comparison is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the single-precision floating-point values `a' and `b' cannot >> +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The >> +comparison is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) >> { >> @@ -2734,16 +2840,17 @@ int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) >> return 0; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the double-precision floating-point value >> -| `a' to the 32-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic---which means in particular that the conversion is rounded >> -| according to the current rounding mode. If `a' is a NaN, the largest >> -| positive integer is returned. Otherwise, if the conversion overflows, the >> -| largest integer with the same sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the double-precision floating-point value >> +`a' to the 32-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic---which means in particular that the conversion is rounded >> +according to the current rounding mode. If `a' is a NaN, the largest >> +positive integer is returned. Otherwise, if the conversion overflows, the >> +largest integer with the same sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int32 float64_to_int32( float64 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -2762,16 +2869,17 @@ int32 float64_to_int32( float64 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the double-precision floating-point value >> -| `a' to the 32-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic, except that the conversion is always rounded toward zero. >> -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> -| the conversion overflows, the largest integer with the same sign as `a' is >> -| returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the double-precision floating-point value >> +`a' to the 32-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic, except that the conversion is always rounded toward zero. >> +If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> +the conversion overflows, the largest integer with the same sign as `a' is >> +returned. >> +------------------------------------------------------------------------------- >> +*/ >> int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -2809,15 +2917,17 @@ int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the double-precision floating-point value >> -| `a' to the 16-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic, except that the conversion is always rounded toward zero. >> -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> -| the conversion overflows, the largest integer with the same sign as `a' is >> -| returned. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the double-precision floating-point value >> +`a' to the 16-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic, except that the conversion is always rounded toward zero. >> +If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> +the conversion overflows, the largest integer with the same sign as `a' is >> +returned. >> +------------------------------------------------------------------------------- >> +*/ >> >> int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) >> { >> @@ -2860,16 +2970,17 @@ int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) >> return z; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the double-precision floating-point value >> -| `a' to the 64-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic---which means in particular that the conversion is rounded >> -| according to the current rounding mode. If `a' is a NaN, the largest >> -| positive integer is returned. Otherwise, if the conversion overflows, the >> -| largest integer with the same sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the double-precision floating-point value >> +`a' to the 64-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic---which means in particular that the conversion is rounded >> +according to the current rounding mode. If `a' is a NaN, the largest >> +positive integer is returned. Otherwise, if the conversion overflows, the >> +largest integer with the same sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int64 float64_to_int64( float64 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -2903,16 +3014,17 @@ int64 float64_to_int64( float64 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the double-precision floating-point value >> -| `a' to the 64-bit two's complement integer format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic, except that the conversion is always rounded toward zero. >> -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> -| the conversion overflows, the largest integer with the same sign as `a' is >> -| returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the double-precision floating-point value >> +`a' to the 64-bit two's complement integer format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic, except that the conversion is always rounded toward zero. >> +If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> +the conversion overflows, the largest integer with the same sign as `a' is >> +returned. >> +------------------------------------------------------------------------------- >> +*/ >> int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -2956,13 +3068,14 @@ int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the double-precision floating-point value >> -| `a' to the single-precision floating-point format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the double-precision floating-point value >> +`a' to the single-precision floating-point format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float64_to_float32( float64 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -2989,16 +3102,18 @@ float32 float64_to_float32( float64 a STATUS_PARAM ) >> } >> >> >> -/*---------------------------------------------------------------------------- >> -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a >> -| half-precision floating-point value, returning the result. After being >> -| shifted into the proper positions, the three fields are simply added >> -| together to form the result. This means that any integer portion of `zSig' >> -| will be added into the exponent. Since a properly normalized significand >> -| will have an integer portion equal to 1, the `zExp' input should be 1 less >> -| than the desired result exponent whenever `zSig' is a complete, normalized >> -| significand. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a >> +half-precision floating-point value, returning the result. After being >> +shifted into the proper positions, the three fields are simply added >> +together to form the result. This means that any integer portion of `zSig' >> +will be added into the exponent. Since a properly normalized significand >> +will have an integer portion equal to 1, the `zExp' input should be 1 less >> +than the desired result exponent whenever `zSig' is a complete, normalized >> +significand. >> +------------------------------------------------------------------------------- >> +*/ >> static float16 packFloat16(flag zSign, int_fast16_t zExp, uint16_t zSig) >> { >> return make_float16( >> @@ -3132,13 +3247,14 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM) >> return packFloat16(aSign, aExp + 14, aSig >> 13); >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the double-precision floating-point value >> -| `a' to the extended double-precision floating-point format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the double-precision floating-point value >> +`a' to the extended double-precision floating-point format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -3163,13 +3279,14 @@ floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the double-precision floating-point value >> -| `a' to the quadruple-precision floating-point format. The conversion is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the double-precision floating-point value >> +`a' to the quadruple-precision floating-point format. The conversion is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float64_to_float128( float64 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -3194,13 +3311,14 @@ float128 float64_to_float128( float64 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Rounds the double-precision floating-point value `a' to an integer, and >> -| returns the result as a double-precision floating-point value. The >> -| operation is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Rounds the double-precision floating-point value `a' to an integer, and >> +returns the result as a double-precision floating-point value. The >> +operation is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float64_round_to_int( float64 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -3267,14 +3385,15 @@ float64 float64_trunc_to_int( float64 a STATUS_PARAM) >> return res; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of adding the absolute values of the double-precision >> -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated >> -| before being returned. `zSign' is ignored if the result is a NaN. >> -| The addition is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of adding the absolute values of the double-precision >> +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated >> +before being returned. `zSign' is ignored if the result is a NaN. >> +The addition is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) >> { >> int_fast16_t aExp, bExp, zExp; >> @@ -3346,14 +3465,15 @@ static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of subtracting the absolute values of the double- >> -| precision floating-point values `a' and `b'. If `zSign' is 1, the >> -| difference is negated before being returned. `zSign' is ignored if the >> -| result is a NaN. The subtraction is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of subtracting the absolute values of the double- >> +precision floating-point values `a' and `b'. If `zSign' is 1, the >> +difference is negated before being returned. `zSign' is ignored if the >> +result is a NaN. The subtraction is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) >> { >> int_fast16_t aExp, bExp, zExp; >> @@ -3421,12 +3541,13 @@ static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of adding the double-precision floating-point values `a' >> -| and `b'. The operation is performed according to the IEC/IEEE Standard for >> -| Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of adding the double-precision floating-point values `a' >> +and `b'. The operation is performed according to the IEC/IEEE Standard for >> +Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float64_add( float64 a, float64 b STATUS_PARAM ) >> { >> flag aSign, bSign; >> @@ -3444,12 +3565,13 @@ float64 float64_add( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of subtracting the double-precision floating-point values >> -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard >> -| for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of subtracting the double-precision floating-point values >> +`a' and `b'. The operation is performed according to the IEC/IEEE Standard >> +for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float64_sub( float64 a, float64 b STATUS_PARAM ) >> { >> flag aSign, bSign; >> @@ -3467,12 +3589,13 @@ float64 float64_sub( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of multiplying the double-precision floating-point values >> -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard >> -| for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of multiplying the double-precision floating-point values >> +`a' and `b'. The operation is performed according to the IEC/IEEE Standard >> +for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float64_mul( float64 a, float64 b STATUS_PARAM ) >> { >> flag aSign, bSign, zSign; >> @@ -3528,12 +3651,13 @@ float64 float64_mul( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of dividing the double-precision floating-point value `a' >> -| by the corresponding value `b'. The operation is performed according to >> -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of dividing the double-precision floating-point value `a' >> +by the corresponding value `b'. The operation is performed according to >> +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float64_div( float64 a, float64 b STATUS_PARAM ) >> { >> flag aSign, bSign, zSign; >> @@ -3600,12 +3724,13 @@ float64 float64_div( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the remainder of the double-precision floating-point value `a' >> -| with respect to the corresponding value `b'. The operation is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the remainder of the double-precision floating-point value `a' >> +with respect to the corresponding value `b'. The operation is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float64_rem( float64 a, float64 b STATUS_PARAM ) >> { >> flag aSign, zSign; >> @@ -3686,16 +3811,18 @@ float64 float64_rem( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of multiplying the double-precision floating-point values >> -| `a' and `b' then adding 'c', with no intermediate rounding step after the >> -| multiplication. The operation is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic 754-2008. >> -| The flags argument allows the caller to select negation of the >> -| addend, the intermediate product, or the final result. (The difference >> -| between this and having the caller do a separate negation is that negating >> -| externally will flip the sign bit on NaNs.) >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of multiplying the double-precision floating-point values >> +`a' and `b' then adding 'c', with no intermediate rounding step after the >> +multiplication. The operation is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic 754-2008. >> +The flags argument allows the caller to select negation of the >> +addend, the intermediate product, or the final result. (The difference >> +between this and having the caller do a separate negation is that negating >> +externally will flip the sign bit on NaNs.) >> +------------------------------------------------------------------------------- >> +*/ >> >> float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) >> { >> @@ -3912,12 +4039,13 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) >> } >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the square root of the double-precision floating-point value `a'. >> -| The operation is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the square root of the double-precision floating-point value `a'. >> +The operation is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float64_sqrt( float64 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -3964,11 +4092,13 @@ float64 float64_sqrt( float64 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the binary log of the double-precision floating-point value `a'. >> -| The operation is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns the binary log of the double-precision floating-point value `a'. >> +The operation is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float64_log2( float64 a STATUS_PARAM ) >> { >> flag aSign, zSign; >> @@ -4011,12 +4141,14 @@ float64 float64_log2( float64 a STATUS_PARAM ) >> return normalizeRoundAndPackFloat64( zSign, 0x408, zSig STATUS_VAR ); >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point value `a' is equal to the >> -| corresponding value `b', and 0 otherwise. The invalid exception is raised >> -| if either operand is a NaN. Otherwise, the comparison is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point value `a' is equal to the >> +corresponding value `b', and 0 otherwise. The invalid exception is raised >> +if either operand is a NaN. Otherwise, the comparison is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_eq( float64 a, float64 b STATUS_PARAM ) >> { >> @@ -4036,12 +4168,14 @@ int float64_eq( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point value `a' is less than or >> -| equal to the corresponding value `b', and 0 otherwise. The invalid >> -| exception is raised if either operand is a NaN. The comparison is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point value `a' is less than or >> +equal to the corresponding value `b', and 0 otherwise. The invalid >> +exception is raised if either operand is a NaN. The comparison is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_le( float64 a, float64 b STATUS_PARAM ) >> { >> @@ -4065,12 +4199,14 @@ int float64_le( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point value `a' is less than >> -| the corresponding value `b', and 0 otherwise. The invalid exception is >> -| raised if either operand is a NaN. The comparison is performed according >> -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point value `a' is less than >> +the corresponding value `b', and 0 otherwise. The invalid exception is >> +raised if either operand is a NaN. The comparison is performed according >> +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_lt( float64 a, float64 b STATUS_PARAM ) >> { >> @@ -4094,12 +4230,14 @@ int float64_lt( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point values `a' and `b' cannot >> -| be compared, and 0 otherwise. The invalid exception is raised if either >> -| operand is a NaN. The comparison is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point values `a' and `b' cannot >> +be compared, and 0 otherwise. The invalid exception is raised if either >> +operand is a NaN. The comparison is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_unordered( float64 a, float64 b STATUS_PARAM ) >> { >> @@ -4115,12 +4253,14 @@ int float64_unordered( float64 a, float64 b STATUS_PARAM ) >> return 0; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point value `a' is equal to the >> -| corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> -| exception.The comparison is performed according to the IEC/IEEE Standard >> -| for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point value `a' is equal to the >> +corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> +exception.The comparison is performed according to the IEC/IEEE Standard >> +for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) >> { >> @@ -4142,12 +4282,14 @@ int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point value `a' is less than or >> -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not >> -| cause an exception. Otherwise, the comparison is performed according to the >> -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point value `a' is less than or >> +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not >> +cause an exception. Otherwise, the comparison is performed according to the >> +IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) >> { >> @@ -4173,12 +4315,14 @@ int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point value `a' is less than >> -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> -| exception. Otherwise, the comparison is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point value `a' is less than >> +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> +exception. Otherwise, the comparison is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) >> { >> @@ -4204,12 +4348,14 @@ int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the double-precision floating-point values `a' and `b' cannot >> -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The >> -| comparison is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the double-precision floating-point values `a' and `b' cannot >> +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The >> +comparison is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) >> { >> @@ -4227,16 +4373,17 @@ int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) >> return 0; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the extended double-precision floating- >> -| point value `a' to the 32-bit two's complement integer format. The >> -| conversion is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic---which means in particular that the conversion >> -| is rounded according to the current rounding mode. If `a' is a NaN, the >> -| largest positive integer is returned. Otherwise, if the conversion >> -| overflows, the largest integer with the same sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the extended double-precision floating- >> +point value `a' to the 32-bit two's complement integer format. The >> +conversion is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic---which means in particular that the conversion >> +is rounded according to the current rounding mode. If `a' is a NaN, the >> +largest positive integer is returned. Otherwise, if the conversion >> +overflows, the largest integer with the same sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -4254,16 +4401,17 @@ int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the extended double-precision floating- >> -| point value `a' to the 32-bit two's complement integer format. The >> -| conversion is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic, except that the conversion is always rounded >> -| toward zero. If `a' is a NaN, the largest positive integer is returned. >> -| Otherwise, if the conversion overflows, the largest integer with the same >> -| sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the extended double-precision floating- >> +point value `a' to the 32-bit two's complement integer format. The >> +conversion is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic, except that the conversion is always rounded >> +toward zero. If `a' is a NaN, the largest positive integer is returned. >> +Otherwise, if the conversion overflows, the largest integer with the same >> +sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -4299,16 +4447,17 @@ int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the extended double-precision floating- >> -| point value `a' to the 64-bit two's complement integer format. The >> -| conversion is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic---which means in particular that the conversion >> -| is rounded according to the current rounding mode. If `a' is a NaN, >> -| the largest positive integer is returned. Otherwise, if the conversion >> -| overflows, the largest integer with the same sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the extended double-precision floating- >> +point value `a' to the 64-bit two's complement integer format. The >> +conversion is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic---which means in particular that the conversion >> +is rounded according to the current rounding mode. If `a' is a NaN, >> +the largest positive integer is returned. Otherwise, if the conversion >> +overflows, the largest integer with the same sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -4339,16 +4488,17 @@ int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the extended double-precision floating- >> -| point value `a' to the 64-bit two's complement integer format. The >> -| conversion is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic, except that the conversion is always rounded >> -| toward zero. If `a' is a NaN, the largest positive integer is returned. >> -| Otherwise, if the conversion overflows, the largest integer with the same >> -| sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the extended double-precision floating- >> +point value `a' to the 64-bit two's complement integer format. The >> +conversion is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic, except that the conversion is always rounded >> +toward zero. If `a' is a NaN, the largest positive integer is returned. >> +Otherwise, if the conversion overflows, the largest integer with the same >> +sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -4383,13 +4533,14 @@ int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the extended double-precision floating- >> -| point value `a' to the single-precision floating-point format. The >> -| conversion is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the extended double-precision floating- >> +point value `a' to the single-precision floating-point format. The >> +conversion is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -4411,13 +4562,14 @@ float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the extended double-precision floating- >> -| point value `a' to the double-precision floating-point format. The >> -| conversion is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the extended double-precision floating- >> +point value `a' to the double-precision floating-point format. The >> +conversion is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -4439,13 +4591,14 @@ float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the extended double-precision floating- >> -| point value `a' to the quadruple-precision floating-point format. The >> -| conversion is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the extended double-precision floating- >> +point value `a' to the quadruple-precision floating-point format. The >> +conversion is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -4463,13 +4616,14 @@ float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Rounds the extended double-precision floating-point value `a' to an integer, >> -| and returns the result as an extended quadruple-precision floating-point >> -| value. The operation is performed according to the IEC/IEEE Standard for >> -| Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Rounds the extended double-precision floating-point value `a' to an integer, >> +and returns the result as an extended quadruple-precision floating-point >> +value. The operation is performed according to the IEC/IEEE Standard for >> +Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -4536,14 +4690,15 @@ floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of adding the absolute values of the extended double- >> -| precision floating-point values `a' and `b'. If `zSign' is 1, the sum is >> -| negated before being returned. `zSign' is ignored if the result is a NaN. >> -| The addition is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of adding the absolute values of the extended double- >> +precision floating-point values `a' and `b'. If `zSign' is 1, the sum is >> +negated before being returned. `zSign' is ignored if the result is a NaN. >> +The addition is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM) >> { >> int32 aExp, bExp, zExp; >> @@ -4602,14 +4757,15 @@ static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of subtracting the absolute values of the extended >> -| double-precision floating-point values `a' and `b'. If `zSign' is 1, the >> -| difference is negated before being returned. `zSign' is ignored if the >> -| result is a NaN. The subtraction is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of subtracting the absolute values of the extended >> +double-precision floating-point values `a' and `b'. If `zSign' is 1, the >> +difference is negated before being returned. `zSign' is ignored if the >> +result is a NaN. The subtraction is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM ) >> { >> int32 aExp, bExp, zExp; >> @@ -4670,12 +4826,13 @@ static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of adding the extended double-precision floating-point >> -| values `a' and `b'. The operation is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of adding the extended double-precision floating-point >> +values `a' and `b'. The operation is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> flag aSign, bSign; >> @@ -4691,12 +4848,13 @@ floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of subtracting the extended double-precision floating- >> -| point values `a' and `b'. The operation is performed according to the >> -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of subtracting the extended double-precision floating- >> +point values `a' and `b'. The operation is performed according to the >> +IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> flag aSign, bSign; >> @@ -4712,12 +4870,13 @@ floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of multiplying the extended double-precision floating- >> -| point values `a' and `b'. The operation is performed according to the >> -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of multiplying the extended double-precision floating- >> +point values `a' and `b'. The operation is performed according to the >> +IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> flag aSign, bSign, zSign; >> @@ -4771,12 +4930,13 @@ floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of dividing the extended double-precision floating-point >> -| value `a' by the corresponding value `b'. The operation is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of dividing the extended double-precision floating-point >> +value `a' by the corresponding value `b'. The operation is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> flag aSign, bSign, zSign; >> @@ -4851,12 +5011,13 @@ floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the remainder of the extended double-precision floating-point value >> -| `a' with respect to the corresponding value `b'. The operation is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the remainder of the extended double-precision floating-point value >> +`a' with respect to the corresponding value `b'. The operation is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> flag aSign, zSign; >> @@ -4947,12 +5108,13 @@ floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the square root of the extended double-precision floating-point >> -| value `a'. The operation is performed according to the IEC/IEEE Standard >> -| for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the square root of the extended double-precision floating-point >> +value `a'. The operation is performed according to the IEC/IEEE Standard >> +for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -5017,12 +5179,14 @@ floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point value `a' is equal >> -| to the corresponding value `b', and 0 otherwise. The invalid exception is >> -| raised if either operand is a NaN. Otherwise, the comparison is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point value `a' is equal >> +to the corresponding value `b', and 0 otherwise. The invalid exception is >> +raised if either operand is a NaN. Otherwise, the comparison is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> @@ -5044,13 +5208,15 @@ int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point value `a' is >> -| less than or equal to the corresponding value `b', and 0 otherwise. The >> -| invalid exception is raised if either operand is a NaN. The comparison is >> -| performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point value `a' is >> +less than or equal to the corresponding value `b', and 0 otherwise. The >> +invalid exception is raised if either operand is a NaN. The comparison is >> +performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> @@ -5078,12 +5244,14 @@ int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point value `a' is >> -| less than the corresponding value `b', and 0 otherwise. The invalid >> -| exception is raised if either operand is a NaN. The comparison is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point value `a' is >> +less than the corresponding value `b', and 0 otherwise. The invalid >> +exception is raised if either operand is a NaN. The comparison is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> @@ -5111,12 +5279,14 @@ int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point values `a' and `b' >> -| cannot be compared, and 0 otherwise. The invalid exception is raised if >> -| either operand is a NaN. The comparison is performed according to the >> -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point values `a' and `b' >> +cannot be compared, and 0 otherwise. The invalid exception is raised if >> +either operand is a NaN. The comparison is performed according to the >> +IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) >> @@ -5130,12 +5300,14 @@ int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) >> return 0; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point value `a' is >> -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not >> -| cause an exception. The comparison is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point value `a' is >> +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not >> +cause an exception. The comparison is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> @@ -5160,12 +5332,14 @@ int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point value `a' is less >> -| than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs >> -| do not cause an exception. Otherwise, the comparison is performed according >> -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point value `a' is less >> +than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs >> +do not cause an exception. Otherwise, the comparison is performed according >> +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> @@ -5196,12 +5370,14 @@ int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point value `a' is less >> -| than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause >> -| an exception. Otherwise, the comparison is performed according to the >> -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point value `a' is less >> +than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause >> +an exception. Otherwise, the comparison is performed according to the >> +IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> @@ -5232,12 +5408,14 @@ int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the extended double-precision floating-point values `a' and `b' >> -| cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. >> -| The comparison is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the extended double-precision floating-point values `a' and `b' >> +cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. >> +The comparison is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) >> { >> if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) >> @@ -5254,16 +5432,17 @@ int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) >> return 0; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the quadruple-precision floating-point >> -| value `a' to the 32-bit two's complement integer format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic---which means in particular that the conversion is rounded >> -| according to the current rounding mode. If `a' is a NaN, the largest >> -| positive integer is returned. Otherwise, if the conversion overflows, the >> -| largest integer with the same sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the quadruple-precision floating-point >> +value `a' to the 32-bit two's complement integer format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic---which means in particular that the conversion is rounded >> +according to the current rounding mode. If `a' is a NaN, the largest >> +positive integer is returned. Otherwise, if the conversion overflows, the >> +largest integer with the same sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int32 float128_to_int32( float128 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -5283,16 +5462,17 @@ int32 float128_to_int32( float128 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the quadruple-precision floating-point >> -| value `a' to the 32-bit two's complement integer format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic, except that the conversion is always rounded toward zero. If >> -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the >> -| conversion overflows, the largest integer with the same sign as `a' is >> -| returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the quadruple-precision floating-point >> +value `a' to the 32-bit two's complement integer format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic, except that the conversion is always rounded toward zero. If >> +`a' is a NaN, the largest positive integer is returned. Otherwise, if the >> +conversion overflows, the largest integer with the same sign as `a' is >> +returned. >> +------------------------------------------------------------------------------- >> +*/ >> int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -5331,16 +5511,17 @@ int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the quadruple-precision floating-point >> -| value `a' to the 64-bit two's complement integer format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic---which means in particular that the conversion is rounded >> -| according to the current rounding mode. If `a' is a NaN, the largest >> -| positive integer is returned. Otherwise, if the conversion overflows, the >> -| largest integer with the same sign as `a' is returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the quadruple-precision floating-point >> +value `a' to the 64-bit two's complement integer format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic---which means in particular that the conversion is rounded >> +according to the current rounding mode. If `a' is a NaN, the largest >> +positive integer is returned. Otherwise, if the conversion overflows, the >> +largest integer with the same sign as `a' is returned. >> +------------------------------------------------------------------------------- >> +*/ >> int64 float128_to_int64( float128 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -5374,16 +5555,17 @@ int64 float128_to_int64( float128 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the quadruple-precision floating-point >> -| value `a' to the 64-bit two's complement integer format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic, except that the conversion is always rounded toward zero. >> -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> -| the conversion overflows, the largest integer with the same sign as `a' is >> -| returned. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the quadruple-precision floating-point >> +value `a' to the 64-bit two's complement integer format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic, except that the conversion is always rounded toward zero. >> +If `a' is a NaN, the largest positive integer is returned. Otherwise, if >> +the conversion overflows, the largest integer with the same sign as `a' is >> +returned. >> +------------------------------------------------------------------------------- >> +*/ >> int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -5435,13 +5617,14 @@ int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the quadruple-precision floating-point >> -| value `a' to the single-precision floating-point format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the quadruple-precision floating-point >> +value `a' to the single-precision floating-point format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float128_to_float32( float128 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -5470,13 +5653,14 @@ float32 float128_to_float32( float128 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the quadruple-precision floating-point >> -| value `a' to the double-precision floating-point format. The conversion >> -| is performed according to the IEC/IEEE Standard for Binary Floating-Point >> -| Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the quadruple-precision floating-point >> +value `a' to the double-precision floating-point format. The conversion >> +is performed according to the IEC/IEEE Standard for Binary Floating-Point >> +Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float128_to_float64( float128 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -5503,13 +5687,14 @@ float64 float128_to_float64( float128 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of converting the quadruple-precision floating-point >> -| value `a' to the extended double-precision floating-point format. The >> -| conversion is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of converting the quadruple-precision floating-point >> +value `a' to the extended double-precision floating-point format. The >> +conversion is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -5538,13 +5723,14 @@ floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Rounds the quadruple-precision floating-point value `a' to an integer, and >> -| returns the result as a quadruple-precision floating-point value. The >> -| operation is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Rounds the quadruple-precision floating-point value `a' to an integer, and >> +returns the result as a quadruple-precision floating-point value. The >> +operation is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float128_round_to_int( float128 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -5641,14 +5827,15 @@ float128 float128_round_to_int( float128 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of adding the absolute values of the quadruple-precision >> -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated >> -| before being returned. `zSign' is ignored if the result is a NaN. >> -| The addition is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of adding the absolute values of the quadruple-precision >> +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated >> +before being returned. `zSign' is ignored if the result is a NaN. >> +The addition is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) >> { >> int32 aExp, bExp, zExp; >> @@ -5727,14 +5914,15 @@ static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of subtracting the absolute values of the quadruple- >> -| precision floating-point values `a' and `b'. If `zSign' is 1, the >> -| difference is negated before being returned. `zSign' is ignored if the >> -| result is a NaN. The subtraction is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of subtracting the absolute values of the quadruple- >> +precision floating-point values `a' and `b'. If `zSign' is 1, the >> +difference is negated before being returned. `zSign' is ignored if the >> +result is a NaN. The subtraction is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) >> { >> int32 aExp, bExp, zExp; >> @@ -5811,12 +5999,13 @@ static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of adding the quadruple-precision floating-point values >> -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard >> -| for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of adding the quadruple-precision floating-point values >> +`a' and `b'. The operation is performed according to the IEC/IEEE Standard >> +for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float128_add( float128 a, float128 b STATUS_PARAM ) >> { >> flag aSign, bSign; >> @@ -5832,12 +6021,13 @@ float128 float128_add( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of subtracting the quadruple-precision floating-point >> -| values `a' and `b'. The operation is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of subtracting the quadruple-precision floating-point >> +values `a' and `b'. The operation is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float128_sub( float128 a, float128 b STATUS_PARAM ) >> { >> flag aSign, bSign; >> @@ -5853,12 +6043,13 @@ float128 float128_sub( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of multiplying the quadruple-precision floating-point >> -| values `a' and `b'. The operation is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of multiplying the quadruple-precision floating-point >> +values `a' and `b'. The operation is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float128_mul( float128 a, float128 b STATUS_PARAM ) >> { >> flag aSign, bSign, zSign; >> @@ -5917,12 +6108,13 @@ float128 float128_mul( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the result of dividing the quadruple-precision floating-point value >> -| `a' by the corresponding value `b'. The operation is performed according to >> -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the result of dividing the quadruple-precision floating-point value >> +`a' by the corresponding value `b'. The operation is performed according to >> +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float128_div( float128 a, float128 b STATUS_PARAM ) >> { >> flag aSign, bSign, zSign; >> @@ -6001,12 +6193,13 @@ float128 float128_div( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the remainder of the quadruple-precision floating-point value `a' >> -| with respect to the corresponding value `b'. The operation is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the remainder of the quadruple-precision floating-point value `a' >> +with respect to the corresponding value `b'. The operation is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float128_rem( float128 a, float128 b STATUS_PARAM ) >> { >> flag aSign, zSign; >> @@ -6110,12 +6303,13 @@ float128 float128_rem( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns the square root of the quadruple-precision floating-point value `a'. >> -| The operation is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> - >> +/* >> +------------------------------------------------------------------------------- >> +Returns the square root of the quadruple-precision floating-point value `a'. >> +The operation is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float128_sqrt( float128 a STATUS_PARAM ) >> { >> flag aSign; >> @@ -6179,12 +6373,14 @@ float128 float128_sqrt( float128 a STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point value `a' is equal to >> -| the corresponding value `b', and 0 otherwise. The invalid exception is >> -| raised if either operand is a NaN. Otherwise, the comparison is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point value `a' is equal to >> +the corresponding value `b', and 0 otherwise. The invalid exception is >> +raised if either operand is a NaN. Otherwise, the comparison is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_eq( float128 a, float128 b STATUS_PARAM ) >> { >> @@ -6206,12 +6402,14 @@ int float128_eq( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point value `a' is less than >> -| or equal to the corresponding value `b', and 0 otherwise. The invalid >> -| exception is raised if either operand is a NaN. The comparison is performed >> -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point value `a' is less than >> +or equal to the corresponding value `b', and 0 otherwise. The invalid >> +exception is raised if either operand is a NaN. The comparison is performed >> +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_le( float128 a, float128 b STATUS_PARAM ) >> { >> @@ -6239,12 +6437,14 @@ int float128_le( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point value `a' is less than >> -| the corresponding value `b', and 0 otherwise. The invalid exception is >> -| raised if either operand is a NaN. The comparison is performed according >> -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point value `a' is less than >> +the corresponding value `b', and 0 otherwise. The invalid exception is >> +raised if either operand is a NaN. The comparison is performed according >> +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_lt( float128 a, float128 b STATUS_PARAM ) >> { >> @@ -6272,12 +6472,14 @@ int float128_lt( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot >> -| be compared, and 0 otherwise. The invalid exception is raised if either >> -| operand is a NaN. The comparison is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot >> +be compared, and 0 otherwise. The invalid exception is raised if either >> +operand is a NaN. The comparison is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_unordered( float128 a, float128 b STATUS_PARAM ) >> { >> @@ -6292,12 +6494,14 @@ int float128_unordered( float128 a, float128 b STATUS_PARAM ) >> return 0; >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point value `a' is equal to >> -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> -| exception. The comparison is performed according to the IEC/IEEE Standard >> -| for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point value `a' is equal to >> +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> +exception. The comparison is performed according to the IEC/IEEE Standard >> +for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) >> { >> @@ -6322,12 +6526,14 @@ int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point value `a' is less than >> -| or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not >> -| cause an exception. Otherwise, the comparison is performed according to the >> -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point value `a' is less than >> +or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not >> +cause an exception. Otherwise, the comparison is performed according to the >> +IEC/IEEE Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) >> { >> @@ -6358,12 +6564,14 @@ int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point value `a' is less than >> -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> -| exception. Otherwise, the comparison is performed according to the IEC/IEEE >> -| Standard for Binary Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point value `a' is less than >> +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an >> +exception. Otherwise, the comparison is performed according to the IEC/IEEE >> +Standard for Binary Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) >> { >> @@ -6394,12 +6602,14 @@ int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) >> >> } >> >> -/*---------------------------------------------------------------------------- >> -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot >> -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The >> -| comparison is performed according to the IEC/IEEE Standard for Binary >> -| Floating-Point Arithmetic. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot >> +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The >> +comparison is performed according to the IEC/IEEE Standard for Binary >> +Floating-Point Arithmetic. >> +------------------------------------------------------------------------------- >> +*/ >> >> int float128_unordered_quiet( float128 a, float128 b STATUS_PARAM ) >> { >> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h >> index f3927e2..b646621 100644 >> --- a/include/fpu/softfloat.h >> +++ b/include/fpu/softfloat.h >> @@ -4,10 +4,11 @@ >> * Derived from SoftFloat. >> */ >> >> -/*============================================================================ >> +/* >> +============================================================================ >> >> -This C header file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic >> -Package, Release 2b. >> +This C header file is part of the SoftFloat IEC/IEEE Floating-point >> +Arithmetic Package, Release 2a. >> >> Written by John R. Hauser. This work was made possible in part by the >> International Computer Science Institute, located at Suite 600, 1947 Center >> @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version >> of this code was written as part of a project to build a fixed-point vector >> processor in collaboration with the University of California at Berkeley, >> overseen by Profs. Nelson Morgan and John Wawrzynek. More information >> -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ >> +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ >> arithmetic/SoftFloat.html'. >> >> -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has >> -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES >> -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS >> -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, >> -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE >> -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE >> -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR >> -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. >> +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort >> +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT >> +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO >> +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY >> +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. >> >> Derivative works are acceptable, even for commercial purposes, so long as >> -(1) the source code for the derivative work includes prominent notice that >> -the work is derivative, and (2) the source code includes prominent notice with >> -these four paragraphs for those parts of this code that are retained. >> +(1) they include prominent notice that the work is derivative, and (2) they >> +include prominent notice akin to these four paragraphs for those parts of >> +this code that are retained. >> >> -=============================================================================*/ >> +=============================================================================== >> +*/ >> >> #ifndef SOFTFLOAT_H >> #define SOFTFLOAT_H >> @@ -46,14 +45,16 @@ these four paragraphs for those parts of this code that are retained. >> #include "config-host.h" >> #include "qemu/osdep.h" >> >> -/*---------------------------------------------------------------------------- >> -| Each of the following `typedef's defines the most convenient type that holds >> -| integers of at least as many bits as specified. For example, `uint8' should >> -| be the most convenient type that can hold unsigned integers of as many as >> -| 8 bits. The `flag' type must be able to hold either a 0 or 1. For most >> -| implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed >> -| to the same as `int'. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Each of the following `typedef's defines the most convenient type that holds >> +integers of at least as many bits as specified. For example, `uint8' should >> +be the most convenient type that can hold unsigned integers of as many as >> +8 bits. The `flag' type must be able to hold either a 0 or 1. For most >> +implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed >> +to the same as `int'. >> +------------------------------------------------------------------------------- >> +*/ >> typedef uint8_t flag; >> typedef uint8_t uint8; >> typedef int8_t int8; >> @@ -69,9 +70,11 @@ typedef int64_t int64; >> #define STATUS(field) status->field >> #define STATUS_VAR , status >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE floating-point ordering relations >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE floating-point ordering relations >> +------------------------------------------------------------------------------- >> +*/ >> enum { >> float_relation_less = -1, >> float_relation_equal = 0, >> @@ -79,9 +82,11 @@ enum { >> float_relation_unordered = 2 >> }; >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE floating-point types. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE floating-point types. >> +------------------------------------------------------------------------------- >> +*/ >> /* Use structures for soft-float types. This prevents accidentally mixing >> them with native int/float types. A sufficiently clever compiler and >> sane ABI should be able to see though these structs. However >> @@ -137,17 +142,21 @@ typedef struct { >> #define make_float128(high_, low_) ((float128) { .high = high_, .low = low_ }) >> #define make_float128_init(high_, low_) { .high = high_, .low = low_ } >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE floating-point underflow tininess-detection mode. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE floating-point underflow tininess-detection mode. >> +------------------------------------------------------------------------------- >> +*/ >> enum { >> float_tininess_after_rounding = 0, >> float_tininess_before_rounding = 1 >> }; >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE floating-point rounding mode. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE floating-point rounding mode. >> +------------------------------------------------------------------------------- >> +*/ >> enum { >> float_round_nearest_even = 0, >> float_round_down = 1, >> @@ -155,9 +164,11 @@ enum { >> float_round_to_zero = 3 >> }; >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE floating-point exception flags. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE floating-point exception flags. >> +------------------------------------------------------------------------------- >> +*/ >> enum { >> float_flag_invalid = 1, >> float_flag_divbyzero = 4, >> @@ -167,7 +178,6 @@ enum { >> float_flag_input_denormal = 64, >> float_flag_output_denormal = 128 >> }; >> - >> typedef struct float_status { >> signed char float_detect_tininess; >> signed char float_rounding_mode; >> @@ -204,27 +214,33 @@ INLINE int get_float_exception_flags(float_status *status) >> } >> void set_floatx80_rounding_precision(int val STATUS_PARAM); >> >> -/*---------------------------------------------------------------------------- >> -| Routine to raise any or all of the software IEC/IEEE floating-point >> -| exception flags. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Routine to raise any or all of the software IEC/IEEE floating-point >> +exception flags. >> +------------------------------------------------------------------------------- >> +*/ >> void float_raise( int8 flags STATUS_PARAM); >> >> -/*---------------------------------------------------------------------------- >> -| Options to indicate which negations to perform in float*_muladd() >> -| Using these differs from negating an input or output before calling >> -| the muladd function in that this means that a NaN doesn't have its >> -| sign bit inverted before it is propagated. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Options to indicate which negations to perform in float*_muladd() >> +Using these differs from negating an input or output before calling >> +the muladd function in that this means that a NaN doesn't have its >> +sign bit inverted before it is propagated. >> +------------------------------------------------------------------------------- >> +*/ >> enum { >> float_muladd_negate_c = 1, >> float_muladd_negate_product = 2, >> float_muladd_negate_result = 4, >> }; >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE integer-to-floating-point conversion routines. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE integer-to-floating-point conversion routines. >> +------------------------------------------------------------------------------- >> +*/ >> float32 int32_to_float32( int32 STATUS_PARAM ); >> float64 int32_to_float64( int32 STATUS_PARAM ); >> float32 uint32_to_float32( uint32 STATUS_PARAM ); >> @@ -239,15 +255,19 @@ floatx80 int64_to_floatx80( int64 STATUS_PARAM ); >> float128 int64_to_float128( int64 STATUS_PARAM ); >> float128 uint64_to_float128( uint64 STATUS_PARAM ); >> >> -/*---------------------------------------------------------------------------- >> -| Software half-precision conversion routines. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software half-precision conversion routines. >> +*---------------------------------------------------------------------------- >> +*/ >> float16 float32_to_float16( float32, flag STATUS_PARAM ); >> float32 float16_to_float32( float16, flag STATUS_PARAM ); >> >> -/*---------------------------------------------------------------------------- >> -| Software half-precision operations. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software half-precision operations. >> +------------------------------------------------------------------------------- >> +*/ >> int float16_is_quiet_nan( float16 ); >> int float16_is_signaling_nan( float16 ); >> float16 float16_maybe_silence_nan( float16 ); >> @@ -257,14 +277,18 @@ INLINE int float16_is_any_nan(float16 a) >> return ((float16_val(a) & ~0x8000) > 0x7c00); >> } >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated half-precision NaN. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated half-precision NaN. >> +------------------------------------------------------------------------------- >> +*/ >> extern const float16 float16_default_nan; >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE single-precision conversion routines. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE single-precision conversion routines. >> +------------------------------------------------------------------------------- >> +*/ >> int_fast16_t float32_to_int16_round_to_zero(float32 STATUS_PARAM); >> uint_fast16_t float32_to_uint16_round_to_zero(float32 STATUS_PARAM); >> int32 float32_to_int32( float32 STATUS_PARAM ); >> @@ -277,9 +301,11 @@ float64 float32_to_float64( float32 STATUS_PARAM ); >> floatx80 float32_to_floatx80( float32 STATUS_PARAM ); >> float128 float32_to_float128( float32 STATUS_PARAM ); >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE single-precision operations. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE single-precision operations. >> +------------------------------------------------------------------------------- >> +*/ >> float32 float32_round_to_int( float32 STATUS_PARAM ); >> float32 float32_add( float32, float32 STATUS_PARAM ); >> float32 float32_sub( float32, float32 STATUS_PARAM ); >> @@ -361,14 +387,18 @@ INLINE float32 float32_set_sign(float32 a, int sign) >> #define float32_infinity make_float32(0x7f800000) >> >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated single-precision NaN. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated single-precision NaN. >> +------------------------------------------------------------------------------- >> +*/ >> extern const float32 float32_default_nan; >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE double-precision conversion routines. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE double-precision conversion routines. >> +------------------------------------------------------------------------------- >> +*/ >> int_fast16_t float64_to_int16_round_to_zero(float64 STATUS_PARAM); >> uint_fast16_t float64_to_uint16_round_to_zero(float64 STATUS_PARAM); >> int32 float64_to_int32( float64 STATUS_PARAM ); >> @@ -383,9 +413,11 @@ float32 float64_to_float32( float64 STATUS_PARAM ); >> floatx80 float64_to_floatx80( float64 STATUS_PARAM ); >> float128 float64_to_float128( float64 STATUS_PARAM ); >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE double-precision operations. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE double-precision operations. >> +------------------------------------------------------------------------------- >> +*/ >> float64 float64_round_to_int( float64 STATUS_PARAM ); >> float64 float64_trunc_to_int( float64 STATUS_PARAM ); >> float64 float64_add( float64, float64 STATUS_PARAM ); >> @@ -467,14 +499,18 @@ INLINE float64 float64_set_sign(float64 a, int sign) >> #define float64_half make_float64(0x3fe0000000000000LL) >> #define float64_infinity make_float64(0x7ff0000000000000LL) >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated double-precision NaN. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated double-precision NaN. >> +------------------------------------------------------------------------------- >> +*/ >> extern const float64 float64_default_nan; >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE extended double-precision conversion routines. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE extended double-precision conversion routines. >> +------------------------------------------------------------------------------- >> +*/ >> int32 floatx80_to_int32( floatx80 STATUS_PARAM ); >> int32 floatx80_to_int32_round_to_zero( floatx80 STATUS_PARAM ); >> int64 floatx80_to_int64( floatx80 STATUS_PARAM ); >> @@ -483,9 +519,11 @@ float32 floatx80_to_float32( floatx80 STATUS_PARAM ); >> float64 floatx80_to_float64( floatx80 STATUS_PARAM ); >> float128 floatx80_to_float128( floatx80 STATUS_PARAM ); >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE extended double-precision operations. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE extended double-precision operations. >> +------------------------------------------------------------------------------- >> +*/ >> floatx80 floatx80_round_to_int( floatx80 STATUS_PARAM ); >> floatx80 floatx80_add( floatx80, floatx80 STATUS_PARAM ); >> floatx80 floatx80_sub( floatx80, floatx80 STATUS_PARAM ); >> @@ -552,14 +590,18 @@ INLINE int floatx80_is_any_nan(floatx80 a) >> #define floatx80_half make_floatx80(0x3ffe, 0x8000000000000000LL) >> #define floatx80_infinity make_floatx80(0x7fff, 0x8000000000000000LL) >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated extended double-precision NaN. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated extended double-precision NaN. >> +------------------------------------------------------------------------------- >> +*/ >> extern const floatx80 floatx80_default_nan; >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE quadruple-precision conversion routines. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE quadruple-precision conversion routines. >> +------------------------------------------------------------------------------- >> +*/ >> int32 float128_to_int32( float128 STATUS_PARAM ); >> int32 float128_to_int32_round_to_zero( float128 STATUS_PARAM ); >> int64 float128_to_int64( float128 STATUS_PARAM ); >> @@ -568,9 +610,11 @@ float32 float128_to_float32( float128 STATUS_PARAM ); >> float64 float128_to_float64( float128 STATUS_PARAM ); >> floatx80 float128_to_floatx80( float128 STATUS_PARAM ); >> >> -/*---------------------------------------------------------------------------- >> -| Software IEC/IEEE quadruple-precision operations. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +Software IEC/IEEE quadruple-precision operations. >> +------------------------------------------------------------------------------- >> +*/ >> float128 float128_round_to_int( float128 STATUS_PARAM ); >> float128 float128_add( float128, float128 STATUS_PARAM ); >> float128 float128_sub( float128, float128 STATUS_PARAM ); >> @@ -633,9 +677,11 @@ INLINE int float128_is_any_nan(float128 a) >> >> #define float128_zero make_float128(0, 0) >> >> -/*---------------------------------------------------------------------------- >> -| The pattern for a default generated quadruple-precision NaN. >> -*----------------------------------------------------------------------------*/ >> +/* >> +------------------------------------------------------------------------------- >> +The pattern for a default generated quadruple-precision NaN. >> +------------------------------------------------------------------------------- >> +*/ >> extern const float128 float128_default_nan; >> >> #endif /* !SOFTFLOAT_H */ >>
Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 29/04/2013 20:05, Anthony Liguori ha scritto: >> N.B. If you are on CC, see after the '---' for a requested action! >> >> The license of SoftFloat-2b is claimed to be GPLv2 incompatible by >> the FSF due to an indemnification clause. The previous release, >> SoftFloat-2a, did not contain this clause. The only changes between >> these two versions as far as QEMU is concerned is the license change >> and a global modification of the comment structure. This patch rebases >> our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible >> license. >> >> Please note, this is a comment-only change. The resulting binary should >> be the same. >> >> I created this patch using the following strategy: >> >> 1) Create a branch using the original import of softfloat code: >> $ git checkout 158142c2c2df728cfa3b5320c65534921a764f26 >> >> 2) Remove carriage returns from Softfloat-2b >> >> 3) Compare each of the softfloat files against Softfloat-2b using the >> following mapping to generate Fabrice's original softfloat changes: >> >> - fpu/softfloat.c -> softfloat/bits64/softfloat.c >> - fpu/softfloat.h -> softfloat/bits64/386-Win32-gcc/softfloat.h >> - fpu/softfloat-macros.h -> softfloat/bits64/softfloat-macros >> - fpu/softfloat-specialize.h -> > softfloat/bits64/386-Win32-gcc/softfloat-specialize >> >> 4) Replace our softfloat files with the corresponding files from Softfloat-2a >> >> 5) Apply the diffs from (3) to (4) and commit >> >> 6) Create a diff between (5) and 158142c2c2df728cfa3b5320c65534921a764f26 >> - This diff consists 100% of licensing change + comment reformating >> >> 7) Checkout the latest master branch, apply the diff from (6) >> - There were a lot of comment rejects, confirmed this was only comments >> and then used an emacs macro to rewrite the comments to the Softfloat-2a >> form. >> >> Cc: Andreas Färber <afaerber@suse.de> >> Cc: Aurelien Jarno <aurelien@aurel32.net> >> Cc: Avi Kivity <avi.kivity@gmail.com> >> Cc: Ben Taylor <bentaylor.solx86@gmail.com> >> Cc: Blue Swirl <blauwirbel@gmail.com> >> Cc: Christophe Lyon <christophe.lyon@st.com> >> Cc: Fabrice Bellard <fabrice@bellard.org> >> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> >> Cc: Jocelyn Mayer <l_indien@magic.fr> >> Cc: Juan Quintela <quintela@redhat.com> >> Cc: malc <av1474@comtv.ru> >> Cc: Max Filippov <jcmvbkbc@gmail.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Paul Brook <paul@codesourcery.com> >> Cc: Peter Maydell <peter.maydell@linaro.org> >> Cc: Richard Henderson <rth@twiddle.net> >> Cc: Richard Sandiford <rdsandiford@googlemail.com> >> Cc: Stefan Weil <weil@mail.berlios.de> >> Cc: Thiemo Seufer <ths@networkno.de> >> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> >> --- >> In order to make this change, we need to relicense all contributions >> from initial import of the SoftFloat code to match the license of >> SoftFloat-2a (instead of the implied SoftFloat-2b license). > > All Red Hat contributions (at least Avi, Juan, me; don't know about rth) > are available under GPLv2+; also other authors agreed on it. For this > particular license, > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Juan Quintela <quintela@redhat.com> As said by paolo, any contribution by me is under GPLv2+ O:-) Anthony, thanks for the effort. Later, Juan.
Am 29.04.2013 20:05, schrieb Anthony Liguori: > N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. > ... > --- > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). > > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name <your@email.com> > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). > > Please respond no later than May 6th, 2013. If we are unable to confirm > relicense from an author, changes from that author will be reverted. > --- > Acked-by: Stefan Weil <sw@weilnetz.de> (weil@mail.berlios.de was my former mail address. It still can be used because berlios was not closed) All my contributions to QEMU may be used with GPLv2+. Regards, Stefan
Anthony Liguori <aliguori@us.ibm.com> writes: > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). > > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name <your@email.com> > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). > > Please respond no later than May 6th, 2013. If we are unable to confirm > relicense from an author, changes from that author will be reverted. Acked-by: Richard Sandiford <rdsandiford@googlemail.com> Richard
On Apr 29, 2013 9:06 PM, "Anthony Liguori" <aliguori@us.ibm.com> wrote: > > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). Acked-by: Avi Kivity <avi.kivity@gmail.com>
Acked-by: Your Name bentaylor.solx86@gmail.com if needed. It looked like my and Juergen's additions may have already been deprecated. Ben On Mon, Apr 29, 2013 at 2:05 PM, Anthony Liguori <aliguori@us.ibm.com>wrote: > N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. > > Please note, this is a comment-only change. The resulting binary should > be the same. > > I created this patch using the following strategy: > > 1) Create a branch using the original import of softfloat code: > $ git checkout 158142c2c2df728cfa3b5320c65534921a764f26 > > 2) Remove carriage returns from Softfloat-2b > > 3) Compare each of the softfloat files against Softfloat-2b using the > following mapping to generate Fabrice's original softfloat changes: > > - fpu/softfloat.c -> softfloat/bits64/softfloat.c > - fpu/softfloat.h -> softfloat/bits64/386-Win32-gcc/softfloat.h > - fpu/softfloat-macros.h -> softfloat/bits64/softfloat-macros > - fpu/softfloat-specialize.h -> > softfloat/bits64/386-Win32-gcc/softfloat-specialize > > 4) Replace our softfloat files with the corresponding files from > Softfloat-2a > > 5) Apply the diffs from (3) to (4) and commit > > 6) Create a diff between (5) and 158142c2c2df728cfa3b5320c65534921a764f26 > - This diff consists 100% of licensing change + comment reformating > > 7) Checkout the latest master branch, apply the diff from (6) > - There were a lot of comment rejects, confirmed this was only comments > and then used an emacs macro to rewrite the comments to the > Softfloat-2a > form. > > Cc: Andreas Färber <afaerber@suse.de> > Cc: Aurelien Jarno <aurelien@aurel32.net> > Cc: Avi Kivity <avi.kivity@gmail.com> > Cc: Ben Taylor <bentaylor.solx86@gmail.com> > Cc: Blue Swirl <blauwirbel@gmail.com> > Cc: Christophe Lyon <christophe.lyon@st.com> > Cc: Fabrice Bellard <fabrice@bellard.org> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> > Cc: Jocelyn Mayer <l_indien@magic.fr> > Cc: Juan Quintela <quintela@redhat.com> > Cc: malc <av1474@comtv.ru> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Paul Brook <paul@codesourcery.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Richard Sandiford <rdsandiford@googlemail.com> > Cc: Stefan Weil <weil@mail.berlios.de> > Cc: Thiemo Seufer <ths@networkno.de> > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > --- > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). > > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name <your@email.com> > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). > > Please respond no later than May 6th, 2013. If we are unable to confirm > relicense from an author, changes from that author will be reverted. > --- > For completeness, here is the full listing of contributions: > > Andreas Färber <afaerber@suse.de> > be45f06 Silence softfloat warnings on OpenSolaris > 5aea4c5 softfloat: Replace uint16 type with uint_fast16_t > 94a49d8 softfloat: Replace int16 type with int_fast16_t > c969654 softfloat: Fix mixups of int and int16 > 38641f8 softfloat: Use uint16 consistently > 87b8cc3 softfloat: Resolve type mismatches between declaration and > implementation > 8d725fa softfloat: Prepend QEMU-style header with derivation notice > 9f8d2a0 softfloat: Use uint32 consistently > bb98fe4 softfloat: Drop [s]bits{8, 16, 32, 64} types in favor of > [u]int{8, 16, 32, 64}_t > > Aurelien Jarno <aurelien@aurel32.net> > 1020160 softfloat: fix default-NaN mode > 084d19b target-mips: Implement correct NaN propagation rules > 196cfc8 softfloat: add a 1.0 constant for float32 and float64 > 1b2ad2e softfloat-native: fix *nan() > 1f398e0 softfloat: use float{32,64,x80,128}_maybe_silence_nan() > 211315f softfloat: rename float*_eq() into float*_eq_quiet() > 2657d0f softfloat: rename float*_eq_signaling() into float*_eq() > 30e7a22 Use float_relation_* constants > 326b9e9 softfloat: fix float*_scalnb() corner cases > 34d2386 softfloat: remove HPPA specific code > 374dfc3 soft-float: add float32_log2() and float64_log2() > 4cc5383 softfloat-native: add float*_is_any_nan() functions > 587eabf softfloat: add float*_is_zero_or_denormal() > 629bd74 softfloat-native: add float32_is_nan() > 67b7861 softfloat: add float*_unordered_{,quiet}() functions > 8229c99 softfloat: add float32_exp2() > 85016c9 Assortment of soft-float fixes, by Aurelien Jarno. > 8d6c92b softfloat-native: improve correctness of floatXX_is_neg() > 93ae1c6 softfloat: fix float{32,64}_maybe_silence_nan() for MIPS > a167ba5 Add support for GNU/kFreeBSD > b3b4c7f softfloat: use GCC builtins to count the leading zeros > b4a0ef7 softfloat-native: add float*_unordered_quiet() functions > b689362 softfloat: move float*_eq and float*_eq_quiet > b76235e softfloat: fix floatx80_is_infinity() > bbc1ded softfloat: implement fused multiply-add NaN propagation > for MIPS > be22a9a softfloat: always enable floatx80 and float128 support > c4b4c77 softfloat: add pi constants > c52ab6f fp: add floatXX_is_infinity(), floatXX_is_neg(), > floatXX_is_zero() > cf67c6b softfloat-native: remove > d2b1027 softfloat-native: add a few constant values > d6882cf softfloat-native: fix float*_scalbn() functions > d735d69 softfloat: rename *IsNaN variables to *IsQuietNaN > dadd71a fp: fix float32_is_infinity() > de4af5f softfloat: fix floatx80_is_{quiet,signaling}_nan() > e024e88 target-ppc: Implement correct NaN propagation rules > e2f4220 softfloat: fix floatx80 handling of NaN > e872aa8 softfloat-native: fix type of float_rounding_mode > e908775 softfloat: SH4 has the sNaN bit set > f3218a8 softfloat: add floatx80 constants > f5a6425 softfloat: improve description of comparison functions > f6714d3 softfloat: add floatx80_compare*() functions > f6a7d92 softfloat: add float{x80,128}_maybe_silence_nan() > > Avi Kivity <avi.kivity@gmail.com> > 3bf7e40 softfloat: fix for C99 > > Ben Taylor <bentaylor.solx86@gmail.com> > 0475a5c Solaris 9/x86 support, by Ben Taylor. > c94655b Updated Solaris isinf support, by Juergen Keil and Ben > Taylor. > > Blue Swirl <blauwirbel@gmail.com> > 128ab2f Preliminary OpenBSD host support (based on OpenBSD patches > by Todd T. Fries) > 14d483e Fix OpenSolaris softfloat warnings > 179a2c1 Rename _BSD to HOST_BSD so that it's more obvious that > it's defined by configure > 1d6198c Remove unnecessary trailing newlines > 1f58732 128-bit float support for user mode > 2734c70 Rename one more _BSD to HOST_BSD (spotted by Hasso Tepper) > 3f4cb3d Fix OpenSolaris gcc4 warnings: iovec type mismatches, > missing 'static' > 70c1470 Sparse fixes: dubious mixing of bitwise and logical > operations > 7c2a9d0 Fix math warnings on OpenBSD -current > b1d8e52 Fix undeclared symbol warnings from sparse > b55266b Suppress gcc 4.x -Wpointer-sign (included in -Wall) > warnings > cd8a253 Fix more typos in softloat code (Eduardo Felipe) > d07cca0 Add native softfloat fpu functions (Christoph Egger) > ed086f3 softfloat: remove dead assignments, spotted by clang > > Christophe Lyon <christophe.lyon@st.com> > 8559666 softfloat: move all default NaN definitions to softfloat.h. > bcd4d9a softfloat: Honour default_nan_mode for float-to-float > conversions > c30fe7d softfloat: add _set_sign(), _infinity and _half for 32 and > 64 bits floats. > > Fabrice Bellard <fabrice@bellard.org> > 158142c soft float support > 1b2b0af 64 bit fix > 1d6bda3 added abs, chs and compare functions > 38cfa06 Solaris port (Ben Taylor) > 750afe9 avoid using char when it is not necessary > b109f9f more native FPU comparison functions - native FPU remainder > ec530c8 Solaris port (Ben Taylor) > fdbb469 Solaris/SPARC host port (Ben Taylor) > > Guan Xuetao <gxt@mprc.pku.edu.cn> > d2fbca9 unicore32: necessary modifications for other files to > support unicore32 > > Jocelyn Mayer <l_indien@magic.fr> > 3430b0b Ooops... Typo. > 75d62a5 Add missing softfloat helpers. > > Juan Quintela <quintela@redhat.com> > 0eb4fc8 softfloat: make USE_SOFTFLOAT_STRUCT_TYPES compile > 71e72a1 rename HOST_BSD to CONFIG_BSD > 75b5a69 rename NEEDS_LIBSUNMATH to CONFIG_NEEDS_LIBSUNMATH > dfe5fff change HOST_SOLARIS to CONFIG_SOLARIS{_VERSION} > e2542fe rename WORDS_BIGENDIAN to HOST_WORDS_BIGENDIAN > > malc <av1474@comtv.ru> > 947f5fc Add static qualifier to local functions > e58ffeb Remove all traces of __powerpc__ > > Max Filippov <jcmvbkbc@gmail.com> > 6617680 softfloat: make float_muladd_negate_* flags independent > 213ff4e softfloat: add NO_SIGNALING_NANS > b81fe82 target-xtensa: specialize softfloat NaN rules > > Paolo Bonzini <pbonzini@redhat.com> > 1de7afc misc: move include files to include/qemu/ > 6b4c305 fpu: move public header file to include/fpu > 789ec7c softfloat: change default nan definitions to variables > > Paul Brook <paul@codesourcery.com> > 6001149 ARM FP16 support > 6939754 Correctly normalize values and handle zero inputs to > scalbn functions. > 3598ecb Remove missing include. > 5c7908e Implement default-NaN mode. > 7918bf4 Fix typo in BSD FP rounding mode names. > 9027db8 Fix ARM default NaN. > 9ee6e8b ARMv7 support. > a1b91bb Fix typo in softfloat code. > e6e5906 ColdFire target. > f090c9d Add strict checking mode for softfp code. > fe76d97 Implement flush-to-zero mode (denormal results are > replaced with zero). > > Peter Maydell <peter.maydell@linaro.org> > 1856987 softfloat: Rename float*_is_nan() functions to > float*_is_quiet_nan() > 760e141 softfloat: roundAndPackInt{32, 64}: Don't assume int32 is > 32 bits > 011da61 target-arm: Implement correct NaN propagation rules > 21d6ebd softfloat: Add float*_is_any_nan() functions > 274f1b0 softfloat: Add float*_min() and float*_max() functions > 2ac8bd0 softfloat: Reinstate accidentally disabled target-specific > NaN handling > 2bed652 softfloat: Implement floatx80_is_any_nan() and > float128_is_any_nan() > 354f211 softfloat: abstract out target-specific NaN propagation > rules > 369be8f softfloat: Implement fused multiply-add > 37d1866 softfloat: Implement flushing input denormals to zero > 4be8eea fpu/softfloat.c: Remove pointless shift of always-zero > value > 600e30d softfloat: Fix single-to-half precision float conversions > 6f3300a softfloat: Add float32_is_zero_or_denormal() function > b3a6a2e softfloat: float*_to_int32_round_to_zero: don't assume > int32 is 32 bits > b408dbd softfloat: Add float*_maybe_silence_nan() functions > bb4d4bb softfloat: Add float16 type and float16 NaN handling > functions > c29aca4 softfloat: Add setter function for tininess detection mode > cbcef45 softfloat: Add float/double to 16 bit integer conversion > functions > d5138cf softfloat: Fix compilation failures with > USE_SOFTFLOAT_STRUCT_TYPES > e3d142d fpu: Correct edgecase in float64_muladd > e6afc87 softfloat: Add new flag for when denormal result is > flushed to zero > e744c06 fpu/softfloat.c: Return correctly signed values from > uint64_to_float32 > f591e1b softfloat: Correctly handle NaNs in float16_to_float32() > > Richard Henderson <rth@twiddle.net> > 17ed229 softfloat: Fix uint64_to_float64 > 1e397ea softfloat: Implement uint64_to_float128 > 8443eff target-alpha: Split up FPCR value into separate fields. > 990b3e1 target-alpha: Enable softfloat. > ba0e276 target-alpha: Fixes for alpha-linux syscalls. > > Richard Sandiford <rdsandiford@googlemail.com> > a6e7c18 softfloat: Handle float_muladd_negate_c when product is > zero > > Stefan Weil <weil@mail.berlios.de> > bc4347b arm host: fix compiler warning > > Thiemo Seufer <ths@networkno.de> > 5a6932d Fix NaN handling for MIPS and HPPA. > 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files > 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. > 924b2c0 Add proper float*_is_nan prototypes. > b645bb4 Fix softfloat NaN handling. > fc81ba5 Check that HOST_SOLARIS is defined before relying on its > value. Spotted by Joachim Henke. > --- > fpu/softfloat-macros.h | 430 ++++---- > fpu/softfloat-specialize.h | 494 +++++---- > fpu/softfloat.c | 2436 > ++++++++++++++++++++++++-------------------- > include/fpu/softfloat.h | 242 +++-- > 4 files changed, 1981 insertions(+), 1621 deletions(-) > > diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h > index b5164af..2009315 100644 > --- a/fpu/softfloat-macros.h > +++ b/fpu/softfloat-macros.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > > -/*============================================================================ > +/* > > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 > Center > @@ -16,28 +17,27 @@ National Science Foundation under grant MIP-9311980. > The original version > of this code was written as part of a project to build a fixed-point > vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL > LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO > FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal notice) AGAINST ALL LOSSES, COSTS, > OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE > SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR > ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice > with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) > they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > > =============================================================================*/ > > > -/*---------------------------------------------------------------------------- > -| This macro tests for minimum version of the GNU C compiler. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +This macro tests for minimum version of the GNU C compiler. > > +------------------------------------------------------------------------------- > +*/ > #if defined(__GNUC__) && defined(__GNUC_MINOR__) > # define SOFTFLOAT_GNUC_PREREQ(maj, min) \ > ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) > @@ -46,14 +46,16 @@ these four paragraphs for those parts of this code > that are retained. > #endif > > > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant > bit of > -| the result by setting the least significant bit to 1. The value of > `count' > -| can be arbitrarily large; in particular, if `count' is greater than 32, > the > -| result will be either 0 or 1, depending on whether `a' is zero or > nonzero. > -| The result is stored in the location pointed to by `zPtr'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit > of > +the result by setting the least significant bit to 1. The value of > `count' > +can be arbitrarily large; in particular, if `count' is greater than 32, > the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t > *zPtr) > { > @@ -72,14 +74,16 @@ INLINE void shift32RightJamming(uint32_t a, > int_fast16_t count, uint32_t *zPtr) > > } > > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant > bit of > -| the result by setting the least significant bit to 1. The value of > `count' > -| can be arbitrarily large; in particular, if `count' is greater than 64, > the > -| result will be either 0 or 1, depending on whether `a' is zero or > nonzero. > -| The result is stored in the location pointed to by `zPtr'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit > of > +the result by setting the least significant bit to 1. The value of > `count' > +can be arbitrarily large; in particular, if `count' is greater than 64, > the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t > *zPtr) > { > @@ -98,23 +102,24 @@ INLINE void shift64RightJamming(uint64_t a, > int_fast16_t count, uint64_t *zPtr) > > } > > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by > 64 > -| _plus_ the number of bits given in `count'. The shifted result is at > most > -| 64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. > The > -| bits shifted off form a second 64-bit result as follows: The _last_ bit > -| shifted off is the most-significant bit of the extra result, and the > other > -| 63 bits of the extra result are all zero if and only if > _all_but_the_last_ > -| bits shifted off were all zero. This extra result is stored in the > location > -| pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0' and `a1' are considered to > form > -| a fixed-point value with binary point between `a0' and `a1'. This > fixed- > -| point value is shifted right by the number of bits given in `count', and > -| the integer part of the result is returned at the location pointed to by > -| `z0Ptr'. The fractional part of the result may be slightly corrupted as > -| described above, and is returned at the location pointed to by `z1Ptr'.) > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 > +_plus_ the number of bits given in `count'. The shifted result is at most > +64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. > The > +bits shifted off form a second 64-bit result as follows: The _last_ bit > +shifted off is the most-significant bit of the extra result, and the other > +63 bits of the extra result are all zero if and only if _all_but_the_last_ > +bits shifted off were all zero. This extra result is stored in the > location > +pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0' and `a1' are considered to > form a > +fixed-point value with binary point between `a0' and `a1'. This > fixed-point > +value is shifted right by the number of bits given in `count', and the > +integer part of the result is returned at the location pointed to by > +`z0Ptr'. The fractional part of the result may be slightly corrupted as > +described above, and is returned at the location pointed to by `z1Ptr'.) > > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift64ExtraRightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, > uint64_t *z1Ptr) > @@ -144,14 +149,15 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by > the > -| number of bits given in `count'. Any bits shifted off are lost. The > value > -| of `count' can be arbitrarily large; in particular, if `count' is > greater > -| than 128, the result will be 0. The result is broken into two 64-bit > pieces > -| which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by > the > +number of bits given in `count'. Any bits shifted off are lost. The > value > +of `count' can be arbitrarily large; in particular, if `count' is greater > +than 128, the result will be 0. The result is broken into two 64-bit > pieces > +which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128Right( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, > uint64_t *z1Ptr) > @@ -176,17 +182,18 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by > the > -| number of bits given in `count'. If any nonzero bits are shifted off, > they > -| are ``jammed'' into the least significant bit of the result by setting > the > -| least significant bit to 1. The value of `count' can be arbitrarily > large; > -| in particular, if `count' is greater than 128, the result will be either > -| 0 or 1, depending on whether the concatenation of `a0' and `a1' is zero > or > -| nonzero. The result is broken into two 64-bit pieces which are stored > at > -| the locations pointed to by `z0Ptr' and `z1Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by > the > +number of bits given in `count'. If any nonzero bits are shifted off, > they > +are ``jammed'' into the least significant bit of the result by setting the > +least significant bit to 1. The value of `count' can be arbitrarily > large; > +in particular, if `count' is greater than 128, the result will be either > +0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or > +nonzero. The result is broken into two 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr' and `z1Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128RightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, > uint64_t *z1Ptr) > @@ -219,25 +226,26 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' > right > -| by 64 _plus_ the number of bits given in `count'. The shifted result is > -| at most 128 nonzero bits; these are broken into two 64-bit pieces which > are > -| stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits > shifted > -| off form a third 64-bit result as follows: The _last_ bit shifted off > is > -| the most-significant bit of the extra result, and the other 63 bits of > the > -| extra result are all zero if and only if _all_but_the_last_ bits > shifted off > -| were all zero. This extra result is stored in the location pointed to > by > -| `z2Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0', `a1', and `a2' are > considered > -| to form a fixed-point value with binary point between `a1' and `a2'. > This > -| fixed-point value is shifted right by the number of bits given in > `count', > -| and the integer part of the result is returned at the locations pointed > to > -| by `z0Ptr' and `z1Ptr'. The fractional part of the result may be > slightly > -| corrupted as described above, and is returned at the location pointed > to by > -| `z2Ptr'.) > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' > right > +by 64 _plus_ the number of bits given in `count'. The shifted result is > +at most 128 nonzero bits; these are broken into two 64-bit pieces which > are > +stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits > shifted > +off form a third 64-bit result as follows: The _last_ bit shifted off is > +the most-significant bit of the extra result, and the other 63 bits of the > +extra result are all zero if and only if _all_but_the_last_ bits shifted > off > +were all zero. This extra result is stored in the location pointed to by > +`z2Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0', `a1', and `a2' are considered > +to form a fixed-point value with binary point between `a1' and `a2'. This > +fixed-point value is shifted right by the number of bits given in `count', > +and the integer part of the result is returned at the locations pointed to > +by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly > +corrupted as described above, and is returned at the location pointed to > by > +`z2Ptr'.) > > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128ExtraRightJamming( > uint64_t a0, > @@ -289,13 +297,14 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' left by > the > -| number of bits given in `count'. Any bits shifted off are lost. The > value > -| of `count' must be less than 64. The result is broken into two 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr' and > `z1Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the > +number of bits given in `count'. Any bits shifted off are lost. The > value > +of `count' must be less than 64. The result is broken into two 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr' and > `z1Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift128Left( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, > uint64_t *z1Ptr) > @@ -307,14 +316,15 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' > left > -| by the number of bits given in `count'. Any bits shifted off are lost. > -| The value of `count' must be less than 64. The result is broken into > three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left > +by the number of bits given in `count'. Any bits shifted off are lost. > +The value of `count' must be less than 64. The result is broken into > three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift192Left( > uint64_t a0, > @@ -343,13 +353,14 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Adds the 128-bit value formed by concatenating `a0' and `a1' to the > 128-bit > -| value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, > so > -| any carry out is lost. The result is broken into two 64-bit pieces > which > -| are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Adds the 128-bit value formed by concatenating `a0' and `a1' to the > 128-bit > +value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so > +any carry out is lost. The result is broken into two 64-bit pieces which > +are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > add128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, > uint64_t *z1Ptr ) > @@ -362,14 +373,15 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to > the > -| 192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > -| modulo 2^192, so any carry out is lost. The result is broken into three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the > +192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > +modulo 2^192, so any carry out is lost. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > add192( > uint64_t a0, > @@ -400,14 +412,15 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Subtracts the 128-bit value formed by concatenating `b0' and `b1' from > the > -| 128-bit value formed by concatenating `a0' and `a1'. Subtraction is > modulo > -| 2^128, so any borrow out (carry out) is lost. The result is broken > into two > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr' > and > -| `z1Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the > +128-bit value formed by concatenating `a0' and `a1'. Subtraction is > modulo > +2^128, so any borrow out (carry out) is lost. The result is broken into > two > +64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > +`z1Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, > uint64_t *z1Ptr ) > @@ -418,14 +431,15 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > -| from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > -| Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > -| result is broken into three 64-bit pieces which are stored at the > locations > -| pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > +from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > +Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > +result is broken into three 64-bit pieces which are stored at the > locations > +pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub192( > uint64_t a0, > @@ -456,11 +470,13 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Multiplies `a' by `b' to obtain a 128-bit product. The product is > broken > -| into two 64-bit pieces which are stored at the locations pointed to by > -| `z0Ptr' and `z1Ptr'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > +into two 64-bit pieces which are stored at the locations pointed to by > +`z0Ptr' and `z1Ptr'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t > *z1Ptr ) > { > @@ -485,13 +501,14 @@ INLINE void mul64To128( uint64_t a, uint64_t b, > uint64_t *z0Ptr, uint64_t *z1Ptr > > } > > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > -| `b' to obtain a 192-bit product. The product is broken into three > 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr', > `z1Ptr', and > -| `z2Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > +`b' to obtain a 192-bit product. The product is broken into three 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', > and > +`z2Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128By64To192( > uint64_t a0, > @@ -513,13 +530,14 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' to > the > -| 128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > -| product. The product is broken into four 64-bit pieces which are > stored at > -| the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > +128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > +product. The product is broken into four 64-bit pieces which are stored > at > +the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128To256( > uint64_t a0, > @@ -550,14 +568,16 @@ INLINE void > > } > > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the 64-bit integer quotient obtained by > dividing > -| `b' into the 128-bit value formed by concatenating `a0' and `a1'. The > -| divisor `b' must be at least 2^63. If q is the exact quotient truncated > -| toward zero, the approximation returned lies between q and q + 2 > inclusive. > -| If the exact quotient q is larger than 64 bits, the maximum positive > 64-bit > -| unsigned integer is returned. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns an approximation to the 64-bit integer quotient obtained by > dividing > +`b' into the 128-bit value formed by concatenating `a0' and `a1'. The > +divisor `b' must be at least 2^63. If q is the exact quotient truncated > +toward zero, the approximation returned lies between q and q + 2 > inclusive. > +If the exact quotient q is larger than 64 bits, the maximum positive > 64-bit > +unsigned integer is returned. > > +------------------------------------------------------------------------------- > +*/ > > static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) > { > @@ -581,15 +601,17 @@ static uint64_t estimateDiv128To64( uint64_t a0, > uint64_t a1, uint64_t b ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the square root of the 32-bit significand > given > -| by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 > of > -| `aExp' (the least significant bit) is 1, the integer returned > approximates > -| 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of > `aExp' > -| is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > -| case, the approximation returned lies strictly within +/-2 of the exact > -| value. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns an approximation to the square root of the 32-bit significand > given > +by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of > +`aExp' (the least significant bit) is 1, the integer returned approximates > +2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of > `aExp' > +is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > +case, the approximation returned lies strictly within +/-2 of the exact > +value. > > +------------------------------------------------------------------------------- > +*/ > > static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > { > @@ -620,10 +642,12 @@ static uint32_t estimateSqrt32(int_fast16_t aExp, > uint32_t a) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit > of > -| `a'. If `a' is zero, 32 is returned. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 32 is returned. > > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros32( uint32_t a ) > { > @@ -668,10 +692,12 @@ static int8 countLeadingZeros32( uint32_t a ) > #endif > } > > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit > of > -| `a'. If `a' is zero, 64 is returned. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 64 is returned. > > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros64( uint64_t a ) > { > @@ -696,11 +722,13 @@ static int8 countLeadingZeros64( uint64_t a ) > #endif > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > -| is equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > +is equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > > +------------------------------------------------------------------------------- > +*/ > > INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -709,11 +737,13 @@ INLINE flag eq128( uint64_t a0, uint64_t a1, > uint64_t b0, uint64_t b1 ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > less > -| than or equal to the 128-bit value formed by concatenating `b0' and > `b1'. > -| Otherwise, returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > less > +than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > > +------------------------------------------------------------------------------- > +*/ > > INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -722,11 +752,13 @@ INLINE flag le128( uint64_t a0, uint64_t a1, > uint64_t b0, uint64_t b1 ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > less > -| than the 128-bit value formed by concatenating `b0' and `b1'. > Otherwise, > -| returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > less > +than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > +returns 0. > > +------------------------------------------------------------------------------- > +*/ > > INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -735,11 +767,13 @@ INLINE flag lt128( uint64_t a0, uint64_t a1, > uint64_t b0, uint64_t b1 ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > -| not equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > +not equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > > +------------------------------------------------------------------------------- > +*/ > > INLINE flag ne128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h > index 518f694..ba9bfeb 100644 > --- a/fpu/softfloat-specialize.h > +++ b/fpu/softfloat-specialize.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > > -/*============================================================================ > +/* > > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 > Center > @@ -16,22 +17,19 @@ National Science Foundation under grant MIP-9311980. > The original version > of this code was written as part of a project to build a fixed-point > vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL > LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO > FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, > OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE > SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR > ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice > with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) > they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > > =============================================================================*/ > > @@ -48,9 +46,11 @@ these four paragraphs for those parts of this code that > are retained. > #define NO_SIGNALING_NANS 1 > #endif > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > const float16 float16_default_nan = const_float16(0x7E00); > #elif SNAN_BIT_IS_ONE > @@ -59,9 +59,11 @@ const float16 float16_default_nan = > const_float16(0x7DFF); > const float16 float16_default_nan = const_float16(0xFE00); > #endif > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float32 float32_default_nan = const_float32(0x7FFFFFFF); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) > || \ > @@ -73,9 +75,11 @@ const float32 float32_default_nan = > const_float32(0x7FBFFFFF); > const float32 float32_default_nan = const_float32(0xFFC00000); > #endif > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float64 float64_default_nan = const_float64(LIT64( > 0x7FFFFFFFFFFFFFFF )); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) > @@ -86,9 +90,11 @@ const float64 float64_default_nan = > const_float64(LIT64( 0x7FF7FFFFFFFFFFFF )); > const float64 float64_default_nan = const_float64(LIT64( > 0xFFF8000000000000 )); > #endif > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define floatx80_default_nan_high 0x7FFF > #define floatx80_default_nan_low LIT64( 0xBFFFFFFFFFFFFFFF ) > @@ -100,10 +106,12 @@ const float64 float64_default_nan = > const_float64(LIT64( 0xFFF8000000000000 )); > const floatx80 floatx80_default_nan > = make_floatx80_init(floatx80_default_nan_high, > floatx80_default_nan_low); > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. The > `high' and > -| `low' values hold the most- and least-significant bits, respectively. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. The `high' > and > +`low' values hold the most- and least-significant bits, respectively. > > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF ) > #define float128_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) > @@ -115,21 +123,25 @@ const floatx80 floatx80_default_nan > const float128 float128_default_nan > = make_float128_init(float128_default_nan_high, > float128_default_nan_low); > > > -/*---------------------------------------------------------------------------- > -| Raises the exceptions specified by `flags'. Floating-point traps can be > -| defined here if desired. It is currently not possible for such a trap > -| to substitute a result value. If traps are not implemented, this > routine > -| should be simply `float_exception_flags |= flags;'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Raises the exceptions specified by `flags'. Floating-point traps can be > +defined here if desired. It is currently not possible for such a trap > +to substitute a result value. If traps are not implemented, this routine > +should be simply `float_exception_flags |= flags;'. > > +------------------------------------------------------------------------------- > +*/ > > void float_raise( int8 flags STATUS_PARAM ) > { > STATUS(float_exception_flags) |= flags; > } > > > -/*---------------------------------------------------------------------------- > -| Internal canonical NaN format. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Internal canonical NaN format. > > +------------------------------------------------------------------------------- > +*/ > typedef struct { > flag sign; > uint64_t high, low; > @@ -146,10 +158,12 @@ int float16_is_signaling_nan(float16 a_) > return 0; > } > #else > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > > +------------------------------------------------------------------------------- > +*/ > > int float16_is_quiet_nan(float16 a_) > { > @@ -161,10 +175,12 @@ int float16_is_quiet_nan(float16 a_) > #endif > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > > +------------------------------------------------------------------------------- > +*/ > > int float16_is_signaling_nan(float16 a_) > { > @@ -177,10 +193,12 @@ int float16_is_signaling_nan(float16 a_) > } > #endif > > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the half-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the half-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > > +------------------------------------------------------------------------------- > +*/ > float16 float16_maybe_silence_nan(float16 a_) > { > if (float16_is_signaling_nan(a_)) { > @@ -199,11 +217,13 @@ float16 float16_maybe_silence_nan(float16 a_) > return a_; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the half-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the half-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > > +------------------------------------------------------------------------------- > +*/ > > static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > { > @@ -216,10 +236,12 @@ static commonNaNT float16ToCommonNaN( float16 a > STATUS_PARAM ) > return z; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the half- > -| precision floating-point format. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the half- > +precision floating-point format. > > +------------------------------------------------------------------------------- > +*/ > > static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM) > { > @@ -248,10 +270,12 @@ int float32_is_signaling_nan(float32 a_) > return 0; > } > #else > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > > +------------------------------------------------------------------------------- > +*/ > > int float32_is_quiet_nan( float32 a_ ) > { > @@ -263,10 +287,12 @@ int float32_is_quiet_nan( float32 a_ ) > #endif > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a > signaling > -| NaN; otherwise returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > > +------------------------------------------------------------------------------- > +*/ > > int float32_is_signaling_nan( float32 a_ ) > { > @@ -279,10 +305,12 @@ int float32_is_signaling_nan( float32 a_ ) > } > #endif > > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the single-precision floating point value `a' is > a > -| signaling NaN; otherwise returns `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the single-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > > +------------------------------------------------------------------------------- > +*/ > > float32 float32_maybe_silence_nan( float32 a_ ) > { > @@ -302,12 +330,13 @@ float32 float32_maybe_silence_nan( float32 a_ ) > return a_; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > { > commonNaNT z; > @@ -319,10 +348,12 @@ static commonNaNT float32ToCommonNaN( float32 a > STATUS_PARAM ) > return z; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the single- > -| precision floating-point format. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the single- > +precision floating-point format. > > +------------------------------------------------------------------------------- > +*/ > > static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > { > @@ -339,22 +370,24 @@ static float32 commonNaNToFloat32( commonNaNT a > STATUS_PARAM) > return float32_default_nan; > } > > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a two-input operation. > -| IEEE754 doesn't specify all the details of this, so the > -| algorithm is target-specific. > -| The routine is passed various bits of information about the > -| two NaNs and should return 0 to select NaN a and 1 for NaN b. > -| Note that signalling NaNs are always squashed to quiet NaNs > -| by the caller, by calling floatXX_maybe_silence_nan() before > -| returning them. > -| > -| aIsLargerSignificand is only valid if both a and b are NaNs > -| of some kind, and is true if a has the larger significand, > -| or if both a and b have the same significand but a is > -| positive but b is negative. It is only needed for the x87 > -| tie-break rule. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a two-input operation. > +IEEE754 doesn't specify all the details of this, so the > +algorithm is target-specific. > +The routine is passed various bits of information about the > +two NaNs and should return 0 to select NaN a and 1 for NaN b. > +Note that signalling NaNs are always squashed to quiet NaNs > +by the caller, by calling floatXX_maybe_silence_nan() before > +returning them. > + > +aIsLargerSignificand is only valid if both a and b are NaNs > +of some kind, and is true if a has the larger significand, > +or if both a and b have the same significand but a is > +positive but b is negative. It is only needed for the x87 > +tie-break rule. > > +------------------------------------------------------------------------------- > +*/ > > #if defined(TARGET_ARM) > static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > @@ -451,12 +484,14 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag > bIsQNaN, flag bIsSNaN, > } > #endif > > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a three-input operation. > -| For the moment we assume that no CPU needs the 'larger significand' > -| information. > -| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a three-input operation. > +For the moment we assume that no CPU needs the 'larger significand' > +information. > +Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag > bIsSNaN, > flag cIsQNaN, flag cIsSNaN, flag infzero > STATUS_PARAM) > @@ -554,12 +589,13 @@ static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, > flag bIsQNaN, flag bIsSNaN, > } > #endif > > > -/*---------------------------------------------------------------------------- > -| Takes two single-precision floating-point values `a' and `b', one of > which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' > is a > -| signaling NaN, the invalid exception is raised. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes two single-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' > is a > +signaling NaN, the invalid exception is raised. > > +------------------------------------------------------------------------------- > +*/ > static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -594,14 +630,16 @@ static float32 propagateFloat32NaN( float32 a, > float32 b STATUS_PARAM) > } > } > > > -/*---------------------------------------------------------------------------- > -| Takes three single-precision floating-point values `a', `b' and `c', > one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which > case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Takes three single-precision floating-point values `a', `b' and `c', one > of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > > +------------------------------------------------------------------------------- > +*/ > > static float32 propagateFloat32MulAddNaN(float32 a, float32 b, > float32 c, flag infzero > STATUS_PARAM) > @@ -656,10 +694,12 @@ int float64_is_signaling_nan(float64 a_) > return 0; > } > #else > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > > +------------------------------------------------------------------------------- > +*/ > > int float64_is_quiet_nan( float64 a_ ) > { > @@ -673,10 +713,12 @@ int float64_is_quiet_nan( float64 a_ ) > #endif > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a > signaling > -| NaN; otherwise returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > > +------------------------------------------------------------------------------- > +*/ > > int float64_is_signaling_nan( float64 a_ ) > { > @@ -691,10 +733,12 @@ int float64_is_signaling_nan( float64 a_ ) > } > #endif > > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the double-precision floating point value `a' is > a > -| signaling NaN; otherwise returns `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the double-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > > +------------------------------------------------------------------------------- > +*/ > > float64 float64_maybe_silence_nan( float64 a_ ) > { > @@ -714,12 +758,13 @@ float64 float64_maybe_silence_nan( float64 a_ ) > return a_; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > { > commonNaNT z; > @@ -731,10 +776,12 @@ static commonNaNT float64ToCommonNaN( float64 a > STATUS_PARAM) > return z; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the double- > -| precision floating-point format. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the double- > +precision floating-point format. > > +------------------------------------------------------------------------------- > +*/ > > static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > { > @@ -753,12 +800,13 @@ static float64 commonNaNToFloat64( commonNaNT a > STATUS_PARAM) > return float64_default_nan; > } > > > -/*---------------------------------------------------------------------------- > -| Takes two double-precision floating-point values `a' and `b', one of > which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' > is a > -| signaling NaN, the invalid exception is raised. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes two double-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' > is a > +signaling NaN, the invalid exception is raised. > > +------------------------------------------------------------------------------- > +*/ > static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -793,14 +841,16 @@ static float64 propagateFloat64NaN( float64 a, > float64 b STATUS_PARAM) > } > } > > > -/*---------------------------------------------------------------------------- > -| Takes three double-precision floating-point values `a', `b' and `c', > one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which > case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Takes three double-precision floating-point values `a', `b' and `c', one > of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > > +------------------------------------------------------------------------------- > +*/ > > static float64 propagateFloat64MulAddNaN(float64 a, float64 b, > float64 c, flag infzero > STATUS_PARAM) > @@ -855,11 +905,13 @@ int floatx80_is_signaling_nan(floatx80 a_) > return 0; > } > #else > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| quiet NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +quiet NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_quiet_nan( floatx80 a ) > { > @@ -877,11 +929,13 @@ int floatx80_is_quiet_nan( floatx80 a ) > #endif > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_signaling_nan( floatx80 a ) > { > @@ -900,10 +954,12 @@ int floatx80_is_signaling_nan( floatx80 a ) > } > #endif > > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the extended double-precision floating point > value > -| `a' is a signaling NaN; otherwise returns `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the extended double-precision floating point value > +`a' is a signaling NaN; otherwise returns `a'. > > +------------------------------------------------------------------------------- > +*/ > > floatx80 floatx80_maybe_silence_nan( floatx80 a ) > { > @@ -923,12 +979,13 @@ floatx80 floatx80_maybe_silence_nan( floatx80 a ) > return a; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, > the > -| invalid exception is raised. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the > +invalid exception is raised. > > +------------------------------------------------------------------------------- > +*/ > static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > { > commonNaNT z; > @@ -946,10 +1003,12 @@ static commonNaNT floatx80ToCommonNaN( floatx80 a > STATUS_PARAM) > return z; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the extended > -| double-precision floating-point format. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the extended > +double-precision floating-point format. > > +------------------------------------------------------------------------------- > +*/ > > static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > { > @@ -972,12 +1031,13 @@ static floatx80 commonNaNToFloatx80( commonNaNT a > STATUS_PARAM) > return z; > } > > > -/*---------------------------------------------------------------------------- > -| Takes two extended double-precision floating-point values `a' and `b', > one > -| of which is a NaN, and returns the appropriate NaN result. If either > `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes two extended double-precision floating-point values `a' and `b', one > +of which is a NaN, and returns the appropriate NaN result. If either `a' > or > +`b' is a signaling NaN, the invalid exception is raised. > > +------------------------------------------------------------------------------- > +*/ > static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -1023,10 +1083,12 @@ int float128_is_signaling_nan(float128 a_) > return 0; > } > #else > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > > +------------------------------------------------------------------------------- > +*/ > > int float128_is_quiet_nan( float128 a ) > { > @@ -1041,10 +1103,12 @@ int float128_is_quiet_nan( float128 a ) > #endif > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. > > +------------------------------------------------------------------------------- > +*/ > > int float128_is_signaling_nan( float128 a ) > { > @@ -1060,10 +1124,12 @@ int float128_is_signaling_nan( float128 a ) > } > #endif > > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the quadruple-precision floating point value `a' > is > -| a signaling NaN; otherwise returns `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the quadruple-precision floating point value `a' is > +a signaling NaN; otherwise returns `a'. > > +------------------------------------------------------------------------------- > +*/ > > float128 float128_maybe_silence_nan( float128 a ) > { > @@ -1083,12 +1149,13 @@ float128 float128_maybe_silence_nan( float128 a ) > return a; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > { > commonNaNT z; > @@ -1099,10 +1166,12 @@ static commonNaNT float128ToCommonNaN( float128 a > STATUS_PARAM) > return z; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the quadruple- > -| precision floating-point format. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the quadruple- > +precision floating-point format. > > +------------------------------------------------------------------------------- > +*/ > > static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > { > @@ -1119,12 +1188,13 @@ static float128 commonNaNToFloat128( commonNaNT a > STATUS_PARAM) > return z; > } > > > -/*---------------------------------------------------------------------------- > -| Takes two quadruple-precision floating-point values `a' and `b', one of > -| which is a NaN, and returns the appropriate NaN result. If either `a' > or > -| `b' is a signaling NaN, the invalid exception is raised. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes two quadruple-precision floating-point values `a' and `b', one of > +which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > > +------------------------------------------------------------------------------- > +*/ > static float128 propagateFloat128NaN( float128 a, float128 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > diff --git a/fpu/softfloat.c b/fpu/softfloat.c > index 7ba51b6..9145582 100644 > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > > -/*============================================================================ > +/* > > +=============================================================================== > > -This C source file is part of the SoftFloat IEC/IEEE Floating-point > Arithmetic > -Package, Release 2b. > +This C source file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 > Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. > The original version > of this code was written as part of a project to build a fixed-point > vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL > LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO > FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, > OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE > SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR > ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice > with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) > they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > > -=============================================================================*/ > > +=============================================================================== > +*/ > > /* softfloat (and in particular the code in softfloat-specialize.h) is > * target-dependent and needs the TARGET_* macros. > @@ -42,21 +41,25 @@ these four paragraphs for those parts of this code > that are retained. > > #include "fpu/softfloat.h" > > > -/*---------------------------------------------------------------------------- > -| Primitive arithmetic functions, including multi-word arithmetic, and > -| division and square root approximations. (Can be specialized to target > if > -| desired.) > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Primitive arithmetic functions, including multi-word arithmetic, and > +division and square root approximations. (Can be specialized to target if > +desired.) > > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-macros.h" > > > -/*---------------------------------------------------------------------------- > -| Functions and definitions to determine: (1) whether tininess for > underflow > -| is detected before or after rounding by default, (2) what (if anything) > -| happens when exceptions are raised, (3) how signaling NaNs are > distinguished > -| from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > -| are propagated from function inputs to output. These details are > target- > -| specific. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Functions and definitions to determine: (1) whether tininess for > underflow > +is detected before or after rounding by default, (2) what (if anything) > +happens when exceptions are raised, (3) how signaling NaNs are > distinguished > +from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > +are propagated from function inputs to output. These details are target- > +specific. > > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-specialize.h" > > void set_float_rounding_mode(int val STATUS_PARAM) > @@ -74,43 +77,51 @@ void set_floatx80_rounding_precision(int val > STATUS_PARAM) > STATUS(floatx80_rounding_precision) = val; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the half-precision floating-point value > `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the fraction bits of the half-precision floating-point value `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat16Frac(float16 a) > { > return float16_val(a) & 0x3ff; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the half-precision floating-point value > `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the exponent bits of the half-precision floating-point value `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat16Exp(float16 a) > { > return (float16_val(a) >> 10) & 0x1f; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE flag extractFloat16Sign(float16 a) > { > return float16_val(a)>>15; > } > > > -/*---------------------------------------------------------------------------- > -| Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > -| and 7, and returns the properly rounded 32-bit integer corresponding to > the > -| input. If `zSign' is 1, the input is negated before being converted to > an > -| integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point > input > -| is simply rounded to an integer, with the inexact exception raised if > the > -| input cannot be represented exactly as an integer. However, if the > fixed- > -| point input is too large, the invalid exception is raised and the > largest > -| positive or negative integer is returned. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > +and 7, and returns the properly rounded 32-bit integer corresponding to > the > +input. If `zSign' is 1, the input is negated before being converted to an > +integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point > input > +is simply rounded to an integer, with the inexact exception raised if the > +input cannot be represented exactly as an integer. However, if the fixed- > +point input is too large, the invalid exception is raised and the largest > +positive or negative integer is returned. > > +------------------------------------------------------------------------------- > +*/ > > static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > { > @@ -150,17 +161,19 @@ static int32 roundAndPackInt32( flag zSign, uint64_t > absZ STATUS_PARAM) > > } > > > -/*---------------------------------------------------------------------------- > -| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > -| `absZ1', with binary point between bits 63 and 64 (between the input > words), > -| and returns the properly rounded 64-bit integer corresponding to the > input. > -| If `zSign' is 1, the input is negated before being converted to an > integer. > -| Ordinarily, the fixed-point input is simply rounded to an integer, with > -| the inexact exception raised if the input cannot be represented exactly > as > -| an integer. However, if the fixed-point input is too large, the invalid > -| exception is raised and the largest positive or negative integer is > -| returned. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > +`absZ1', with binary point between bits 63 and 64 (between the input > words), > +and returns the properly rounded 64-bit integer corresponding to the > input. > +If `zSign' is 1, the input is negated before being converted to an > integer. > +Ordinarily, the fixed-point input is simply rounded to an integer, with > +the inexact exception raised if the input cannot be represented exactly as > +an integer. However, if the fixed-point input is too large, the invalid > +exception is raised and the largest positive or negative integer is > +returned. > > +------------------------------------------------------------------------------- > +*/ > > static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t > absZ1 STATUS_PARAM) > { > @@ -203,9 +216,11 @@ static int64 roundAndPackInt64( flag zSign, uint64_t > absZ0, uint64_t absZ1 STATU > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the single-precision floating-point value > `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the fraction bits of the single-precision floating-point value > `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat32Frac( float32 a ) > { > @@ -214,9 +229,11 @@ INLINE uint32_t extractFloat32Frac( float32 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the single-precision floating-point value > `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the exponent bits of the single-precision floating-point value > `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat32Exp(float32 a) > { > @@ -225,10 +242,11 @@ INLINE int_fast16_t extractFloat32Exp(float32 a) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat32Sign( float32 a ) > { > > @@ -236,10 +254,12 @@ INLINE flag extractFloat32Sign( float32 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the > value. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > > +------------------------------------------------------------------------------- > +*/ > static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -251,13 +271,14 @@ static float32 float32_squash_input_denormal(float32 > a STATUS_PARAM) > return a; > } > > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal single-precision floating-point value > represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Normalizes the subnormal single-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat32Subnormal(uint32_t aSig, int_fast16_t *zExpPtr, uint32_t > *zSigPtr) > { > @@ -269,16 +290,18 @@ static void > > } > > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| single-precision floating-point value, returning the result. After > being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of > `zSig' > -| will be added into the exponent. Since a properly normalized > significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 > less > -| than the desired result exponent whenever `zSig' is a complete, > normalized > -| significand. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +single-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of > `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > > +------------------------------------------------------------------------------- > +*/ > > INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > { > @@ -288,27 +311,29 @@ INLINE float32 packFloat32(flag zSign, int_fast16_t > zExp, uint32_t zSig) > > } > > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > -| and significand `zSig', and returns the proper single-precision > floating- > -| point value corresponding to the abstract input. Ordinarily, the > abstract > -| value is simply rounded and packed into the single-precision format, > with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is > rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised > if > -| the abstract input cannot be represented exactly as a subnormal single- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 30 > -| and 29, which is 7 bits to the left of the usual location. This shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal > number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point > exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the single-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded > to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal single- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 30 > +and 29, which is 7 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point > exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, > uint32_t zSig STATUS_PARAM) > { > @@ -366,15 +391,16 @@ static float32 roundAndPackFloat32(flag zSign, > int_fast16_t zExp, uint32_t zSig > > } > > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > -| and significand `zSig', and returns the proper single-precision > floating- > -| point value corresponding to the abstract input. This routine is just > like > -| `roundAndPackFloat32' except that `zSig' does not have to be normalized. > -| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the > ``true'' > -| floating-point exponent. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. This routine is just > like > +`roundAndPackFloat32' except that `zSig' does not have to be normalized. > +Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > > +------------------------------------------------------------------------------- > +*/ > static float32 > normalizeRoundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t > zSig STATUS_PARAM) > { > @@ -385,9 +411,11 @@ static float32 > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the double-precision floating-point value > `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the fraction bits of the double-precision floating-point value > `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat64Frac( float64 a ) > { > @@ -396,9 +424,11 @@ INLINE uint64_t extractFloat64Frac( float64 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the double-precision floating-point value > `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the exponent bits of the double-precision floating-point value > `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat64Exp(float64 a) > { > @@ -407,10 +437,11 @@ INLINE int_fast16_t extractFloat64Exp(float64 a) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the double-precision floating-point value `a'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the sign bit of the double-precision floating-point value `a'. > > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat64Sign( float64 a ) > { > > @@ -418,10 +449,12 @@ INLINE flag extractFloat64Sign( float64 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the > value. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > > +------------------------------------------------------------------------------- > +*/ > static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -433,13 +466,14 @@ static float64 float64_squash_input_denormal(float64 > a STATUS_PARAM) > return a; > } > > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal double-precision floating-point value > represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Normalizes the subnormal double-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat64Subnormal(uint64_t aSig, int_fast16_t *zExpPtr, uint64_t > *zSigPtr) > { > @@ -451,16 +485,18 @@ static void > > } > > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| double-precision floating-point value, returning the result. After > being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of > `zSig' > -| will be added into the exponent. Since a properly normalized > significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 > less > -| than the desired result exponent whenever `zSig' is a complete, > normalized > -| significand. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +double-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of > `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > > +------------------------------------------------------------------------------- > +*/ > > INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > { > @@ -470,27 +506,29 @@ INLINE float64 packFloat64(flag zSign, int_fast16_t > zExp, uint64_t zSig) > > } > > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > -| and significand `zSig', and returns the proper double-precision > floating- > -| point value corresponding to the abstract input. Ordinarily, the > abstract > -| value is simply rounded and packed into the double-precision format, > with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is > rounded > -| to a subnormal number, and the underflow and inexact exceptions are > raised > -| if the abstract input cannot be represented exactly as a subnormal > double- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 62 > -| and 61, which is 10 bits to the left of the usual location. This > shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal > number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point > exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the double-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded > +to a subnormal number, and the underflow and inexact exceptions are raised > +if the abstract input cannot be represented exactly as a subnormal double- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 62 > +and 61, which is 10 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point > exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, > uint64_t zSig STATUS_PARAM) > { > @@ -548,15 +586,16 @@ static float64 roundAndPackFloat64(flag zSign, > int_fast16_t zExp, uint64_t zSig > > } > > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > -| and significand `zSig', and returns the proper double-precision > floating- > -| point value corresponding to the abstract input. This routine is just > like > -| `roundAndPackFloat64' except that `zSig' does not have to be normalized. > -| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the > ``true'' > -| floating-point exponent. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. This routine is just > like > +`roundAndPackFloat64' except that `zSig' does not have to be normalized. > +Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > > +------------------------------------------------------------------------------- > +*/ > static float64 > normalizeRoundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t > zSig STATUS_PARAM) > { > @@ -567,10 +606,12 @@ static float64 > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the extended double-precision > floating-point > -| value `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the fraction bits of the extended double-precision floating-point > +value `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloatx80Frac( floatx80 a ) > { > @@ -579,11 +620,12 @@ INLINE uint64_t extractFloatx80Frac( floatx80 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the extended double-precision > floating-point > -| value `a'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the exponent bits of the extended double-precision floating-point > +value `a'. > > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloatx80Exp( floatx80 a ) > { > > @@ -591,11 +633,12 @@ INLINE int32 extractFloatx80Exp( floatx80 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the extended double-precision floating-point > value > -| `a'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the sign bit of the extended double-precision floating-point value > +`a'. > > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloatx80Sign( floatx80 a ) > { > > @@ -603,13 +646,14 @@ INLINE flag extractFloatx80Sign( floatx80 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal extended double-precision floating-point value > -| represented by the denormalized significand `aSig'. The normalized > exponent > -| and significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Normalizes the subnormal extended double-precision floating-point value > +represented by the denormalized significand `aSig'. The normalized > exponent > +and significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloatx80Subnormal( uint64_t aSig, int32 *zExpPtr, uint64_t > *zSigPtr ) > { > @@ -621,10 +665,12 @@ static void > > } > > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > -| extended double-precision floating-point value, returning the result. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > +extended double-precision floating-point value, returning the result. > > +------------------------------------------------------------------------------- > +*/ > > INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > { > @@ -636,30 +682,31 @@ INLINE floatx80 packFloatx80( flag zSign, int32 > zExp, uint64_t zSig ) > > } > > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > -| and extended significand formed by the concatenation of `zSig0' and > `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| rounded and packed into the extended double-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is > rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised > if > -| the abstract input cannot be represented exactly as a subnormal extended > -| double-precision floating-point number. > -| If `roundingPrecision' is 32 or 64, the result is rounded to the > same > -| number of bits as single or double precision, respectively. Otherwise, > the > -| result is rounded to the full precision of the extended double-precision > -| format. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the > result > -| returned is a subnormal number, and it must not require rounding. The > -| handling of underflow and overflow follows the IEC/IEEE Standard for > Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > +and extended significand formed by the concatenation of `zSig0' and > `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +rounded and packed into the extended double-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded > to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal extended > +double-precision floating-point number. > + If `roundingPrecision' is 32 or 64, the result is rounded to the same > +number of bits as single or double precision, respectively. Otherwise, > the > +result is rounded to the full precision of the extended double-precision > +format. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. The > +handling of underflow and overflow follows the IEC/IEEE Standard for > Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > static floatx80 > roundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, > uint64_t zSig1 > @@ -823,15 +870,16 @@ static floatx80 > > } > > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > -| `zExp', and significand formed by the concatenation of `zSig0' and > `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. This routine is just like > -| `roundAndPackFloatx80' except that the input significand does not have > to be > -| normalized. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > +`zExp', and significand formed by the concatenation of `zSig0' and > `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. This routine is just like > +`roundAndPackFloatx80' except that the input significand does not have to > be > +normalized. > > +------------------------------------------------------------------------------- > +*/ > static floatx80 > normalizeRoundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, > uint64_t zSig1 > @@ -852,10 +900,12 @@ static floatx80 > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the least-significant 64 fraction bits of the > quadruple-precision > -| floating-point value `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the least-significant 64 fraction bits of the quadruple-precision > +floating-point value `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac1( float128 a ) > { > @@ -864,10 +914,12 @@ INLINE uint64_t extractFloat128Frac1( float128 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the most-significant 48 fraction bits of the quadruple-precision > -| floating-point value `a'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the most-significant 48 fraction bits of the quadruple-precision > +floating-point value `a'. > > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac0( float128 a ) > { > @@ -876,11 +928,12 @@ INLINE uint64_t extractFloat128Frac0( float128 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the quadruple-precision floating-point > value > -| `a'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the exponent bits of the quadruple-precision floating-point value > +`a'. > > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloat128Exp( float128 a ) > { > > @@ -888,10 +941,11 @@ INLINE int32 extractFloat128Exp( float128 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the quadruple-precision floating-point value > `a'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the sign bit of the quadruple-precision floating-point value `a'. > > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat128Sign( float128 a ) > { > > @@ -899,16 +953,17 @@ INLINE flag extractFloat128Sign( float128 a ) > > } > > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal quadruple-precision floating-point value > -| represented by the denormalized significand formed by the concatenation > of > -| `aSig0' and `aSig1'. The normalized exponent is stored at the location > -| pointed to by `zExpPtr'. The most significant 49 bits of the normalized > -| significand are stored at the location pointed to by `zSig0Ptr', and the > -| least significant 64 bits of the normalized significand are stored at > the > -| location pointed to by `zSig1Ptr'. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Normalizes the subnormal quadruple-precision floating-point value > +represented by the denormalized significand formed by the concatenation of > +`aSig0' and `aSig1'. The normalized exponent is stored at the location > +pointed to by `zExpPtr'. The most significant 49 bits of the normalized > +significand are stored at the location pointed to by `zSig0Ptr', and the > +least significant 64 bits of the normalized significand are stored at the > +location pointed to by `zSig1Ptr'. > > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat128Subnormal( > uint64_t aSig0, > @@ -940,19 +995,20 @@ static void > > } > > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', the exponent `zExp', and the significand formed > -| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > -| floating-point value, returning the result. After being shifted into > the > -| proper positions, the three fields `zSign', `zExp', and `zSig0' are > simply > -| added together to form the most significant 32 bits of the result. This > -| means that any integer portion of `zSig0' will be added into the > exponent. > -| Since a properly normalized significand will have an integer portion > equal > -| to 1, the `zExp' input should be 1 less than the desired result exponent > -| whenever `zSig0' and `zSig1' concatenated form a complete, normalized > -| significand. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Packs the sign `zSign', the exponent `zExp', and the significand formed > +by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > +floating-point value, returning the result. After being shifted into the > +proper positions, the three fields `zSign', `zExp', and `zSig0' are simply > +added together to form the most significant 32 bits of the result. This > +means that any integer portion of `zSig0' will be added into the exponent. > +Since a properly normalized significand will have an integer portion equal > +to 1, the `zExp' input should be 1 less than the desired result exponent > +whenever `zSig0' and `zSig1' concatenated form a complete, normalized > +significand. > > +------------------------------------------------------------------------------- > +*/ > INLINE float128 > packFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 ) > { > @@ -964,27 +1020,28 @@ INLINE float128 > > } > > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > -| and extended significand formed by the concatenation of `zSig0', > `zSig1', > -| and `zSig2', and returns the proper quadruple-precision floating-point > value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| simply rounded and packed into the quadruple-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is > rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised > if > -| the abstract input cannot be represented exactly as a subnormal > quadruple- > -| precision floating-point number. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the > result > -| returned is a subnormal number, and it must not require rounding. In > the > -| usual case that the input significand is normalized, `zExp' must be 1 > less > -| than the ``true'' floating-point exponent. The handling of underflow > and > -| overflow follows the IEC/IEEE Standard for Binary Floating-Point > Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > +and extended significand formed by the concatenation of `zSig0', `zSig1', > +and `zSig2', and returns the proper quadruple-precision floating-point > value > +corresponding to the abstract input. Ordinarily, the abstract value is > +simply rounded and packed into the quadruple-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded > to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal quadruple- > +precision floating-point number. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. In the > +usual case that the input significand is normalized, `zExp' must be 1 less > +than the ``true'' floating-point exponent. The handling of underflow and > +overflow follows the IEC/IEEE Standard for Binary Floating-Point > Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > static float128 > roundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1, uint64_t > zSig2 STATUS_PARAM) > @@ -1079,16 +1136,17 @@ static float128 > > } > > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > -| and significand formed by the concatenation of `zSig0' and `zSig1', and > -| returns the proper quadruple-precision floating-point value > corresponding > -| to the abstract input. This routine is just like `roundAndPackFloat128' > -| except that the input significand has fewer bits and does not have to be > -| normalized. In all cases, `zExp' must be 1 less than the ``true'' > floating- > -| point exponent. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > `zExp', > +and significand formed by the concatenation of `zSig0' and `zSig1', and > +returns the proper quadruple-precision floating-point value corresponding > +to the abstract input. This routine is just like `roundAndPackFloat128' > +except that the input significand has fewer bits and does not have to be > +normalized. In all cases, `zExp' must be 1 less than the ``true'' > floating- > +point exponent. > > +------------------------------------------------------------------------------- > +*/ > static float128 > normalizeRoundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 STATUS_PARAM) > @@ -1115,13 +1173,14 @@ static float128 > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > -float32 int32_to_float32( int32 a STATUS_PARAM ) > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > +float32 int32_to_float32( int32 a STATUS_PARAM) > { > flag zSign; > > @@ -1132,13 +1191,14 @@ float32 int32_to_float32( int32 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > -float64 int32_to_float64( int32 a STATUS_PARAM ) > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > +float64 int32_to_float64( int32 a STATUS_PARAM) > { > flag zSign; > uint32 absA; > @@ -1154,13 +1214,14 @@ float64 int32_to_float64( int32 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1177,12 +1238,13 @@ floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer > `a' to > -| the quadruple-precision floating-point format. The conversion is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > to > +the quadruple-precision floating-point format. The conversion is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 int32_to_float128( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1199,12 +1261,13 @@ float128 int32_to_float128( int32 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 int64_to_float32( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1252,12 +1315,13 @@ float32 uint64_to_float32( uint64 a STATUS_PARAM ) > } > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 int64_to_float64( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1285,13 +1349,14 @@ float64 uint64_to_float64(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat64(0, exp, a STATUS_VAR); > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1306,12 +1371,13 @@ floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer > `a' to > -| the quadruple-precision floating-point format. The conversion is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > to > +the quadruple-precision floating-point format. The conversion is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 int64_to_float128( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1347,16 +1413,17 @@ float128 uint64_to_float128(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat128(0, 0x406E, a, 0 STATUS_VAR); > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point > value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, > the > -| largest integer with the same sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1378,16 +1445,17 @@ int32 float32_to_int32( float32 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point > value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, > if > -| the conversion overflows, the largest integer with the same sign as `a' > is > -| returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1421,15 +1489,17 @@ int32 float32_to_int32_round_to_zero( float32 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point > value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, > if > -| the conversion overflows, the largest integer with the same sign as `a' > is > -| returned. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > { > @@ -1470,16 +1540,17 @@ int_fast16_t > float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point > value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, > the > -| largest integer with the same sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1507,16 +1578,17 @@ int64 float32_to_int64( float32 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point > value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if > the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1554,13 +1626,14 @@ int64 float32_to_int64_round_to_zero( float32 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point > value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float32_to_float64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1584,13 +1657,14 @@ float64 float32_to_float64( float32 a STATUS_PARAM > ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point > value > -| `a' to the extended double-precision floating-point format. The > conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the extended double-precision floating-point format. The > conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1614,13 +1688,14 @@ floatx80 float32_to_floatx80( float32 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point > value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 float32_to_float128( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1644,14 +1719,15 @@ float128 float32_to_float128( float32 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Rounds the single-precision floating-point value `a' to an integer, and > -| returns the result as a single-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > -float32 float32_round_to_int( float32 a STATUS_PARAM) > +/* > > +------------------------------------------------------------------------------- > +Rounds the single-precision floating-point value `a' to an integer, and > +returns the result as a single-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > +float32 float32_round_to_int( float32 a STATUS_PARAM ) > { > flag aSign; > int_fast16_t aExp; > @@ -1704,15 +1780,16 @@ float32 float32_round_to_int( float32 a > STATUS_PARAM) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the single-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > -static float32 addFloat32Sigs( float32 a, float32 b, flag zSign > STATUS_PARAM) > +/* > > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the single-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > +static float32 addFloat32Sigs( float32 a, float32 b, flag zSign > STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1783,15 +1860,16 @@ static float32 addFloat32Sigs( float32 a, float32 > b, flag zSign STATUS_PARAM) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the single- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > -static float32 subFloat32Sigs( float32 a, float32 b, flag zSign > STATUS_PARAM) > +/* > > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the single- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > +static float32 subFloat32Sigs( float32 a, float32 b, flag zSign > STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1858,12 +1936,13 @@ static float32 subFloat32Sigs( float32 a, float32 > b, flag zSign STATUS_PARAM) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the single-precision floating-point values > `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard > for > -| Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of adding the single-precision floating-point values > `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard > for > +Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 float32_add( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1881,12 +1960,13 @@ float32 float32_add( float32 a, float32 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the single-precision floating-point > values > -| `a' and `b'. The operation is performed according to the IEC/IEEE > Standard > -| for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of subtracting the single-precision floating-point > values > +`a' and `b'. The operation is performed according to the IEC/IEEE > Standard > +for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1904,12 +1984,13 @@ float32 float32_sub( float32 a, float32 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point > values > -| `a' and `b'. The operation is performed according to the IEC/IEEE > Standard > -| for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point > values > +`a' and `b'. The operation is performed according to the IEC/IEEE > Standard > +for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -1967,12 +2048,13 @@ float32 float32_mul( float32 a, float32 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the single-precision floating-point > value `a' > -| by the corresponding value `b'. The operation is performed according > to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of dividing the single-precision floating-point value > `a' > +by the corresponding value `b'. The operation is performed according to > the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 float32_div( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -2031,12 +2113,13 @@ float32 float32_div( float32 a, float32 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the single-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the remainder of the single-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2132,16 +2215,18 @@ float32 float32_rem( float32 a, float32 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point > values > -| `a' and `b' then adding 'c', with no intermediate rounding step after > the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that > negating > -| externally will flip the sign bit on NaNs.) > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point > values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > > +------------------------------------------------------------------------------- > +*/ > > float32 float32_muladd(float32 a, float32 b, float32 c, int flags > STATUS_PARAM) > { > @@ -2339,12 +2424,13 @@ float32 float32_muladd(float32 a, float32 b, > float32 c, int flags STATUS_PARAM) > } > > > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the single-precision floating-point value > `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the square root of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 float32_sqrt( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -2394,23 +2480,25 @@ float32 float32_sqrt( float32 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the binary exponential of the single-precision floating-point > value > -| `a'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -| > -| Uses the following identities: > -| > -| 1. > ------------------------------------------------------------------------- > -| x x*ln(2) > -| 2 = e > -| > -| 2. > ------------------------------------------------------------------------- > -| 2 3 4 5 n > -| x x x x x x x > -| e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > -| 1! 2! 3! 4! 5! n! > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the binary exponential of the single-precision floating-point > value > +`a'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > + > +Uses the following identities: > + > +1. > ------------------------------------------------------------------------- > + x x*ln(2) > + 2 = e > + > +2. > ------------------------------------------------------------------------- > + 2 3 4 5 n > + x x x x x x x > + e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > + 1! 2! 3! 4! 5! n! > > +------------------------------------------------------------------------------- > +*/ > > static const float64 float32_exp2_coefficients[15] = > { > @@ -2474,11 +2562,13 @@ float32 float32_exp2( float32 a STATUS_PARAM ) > return float64_to_float32(r, status); > } > > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the single-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the binary log of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 float32_log2( float32 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2522,12 +2612,14 @@ float32 float32_log2( float32 a STATUS_PARAM ) > return normalizeRoundAndPackFloat32( zSign, 0x85, zSig STATUS_VAR ); > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float32_eq( float32 a, float32 b STATUS_PARAM ) > { > @@ -2546,12 +2638,14 @@ int float32_eq( float32 a, float32 b STATUS_PARAM ) > return ( av == bv ) || ( (uint32_t) ( ( av | bv )<<1 ) == 0 ); > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float32_le( float32 a, float32 b STATUS_PARAM ) > { > @@ -2575,12 +2669,14 @@ int float32_le( float32 a, float32 b STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed > according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float32_lt( float32 a, float32 b STATUS_PARAM ) > { > @@ -2604,12 +2700,14 @@ int float32_lt( float32 a, float32 b STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' > cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered( float32 a, float32 b STATUS_PARAM ) > { > @@ -2625,12 +2723,14 @@ int float32_unordered( float32 a, float32 b > STATUS_PARAM ) > return 0; > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > an > -| exception. The comparison is performed according to the IEC/IEEE > Standard > -| for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2649,12 +2749,14 @@ int float32_eq_quiet( float32 a, float32 b > STATUS_PARAM ) > ( (uint32_t) ( ( float32_val(a) | float32_val(b) )<<1 ) == 0 > ); > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do > not > -| cause an exception. Otherwise, the comparison is performed according > to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to > the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2680,12 +2782,14 @@ int float32_le_quiet( float32 a, float32 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > an > -| exception. Otherwise, the comparison is performed according to the > IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the > IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2711,12 +2815,14 @@ int float32_lt_quiet( float32 a, float32 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' > cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2734,16 +2840,17 @@ int float32_unordered_quiet( float32 a, float32 b > STATUS_PARAM ) > return 0; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point > value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, > the > -| largest integer with the same sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2762,16 +2869,17 @@ int32 float64_to_int32( float64 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point > value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, > if > -| the conversion overflows, the largest integer with the same sign as `a' > is > -| returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2809,15 +2917,17 @@ int32 float64_to_int32_round_to_zero( float64 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point > value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, > if > -| the conversion overflows, the largest integer with the same sign as `a' > is > -| returned. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > { > @@ -2860,16 +2970,17 @@ int_fast16_t > float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > return z; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point > value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, > the > -| largest integer with the same sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2903,16 +3014,17 @@ int64 float64_to_int64( float64 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point > value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, > if > -| the conversion overflows, the largest integer with the same sign as `a' > is > -| returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2956,13 +3068,14 @@ int64 float64_to_int64_round_to_zero( float64 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point > value > -| `a' to the single-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the single-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 float64_to_float32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2989,16 +3102,18 @@ float32 float64_to_float32( float64 a STATUS_PARAM > ) > } > > > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| half-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of > `zSig' > -| will be added into the exponent. Since a properly normalized > significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 > less > -| than the desired result exponent whenever `zSig' is a complete, > normalized > -| significand. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +half-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of > `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > > +------------------------------------------------------------------------------- > +*/ > static float16 packFloat16(flag zSign, int_fast16_t zExp, uint16_t zSig) > { > return make_float16( > @@ -3132,13 +3247,14 @@ float16 float32_to_float16(float32 a, flag ieee > STATUS_PARAM) > return packFloat16(aSign, aExp + 14, aSig >> 13); > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point > value > -| `a' to the extended double-precision floating-point format. The > conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the extended double-precision floating-point format. The > conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3163,13 +3279,14 @@ floatx80 float64_to_floatx80( float64 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point > value > -| `a' to the quadruple-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the quadruple-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 float64_to_float128( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3194,13 +3311,14 @@ float128 float64_to_float128( float64 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Rounds the double-precision floating-point value `a' to an integer, and > -| returns the result as a double-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Rounds the double-precision floating-point value `a' to an integer, and > +returns the result as a double-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3267,14 +3385,15 @@ float64 float64_trunc_to_int( float64 a > STATUS_PARAM) > return res; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the double-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the double-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > static float64 addFloat64Sigs( float64 a, float64 b, flag zSign > STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3346,14 +3465,15 @@ static float64 addFloat64Sigs( float64 a, float64 > b, flag zSign STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > static float64 subFloat64Sigs( float64 a, float64 b, flag zSign > STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3421,12 +3541,13 @@ static float64 subFloat64Sigs( float64 a, float64 > b, flag zSign STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the double-precision floating-point values > `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard > for > -| Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of adding the double-precision floating-point values > `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard > for > +Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float64_add( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3444,12 +3565,13 @@ float64 float64_add( float64 a, float64 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the double-precision floating-point > values > -| `a' and `b'. The operation is performed according to the IEC/IEEE > Standard > -| for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of subtracting the double-precision floating-point > values > +`a' and `b'. The operation is performed according to the IEC/IEEE > Standard > +for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3467,12 +3589,13 @@ float64 float64_sub( float64 a, float64 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point > values > -| `a' and `b'. The operation is performed according to the IEC/IEEE > Standard > -| for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point > values > +`a' and `b'. The operation is performed according to the IEC/IEEE > Standard > +for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3528,12 +3651,13 @@ float64 float64_mul( float64 a, float64 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the double-precision floating-point > value `a' > -| by the corresponding value `b'. The operation is performed according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of dividing the double-precision floating-point value > `a' > +by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float64_div( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3600,12 +3724,13 @@ float64 float64_div( float64 a, float64 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the double-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the remainder of the double-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -3686,16 +3811,18 @@ float64 float64_rem( float64 a, float64 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point > values > -| `a' and `b' then adding 'c', with no intermediate rounding step after > the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that > negating > -| externally will flip the sign bit on NaNs.) > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point > values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > > +------------------------------------------------------------------------------- > +*/ > > float64 float64_muladd(float64 a, float64 b, float64 c, int flags > STATUS_PARAM) > { > @@ -3912,12 +4039,13 @@ float64 float64_muladd(float64 a, float64 b, > float64 c, int flags STATUS_PARAM) > } > } > > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the double-precision floating-point value > `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the square root of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float64_sqrt( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3964,11 +4092,13 @@ float64 float64_sqrt( float64 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the double-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns the binary log of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float64_log2( float64 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4011,12 +4141,14 @@ float64 float64_log2( float64 a STATUS_PARAM ) > return normalizeRoundAndPackFloat64( zSign, 0x408, zSig STATUS_VAR ); > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to > the > -| corresponding value `b', and 0 otherwise. The invalid exception is > raised > -| if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. The invalid exception is raised > +if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float64_eq( float64 a, float64 b STATUS_PARAM ) > { > @@ -4036,12 +4168,14 @@ int float64_eq( float64 a, float64 b STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > or > -| equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float64_le( float64 a, float64 b STATUS_PARAM ) > { > @@ -4065,12 +4199,14 @@ int float64_le( float64 a, float64 b STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed > according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float64_lt( float64 a, float64 b STATUS_PARAM ) > { > @@ -4094,12 +4230,14 @@ int float64_lt( float64 a, float64 b STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' > cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered( float64 a, float64 b STATUS_PARAM ) > { > @@ -4115,12 +4253,14 @@ int float64_unordered( float64 a, float64 b > STATUS_PARAM ) > return 0; > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to > the > -| corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception.The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception.The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4142,12 +4282,14 @@ int float64_eq_quiet( float64 a, float64 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do > not > -| cause an exception. Otherwise, the comparison is performed according > to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to > the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4173,12 +4315,14 @@ int float64_le_quiet( float64 a, float64 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > an > -| exception. Otherwise, the comparison is performed according to the > IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the > IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4204,12 +4348,14 @@ int float64_lt_quiet( float64 a, float64 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' > cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4227,16 +4373,17 @@ int float64_unordered_quiet( float64 a, float64 b > STATUS_PARAM ) > return 0; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the > conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, the > -| largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, the > +largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4254,16 +4401,17 @@ int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the > same > -| sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4299,16 +4447,17 @@ int32 floatx80_to_int32_round_to_zero( floatx80 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the > conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, > -| the largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, > +the largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4339,16 +4488,17 @@ int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the > same > -| sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4383,13 +4533,14 @@ int64 floatx80_to_int64_round_to_zero( floatx80 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the single-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the single-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4411,13 +4562,14 @@ float32 floatx80_to_float32( floatx80 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4439,13 +4591,14 @@ float64 floatx80_to_float64( floatx80 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the quadruple-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the quadruple-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4463,13 +4616,14 @@ float128 floatx80_to_float128( floatx80 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Rounds the extended double-precision floating-point value `a' to an > integer, > -| and returns the result as an extended quadruple-precision floating-point > -| value. The operation is performed according to the IEC/IEEE Standard > for > -| Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Rounds the extended double-precision floating-point value `a' to an > integer, > +and returns the result as an extended quadruple-precision floating-point > +value. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4536,14 +4690,15 @@ floatx80 floatx80_round_to_int( floatx80 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the extended double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the sum > is > -| negated before being returned. `zSign' is ignored if the result is a > NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the extended double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > +negated before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign > STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -4602,14 +4757,15 @@ static floatx80 addFloatx80Sigs( floatx80 a, > floatx80 b, flag zSign STATUS_PARAM > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the extended > -| double-precision floating-point values `a' and `b'. If `zSign' is 1, > the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the extended > +double-precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign > STATUS_PARAM ) > { > int32 aExp, bExp, zExp; > @@ -4670,12 +4826,13 @@ static floatx80 subFloatx80Sigs( floatx80 a, > floatx80 b, flag zSign STATUS_PARAM > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the extended double-precision > floating-point > -| values `a' and `b'. The operation is performed according to the > IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of adding the extended double-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4691,12 +4848,13 @@ floatx80 floatx80_add( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the extended double-precision > floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of subtracting the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4712,12 +4870,13 @@ floatx80 floatx80_sub( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the extended double-precision > floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of multiplying the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4771,12 +4930,13 @@ floatx80 floatx80_mul( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the extended double-precision > floating-point > -| value `a' by the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of dividing the extended double-precision > floating-point > +value `a' by the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4851,12 +5011,13 @@ floatx80 floatx80_div( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the extended double-precision floating-point > value > -| `a' with respect to the corresponding value `b'. The operation is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the remainder of the extended double-precision floating-point > value > +`a' with respect to the corresponding value `b'. The operation is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4947,12 +5108,13 @@ floatx80 floatx80_rem( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the extended double-precision floating-point > -| value `a'. The operation is performed according to the IEC/IEEE > Standard > -| for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the square root of the extended double-precision floating-point > +value `a'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -5017,12 +5179,14 @@ floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > equal > -| to the corresponding value `b', and 0 otherwise. The invalid exception > is > -| raised if either operand is a NaN. Otherwise, the comparison is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > equal > +to the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5044,13 +5208,15 @@ int floatx80_eq( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than or equal to the corresponding value `b', and 0 otherwise. The > -| invalid exception is raised if either operand is a NaN. The comparison > is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than or equal to the corresponding value `b', and 0 otherwise. The > +invalid exception is raised if either operand is a NaN. The comparison is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5078,12 +5244,14 @@ int floatx80_le( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5111,12 +5279,14 @@ int floatx80_lt( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' > and `b' > -| cannot be compared, and 0 otherwise. The invalid exception is raised if > -| either operand is a NaN. The comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and > `b' > +cannot be compared, and 0 otherwise. The invalid exception is raised if > +either operand is a NaN. The comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5130,12 +5300,14 @@ int floatx80_unordered( floatx80 a, floatx80 b > STATUS_PARAM ) > return 0; > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do > not > -| cause an exception. The comparison is performed according to the > IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5160,12 +5332,14 @@ int floatx80_eq_quiet( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > less > -| than or equal to the corresponding value `b', and 0 otherwise. Quiet > NaNs > -| do not cause an exception. Otherwise, the comparison is performed > according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > less > +than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > +do not cause an exception. Otherwise, the comparison is performed > according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5196,12 +5370,14 @@ int floatx80_le_quiet( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > less > -| than the corresponding value `b', and 0 otherwise. Quiet NaNs do not > cause > -| an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > less > +than the corresponding value `b', and 0 otherwise. Quiet NaNs do not > cause > +an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5232,12 +5408,14 @@ int floatx80_lt_quiet( floatx80 a, floatx80 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' > and `b' > -| cannot be compared, and 0 otherwise. Quiet NaNs do not cause an > exception. > -| The comparison is performed according to the IEC/IEEE Standard for > Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and > `b' > +cannot be compared, and 0 otherwise. Quiet NaNs do not cause an > exception. > +The comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5254,16 +5432,17 @@ int floatx80_unordered_quiet( floatx80 a, floatx80 > b STATUS_PARAM ) > return 0; > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, > the > -| largest integer with the same sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5283,16 +5462,17 @@ int32 float128_to_int32( float128 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if > the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5331,16 +5511,17 @@ int32 float128_to_int32_round_to_zero( float128 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, > the > -| largest integer with the same sign as `a' is returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5374,16 +5555,17 @@ int64 float128_to_int64( float128 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, > if > -| the conversion overflows, the largest integer with the same sign as `a' > is > -| returned. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5435,13 +5617,14 @@ int64 float128_to_int64_round_to_zero( float128 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the single-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the single-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float32 float128_to_float32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5470,13 +5653,14 @@ float32 float128_to_float32( float128 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary > Floating-Point > -| Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float64 float128_to_float64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5503,13 +5687,14 @@ float64 float128_to_float64( float128 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the extended double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the extended double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5538,13 +5723,14 @@ floatx80 float128_to_floatx80( float128 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Rounds the quadruple-precision floating-point value `a' to an integer, > and > -| returns the result as a quadruple-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Rounds the quadruple-precision floating-point value `a' to an integer, and > +returns the result as a quadruple-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5641,14 +5827,15 @@ float128 float128_round_to_int( float128 a > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the > quadruple-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the > quadruple-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > static float128 addFloat128Sigs( float128 a, float128 b, flag zSign > STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5727,14 +5914,15 @@ static float128 addFloat128Sigs( float128 a, > float128 b, flag zSign STATUS_PARAM > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the quadruple- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the quadruple- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > static float128 subFloat128Sigs( float128 a, float128 b, flag zSign > STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5811,12 +5999,13 @@ static float128 subFloat128Sigs( float128 a, > float128 b, flag zSign STATUS_PARAM > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the quadruple-precision floating-point > values > -| `a' and `b'. The operation is performed according to the IEC/IEEE > Standard > -| for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of adding the quadruple-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE > Standard > +for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 float128_add( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5832,12 +6021,13 @@ float128 float128_add( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the > IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of subtracting the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5853,12 +6043,13 @@ float128 float128_sub( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the > IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of multiplying the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -5917,12 +6108,13 @@ float128 float128_mul( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the quadruple-precision floating-point > value > -| `a' by the corresponding value `b'. The operation is performed > according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the result of dividing the quadruple-precision floating-point > value > +`a' by the corresponding value `b'. The operation is performed according > to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 float128_div( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -6001,12 +6193,13 @@ float128 float128_div( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the quadruple-precision floating-point value > `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the remainder of the quadruple-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -6110,12 +6303,13 @@ float128 float128_rem( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the quadruple-precision floating-point value > `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > - > +/* > > +------------------------------------------------------------------------------- > +Returns the square root of the quadruple-precision floating-point value > `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > float128 float128_sqrt( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -6179,12 +6373,14 @@ float128 float128_sqrt( float128 a STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal > to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float128_eq( float128 a, float128 b STATUS_PARAM ) > { > @@ -6206,12 +6402,14 @@ int float128_eq( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less > than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is > performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is > performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float128_le( float128 a, float128 b STATUS_PARAM ) > { > @@ -6239,12 +6437,14 @@ int float128_le( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less > than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed > according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float128_lt( float128 a, float128 b STATUS_PARAM ) > { > @@ -6272,12 +6472,14 @@ int float128_lt( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' > cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' > cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered( float128 a, float128 b STATUS_PARAM ) > { > @@ -6292,12 +6494,14 @@ int float128_unordered( float128 a, float128 b > STATUS_PARAM ) > return 0; > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal > to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > an > -| exception. The comparison is performed according to the IEC/IEEE > Standard > -| for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6322,12 +6526,14 @@ int float128_eq_quiet( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less > than > -| or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > do not > -| cause an exception. Otherwise, the comparison is performed according > to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do > not > +cause an exception. Otherwise, the comparison is performed according to > the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6358,12 +6564,14 @@ int float128_le_quiet( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less > than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > an > -| exception. Otherwise, the comparison is performed according to the > IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the > IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6394,12 +6602,14 @@ int float128_lt_quiet( float128 a, float128 b > STATUS_PARAM ) > > } > > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' > cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' > cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered_quiet( float128 a, float128 b STATUS_PARAM ) > { > diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h > index f3927e2..b646621 100644 > --- a/include/fpu/softfloat.h > +++ b/include/fpu/softfloat.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > > -/*============================================================================ > +/* > > +============================================================================ > > -This C header file is part of the SoftFloat IEC/IEEE Floating-point > Arithmetic > -Package, Release 2b. > +This C header file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 > Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. > The original version > of this code was written as part of a project to build a fixed-point > vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL > LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO > FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, > OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE > SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR > ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice > with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) > they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > > -=============================================================================*/ > > +=============================================================================== > +*/ > > #ifndef SOFTFLOAT_H > #define SOFTFLOAT_H > @@ -46,14 +45,16 @@ these four paragraphs for those parts of this code > that are retained. > #include "config-host.h" > #include "qemu/osdep.h" > > > -/*---------------------------------------------------------------------------- > -| Each of the following `typedef's defines the most convenient type that > holds > -| integers of at least as many bits as specified. For example, `uint8' > should > -| be the most convenient type that can hold unsigned integers of as many > as > -| 8 bits. The `flag' type must be able to hold either a 0 or 1. For most > -| implementations of C, `flag', `uint8', and `int8' should all be > `typedef'ed > -| to the same as `int'. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Each of the following `typedef's defines the most convenient type that > holds > +integers of at least as many bits as specified. For example, `uint8' > should > +be the most convenient type that can hold unsigned integers of as many as > +8 bits. The `flag' type must be able to hold either a 0 or 1. For most > +implementations of C, `flag', `uint8', and `int8' should all be > `typedef'ed > +to the same as `int'. > > +------------------------------------------------------------------------------- > +*/ > typedef uint8_t flag; > typedef uint8_t uint8; > typedef int8_t int8; > @@ -69,9 +70,11 @@ typedef int64_t int64; > #define STATUS(field) status->field > #define STATUS_VAR , status > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point ordering relations > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point ordering relations > > +------------------------------------------------------------------------------- > +*/ > enum { > float_relation_less = -1, > float_relation_equal = 0, > @@ -79,9 +82,11 @@ enum { > float_relation_unordered = 2 > }; > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point types. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point types. > > +------------------------------------------------------------------------------- > +*/ > /* Use structures for soft-float types. This prevents accidentally mixing > them with native int/float types. A sufficiently clever compiler and > sane ABI should be able to see though these structs. However > @@ -137,17 +142,21 @@ typedef struct { > #define make_float128(high_, low_) ((float128) { .high = high_, .low = > low_ }) > #define make_float128_init(high_, low_) { .high = high_, .low = low_ } > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point underflow tininess-detection mode. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point underflow tininess-detection mode. > > +------------------------------------------------------------------------------- > +*/ > enum { > float_tininess_after_rounding = 0, > float_tininess_before_rounding = 1 > }; > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point rounding mode. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point rounding mode. > > +------------------------------------------------------------------------------- > +*/ > enum { > float_round_nearest_even = 0, > float_round_down = 1, > @@ -155,9 +164,11 @@ enum { > float_round_to_zero = 3 > }; > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point exception flags. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point exception flags. > > +------------------------------------------------------------------------------- > +*/ > enum { > float_flag_invalid = 1, > float_flag_divbyzero = 4, > @@ -167,7 +178,6 @@ enum { > float_flag_input_denormal = 64, > float_flag_output_denormal = 128 > }; > - > typedef struct float_status { > signed char float_detect_tininess; > signed char float_rounding_mode; > @@ -204,27 +214,33 @@ INLINE int get_float_exception_flags(float_status > *status) > } > void set_floatx80_rounding_precision(int val STATUS_PARAM); > > > -/*---------------------------------------------------------------------------- > -| Routine to raise any or all of the software IEC/IEEE floating-point > -| exception flags. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Routine to raise any or all of the software IEC/IEEE floating-point > +exception flags. > > +------------------------------------------------------------------------------- > +*/ > void float_raise( int8 flags STATUS_PARAM); > > > -/*---------------------------------------------------------------------------- > -| Options to indicate which negations to perform in float*_muladd() > -| Using these differs from negating an input or output before calling > -| the muladd function in that this means that a NaN doesn't have its > -| sign bit inverted before it is propagated. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Options to indicate which negations to perform in float*_muladd() > +Using these differs from negating an input or output before calling > +the muladd function in that this means that a NaN doesn't have its > +sign bit inverted before it is propagated. > > +------------------------------------------------------------------------------- > +*/ > enum { > float_muladd_negate_c = 1, > float_muladd_negate_product = 2, > float_muladd_negate_result = 4, > }; > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE integer-to-floating-point conversion routines. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE integer-to-floating-point conversion routines. > > +------------------------------------------------------------------------------- > +*/ > float32 int32_to_float32( int32 STATUS_PARAM ); > float64 int32_to_float64( int32 STATUS_PARAM ); > float32 uint32_to_float32( uint32 STATUS_PARAM ); > @@ -239,15 +255,19 @@ floatx80 int64_to_floatx80( int64 STATUS_PARAM ); > float128 int64_to_float128( int64 STATUS_PARAM ); > float128 uint64_to_float128( uint64 STATUS_PARAM ); > > > -/*---------------------------------------------------------------------------- > -| Software half-precision conversion routines. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software half-precision conversion routines. > > +*---------------------------------------------------------------------------- > +*/ > float16 float32_to_float16( float32, flag STATUS_PARAM ); > float32 float16_to_float32( float16, flag STATUS_PARAM ); > > > -/*---------------------------------------------------------------------------- > -| Software half-precision operations. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software half-precision operations. > > +------------------------------------------------------------------------------- > +*/ > int float16_is_quiet_nan( float16 ); > int float16_is_signaling_nan( float16 ); > float16 float16_maybe_silence_nan( float16 ); > @@ -257,14 +277,18 @@ INLINE int float16_is_any_nan(float16 a) > return ((float16_val(a) & ~0x8000) > 0x7c00); > } > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > > +------------------------------------------------------------------------------- > +*/ > extern const float16 float16_default_nan; > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision conversion routines. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision conversion routines. > > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float32_to_int16_round_to_zero(float32 STATUS_PARAM); > uint_fast16_t float32_to_uint16_round_to_zero(float32 STATUS_PARAM); > int32 float32_to_int32( float32 STATUS_PARAM ); > @@ -277,9 +301,11 @@ float64 float32_to_float64( float32 STATUS_PARAM ); > floatx80 float32_to_floatx80( float32 STATUS_PARAM ); > float128 float32_to_float128( float32 STATUS_PARAM ); > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision operations. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision operations. > > +------------------------------------------------------------------------------- > +*/ > float32 float32_round_to_int( float32 STATUS_PARAM ); > float32 float32_add( float32, float32 STATUS_PARAM ); > float32 float32_sub( float32, float32 STATUS_PARAM ); > @@ -361,14 +387,18 @@ INLINE float32 float32_set_sign(float32 a, int sign) > #define float32_infinity make_float32(0x7f800000) > > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > > +------------------------------------------------------------------------------- > +*/ > extern const float32 float32_default_nan; > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision conversion routines. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision conversion routines. > > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float64_to_int16_round_to_zero(float64 STATUS_PARAM); > uint_fast16_t float64_to_uint16_round_to_zero(float64 STATUS_PARAM); > int32 float64_to_int32( float64 STATUS_PARAM ); > @@ -383,9 +413,11 @@ float32 float64_to_float32( float64 STATUS_PARAM ); > floatx80 float64_to_floatx80( float64 STATUS_PARAM ); > float128 float64_to_float128( float64 STATUS_PARAM ); > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision operations. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision operations. > > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 STATUS_PARAM ); > float64 float64_trunc_to_int( float64 STATUS_PARAM ); > float64 float64_add( float64, float64 STATUS_PARAM ); > @@ -467,14 +499,18 @@ INLINE float64 float64_set_sign(float64 a, int sign) > #define float64_half make_float64(0x3fe0000000000000LL) > #define float64_infinity make_float64(0x7ff0000000000000LL) > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > > +------------------------------------------------------------------------------- > +*/ > extern const float64 float64_default_nan; > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision conversion routines. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision conversion routines. > > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 STATUS_PARAM ); > int32 floatx80_to_int32_round_to_zero( floatx80 STATUS_PARAM ); > int64 floatx80_to_int64( floatx80 STATUS_PARAM ); > @@ -483,9 +519,11 @@ float32 floatx80_to_float32( floatx80 STATUS_PARAM ); > float64 floatx80_to_float64( floatx80 STATUS_PARAM ); > float128 floatx80_to_float128( floatx80 STATUS_PARAM ); > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision operations. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision operations. > > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 STATUS_PARAM ); > floatx80 floatx80_add( floatx80, floatx80 STATUS_PARAM ); > floatx80 floatx80_sub( floatx80, floatx80 STATUS_PARAM ); > @@ -552,14 +590,18 @@ INLINE int floatx80_is_any_nan(floatx80 a) > #define floatx80_half make_floatx80(0x3ffe, 0x8000000000000000LL) > #define floatx80_infinity make_floatx80(0x7fff, 0x8000000000000000LL) > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > > +------------------------------------------------------------------------------- > +*/ > extern const floatx80 floatx80_default_nan; > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision conversion routines. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision conversion routines. > > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 STATUS_PARAM ); > int32 float128_to_int32_round_to_zero( float128 STATUS_PARAM ); > int64 float128_to_int64( float128 STATUS_PARAM ); > @@ -568,9 +610,11 @@ float32 float128_to_float32( float128 STATUS_PARAM ); > float64 float128_to_float64( float128 STATUS_PARAM ); > floatx80 float128_to_floatx80( float128 STATUS_PARAM ); > > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision operations. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision operations. > > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 STATUS_PARAM ); > float128 float128_add( float128, float128 STATUS_PARAM ); > float128 float128_sub( float128, float128 STATUS_PARAM ); > @@ -633,9 +677,11 @@ INLINE int float128_is_any_nan(float128 a) > > #define float128_zero make_float128(0, 0) > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. > > -*----------------------------------------------------------------------------*/ > +/* > > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. > > +------------------------------------------------------------------------------- > +*/ > extern const float128 float128_default_nan; > > #endif /* !SOFTFLOAT_H */ > -- > 1.8.0 > >
> N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. > > Please note, this is a comment-only change. The resulting binary should > be the same. > > I created this patch using the following strategy: > > 1) Create a branch using the original import of softfloat code: > $ git checkout 158142c2c2df728cfa3b5320c65534921a764f26 > > 2) Remove carriage returns from Softfloat-2b > > 3) Compare each of the softfloat files against Softfloat-2b using the > following mapping to generate Fabrice's original softfloat changes: > > - fpu/softfloat.c -> softfloat/bits64/softfloat.c > - fpu/softfloat.h -> softfloat/bits64/386-Win32-gcc/softfloat.h > - fpu/softfloat-macros.h -> softfloat/bits64/softfloat-macros > - fpu/softfloat-specialize.h -> > softfloat/bits64/386-Win32-gcc/softfloat-specialize > > 4) Replace our softfloat files with the corresponding files from > Softfloat-2a > > 5) Apply the diffs from (3) to (4) and commit > > 6) Create a diff between (5) and 158142c2c2df728cfa3b5320c65534921a764f26 > - This diff consists 100% of licensing change + comment reformating > > 7) Checkout the latest master branch, apply the diff from (6) > - There were a lot of comment rejects, confirmed this was only comments > and then used an emacs macro to rewrite the comments to the > Softfloat-2a > form. > > Cc: Andreas F盲rber <afaerber@suse.de> > Cc: Aurelien Jarno <aurelien@aurel32.net> > Cc: Avi Kivity <avi.kivity@gmail.com> > Cc: Ben Taylor <bentaylor.solx86@gmail.com> > Cc: Blue Swirl <blauwirbel@gmail.com> > Cc: Christophe Lyon <christophe.lyon@st.com> > Cc: Fabrice Bellard <fabrice@bellard.org> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> > Cc: Jocelyn Mayer <l_indien@magic.fr> > Cc: Juan Quintela <quintela@redhat.com> > Cc: malc <av1474@comtv.ru> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Paul Brook <paul@codesourcery.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Richard Sandiford <rdsandiford@googlemail.com> > Cc: Stefan Weil <weil@mail.berlios.de> > Cc: Thiemo Seufer <ths@networkno.de> > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
On Mon, Apr 29, 2013 at 01:05:03PM -0500, Anthony Liguori wrote: > N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. Thanks for your work! > Please note, this is a comment-only change. The resulting binary should > be the same. > > I created this patch using the following strategy: > > 1) Create a branch using the original import of softfloat code: > $ git checkout 158142c2c2df728cfa3b5320c65534921a764f26 > > 2) Remove carriage returns from Softfloat-2b > > 3) Compare each of the softfloat files against Softfloat-2b using the > following mapping to generate Fabrice's original softfloat changes: > > - fpu/softfloat.c -> softfloat/bits64/softfloat.c > - fpu/softfloat.h -> softfloat/bits64/386-Win32-gcc/softfloat.h > - fpu/softfloat-macros.h -> softfloat/bits64/softfloat-macros > - fpu/softfloat-specialize.h -> softfloat/bits64/386-Win32-gcc/softfloat-specialize > > 4) Replace our softfloat files with the corresponding files from Softfloat-2a > > 5) Apply the diffs from (3) to (4) and commit > > 6) Create a diff between (5) and 158142c2c2df728cfa3b5320c65534921a764f26 > - This diff consists 100% of licensing change + comment reformating > > 7) Checkout the latest master branch, apply the diff from (6) > - There were a lot of comment rejects, confirmed this was only comments > and then used an emacs macro to rewrite the comments to the Softfloat-2a > form. I guess this last step is the reason why comments added long after the original code are also modified by your patch. Right? > Cc: Andreas Färber <afaerber@suse.de> > Cc: Aurelien Jarno <aurelien@aurel32.net> > Cc: Avi Kivity <avi.kivity@gmail.com> > Cc: Ben Taylor <bentaylor.solx86@gmail.com> > Cc: Blue Swirl <blauwirbel@gmail.com> > Cc: Christophe Lyon <christophe.lyon@st.com> > Cc: Fabrice Bellard <fabrice@bellard.org> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> > Cc: Jocelyn Mayer <l_indien@magic.fr> > Cc: Juan Quintela <quintela@redhat.com> > Cc: malc <av1474@comtv.ru> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Paul Brook <paul@codesourcery.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Richard Sandiford <rdsandiford@googlemail.com> > Cc: Stefan Weil <weil@mail.berlios.de> > Cc: Thiemo Seufer <ths@networkno.de> > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > --- > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). > > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name <your@email.com> > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). > > Please respond no later than May 6th, 2013. If we are unable to confirm > relicense from an author, changes from that author will be reverted. Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Aurelien Jarno <aurelien@aurel32.net> writes: > On Mon, Apr 29, 2013 at 01:05:03PM -0500, Anthony Liguori wrote: >> 7) Checkout the latest master branch, apply the diff from (6) >> - There were a lot of comment rejects, confirmed this was only comments >> and then used an emacs macro to rewrite the comments to the Softfloat-2a >> form. > > I guess this last step is the reason why comments added long after the > original code are also modified by your patch. Right? Indeed. This is also why I provided a detailed explanation of the methodology :-) >> In order to make this change, we need to relicense all contributions >> from initial import of the SoftFloat code to match the license of >> SoftFloat-2a (instead of the implied SoftFloat-2b license). >> >> If you are on CC, it is because you have contributed to the softfloat >> code in QEMU. Please response to this note with: >> >> Acked-by: Your Name <your@email.com> >> >> To significant that you are able and willing to relicense your changes >> to the SoftFloat-1a license (or a GPL compatible license). >> >> Please respond no later than May 6th, 2013. If we are unable to confirm >> relicense from an author, changes from that author will be reverted. > > Acked-by: Aurelien Jarno <aurelien@aurel32.net> Thanks to everyone that has responded so far! Regards, Anthony Liguori > > -- > Aurelien Jarno GPG: 1024D/F1BCDB73 > aurelien@aurel32.net http://www.aurel32.net
Am 29.04.2013 20:05, schrieb Anthony Liguori: > N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. > > Please note, this is a comment-only change. The resulting binary should > be the same. > > I created this patch using the following strategy: > > 1) Create a branch using the original import of softfloat code: > $ git checkout 158142c2c2df728cfa3b5320c65534921a764f26 > > 2) Remove carriage returns from Softfloat-2b > > 3) Compare each of the softfloat files against Softfloat-2b using the > following mapping to generate Fabrice's original softfloat changes: > > - fpu/softfloat.c -> softfloat/bits64/softfloat.c > - fpu/softfloat.h -> softfloat/bits64/386-Win32-gcc/softfloat.h > - fpu/softfloat-macros.h -> softfloat/bits64/softfloat-macros > - fpu/softfloat-specialize.h -> softfloat/bits64/386-Win32-gcc/softfloat-specialize > > 4) Replace our softfloat files with the corresponding files from Softfloat-2a > > 5) Apply the diffs from (3) to (4) and commit > > 6) Create a diff between (5) and 158142c2c2df728cfa3b5320c65534921a764f26 > - This diff consists 100% of licensing change + comment reformating > > 7) Checkout the latest master branch, apply the diff from (6) > - There were a lot of comment rejects, confirmed this was only comments > and then used an emacs macro to rewrite the comments to the Softfloat-2a > form. > > Cc: Andreas Färber <afaerber@suse.de> > Cc: Aurelien Jarno <aurelien@aurel32.net> > Cc: Avi Kivity <avi.kivity@gmail.com> > Cc: Ben Taylor <bentaylor.solx86@gmail.com> > Cc: Blue Swirl <blauwirbel@gmail.com> > Cc: Christophe Lyon <christophe.lyon@st.com> > Cc: Fabrice Bellard <fabrice@bellard.org> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> > Cc: Jocelyn Mayer <l_indien@magic.fr> > Cc: Juan Quintela <quintela@redhat.com> > Cc: malc <av1474@comtv.ru> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Paul Brook <paul@codesourcery.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Richard Sandiford <rdsandiford@googlemail.com> > Cc: Stefan Weil <weil@mail.berlios.de> > Cc: Thiemo Seufer <ths@networkno.de> > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > --- > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). > > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name <your@email.com> > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). Including my pre-SUSE contributions, Acked-by: Andreas Färber <afaerber@suse.de> for changing to SoftFloat-2a license. Thanks for looking into this, Andreas > Please respond no later than May 6th, 2013. If we are unable to confirm > relicense from an author, changes from that author will be reverted. > --- > For completeness, here is the full listing of contributions: > > Andreas Färber <afaerber@suse.de> > be45f06 Silence softfloat warnings on OpenSolaris > 5aea4c5 softfloat: Replace uint16 type with uint_fast16_t > 94a49d8 softfloat: Replace int16 type with int_fast16_t > c969654 softfloat: Fix mixups of int and int16 > 38641f8 softfloat: Use uint16 consistently > 87b8cc3 softfloat: Resolve type mismatches between declaration and implementation > 8d725fa softfloat: Prepend QEMU-style header with derivation notice > 9f8d2a0 softfloat: Use uint32 consistently > bb98fe4 softfloat: Drop [s]bits{8, 16, 32, 64} types in favor of [u]int{8, 16, 32, 64}_t > [snip]
Am 29.04.2013 22:18, schrieb Peter Maydell: > On 29 April 2013 19:53, Anthony Liguori <aliguori@us.ibm.com> wrote: >> Anthony Liguori <aliguori@us.ibm.com> writes: >> >>> Thiemo Seufer <ths@networkno.de> [...] >>> 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files >>> 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. >>> fc81ba5 Check that HOST_SOLARIS is defined before relying on its >>> value. Spotted by Joachim Henke. > > These three are all changes to files that have subsequently been > deleted (the softfloat-native support was dropped altogether). Further, ", by ..." used to indicate that the committer applied a patch authored by someone else in pre-Git days. :) CC'ing Juergen Keil. Regards, Andreas Full patch: http://patchwork.ozlabs.org/patch/240431/
On Mon, Apr 29, 2013 at 6:05 PM, Anthony Liguori <aliguori@us.ibm.com> wrote: > N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. > > Please note, this is a comment-only change. The resulting binary should > be the same. > > I created this patch using the following strategy: > > 1) Create a branch using the original import of softfloat code: > $ git checkout 158142c2c2df728cfa3b5320c65534921a764f26 > > 2) Remove carriage returns from Softfloat-2b > > 3) Compare each of the softfloat files against Softfloat-2b using the > following mapping to generate Fabrice's original softfloat changes: > > - fpu/softfloat.c -> softfloat/bits64/softfloat.c > - fpu/softfloat.h -> softfloat/bits64/386-Win32-gcc/softfloat.h > - fpu/softfloat-macros.h -> softfloat/bits64/softfloat-macros > - fpu/softfloat-specialize.h -> softfloat/bits64/386-Win32-gcc/softfloat-specialize > > 4) Replace our softfloat files with the corresponding files from Softfloat-2a > > 5) Apply the diffs from (3) to (4) and commit > > 6) Create a diff between (5) and 158142c2c2df728cfa3b5320c65534921a764f26 > - This diff consists 100% of licensing change + comment reformating > > 7) Checkout the latest master branch, apply the diff from (6) > - There were a lot of comment rejects, confirmed this was only comments > and then used an emacs macro to rewrite the comments to the Softfloat-2a > form. > > Cc: Andreas Färber <afaerber@suse.de> > Cc: Aurelien Jarno <aurelien@aurel32.net> > Cc: Avi Kivity <avi.kivity@gmail.com> > Cc: Ben Taylor <bentaylor.solx86@gmail.com> > Cc: Blue Swirl <blauwirbel@gmail.com> > Cc: Christophe Lyon <christophe.lyon@st.com> > Cc: Fabrice Bellard <fabrice@bellard.org> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> > Cc: Jocelyn Mayer <l_indien@magic.fr> > Cc: Juan Quintela <quintela@redhat.com> > Cc: malc <av1474@comtv.ru> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Paul Brook <paul@codesourcery.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Richard Sandiford <rdsandiford@googlemail.com> > Cc: Stefan Weil <weil@mail.berlios.de> > Cc: Thiemo Seufer <ths@networkno.de> > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > --- > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). > > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name <your@email.com> > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). > > Please respond no later than May 6th, 2013. If we are unable to confirm > relicense from an author, changes from that author will be reverted. > --- > For completeness, here is the full listing of contributions: > > Andreas Färber <afaerber@suse.de> > be45f06 Silence softfloat warnings on OpenSolaris > 5aea4c5 softfloat: Replace uint16 type with uint_fast16_t > 94a49d8 softfloat: Replace int16 type with int_fast16_t > c969654 softfloat: Fix mixups of int and int16 > 38641f8 softfloat: Use uint16 consistently > 87b8cc3 softfloat: Resolve type mismatches between declaration and implementation > 8d725fa softfloat: Prepend QEMU-style header with derivation notice > 9f8d2a0 softfloat: Use uint32 consistently > bb98fe4 softfloat: Drop [s]bits{8, 16, 32, 64} types in favor of [u]int{8, 16, 32, 64}_t > > Aurelien Jarno <aurelien@aurel32.net> > 1020160 softfloat: fix default-NaN mode > 084d19b target-mips: Implement correct NaN propagation rules > 196cfc8 softfloat: add a 1.0 constant for float32 and float64 > 1b2ad2e softfloat-native: fix *nan() > 1f398e0 softfloat: use float{32,64,x80,128}_maybe_silence_nan() > 211315f softfloat: rename float*_eq() into float*_eq_quiet() > 2657d0f softfloat: rename float*_eq_signaling() into float*_eq() > 30e7a22 Use float_relation_* constants > 326b9e9 softfloat: fix float*_scalnb() corner cases > 34d2386 softfloat: remove HPPA specific code > 374dfc3 soft-float: add float32_log2() and float64_log2() > 4cc5383 softfloat-native: add float*_is_any_nan() functions > 587eabf softfloat: add float*_is_zero_or_denormal() > 629bd74 softfloat-native: add float32_is_nan() > 67b7861 softfloat: add float*_unordered_{,quiet}() functions > 8229c99 softfloat: add float32_exp2() > 85016c9 Assortment of soft-float fixes, by Aurelien Jarno. > 8d6c92b softfloat-native: improve correctness of floatXX_is_neg() > 93ae1c6 softfloat: fix float{32,64}_maybe_silence_nan() for MIPS > a167ba5 Add support for GNU/kFreeBSD > b3b4c7f softfloat: use GCC builtins to count the leading zeros > b4a0ef7 softfloat-native: add float*_unordered_quiet() functions > b689362 softfloat: move float*_eq and float*_eq_quiet > b76235e softfloat: fix floatx80_is_infinity() > bbc1ded softfloat: implement fused multiply-add NaN propagation for MIPS > be22a9a softfloat: always enable floatx80 and float128 support > c4b4c77 softfloat: add pi constants > c52ab6f fp: add floatXX_is_infinity(), floatXX_is_neg(), floatXX_is_zero() > cf67c6b softfloat-native: remove > d2b1027 softfloat-native: add a few constant values > d6882cf softfloat-native: fix float*_scalbn() functions > d735d69 softfloat: rename *IsNaN variables to *IsQuietNaN > dadd71a fp: fix float32_is_infinity() > de4af5f softfloat: fix floatx80_is_{quiet,signaling}_nan() > e024e88 target-ppc: Implement correct NaN propagation rules > e2f4220 softfloat: fix floatx80 handling of NaN > e872aa8 softfloat-native: fix type of float_rounding_mode > e908775 softfloat: SH4 has the sNaN bit set > f3218a8 softfloat: add floatx80 constants > f5a6425 softfloat: improve description of comparison functions > f6714d3 softfloat: add floatx80_compare*() functions > f6a7d92 softfloat: add float{x80,128}_maybe_silence_nan() > > Avi Kivity <avi.kivity@gmail.com> > 3bf7e40 softfloat: fix for C99 > > Ben Taylor <bentaylor.solx86@gmail.com> > 0475a5c Solaris 9/x86 support, by Ben Taylor. > c94655b Updated Solaris isinf support, by Juergen Keil and Ben Taylor. > > Blue Swirl <blauwirbel@gmail.com> > 128ab2f Preliminary OpenBSD host support (based on OpenBSD patches by Todd T. Fries) > 14d483e Fix OpenSolaris softfloat warnings > 179a2c1 Rename _BSD to HOST_BSD so that it's more obvious that it's defined by configure > 1d6198c Remove unnecessary trailing newlines > 1f58732 128-bit float support for user mode > 2734c70 Rename one more _BSD to HOST_BSD (spotted by Hasso Tepper) > 3f4cb3d Fix OpenSolaris gcc4 warnings: iovec type mismatches, missing 'static' > 70c1470 Sparse fixes: dubious mixing of bitwise and logical operations > 7c2a9d0 Fix math warnings on OpenBSD -current > b1d8e52 Fix undeclared symbol warnings from sparse > b55266b Suppress gcc 4.x -Wpointer-sign (included in -Wall) warnings > cd8a253 Fix more typos in softloat code (Eduardo Felipe) > d07cca0 Add native softfloat fpu functions (Christoph Egger) > ed086f3 softfloat: remove dead assignments, spotted by clang d07cca0 was supplied by Christoph Egger (cc'd): http://lists.nongnu.org/archive/html/qemu-devel/2008-11/msg00939.html Otherwise it's fine to relicense the above in Softfloat-1x, Softfloat-2x, GPLv2+ or LGPLv2+ licences. Acked-by: Blue Swirl <blauwirbel@gmail.com> > > Christophe Lyon <christophe.lyon@st.com> > 8559666 softfloat: move all default NaN definitions to softfloat.h. > bcd4d9a softfloat: Honour default_nan_mode for float-to-float conversions > c30fe7d softfloat: add _set_sign(), _infinity and _half for 32 and 64 bits floats. > > Fabrice Bellard <fabrice@bellard.org> > 158142c soft float support > 1b2b0af 64 bit fix > 1d6bda3 added abs, chs and compare functions > 38cfa06 Solaris port (Ben Taylor) > 750afe9 avoid using char when it is not necessary > b109f9f more native FPU comparison functions - native FPU remainder > ec530c8 Solaris port (Ben Taylor) > fdbb469 Solaris/SPARC host port (Ben Taylor) > > Guan Xuetao <gxt@mprc.pku.edu.cn> > d2fbca9 unicore32: necessary modifications for other files to support unicore32 > > Jocelyn Mayer <l_indien@magic.fr> > 3430b0b Ooops... Typo. > 75d62a5 Add missing softfloat helpers. > > Juan Quintela <quintela@redhat.com> > 0eb4fc8 softfloat: make USE_SOFTFLOAT_STRUCT_TYPES compile > 71e72a1 rename HOST_BSD to CONFIG_BSD > 75b5a69 rename NEEDS_LIBSUNMATH to CONFIG_NEEDS_LIBSUNMATH > dfe5fff change HOST_SOLARIS to CONFIG_SOLARIS{_VERSION} > e2542fe rename WORDS_BIGENDIAN to HOST_WORDS_BIGENDIAN > > malc <av1474@comtv.ru> > 947f5fc Add static qualifier to local functions > e58ffeb Remove all traces of __powerpc__ > > Max Filippov <jcmvbkbc@gmail.com> > 6617680 softfloat: make float_muladd_negate_* flags independent > 213ff4e softfloat: add NO_SIGNALING_NANS > b81fe82 target-xtensa: specialize softfloat NaN rules > > Paolo Bonzini <pbonzini@redhat.com> > 1de7afc misc: move include files to include/qemu/ > 6b4c305 fpu: move public header file to include/fpu > 789ec7c softfloat: change default nan definitions to variables > > Paul Brook <paul@codesourcery.com> > 6001149 ARM FP16 support > 6939754 Correctly normalize values and handle zero inputs to scalbn functions. > 3598ecb Remove missing include. > 5c7908e Implement default-NaN mode. > 7918bf4 Fix typo in BSD FP rounding mode names. > 9027db8 Fix ARM default NaN. > 9ee6e8b ARMv7 support. > a1b91bb Fix typo in softfloat code. > e6e5906 ColdFire target. > f090c9d Add strict checking mode for softfp code. > fe76d97 Implement flush-to-zero mode (denormal results are replaced with zero). > > Peter Maydell <peter.maydell@linaro.org> > 1856987 softfloat: Rename float*_is_nan() functions to float*_is_quiet_nan() > 760e141 softfloat: roundAndPackInt{32, 64}: Don't assume int32 is 32 bits > 011da61 target-arm: Implement correct NaN propagation rules > 21d6ebd softfloat: Add float*_is_any_nan() functions > 274f1b0 softfloat: Add float*_min() and float*_max() functions > 2ac8bd0 softfloat: Reinstate accidentally disabled target-specific NaN handling > 2bed652 softfloat: Implement floatx80_is_any_nan() and float128_is_any_nan() > 354f211 softfloat: abstract out target-specific NaN propagation rules > 369be8f softfloat: Implement fused multiply-add > 37d1866 softfloat: Implement flushing input denormals to zero > 4be8eea fpu/softfloat.c: Remove pointless shift of always-zero value > 600e30d softfloat: Fix single-to-half precision float conversions > 6f3300a softfloat: Add float32_is_zero_or_denormal() function > b3a6a2e softfloat: float*_to_int32_round_to_zero: don't assume int32 is 32 bits > b408dbd softfloat: Add float*_maybe_silence_nan() functions > bb4d4bb softfloat: Add float16 type and float16 NaN handling functions > c29aca4 softfloat: Add setter function for tininess detection mode > cbcef45 softfloat: Add float/double to 16 bit integer conversion functions > d5138cf softfloat: Fix compilation failures with USE_SOFTFLOAT_STRUCT_TYPES > e3d142d fpu: Correct edgecase in float64_muladd > e6afc87 softfloat: Add new flag for when denormal result is flushed to zero > e744c06 fpu/softfloat.c: Return correctly signed values from uint64_to_float32 > f591e1b softfloat: Correctly handle NaNs in float16_to_float32() > > Richard Henderson <rth@twiddle.net> > 17ed229 softfloat: Fix uint64_to_float64 > 1e397ea softfloat: Implement uint64_to_float128 > 8443eff target-alpha: Split up FPCR value into separate fields. > 990b3e1 target-alpha: Enable softfloat. > ba0e276 target-alpha: Fixes for alpha-linux syscalls. > > Richard Sandiford <rdsandiford@googlemail.com> > a6e7c18 softfloat: Handle float_muladd_negate_c when product is zero > > Stefan Weil <weil@mail.berlios.de> > bc4347b arm host: fix compiler warning > > Thiemo Seufer <ths@networkno.de> > 5a6932d Fix NaN handling for MIPS and HPPA. > 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files > 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. > 924b2c0 Add proper float*_is_nan prototypes. > b645bb4 Fix softfloat NaN handling. > fc81ba5 Check that HOST_SOLARIS is defined before relying on its value. Spotted by Joachim Henke. > --- > fpu/softfloat-macros.h | 430 ++++---- > fpu/softfloat-specialize.h | 494 +++++---- > fpu/softfloat.c | 2436 ++++++++++++++++++++++++-------------------- > include/fpu/softfloat.h | 242 +++-- > 4 files changed, 1981 insertions(+), 1621 deletions(-) > > diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h > index b5164af..2009315 100644 > --- a/fpu/softfloat-macros.h > +++ b/fpu/softfloat-macros.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,28 +17,27 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal notice) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > =============================================================================*/ > > -/*---------------------------------------------------------------------------- > -| This macro tests for minimum version of the GNU C compiler. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +This macro tests for minimum version of the GNU C compiler. > +------------------------------------------------------------------------------- > +*/ > #if defined(__GNUC__) && defined(__GNUC_MINOR__) > # define SOFTFLOAT_GNUC_PREREQ(maj, min) \ > ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) > @@ -46,14 +46,16 @@ these four paragraphs for those parts of this code that are retained. > #endif > > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant bit of > -| the result by setting the least significant bit to 1. The value of `count' > -| can be arbitrarily large; in particular, if `count' is greater than 32, the > -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. > -| The result is stored in the location pointed to by `zPtr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit of > +the result by setting the least significant bit to 1. The value of `count' > +can be arbitrarily large; in particular, if `count' is greater than 32, the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) > { > @@ -72,14 +74,16 @@ INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) > > } > > -/*---------------------------------------------------------------------------- > -| Shifts `a' right by the number of bits given in `count'. If any nonzero > -| bits are shifted off, they are ``jammed'' into the least significant bit of > -| the result by setting the least significant bit to 1. The value of `count' > -| can be arbitrarily large; in particular, if `count' is greater than 64, the > -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. > -| The result is stored in the location pointed to by `zPtr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit of > +the result by setting the least significant bit to 1. The value of `count' > +can be arbitrarily large; in particular, if `count' is greater than 64, the > +result will be either 0 or 1, depending on whether `a' is zero or nonzero. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) > { > @@ -98,23 +102,24 @@ INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 > -| _plus_ the number of bits given in `count'. The shifted result is at most > -| 64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The > -| bits shifted off form a second 64-bit result as follows: The _last_ bit > -| shifted off is the most-significant bit of the extra result, and the other > -| 63 bits of the extra result are all zero if and only if _all_but_the_last_ > -| bits shifted off were all zero. This extra result is stored in the location > -| pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0' and `a1' are considered to form > -| a fixed-point value with binary point between `a0' and `a1'. This fixed- > -| point value is shifted right by the number of bits given in `count', and > -| the integer part of the result is returned at the location pointed to by > -| `z0Ptr'. The fractional part of the result may be slightly corrupted as > -| described above, and is returned at the location pointed to by `z1Ptr'.) > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 > +_plus_ the number of bits given in `count'. The shifted result is at most > +64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The > +bits shifted off form a second 64-bit result as follows: The _last_ bit > +shifted off is the most-significant bit of the extra result, and the other > +63 bits of the extra result are all zero if and only if _all_but_the_last_ > +bits shifted off were all zero. This extra result is stored in the location > +pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0' and `a1' are considered to form a > +fixed-point value with binary point between `a0' and `a1'. This fixed-point > +value is shifted right by the number of bits given in `count', and the > +integer part of the result is returned at the location pointed to by > +`z0Ptr'. The fractional part of the result may be slightly corrupted as > +described above, and is returned at the location pointed to by `z1Ptr'.) > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift64ExtraRightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -144,14 +149,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > -| number of bits given in `count'. Any bits shifted off are lost. The value > -| of `count' can be arbitrarily large; in particular, if `count' is greater > -| than 128, the result will be 0. The result is broken into two 64-bit pieces > -| which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > +number of bits given in `count'. Any bits shifted off are lost. The value > +of `count' can be arbitrarily large; in particular, if `count' is greater > +than 128, the result will be 0. The result is broken into two 64-bit pieces > +which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128Right( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -176,17 +182,18 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > -| number of bits given in `count'. If any nonzero bits are shifted off, they > -| are ``jammed'' into the least significant bit of the result by setting the > -| least significant bit to 1. The value of `count' can be arbitrarily large; > -| in particular, if `count' is greater than 128, the result will be either > -| 0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or > -| nonzero. The result is broken into two 64-bit pieces which are stored at > -| the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the > +number of bits given in `count'. If any nonzero bits are shifted off, they > +are ``jammed'' into the least significant bit of the result by setting the > +least significant bit to 1. The value of `count' can be arbitrarily large; > +in particular, if `count' is greater than 128, the result will be either > +0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or > +nonzero. The result is broken into two 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128RightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -219,25 +226,26 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right > -| by 64 _plus_ the number of bits given in `count'. The shifted result is > -| at most 128 nonzero bits; these are broken into two 64-bit pieces which are > -| stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted > -| off form a third 64-bit result as follows: The _last_ bit shifted off is > -| the most-significant bit of the extra result, and the other 63 bits of the > -| extra result are all zero if and only if _all_but_the_last_ bits shifted off > -| were all zero. This extra result is stored in the location pointed to by > -| `z2Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0', `a1', and `a2' are considered > -| to form a fixed-point value with binary point between `a1' and `a2'. This > -| fixed-point value is shifted right by the number of bits given in `count', > -| and the integer part of the result is returned at the locations pointed to > -| by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly > -| corrupted as described above, and is returned at the location pointed to by > -| `z2Ptr'.) > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right > +by 64 _plus_ the number of bits given in `count'. The shifted result is > +at most 128 nonzero bits; these are broken into two 64-bit pieces which are > +stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted > +off form a third 64-bit result as follows: The _last_ bit shifted off is > +the most-significant bit of the extra result, and the other 63 bits of the > +extra result are all zero if and only if _all_but_the_last_ bits shifted off > +were all zero. This extra result is stored in the location pointed to by > +`z2Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0', `a1', and `a2' are considered > +to form a fixed-point value with binary point between `a1' and `a2'. This > +fixed-point value is shifted right by the number of bits given in `count', > +and the integer part of the result is returned at the locations pointed to > +by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly > +corrupted as described above, and is returned at the location pointed to by > +`z2Ptr'.) > +------------------------------------------------------------------------------- > +*/ > INLINE void > shift128ExtraRightJamming( > uint64_t a0, > @@ -289,13 +297,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the > -| number of bits given in `count'. Any bits shifted off are lost. The value > -| of `count' must be less than 64. The result is broken into two 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the > +number of bits given in `count'. Any bits shifted off are lost. The value > +of `count' must be less than 64. The result is broken into two 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift128Left( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) > @@ -307,14 +316,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left > -| by the number of bits given in `count'. Any bits shifted off are lost. > -| The value of `count' must be less than 64. The result is broken into three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left > +by the number of bits given in `count'. Any bits shifted off are lost. > +The value of `count' must be less than 64. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > shortShift192Left( > uint64_t a0, > @@ -343,13 +353,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit > -| value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so > -| any carry out is lost. The result is broken into two 64-bit pieces which > -| are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit > +value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so > +any carry out is lost. The result is broken into two 64-bit pieces which > +are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > add128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) > @@ -362,14 +373,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the > -| 192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > -| modulo 2^192, so any carry out is lost. The result is broken into three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the > +192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > +modulo 2^192, so any carry out is lost. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > add192( > uint64_t a0, > @@ -400,14 +412,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the > -| 128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo > -| 2^128, so any borrow out (carry out) is lost. The result is broken into two > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > -| `z1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the > +128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo > +2^128, so any borrow out (carry out) is lost. The result is broken into two > +64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > +`z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) > @@ -418,14 +431,15 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > -| from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > -| Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > -| result is broken into three 64-bit pieces which are stored at the locations > -| pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > +from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > +Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > +result is broken into three 64-bit pieces which are stored at the locations > +pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > sub192( > uint64_t a0, > @@ -456,11 +470,13 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > -| into two 64-bit pieces which are stored at the locations pointed to by > -| `z0Ptr' and `z1Ptr'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > +into two 64-bit pieces which are stored at the locations pointed to by > +`z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------------- > +*/ > > INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr ) > { > @@ -485,13 +501,14 @@ INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > -| `b' to obtain a 192-bit product. The product is broken into three 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and > -| `z2Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > +`b' to obtain a 192-bit product. The product is broken into three 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and > +`z2Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128By64To192( > uint64_t a0, > @@ -513,13 +530,14 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > -| 128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > -| product. The product is broken into four 64-bit pieces which are stored at > -| the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > +128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > +product. The product is broken into four 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > +------------------------------------------------------------------------------- > +*/ > INLINE void > mul128To256( > uint64_t a0, > @@ -550,14 +568,16 @@ INLINE void > > } > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the 64-bit integer quotient obtained by dividing > -| `b' into the 128-bit value formed by concatenating `a0' and `a1'. The > -| divisor `b' must be at least 2^63. If q is the exact quotient truncated > -| toward zero, the approximation returned lies between q and q + 2 inclusive. > -| If the exact quotient q is larger than 64 bits, the maximum positive 64-bit > -| unsigned integer is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns an approximation to the 64-bit integer quotient obtained by dividing > +`b' into the 128-bit value formed by concatenating `a0' and `a1'. The > +divisor `b' must be at least 2^63. If q is the exact quotient truncated > +toward zero, the approximation returned lies between q and q + 2 inclusive. > +If the exact quotient q is larger than 64 bits, the maximum positive 64-bit > +unsigned integer is returned. > +------------------------------------------------------------------------------- > +*/ > > static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) > { > @@ -581,15 +601,17 @@ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns an approximation to the square root of the 32-bit significand given > -| by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of > -| `aExp' (the least significant bit) is 1, the integer returned approximates > -| 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' > -| is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > -| case, the approximation returned lies strictly within +/-2 of the exact > -| value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns an approximation to the square root of the 32-bit significand given > +by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of > +`aExp' (the least significant bit) is 1, the integer returned approximates > +2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' > +is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > +case, the approximation returned lies strictly within +/-2 of the exact > +value. > +------------------------------------------------------------------------------- > +*/ > > static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > { > @@ -620,10 +642,12 @@ static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit of > -| `a'. If `a' is zero, 32 is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 32 is returned. > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros32( uint32_t a ) > { > @@ -668,10 +692,12 @@ static int8 countLeadingZeros32( uint32_t a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns the number of leading 0 bits before the most-significant 1 bit of > -| `a'. If `a' is zero, 64 is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 64 is returned. > +------------------------------------------------------------------------------- > +*/ > > static int8 countLeadingZeros64( uint64_t a ) > { > @@ -696,11 +722,13 @@ static int8 countLeadingZeros64( uint64_t a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > -| is equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > +is equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -709,11 +737,13 @@ INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > -| than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > +than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -722,11 +752,13 @@ INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > -| than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > -| returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less > +than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > +returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -735,11 +767,13 @@ INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > -| not equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > +not equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag ne128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h > index 518f694..ba9bfeb 100644 > --- a/fpu/softfloat-specialize.h > +++ b/fpu/softfloat-specialize.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,22 +17,19 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > =============================================================================*/ > > @@ -48,9 +46,11 @@ these four paragraphs for those parts of this code that are retained. > #define NO_SIGNALING_NANS 1 > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > const float16 float16_default_nan = const_float16(0x7E00); > #elif SNAN_BIT_IS_ONE > @@ -59,9 +59,11 @@ const float16 float16_default_nan = const_float16(0x7DFF); > const float16 float16_default_nan = const_float16(0xFE00); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float32 float32_default_nan = const_float32(0x7FFFFFFF); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) || \ > @@ -73,9 +75,11 @@ const float32 float32_default_nan = const_float32(0x7FBFFFFF); > const float32 float32_default_nan = const_float32(0xFFC00000); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_SPARC) > const float64 float64_default_nan = const_float64(LIT64( 0x7FFFFFFFFFFFFFFF )); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) > @@ -86,9 +90,11 @@ const float64 float64_default_nan = const_float64(LIT64( 0x7FF7FFFFFFFFFFFF )); > const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); > #endif > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define floatx80_default_nan_high 0x7FFF > #define floatx80_default_nan_low LIT64( 0xBFFFFFFFFFFFFFFF ) > @@ -100,10 +106,12 @@ const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); > const floatx80 floatx80_default_nan > = make_floatx80_init(floatx80_default_nan_high, floatx80_default_nan_low); > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. The `high' and > -| `low' values hold the most- and least-significant bits, respectively. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. The `high' and > +`low' values hold the most- and least-significant bits, respectively. > +------------------------------------------------------------------------------- > +*/ > #if SNAN_BIT_IS_ONE > #define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF ) > #define float128_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) > @@ -115,21 +123,25 @@ const floatx80 floatx80_default_nan > const float128 float128_default_nan > = make_float128_init(float128_default_nan_high, float128_default_nan_low); > > -/*---------------------------------------------------------------------------- > -| Raises the exceptions specified by `flags'. Floating-point traps can be > -| defined here if desired. It is currently not possible for such a trap > -| to substitute a result value. If traps are not implemented, this routine > -| should be simply `float_exception_flags |= flags;'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Raises the exceptions specified by `flags'. Floating-point traps can be > +defined here if desired. It is currently not possible for such a trap > +to substitute a result value. If traps are not implemented, this routine > +should be simply `float_exception_flags |= flags;'. > +------------------------------------------------------------------------------- > +*/ > > void float_raise( int8 flags STATUS_PARAM ) > { > STATUS(float_exception_flags) |= flags; > } > > -/*---------------------------------------------------------------------------- > -| Internal canonical NaN format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Internal canonical NaN format. > +------------------------------------------------------------------------------- > +*/ > typedef struct { > flag sign; > uint64_t high, low; > @@ -146,10 +158,12 @@ int float16_is_signaling_nan(float16 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float16_is_quiet_nan(float16 a_) > { > @@ -161,10 +175,12 @@ int float16_is_quiet_nan(float16 a_) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the half-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the half-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float16_is_signaling_nan(float16 a_) > { > @@ -177,10 +193,12 @@ int float16_is_signaling_nan(float16 a_) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the half-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the half-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > float16 float16_maybe_silence_nan(float16 a_) > { > if (float16_is_signaling_nan(a_)) { > @@ -199,11 +217,13 @@ float16 float16_maybe_silence_nan(float16 a_) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the half-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the half-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > > static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > { > @@ -216,10 +236,12 @@ static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the half- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the half- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM) > { > @@ -248,10 +270,12 @@ int float32_is_signaling_nan(float32 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float32_is_quiet_nan( float32 a_ ) > { > @@ -263,10 +287,12 @@ int float32_is_quiet_nan( float32 a_ ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float32_is_signaling_nan( float32 a_ ) > { > @@ -279,10 +305,12 @@ int float32_is_signaling_nan( float32 a_ ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the single-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the single-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float32 float32_maybe_silence_nan( float32 a_ ) > { > @@ -302,12 +330,13 @@ float32 float32_maybe_silence_nan( float32 a_ ) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > { > commonNaNT z; > @@ -319,10 +348,12 @@ static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the single- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the single- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > { > @@ -339,22 +370,24 @@ static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > return float32_default_nan; > } > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a two-input operation. > -| IEEE754 doesn't specify all the details of this, so the > -| algorithm is target-specific. > -| The routine is passed various bits of information about the > -| two NaNs and should return 0 to select NaN a and 1 for NaN b. > -| Note that signalling NaNs are always squashed to quiet NaNs > -| by the caller, by calling floatXX_maybe_silence_nan() before > -| returning them. > -| > -| aIsLargerSignificand is only valid if both a and b are NaNs > -| of some kind, and is true if a has the larger significand, > -| or if both a and b have the same significand but a is > -| positive but b is negative. It is only needed for the x87 > -| tie-break rule. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a two-input operation. > +IEEE754 doesn't specify all the details of this, so the > +algorithm is target-specific. > +The routine is passed various bits of information about the > +two NaNs and should return 0 to select NaN a and 1 for NaN b. > +Note that signalling NaNs are always squashed to quiet NaNs > +by the caller, by calling floatXX_maybe_silence_nan() before > +returning them. > + > +aIsLargerSignificand is only valid if both a and b are NaNs > +of some kind, and is true if a has the larger significand, > +or if both a and b have the same significand but a is > +positive but b is negative. It is only needed for the x87 > +tie-break rule. > +------------------------------------------------------------------------------- > +*/ > > #if defined(TARGET_ARM) > static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > @@ -451,12 +484,14 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > } > #endif > > -/*---------------------------------------------------------------------------- > -| Select which NaN to propagate for a three-input operation. > -| For the moment we assume that no CPU needs the 'larger significand' > -| information. > -| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Select which NaN to propagate for a three-input operation. > +For the moment we assume that no CPU needs the 'larger significand' > +information. > +Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > +------------------------------------------------------------------------------- > +*/ > #if defined(TARGET_ARM) > static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM) > @@ -554,12 +589,13 @@ static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, > } > #endif > > -/*---------------------------------------------------------------------------- > -| Takes two single-precision floating-point values `a' and `b', one of which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > -| signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two single-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -594,14 +630,16 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Takes three single-precision floating-point values `a', `b' and `c', one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes three single-precision floating-point values `a', `b' and `c', one of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------------- > +*/ > > static float32 propagateFloat32MulAddNaN(float32 a, float32 b, > float32 c, flag infzero STATUS_PARAM) > @@ -656,10 +694,12 @@ int float64_is_signaling_nan(float64 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float64_is_quiet_nan( float64 a_ ) > { > @@ -673,10 +713,12 @@ int float64_is_quiet_nan( float64 a_ ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float64_is_signaling_nan( float64 a_ ) > { > @@ -691,10 +733,12 @@ int float64_is_signaling_nan( float64 a_ ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the double-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the double-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float64 float64_maybe_silence_nan( float64 a_ ) > { > @@ -714,12 +758,13 @@ float64 float64_maybe_silence_nan( float64 a_ ) > return a_; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > { > commonNaNT z; > @@ -731,10 +776,12 @@ static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the double- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the double- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > { > @@ -753,12 +800,13 @@ static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > return float64_default_nan; > } > > -/*---------------------------------------------------------------------------- > -| Takes two double-precision floating-point values `a' and `b', one of which > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > -| signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two double-precision floating-point values `a' and `b', one of which > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -793,14 +841,16 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Takes three double-precision floating-point values `a', `b' and `c', one of > -| which is a NaN, and returns the appropriate NaN result. If any of `a', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes three double-precision floating-point values `a', `b' and `c', one of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------------- > +*/ > > static float64 propagateFloat64MulAddNaN(float64 a, float64 b, > float64 c, flag infzero STATUS_PARAM) > @@ -855,11 +905,13 @@ int floatx80_is_signaling_nan(floatx80 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| quiet NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +quiet NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_quiet_nan( floatx80 a ) > { > @@ -877,11 +929,13 @@ int floatx80_is_quiet_nan( floatx80 a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_is_signaling_nan( floatx80 a ) > { > @@ -900,10 +954,12 @@ int floatx80_is_signaling_nan( floatx80 a ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the extended double-precision floating point value > -| `a' is a signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the extended double-precision floating point value > +`a' is a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > floatx80 floatx80_maybe_silence_nan( floatx80 a ) > { > @@ -923,12 +979,13 @@ floatx80 floatx80_maybe_silence_nan( floatx80 a ) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the > -| invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the > +invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > { > commonNaNT z; > @@ -946,10 +1003,12 @@ static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the extended > -| double-precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the extended > +double-precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > { > @@ -972,12 +1031,13 @@ static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Takes two extended double-precision floating-point values `a' and `b', one > -| of which is a NaN, and returns the appropriate NaN result. If either `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two extended double-precision floating-point values `a' and `b', one > +of which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -1023,10 +1083,12 @@ int float128_is_signaling_nan(float128 a_) > return 0; > } > #else > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float128_is_quiet_nan( float128 a ) > { > @@ -1041,10 +1103,12 @@ int float128_is_quiet_nan( float128 a ) > #endif > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. > +------------------------------------------------------------------------------- > +*/ > > int float128_is_signaling_nan( float128 a ) > { > @@ -1060,10 +1124,12 @@ int float128_is_signaling_nan( float128 a ) > } > #endif > > -/*---------------------------------------------------------------------------- > -| Returns a quiet NaN if the quadruple-precision floating point value `a' is > -| a signaling NaN; otherwise returns `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns a quiet NaN if the quadruple-precision floating point value `a' is > +a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------------- > +*/ > > float128 float128_maybe_silence_nan( float128 a ) > { > @@ -1083,12 +1149,13 @@ float128 float128_maybe_silence_nan( float128 a ) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > -| exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------------- > +*/ > static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > { > commonNaNT z; > @@ -1099,10 +1166,12 @@ static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the canonical NaN `a' to the quadruple- > -| precision floating-point format. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the canonical NaN `a' to the quadruple- > +precision floating-point format. > +------------------------------------------------------------------------------- > +*/ > > static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > { > @@ -1119,12 +1188,13 @@ static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Takes two quadruple-precision floating-point values `a' and `b', one of > -| which is a NaN, and returns the appropriate NaN result. If either `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes two quadruple-precision floating-point values `a' and `b', one of > +which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------------- > +*/ > static float128 propagateFloat128NaN( float128 a, float128 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > diff --git a/fpu/softfloat.c b/fpu/softfloat.c > index 7ba51b6..9145582 100644 > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +=============================================================================== > > -This C source file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic > -Package, Release 2b. > +This C source file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > -=============================================================================*/ > +=============================================================================== > +*/ > > /* softfloat (and in particular the code in softfloat-specialize.h) is > * target-dependent and needs the TARGET_* macros. > @@ -42,21 +41,25 @@ these four paragraphs for those parts of this code that are retained. > > #include "fpu/softfloat.h" > > -/*---------------------------------------------------------------------------- > -| Primitive arithmetic functions, including multi-word arithmetic, and > -| division and square root approximations. (Can be specialized to target if > -| desired.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Primitive arithmetic functions, including multi-word arithmetic, and > +division and square root approximations. (Can be specialized to target if > +desired.) > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-macros.h" > > -/*---------------------------------------------------------------------------- > -| Functions and definitions to determine: (1) whether tininess for underflow > -| is detected before or after rounding by default, (2) what (if anything) > -| happens when exceptions are raised, (3) how signaling NaNs are distinguished > -| from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > -| are propagated from function inputs to output. These details are target- > -| specific. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Functions and definitions to determine: (1) whether tininess for underflow > +is detected before or after rounding by default, (2) what (if anything) > +happens when exceptions are raised, (3) how signaling NaNs are distinguished > +from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > +are propagated from function inputs to output. These details are target- > +specific. > +------------------------------------------------------------------------------- > +*/ > #include "softfloat-specialize.h" > > void set_float_rounding_mode(int val STATUS_PARAM) > @@ -74,43 +77,51 @@ void set_floatx80_rounding_precision(int val STATUS_PARAM) > STATUS(floatx80_rounding_precision) = val; > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the half-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat16Frac(float16 a) > { > return float16_val(a) & 0x3ff; > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the half-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat16Exp(float16 a) > { > return (float16_val(a) >> 10) & 0x1f; > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE flag extractFloat16Sign(float16 a) > { > return float16_val(a)>>15; > } > > -/*---------------------------------------------------------------------------- > -| Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > -| and 7, and returns the properly rounded 32-bit integer corresponding to the > -| input. If `zSign' is 1, the input is negated before being converted to an > -| integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input > -| is simply rounded to an integer, with the inexact exception raised if the > -| input cannot be represented exactly as an integer. However, if the fixed- > -| point input is too large, the invalid exception is raised and the largest > -| positive or negative integer is returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > +and 7, and returns the properly rounded 32-bit integer corresponding to the > +input. If `zSign' is 1, the input is negated before being converted to an > +integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input > +is simply rounded to an integer, with the inexact exception raised if the > +input cannot be represented exactly as an integer. However, if the fixed- > +point input is too large, the invalid exception is raised and the largest > +positive or negative integer is returned. > +------------------------------------------------------------------------------- > +*/ > > static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > { > @@ -150,17 +161,19 @@ static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > -| `absZ1', with binary point between bits 63 and 64 (between the input words), > -| and returns the properly rounded 64-bit integer corresponding to the input. > -| If `zSign' is 1, the input is negated before being converted to an integer. > -| Ordinarily, the fixed-point input is simply rounded to an integer, with > -| the inexact exception raised if the input cannot be represented exactly as > -| an integer. However, if the fixed-point input is too large, the invalid > -| exception is raised and the largest positive or negative integer is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > +`absZ1', with binary point between bits 63 and 64 (between the input words), > +and returns the properly rounded 64-bit integer corresponding to the input. > +If `zSign' is 1, the input is negated before being converted to an integer. > +Ordinarily, the fixed-point input is simply rounded to an integer, with > +the inexact exception raised if the input cannot be represented exactly as > +an integer. However, if the fixed-point input is too large, the invalid > +exception is raised and the largest positive or negative integer is > +returned. > +------------------------------------------------------------------------------- > +*/ > > static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATUS_PARAM) > { > @@ -203,9 +216,11 @@ static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATU > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint32_t extractFloat32Frac( float32 a ) > { > @@ -214,9 +229,11 @@ INLINE uint32_t extractFloat32Frac( float32 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat32Exp(float32 a) > { > @@ -225,10 +242,11 @@ INLINE int_fast16_t extractFloat32Exp(float32 a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the single-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat32Sign( float32 a ) > { > > @@ -236,10 +254,12 @@ INLINE flag extractFloat32Sign( float32 a ) > > } > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > +------------------------------------------------------------------------------- > +*/ > static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -251,13 +271,14 @@ static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal single-precision floating-point value represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal single-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat32Subnormal(uint32_t aSig, int_fast16_t *zExpPtr, uint32_t *zSigPtr) > { > @@ -269,16 +290,18 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| single-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +single-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > > INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > { > @@ -288,27 +311,29 @@ INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper single-precision floating- > -| point value corresponding to the abstract input. Ordinarily, the abstract > -| value is simply rounded and packed into the single-precision format, with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal single- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 30 > -| and 29, which is 7 bits to the left of the usual location. This shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the single-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal single- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 30 > +and 29, which is 7 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) > { > @@ -366,15 +391,16 @@ static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper single-precision floating- > -| point value corresponding to the abstract input. This routine is just like > -| `roundAndPackFloat32' except that `zSig' does not have to be normalized. > -| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > -| floating-point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. This routine is just like > +`roundAndPackFloat32' except that `zSig' does not have to be normalized. > +Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > +------------------------------------------------------------------------------- > +*/ > static float32 > normalizeRoundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) > { > @@ -385,9 +411,11 @@ static float32 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat64Frac( float64 a ) > { > @@ -396,9 +424,11 @@ INLINE uint64_t extractFloat64Frac( float64 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE int_fast16_t extractFloat64Exp(float64 a) > { > @@ -407,10 +437,11 @@ INLINE int_fast16_t extractFloat64Exp(float64 a) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the double-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the double-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat64Sign( float64 a ) > { > > @@ -418,10 +449,12 @@ INLINE flag extractFloat64Sign( float64 a ) > > } > > -/*---------------------------------------------------------------------------- > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the value. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the value. > +------------------------------------------------------------------------------- > +*/ > static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -433,13 +466,14 @@ static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > return a; > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal double-precision floating-point value represented > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal double-precision floating-point value represented > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat64Subnormal(uint64_t aSig, int_fast16_t *zExpPtr, uint64_t *zSigPtr) > { > @@ -451,16 +485,18 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| double-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +double-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > > INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > { > @@ -470,27 +506,29 @@ INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper double-precision floating- > -| point value corresponding to the abstract input. Ordinarily, the abstract > -| value is simply rounded and packed into the double-precision format, with > -| the inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded > -| to a subnormal number, and the underflow and inexact exceptions are raised > -| if the abstract input cannot be represented exactly as a subnormal double- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 62 > -| and 61, which is 10 bits to the left of the usual location. This shifted > -| significand must be normalized or smaller. If `zSig' is not normalized, > -| `zExp' must be 0; in that case, the result returned is a subnormal number, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstract > +value is simply rounded and packed into the double-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded > +to a subnormal number, and the underflow and inexact exceptions are raised > +if the abstract input cannot be represented exactly as a subnormal double- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 62 > +and 61, which is 10 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal number, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) > { > @@ -548,15 +586,16 @@ static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand `zSig', and returns the proper double-precision floating- > -| point value corresponding to the abstract input. This routine is just like > -| `roundAndPackFloat64' except that `zSig' does not have to be normalized. > -| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > -| floating-point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. This routine is just like > +`roundAndPackFloat64' except that `zSig' does not have to be normalized. > +Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' > +floating-point exponent. > +------------------------------------------------------------------------------- > +*/ > static float64 > normalizeRoundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) > { > @@ -567,10 +606,12 @@ static float64 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the fraction bits of the extended double-precision floating-point > -| value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the fraction bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloatx80Frac( floatx80 a ) > { > @@ -579,11 +620,12 @@ INLINE uint64_t extractFloatx80Frac( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the extended double-precision floating-point > -| value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloatx80Exp( floatx80 a ) > { > > @@ -591,11 +633,12 @@ INLINE int32 extractFloatx80Exp( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the extended double-precision floating-point value > -| `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the extended double-precision floating-point value > +`a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloatx80Sign( floatx80 a ) > { > > @@ -603,13 +646,14 @@ INLINE flag extractFloatx80Sign( floatx80 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal extended double-precision floating-point value > -| represented by the denormalized significand `aSig'. The normalized exponent > -| and significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal extended double-precision floating-point value > +represented by the denormalized significand `aSig'. The normalized exponent > +and significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloatx80Subnormal( uint64_t aSig, int32 *zExpPtr, uint64_t *zSigPtr ) > { > @@ -621,10 +665,12 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > -| extended double-precision floating-point value, returning the result. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > +extended double-precision floating-point value, returning the result. > +------------------------------------------------------------------------------- > +*/ > > INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > { > @@ -636,30 +682,31 @@ INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and extended significand formed by the concatenation of `zSig0' and `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| rounded and packed into the extended double-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal extended > -| double-precision floating-point number. > -| If `roundingPrecision' is 32 or 64, the result is rounded to the same > -| number of bits as single or double precision, respectively. Otherwise, the > -| result is rounded to the full precision of the extended double-precision > -| format. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the result > -| returned is a subnormal number, and it must not require rounding. The > -| handling of underflow and overflow follows the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and extended significand formed by the concatenation of `zSig0' and `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +rounded and packed into the extended double-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal extended > +double-precision floating-point number. > + If `roundingPrecision' is 32 or 64, the result is rounded to the same > +number of bits as single or double precision, respectively. Otherwise, the > +result is rounded to the full precision of the extended double-precision > +format. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. The > +handling of underflow and overflow follows the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 > roundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 > @@ -823,15 +870,16 @@ static floatx80 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent > -| `zExp', and significand formed by the concatenation of `zSig0' and `zSig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. This routine is just like > -| `roundAndPackFloatx80' except that the input significand does not have to be > -| normalized. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent > +`zExp', and significand formed by the concatenation of `zSig0' and `zSig1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. This routine is just like > +`roundAndPackFloatx80' except that the input significand does not have to be > +normalized. > +------------------------------------------------------------------------------- > +*/ > static floatx80 > normalizeRoundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 > @@ -852,10 +900,12 @@ static floatx80 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the least-significant 64 fraction bits of the quadruple-precision > -| floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the least-significant 64 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac1( float128 a ) > { > @@ -864,10 +914,12 @@ INLINE uint64_t extractFloat128Frac1( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the most-significant 48 fraction bits of the quadruple-precision > -| floating-point value `a'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the most-significant 48 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > > INLINE uint64_t extractFloat128Frac0( float128 a ) > { > @@ -876,11 +928,12 @@ INLINE uint64_t extractFloat128Frac0( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the exponent bits of the quadruple-precision floating-point value > -| `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the exponent bits of the quadruple-precision floating-point value > +`a'. > +------------------------------------------------------------------------------- > +*/ > INLINE int32 extractFloat128Exp( float128 a ) > { > > @@ -888,10 +941,11 @@ INLINE int32 extractFloat128Exp( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the sign bit of the quadruple-precision floating-point value `a'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the sign bit of the quadruple-precision floating-point value `a'. > +------------------------------------------------------------------------------- > +*/ > INLINE flag extractFloat128Sign( float128 a ) > { > > @@ -899,16 +953,17 @@ INLINE flag extractFloat128Sign( float128 a ) > > } > > -/*---------------------------------------------------------------------------- > -| Normalizes the subnormal quadruple-precision floating-point value > -| represented by the denormalized significand formed by the concatenation of > -| `aSig0' and `aSig1'. The normalized exponent is stored at the location > -| pointed to by `zExpPtr'. The most significant 49 bits of the normalized > -| significand are stored at the location pointed to by `zSig0Ptr', and the > -| least significant 64 bits of the normalized significand are stored at the > -| location pointed to by `zSig1Ptr'. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Normalizes the subnormal quadruple-precision floating-point value > +represented by the denormalized significand formed by the concatenation of > +`aSig0' and `aSig1'. The normalized exponent is stored at the location > +pointed to by `zExpPtr'. The most significant 49 bits of the normalized > +significand are stored at the location pointed to by `zSig0Ptr', and the > +least significant 64 bits of the normalized significand are stored at the > +location pointed to by `zSig1Ptr'. > +------------------------------------------------------------------------------- > +*/ > static void > normalizeFloat128Subnormal( > uint64_t aSig0, > @@ -940,19 +995,20 @@ static void > > } > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', the exponent `zExp', and the significand formed > -| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > -| floating-point value, returning the result. After being shifted into the > -| proper positions, the three fields `zSign', `zExp', and `zSig0' are simply > -| added together to form the most significant 32 bits of the result. This > -| means that any integer portion of `zSig0' will be added into the exponent. > -| Since a properly normalized significand will have an integer portion equal > -| to 1, the `zExp' input should be 1 less than the desired result exponent > -| whenever `zSig0' and `zSig1' concatenated form a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', the exponent `zExp', and the significand formed > +by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > +floating-point value, returning the result. After being shifted into the > +proper positions, the three fields `zSign', `zExp', and `zSig0' are simply > +added together to form the most significant 32 bits of the result. This > +means that any integer portion of `zSig0' will be added into the exponent. > +Since a properly normalized significand will have an integer portion equal > +to 1, the `zExp' input should be 1 less than the desired result exponent > +whenever `zSig0' and `zSig1' concatenated form a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > INLINE float128 > packFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 ) > { > @@ -964,27 +1020,28 @@ INLINE float128 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and extended significand formed by the concatenation of `zSig0', `zSig1', > -| and `zSig2', and returns the proper quadruple-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| simply rounded and packed into the quadruple-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value is > -| returned. If the abstract value is too small, the input value is rounded to > -| a subnormal number, and the underflow and inexact exceptions are raised if > -| the abstract input cannot be represented exactly as a subnormal quadruple- > -| precision floating-point number. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the result > -| returned is a subnormal number, and it must not require rounding. In the > -| usual case that the input significand is normalized, `zExp' must be 1 less > -| than the ``true'' floating-point exponent. The handling of underflow and > -| overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and extended significand formed by the concatenation of `zSig0', `zSig1', > +and `zSig2', and returns the proper quadruple-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +simply rounded and packed into the quadruple-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded to > +a subnormal number, and the underflow and inexact exceptions are raised if > +the abstract input cannot be represented exactly as a subnormal quadruple- > +precision floating-point number. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. In the > +usual case that the input significand is normalized, `zExp' must be 1 less > +than the ``true'' floating-point exponent. The handling of underflow and > +overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 > roundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1, uint64_t zSig2 STATUS_PARAM) > @@ -1079,16 +1136,17 @@ static float128 > > } > > -/*---------------------------------------------------------------------------- > -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', > -| and significand formed by the concatenation of `zSig0' and `zSig1', and > -| returns the proper quadruple-precision floating-point value corresponding > -| to the abstract input. This routine is just like `roundAndPackFloat128' > -| except that the input significand has fewer bits and does not have to be > -| normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- > -| point exponent. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Takes an abstract floating-point value having sign `zSign', exponent `zExp', > +and significand formed by the concatenation of `zSig0' and `zSig1', and > +returns the proper quadruple-precision floating-point value corresponding > +to the abstract input. This routine is just like `roundAndPackFloat128' > +except that the input significand has fewer bits and does not have to be > +normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- > +point exponent. > +------------------------------------------------------------------------------- > +*/ > static float128 > normalizeRoundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 STATUS_PARAM) > @@ -1115,13 +1173,14 @@ static float128 > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float32 int32_to_float32( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float32 int32_to_float32( int32 a STATUS_PARAM) > { > flag zSign; > > @@ -1132,13 +1191,14 @@ float32 int32_to_float32( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float64 int32_to_float64( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float64 int32_to_float64( int32 a STATUS_PARAM) > { > flag zSign; > uint32 absA; > @@ -1154,13 +1214,14 @@ float64 int32_to_float64( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1177,12 +1238,13 @@ floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 32-bit two's complement integer `a' to > -| the quadruple-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 32-bit two's complement integer `a' to > +the quadruple-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 int32_to_float128( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1199,12 +1261,13 @@ float128 int32_to_float128( int32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the single-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 int64_to_float32( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1252,12 +1315,13 @@ float32 uint64_to_float32( uint64 a STATUS_PARAM ) > } > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the double-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 int64_to_float64( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1285,13 +1349,14 @@ float64 uint64_to_float64(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat64(0, exp, a STATUS_VAR); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1306,12 +1371,13 @@ floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the 64-bit two's complement integer `a' to > -| the quadruple-precision floating-point format. The conversion is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the 64-bit two's complement integer `a' to > +the quadruple-precision floating-point format. The conversion is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 int64_to_float128( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1347,16 +1413,17 @@ float128 uint64_to_float128(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat128(0, 0x406E, a, 0 STATUS_VAR); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1378,16 +1445,17 @@ int32 float32_to_int32( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1421,15 +1489,17 @@ int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > { > @@ -1470,16 +1540,17 @@ int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1507,16 +1578,17 @@ int64 float32_to_int64( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1554,13 +1626,14 @@ int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float32_to_float64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1584,13 +1657,14 @@ float64 float32_to_float64( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1614,13 +1688,14 @@ floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the single-precision floating-point value > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the single-precision floating-point value > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float32_to_float128( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1644,14 +1719,15 @@ float128 float32_to_float128( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the single-precision floating-point value `a' to an integer, and > -| returns the result as a single-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -float32 float32_round_to_int( float32 a STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Rounds the single-precision floating-point value `a' to an integer, and > +returns the result as a single-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +float32 float32_round_to_int( float32 a STATUS_PARAM ) > { > flag aSign; > int_fast16_t aExp; > @@ -1704,15 +1780,16 @@ float32 float32_round_to_int( float32 a STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the single-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the single-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1783,15 +1860,16 @@ static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the single- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > -static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the single- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > +static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1858,12 +1936,13 @@ static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the single-precision floating-point values `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the single-precision floating-point values `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_add( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1881,12 +1960,13 @@ float32 float32_add( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the single-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the single-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1904,12 +1984,13 @@ float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -1967,12 +2048,13 @@ float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the single-precision floating-point value `a' > -| by the corresponding value `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the single-precision floating-point value `a' > +by the corresponding value `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_div( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -2031,12 +2113,13 @@ float32 float32_div( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the single-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the single-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2132,16 +2215,18 @@ float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the single-precision floating-point values > -| `a' and `b' then adding 'c', with no intermediate rounding step after the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that negating > -| externally will flip the sign bit on NaNs.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the single-precision floating-point values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------------- > +*/ > > float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) > { > @@ -2339,12 +2424,13 @@ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) > } > > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the single-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_sqrt( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -2394,23 +2480,25 @@ float32 float32_sqrt( float32 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary exponential of the single-precision floating-point value > -| `a'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -| > -| Uses the following identities: > -| > -| 1. ------------------------------------------------------------------------- > -| x x*ln(2) > -| 2 = e > -| > -| 2. ------------------------------------------------------------------------- > -| 2 3 4 5 n > -| x x x x x x x > -| e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > -| 1! 2! 3! 4! 5! n! > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary exponential of the single-precision floating-point value > +`a'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > + > +Uses the following identities: > + > +1. ------------------------------------------------------------------------- > + x x*ln(2) > + 2 = e > + > +2. ------------------------------------------------------------------------- > + 2 3 4 5 n > + x x x x x x x > + e = 1 + --- + --- + --- + --- + --- + ... + --- + ... > + 1! 2! 3! 4! 5! n! > +------------------------------------------------------------------------------- > +*/ > > static const float64 float32_exp2_coefficients[15] = > { > @@ -2474,11 +2562,13 @@ float32 float32_exp2( float32 a STATUS_PARAM ) > return float64_to_float32(r, status); > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the single-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary log of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float32_log2( float32 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2522,12 +2612,14 @@ float32 float32_log2( float32 a STATUS_PARAM ) > return normalizeRoundAndPackFloat32( zSign, 0x85, zSig STATUS_VAR ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_eq( float32 a, float32 b STATUS_PARAM ) > { > @@ -2546,12 +2638,14 @@ int float32_eq( float32 a, float32 b STATUS_PARAM ) > return ( av == bv ) || ( (uint32_t) ( ( av | bv )<<1 ) == 0 ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_le( float32 a, float32 b STATUS_PARAM ) > { > @@ -2575,12 +2669,14 @@ int float32_le( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_lt( float32 a, float32 b STATUS_PARAM ) > { > @@ -2604,12 +2700,14 @@ int float32_lt( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered( float32 a, float32 b STATUS_PARAM ) > { > @@ -2625,12 +2723,14 @@ int float32_unordered( float32 a, float32 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2649,12 +2749,14 @@ int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > ( (uint32_t) ( ( float32_val(a) | float32_val(b) )<<1 ) == 0 ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2680,12 +2782,14 @@ int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2711,12 +2815,14 @@ int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the single-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the single-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2734,16 +2840,17 @@ int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2762,16 +2869,17 @@ int32 float64_to_int32( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2809,15 +2917,17 @@ int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > > int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > { > @@ -2860,16 +2970,17 @@ int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > return z; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2903,16 +3014,17 @@ int64 float64_to_int64( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2956,13 +3068,14 @@ int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the single-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the single-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float64_to_float32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2989,16 +3102,18 @@ float32 float64_to_float32( float64 a STATUS_PARAM ) > } > > > -/*---------------------------------------------------------------------------- > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| half-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `zSig' > -| will be added into the exponent. Since a properly normalized significand > -| will have an integer portion equal to 1, the `zExp' input should be 1 less > -| than the desired result exponent whenever `zSig' is a complete, normalized > -| significand. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +half-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zSig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 less > +than the desired result exponent whenever `zSig' is a complete, normalized > +significand. > +------------------------------------------------------------------------------- > +*/ > static float16 packFloat16(flag zSign, int_fast16_t zExp, uint16_t zSig) > { > return make_float16( > @@ -3132,13 +3247,14 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM) > return packFloat16(aSign, aExp + 14, aSig >> 13); > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3163,13 +3279,14 @@ floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the double-precision floating-point value > -| `a' to the quadruple-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the double-precision floating-point value > +`a' to the quadruple-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float64_to_float128( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3194,13 +3311,14 @@ float128 float64_to_float128( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the double-precision floating-point value `a' to an integer, and > -| returns the result as a double-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the double-precision floating-point value `a' to an integer, and > +returns the result as a double-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3267,14 +3385,15 @@ float64 float64_trunc_to_int( float64 a STATUS_PARAM) > return res; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the double-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the double-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3346,14 +3465,15 @@ static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3421,12 +3541,13 @@ static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the double-precision floating-point values `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the double-precision floating-point values `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_add( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3444,12 +3565,13 @@ float64 float64_add( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the double-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the double-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3467,12 +3589,13 @@ float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3528,12 +3651,13 @@ float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the double-precision floating-point value `a' > -| by the corresponding value `b'. The operation is performed according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the double-precision floating-point value `a' > +by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_div( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3600,12 +3724,13 @@ float64 float64_div( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the double-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the double-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -3686,16 +3811,18 @@ float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the double-precision floating-point values > -| `a' and `b' then adding 'c', with no intermediate rounding step after the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that negating > -| externally will flip the sign bit on NaNs.) > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the double-precision floating-point values > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negating > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------------- > +*/ > > float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) > { > @@ -3912,12 +4039,13 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) > } > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the double-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_sqrt( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3964,11 +4092,13 @@ float64 float64_sqrt( float64 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the binary log of the double-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns the binary log of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float64_log2( float64 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4011,12 +4141,14 @@ float64 float64_log2( float64 a STATUS_PARAM ) > return normalizeRoundAndPackFloat64( zSign, 0x408, zSig STATUS_VAR ); > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to the > -| corresponding value `b', and 0 otherwise. The invalid exception is raised > -| if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. The invalid exception is raised > +if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_eq( float64 a, float64 b STATUS_PARAM ) > { > @@ -4036,12 +4168,14 @@ int float64_eq( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_le( float64 a, float64 b STATUS_PARAM ) > { > @@ -4065,12 +4199,14 @@ int float64_le( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_lt( float64 a, float64 b STATUS_PARAM ) > { > @@ -4094,12 +4230,14 @@ int float64_lt( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered( float64 a, float64 b STATUS_PARAM ) > { > @@ -4115,12 +4253,14 @@ int float64_unordered( float64 a, float64 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is equal to the > -| corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception.The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is equal to the > +corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception.The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4142,12 +4282,14 @@ int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4173,12 +4315,14 @@ int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4204,12 +4348,14 @@ int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the double-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the double-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4227,16 +4373,17 @@ int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, the > -| largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, the > +largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4254,16 +4401,17 @@ int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the same > -| sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4299,16 +4447,17 @@ int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the conversion > -| is rounded according to the current rounding mode. If `a' is a NaN, > -| the largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, > +the largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4339,16 +4488,17 @@ int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returned. > -| Otherwise, if the conversion overflows, the largest integer with the same > -| sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4383,13 +4533,14 @@ int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the single-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the single-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4411,13 +4562,14 @@ float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4439,13 +4591,14 @@ float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the extended double-precision floating- > -| point value `a' to the quadruple-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the quadruple-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4463,13 +4616,14 @@ float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the extended double-precision floating-point value `a' to an integer, > -| and returns the result as an extended quadruple-precision floating-point > -| value. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the extended double-precision floating-point value `a' to an integer, > +and returns the result as an extended quadruple-precision floating-point > +value. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4536,14 +4690,15 @@ floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the extended double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > -| negated before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the extended double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > +negated before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -4602,14 +4757,15 @@ static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the extended > -| double-precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the extended > +double-precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM ) > { > int32 aExp, bExp, zExp; > @@ -4670,12 +4826,13 @@ static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the extended double-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the extended double-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4691,12 +4848,13 @@ floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the extended double-precision floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4712,12 +4870,13 @@ floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the extended double-precision floating- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4771,12 +4930,13 @@ floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the extended double-precision floating-point > -| value `a' by the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the extended double-precision floating-point > +value `a' by the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4851,12 +5011,13 @@ floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the extended double-precision floating-point value > -| `a' with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the extended double-precision floating-point value > +`a' with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4947,12 +5108,13 @@ floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the extended double-precision floating-point > -| value `a'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the extended double-precision floating-point > +value `a'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -5017,12 +5179,14 @@ floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is equal > -| to the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is equal > +to the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5044,13 +5208,15 @@ int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than or equal to the corresponding value `b', and 0 otherwise. The > -| invalid exception is raised if either operand is a NaN. The comparison is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than or equal to the corresponding value `b', and 0 otherwise. The > +invalid exception is raised if either operand is a NaN. The comparison is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5078,12 +5244,14 @@ int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5111,12 +5279,14 @@ int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' and `b' > -| cannot be compared, and 0 otherwise. The invalid exception is raised if > -| either operand is a NaN. The comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and `b' > +cannot be compared, and 0 otherwise. The invalid exception is raised if > +either operand is a NaN. The comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5130,12 +5300,14 @@ int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5160,12 +5332,14 @@ int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is less > -| than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > -| do not cause an exception. Otherwise, the comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is less > +than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs > +do not cause an exception. Otherwise, the comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5196,12 +5370,14 @@ int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point value `a' is less > -| than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > -| an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point value `a' is less > +than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause > +an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5232,12 +5408,14 @@ int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the extended double-precision floating-point values `a' and `b' > -| cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > -| The comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the extended double-precision floating-point values `a' and `b' > +cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. > +The comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) > @@ -5254,16 +5432,17 @@ int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5283,16 +5462,17 @@ int32 float128_to_int32( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if the > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5331,16 +5511,17 @@ int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows, the > -| largest integer with the same sign as `a' is returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, the > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5374,16 +5555,17 @@ int64 float128_to_int64( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if > -| the conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------------- > +*/ > int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5435,13 +5617,14 @@ int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the single-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the single-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float32 float128_to_float32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5470,13 +5653,14 @@ float32 float128_to_float32( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float64 float128_to_float64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5503,13 +5687,14 @@ float64 float128_to_float64( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the extended double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the extended double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5538,13 +5723,14 @@ floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Rounds the quadruple-precision floating-point value `a' to an integer, and > -| returns the result as a quadruple-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Rounds the quadruple-precision floating-point value `a' to an integer, and > +returns the result as a quadruple-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5641,14 +5827,15 @@ float128 float128_round_to_int( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the absolute values of the quadruple-precision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the absolute values of the quadruple-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5727,14 +5914,15 @@ static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the absolute values of the quadruple- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the absolute values of the quadruple- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5811,12 +5999,13 @@ static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of adding the quadruple-precision floating-point values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of adding the quadruple-precision floating-point values > +`a' and `b'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_add( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5832,12 +6021,13 @@ float128 float128_add( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of subtracting the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of subtracting the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5853,12 +6043,13 @@ float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of multiplying the quadruple-precision floating-point > -| values `a' and `b'. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of multiplying the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -5917,12 +6108,13 @@ float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the result of dividing the quadruple-precision floating-point value > -| `a' by the corresponding value `b'. The operation is performed according to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the result of dividing the quadruple-precision floating-point value > +`a' by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_div( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -6001,12 +6193,13 @@ float128 float128_div( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the remainder of the quadruple-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the remainder of the quadruple-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -6110,12 +6303,13 @@ float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns the square root of the quadruple-precision floating-point value `a'. > -| The operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > - > +/* > +------------------------------------------------------------------------------- > +Returns the square root of the quadruple-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > float128 float128_sqrt( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -6179,12 +6373,14 @@ float128 float128_sqrt( float128 a STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_eq( float128 a, float128 b STATUS_PARAM ) > { > @@ -6206,12 +6402,14 @@ int float128_eq( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_le( float128 a, float128 b STATUS_PARAM ) > { > @@ -6239,12 +6437,14 @@ int float128_le( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed according > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_lt( float128 a, float128 b STATUS_PARAM ) > { > @@ -6272,12 +6472,14 @@ int float128_lt( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. The invalid exception is raised if either > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered( float128 a, float128 b STATUS_PARAM ) > { > @@ -6292,12 +6494,14 @@ int float128_unordered( float128 a, float128 b STATUS_PARAM ) > return 0; > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. The comparison is performed according to the IEC/IEEE Standard > -| for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6322,12 +6526,14 @@ int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > -| cause an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6358,12 +6564,14 @@ int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception. Otherwise, the comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6394,12 +6602,14 @@ int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > > } > > -/*---------------------------------------------------------------------------- > -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------------- > +*/ > > int float128_unordered_quiet( float128 a, float128 b STATUS_PARAM ) > { > diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h > index f3927e2..b646621 100644 > --- a/include/fpu/softfloat.h > +++ b/include/fpu/softfloat.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ > > -/*============================================================================ > +/* > +============================================================================ > > -This C header file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic > -Package, Release 2b. > +This C header file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. > > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Center > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version > of this code was written as part of a project to build a fixed-point vector > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. > > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. > > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice that > -the work is derivative, and (2) the source code includes prominent notice with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) they > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. > > -=============================================================================*/ > +=============================================================================== > +*/ > > #ifndef SOFTFLOAT_H > #define SOFTFLOAT_H > @@ -46,14 +45,16 @@ these four paragraphs for those parts of this code that are retained. > #include "config-host.h" > #include "qemu/osdep.h" > > -/*---------------------------------------------------------------------------- > -| Each of the following `typedef's defines the most convenient type that holds > -| integers of at least as many bits as specified. For example, `uint8' should > -| be the most convenient type that can hold unsigned integers of as many as > -| 8 bits. The `flag' type must be able to hold either a 0 or 1. For most > -| implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed > -| to the same as `int'. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Each of the following `typedef's defines the most convenient type that holds > +integers of at least as many bits as specified. For example, `uint8' should > +be the most convenient type that can hold unsigned integers of as many as > +8 bits. The `flag' type must be able to hold either a 0 or 1. For most > +implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed > +to the same as `int'. > +------------------------------------------------------------------------------- > +*/ > typedef uint8_t flag; > typedef uint8_t uint8; > typedef int8_t int8; > @@ -69,9 +70,11 @@ typedef int64_t int64; > #define STATUS(field) status->field > #define STATUS_VAR , status > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point ordering relations > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point ordering relations > +------------------------------------------------------------------------------- > +*/ > enum { > float_relation_less = -1, > float_relation_equal = 0, > @@ -79,9 +82,11 @@ enum { > float_relation_unordered = 2 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point types. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point types. > +------------------------------------------------------------------------------- > +*/ > /* Use structures for soft-float types. This prevents accidentally mixing > them with native int/float types. A sufficiently clever compiler and > sane ABI should be able to see though these structs. However > @@ -137,17 +142,21 @@ typedef struct { > #define make_float128(high_, low_) ((float128) { .high = high_, .low = low_ }) > #define make_float128_init(high_, low_) { .high = high_, .low = low_ } > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point underflow tininess-detection mode. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point underflow tininess-detection mode. > +------------------------------------------------------------------------------- > +*/ > enum { > float_tininess_after_rounding = 0, > float_tininess_before_rounding = 1 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point rounding mode. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point rounding mode. > +------------------------------------------------------------------------------- > +*/ > enum { > float_round_nearest_even = 0, > float_round_down = 1, > @@ -155,9 +164,11 @@ enum { > float_round_to_zero = 3 > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE floating-point exception flags. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE floating-point exception flags. > +------------------------------------------------------------------------------- > +*/ > enum { > float_flag_invalid = 1, > float_flag_divbyzero = 4, > @@ -167,7 +178,6 @@ enum { > float_flag_input_denormal = 64, > float_flag_output_denormal = 128 > }; > - > typedef struct float_status { > signed char float_detect_tininess; > signed char float_rounding_mode; > @@ -204,27 +214,33 @@ INLINE int get_float_exception_flags(float_status *status) > } > void set_floatx80_rounding_precision(int val STATUS_PARAM); > > -/*---------------------------------------------------------------------------- > -| Routine to raise any or all of the software IEC/IEEE floating-point > -| exception flags. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Routine to raise any or all of the software IEC/IEEE floating-point > +exception flags. > +------------------------------------------------------------------------------- > +*/ > void float_raise( int8 flags STATUS_PARAM); > > -/*---------------------------------------------------------------------------- > -| Options to indicate which negations to perform in float*_muladd() > -| Using these differs from negating an input or output before calling > -| the muladd function in that this means that a NaN doesn't have its > -| sign bit inverted before it is propagated. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Options to indicate which negations to perform in float*_muladd() > +Using these differs from negating an input or output before calling > +the muladd function in that this means that a NaN doesn't have its > +sign bit inverted before it is propagated. > +------------------------------------------------------------------------------- > +*/ > enum { > float_muladd_negate_c = 1, > float_muladd_negate_product = 2, > float_muladd_negate_result = 4, > }; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE integer-to-floating-point conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE integer-to-floating-point conversion routines. > +------------------------------------------------------------------------------- > +*/ > float32 int32_to_float32( int32 STATUS_PARAM ); > float64 int32_to_float64( int32 STATUS_PARAM ); > float32 uint32_to_float32( uint32 STATUS_PARAM ); > @@ -239,15 +255,19 @@ floatx80 int64_to_floatx80( int64 STATUS_PARAM ); > float128 int64_to_float128( int64 STATUS_PARAM ); > float128 uint64_to_float128( uint64 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software half-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software half-precision conversion routines. > +*---------------------------------------------------------------------------- > +*/ > float16 float32_to_float16( float32, flag STATUS_PARAM ); > float32 float16_to_float32( float16, flag STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software half-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software half-precision operations. > +------------------------------------------------------------------------------- > +*/ > int float16_is_quiet_nan( float16 ); > int float16_is_signaling_nan( float16 ); > float16 float16_maybe_silence_nan( float16 ); > @@ -257,14 +277,18 @@ INLINE int float16_is_any_nan(float16 a) > return ((float16_val(a) & ~0x8000) > 0x7c00); > } > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated half-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float16 float16_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float32_to_int16_round_to_zero(float32 STATUS_PARAM); > uint_fast16_t float32_to_uint16_round_to_zero(float32 STATUS_PARAM); > int32 float32_to_int32( float32 STATUS_PARAM ); > @@ -277,9 +301,11 @@ float64 float32_to_float64( float32 STATUS_PARAM ); > floatx80 float32_to_floatx80( float32 STATUS_PARAM ); > float128 float32_to_float128( float32 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE single-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE single-precision operations. > +------------------------------------------------------------------------------- > +*/ > float32 float32_round_to_int( float32 STATUS_PARAM ); > float32 float32_add( float32, float32 STATUS_PARAM ); > float32 float32_sub( float32, float32 STATUS_PARAM ); > @@ -361,14 +387,18 @@ INLINE float32 float32_set_sign(float32 a, int sign) > #define float32_infinity make_float32(0x7f800000) > > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated single-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float32 float32_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int_fast16_t float64_to_int16_round_to_zero(float64 STATUS_PARAM); > uint_fast16_t float64_to_uint16_round_to_zero(float64 STATUS_PARAM); > int32 float64_to_int32( float64 STATUS_PARAM ); > @@ -383,9 +413,11 @@ float32 float64_to_float32( float64 STATUS_PARAM ); > floatx80 float64_to_floatx80( float64 STATUS_PARAM ); > float128 float64_to_float128( float64 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE double-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE double-precision operations. > +------------------------------------------------------------------------------- > +*/ > float64 float64_round_to_int( float64 STATUS_PARAM ); > float64 float64_trunc_to_int( float64 STATUS_PARAM ); > float64 float64_add( float64, float64 STATUS_PARAM ); > @@ -467,14 +499,18 @@ INLINE float64 float64_set_sign(float64 a, int sign) > #define float64_half make_float64(0x3fe0000000000000LL) > #define float64_infinity make_float64(0x7ff0000000000000LL) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float64 float64_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int32 floatx80_to_int32( floatx80 STATUS_PARAM ); > int32 floatx80_to_int32_round_to_zero( floatx80 STATUS_PARAM ); > int64 floatx80_to_int64( floatx80 STATUS_PARAM ); > @@ -483,9 +519,11 @@ float32 floatx80_to_float32( floatx80 STATUS_PARAM ); > float64 floatx80_to_float64( floatx80 STATUS_PARAM ); > float128 floatx80_to_float128( floatx80 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE extended double-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE extended double-precision operations. > +------------------------------------------------------------------------------- > +*/ > floatx80 floatx80_round_to_int( floatx80 STATUS_PARAM ); > floatx80 floatx80_add( floatx80, floatx80 STATUS_PARAM ); > floatx80 floatx80_sub( floatx80, floatx80 STATUS_PARAM ); > @@ -552,14 +590,18 @@ INLINE int floatx80_is_any_nan(floatx80 a) > #define floatx80_half make_floatx80(0x3ffe, 0x8000000000000000LL) > #define floatx80_infinity make_floatx80(0x7fff, 0x8000000000000000LL) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated extended double-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const floatx80 floatx80_default_nan; > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision conversion routines. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision conversion routines. > +------------------------------------------------------------------------------- > +*/ > int32 float128_to_int32( float128 STATUS_PARAM ); > int32 float128_to_int32_round_to_zero( float128 STATUS_PARAM ); > int64 float128_to_int64( float128 STATUS_PARAM ); > @@ -568,9 +610,11 @@ float32 float128_to_float32( float128 STATUS_PARAM ); > float64 float128_to_float64( float128 STATUS_PARAM ); > floatx80 float128_to_floatx80( float128 STATUS_PARAM ); > > -/*---------------------------------------------------------------------------- > -| Software IEC/IEEE quadruple-precision operations. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +Software IEC/IEEE quadruple-precision operations. > +------------------------------------------------------------------------------- > +*/ > float128 float128_round_to_int( float128 STATUS_PARAM ); > float128 float128_add( float128, float128 STATUS_PARAM ); > float128 float128_sub( float128, float128 STATUS_PARAM ); > @@ -633,9 +677,11 @@ INLINE int float128_is_any_nan(float128 a) > > #define float128_zero make_float128(0, 0) > > -/*---------------------------------------------------------------------------- > -| The pattern for a default generated quadruple-precision NaN. > -*----------------------------------------------------------------------------*/ > +/* > +------------------------------------------------------------------------------- > +The pattern for a default generated quadruple-precision NaN. > +------------------------------------------------------------------------------- > +*/ > extern const float128 float128_default_nan; > > #endif /* !SOFTFLOAT_H */ > -- > 1.8.0 >
On 1 May 2013 18:53, Blue Swirl <blauwirbel@gmail.com> wrote: > On Mon, Apr 29, 2013 at 6:05 PM, Anthony Liguori <aliguori@us.ibm.com> wrote: >> d07cca0 Add native softfloat fpu functions (Christoph Egger) > d07cca0 was supplied by Christoph Egger (cc'd): > http://lists.nongnu.org/archive/html/qemu-devel/2008-11/msg00939.html As it happens the only fpu file that patch touches is the now-deleted softfloat-native.h, so I think it's ok anyway? -- PMM
On Wed, May 1, 2013 at 5:57 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 1 May 2013 18:53, Blue Swirl <blauwirbel@gmail.com> wrote: >> On Mon, Apr 29, 2013 at 6:05 PM, Anthony Liguori <aliguori@us.ibm.com> wrote: >>> d07cca0 Add native softfloat fpu functions (Christoph Egger) > >> d07cca0 was supplied by Christoph Egger (cc'd): >> http://lists.nongnu.org/archive/html/qemu-devel/2008-11/msg00939.html > > As it happens the only fpu file that patch touches is > the now-deleted softfloat-native.h, so I think it's ok > anyway? Right, that should be fine too. > > -- PMM
> The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. Acked-by: Paul Brook <paul@codesourcery.com>
On 29.04.2013 20:05, Anthony Liguori wrote: > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. Acked-by: Christophe Lyon <christophe.lyon@st.com>
On 29 April 2013 19:05, Anthony Liguori <aliguori@us.ibm.com> wrote: > N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. Acked-by: Peter Maydell <peter.maydell@linaro.org> Linaro is happy to relicense our softfloat changes under the Softfloat-2a license. thanks -- PMM
diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h index b5164af..2009315 100644 --- a/fpu/softfloat-macros.h +++ b/fpu/softfloat-macros.h @@ -4,10 +4,11 @@ * Derived from SoftFloat. */ -/*============================================================================ +/* +=============================================================================== This C source fragment is part of the SoftFloat IEC/IEEE Floating-point -Arithmetic Package, Release 2b. +Arithmetic Package, Release 2a. Written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center @@ -16,28 +17,27 @@ National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. More information -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ arithmetic/SoftFloat.html'. -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE -INSTITUTE (possibly via similar legal notice) AGAINST ALL LOSSES, COSTS, OR -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. Derivative works are acceptable, even for commercial purposes, so long as -(1) the source code for the derivative work includes prominent notice that -the work is derivative, and (2) the source code includes prominent notice with -these four paragraphs for those parts of this code that are retained. +(1) they include prominent notice that the work is derivative, and (2) they +include prominent notice akin to these four paragraphs for those parts of +this code that are retained. =============================================================================*/ -/*---------------------------------------------------------------------------- -| This macro tests for minimum version of the GNU C compiler. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +This macro tests for minimum version of the GNU C compiler. +------------------------------------------------------------------------------- +*/ #if defined(__GNUC__) && defined(__GNUC_MINOR__) # define SOFTFLOAT_GNUC_PREREQ(maj, min) \ ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) @@ -46,14 +46,16 @@ these four paragraphs for those parts of this code that are retained. #endif -/*---------------------------------------------------------------------------- -| Shifts `a' right by the number of bits given in `count'. If any nonzero -| bits are shifted off, they are ``jammed'' into the least significant bit of -| the result by setting the least significant bit to 1. The value of `count' -| can be arbitrarily large; in particular, if `count' is greater than 32, the -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. -| The result is stored in the location pointed to by `zPtr'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Shifts `a' right by the number of bits given in `count'. If any nonzero +bits are shifted off, they are ``jammed'' into the least significant bit of +the result by setting the least significant bit to 1. The value of `count' +can be arbitrarily large; in particular, if `count' is greater than 32, the +result will be either 0 or 1, depending on whether `a' is zero or nonzero. +The result is stored in the location pointed to by `zPtr'. +------------------------------------------------------------------------------- +*/ INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) { @@ -72,14 +74,16 @@ INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t *zPtr) } -/*---------------------------------------------------------------------------- -| Shifts `a' right by the number of bits given in `count'. If any nonzero -| bits are shifted off, they are ``jammed'' into the least significant bit of -| the result by setting the least significant bit to 1. The value of `count' -| can be arbitrarily large; in particular, if `count' is greater than 64, the -| result will be either 0 or 1, depending on whether `a' is zero or nonzero. -| The result is stored in the location pointed to by `zPtr'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Shifts `a' right by the number of bits given in `count'. If any nonzero +bits are shifted off, they are ``jammed'' into the least significant bit of +the result by setting the least significant bit to 1. The value of `count' +can be arbitrarily large; in particular, if `count' is greater than 64, the +result will be either 0 or 1, depending on whether `a' is zero or nonzero. +The result is stored in the location pointed to by `zPtr'. +------------------------------------------------------------------------------- +*/ INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) { @@ -98,23 +102,24 @@ INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t *zPtr) } -/*---------------------------------------------------------------------------- -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 -| _plus_ the number of bits given in `count'. The shifted result is at most -| 64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The -| bits shifted off form a second 64-bit result as follows: The _last_ bit -| shifted off is the most-significant bit of the extra result, and the other -| 63 bits of the extra result are all zero if and only if _all_but_the_last_ -| bits shifted off were all zero. This extra result is stored in the location -| pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. -| (This routine makes more sense if `a0' and `a1' are considered to form -| a fixed-point value with binary point between `a0' and `a1'. This fixed- -| point value is shifted right by the number of bits given in `count', and -| the integer part of the result is returned at the location pointed to by -| `z0Ptr'. The fractional part of the result may be slightly corrupted as -| described above, and is returned at the location pointed to by `z1Ptr'.) -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 +_plus_ the number of bits given in `count'. The shifted result is at most +64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The +bits shifted off form a second 64-bit result as follows: The _last_ bit +shifted off is the most-significant bit of the extra result, and the other +63 bits of the extra result are all zero if and only if _all_but_the_last_ +bits shifted off were all zero. This extra result is stored in the location +pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. + (This routine makes more sense if `a0' and `a1' are considered to form a +fixed-point value with binary point between `a0' and `a1'. This fixed-point +value is shifted right by the number of bits given in `count', and the +integer part of the result is returned at the location pointed to by +`z0Ptr'. The fractional part of the result may be slightly corrupted as +described above, and is returned at the location pointed to by `z1Ptr'.) +------------------------------------------------------------------------------- +*/ INLINE void shift64ExtraRightJamming( uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) @@ -144,14 +149,15 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the -| number of bits given in `count'. Any bits shifted off are lost. The value -| of `count' can be arbitrarily large; in particular, if `count' is greater -| than 128, the result will be 0. The result is broken into two 64-bit pieces -| which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the +number of bits given in `count'. Any bits shifted off are lost. The value +of `count' can be arbitrarily large; in particular, if `count' is greater +than 128, the result will be 0. The result is broken into two 64-bit pieces +which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void shift128Right( uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) @@ -176,17 +182,18 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the -| number of bits given in `count'. If any nonzero bits are shifted off, they -| are ``jammed'' into the least significant bit of the result by setting the -| least significant bit to 1. The value of `count' can be arbitrarily large; -| in particular, if `count' is greater than 128, the result will be either -| 0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or -| nonzero. The result is broken into two 64-bit pieces which are stored at -| the locations pointed to by `z0Ptr' and `z1Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the +number of bits given in `count'. If any nonzero bits are shifted off, they +are ``jammed'' into the least significant bit of the result by setting the +least significant bit to 1. The value of `count' can be arbitrarily large; +in particular, if `count' is greater than 128, the result will be either +0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or +nonzero. The result is broken into two 64-bit pieces which are stored at +the locations pointed to by `z0Ptr' and `z1Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void shift128RightJamming( uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) @@ -219,25 +226,26 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right -| by 64 _plus_ the number of bits given in `count'. The shifted result is -| at most 128 nonzero bits; these are broken into two 64-bit pieces which are -| stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted -| off form a third 64-bit result as follows: The _last_ bit shifted off is -| the most-significant bit of the extra result, and the other 63 bits of the -| extra result are all zero if and only if _all_but_the_last_ bits shifted off -| were all zero. This extra result is stored in the location pointed to by -| `z2Ptr'. The value of `count' can be arbitrarily large. -| (This routine makes more sense if `a0', `a1', and `a2' are considered -| to form a fixed-point value with binary point between `a1' and `a2'. This -| fixed-point value is shifted right by the number of bits given in `count', -| and the integer part of the result is returned at the locations pointed to -| by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly -| corrupted as described above, and is returned at the location pointed to by -| `z2Ptr'.) -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' right +by 64 _plus_ the number of bits given in `count'. The shifted result is +at most 128 nonzero bits; these are broken into two 64-bit pieces which are +stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shifted +off form a third 64-bit result as follows: The _last_ bit shifted off is +the most-significant bit of the extra result, and the other 63 bits of the +extra result are all zero if and only if _all_but_the_last_ bits shifted off +were all zero. This extra result is stored in the location pointed to by +`z2Ptr'. The value of `count' can be arbitrarily large. + (This routine makes more sense if `a0', `a1', and `a2' are considered +to form a fixed-point value with binary point between `a1' and `a2'. This +fixed-point value is shifted right by the number of bits given in `count', +and the integer part of the result is returned at the locations pointed to +by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slightly +corrupted as described above, and is returned at the location pointed to by +`z2Ptr'.) +------------------------------------------------------------------------------- +*/ INLINE void shift128ExtraRightJamming( uint64_t a0, @@ -289,13 +297,14 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the -| number of bits given in `count'. Any bits shifted off are lost. The value -| of `count' must be less than 64. The result is broken into two 64-bit -| pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the +number of bits given in `count'. Any bits shifted off are lost. The value +of `count' must be less than 64. The result is broken into two 64-bit +pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void shortShift128Left( uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint64_t *z1Ptr) @@ -307,14 +316,15 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left -| by the number of bits given in `count'. Any bits shifted off are lost. -| The value of `count' must be less than 64. The result is broken into three -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', -| `z1Ptr', and `z2Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left +by the number of bits given in `count'. Any bits shifted off are lost. +The value of `count' must be less than 64. The result is broken into three +64-bit pieces which are stored at the locations pointed to by `z0Ptr', +`z1Ptr', and `z2Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void shortShift192Left( uint64_t a0, @@ -343,13 +353,14 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit -| value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so -| any carry out is lost. The result is broken into two 64-bit pieces which -| are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit +value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so +any carry out is lost. The result is broken into two 64-bit pieces which +are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void add128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) @@ -362,14 +373,15 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the -| 192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is -| modulo 2^192, so any carry out is lost. The result is broken into three -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', -| `z1Ptr', and `z2Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the +192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is +modulo 2^192, so any carry out is lost. The result is broken into three +64-bit pieces which are stored at the locations pointed to by `z0Ptr', +`z1Ptr', and `z2Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void add192( uint64_t a0, @@ -400,14 +412,15 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the -| 128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo -| 2^128, so any borrow out (carry out) is lost. The result is broken into two -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and -| `z1Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the +128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo +2^128, so any borrow out (carry out) is lost. The result is broken into two +64-bit pieces which are stored at the locations pointed to by `z0Ptr' and +`z1Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void sub128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr ) @@ -418,14 +431,15 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' -| from the 192-bit value formed by concatenating `a0', `a1', and `a2'. -| Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The -| result is broken into three 64-bit pieces which are stored at the locations -| pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' +from the 192-bit value formed by concatenating `a0', `a1', and `a2'. +Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The +result is broken into three 64-bit pieces which are stored at the locations +pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void sub192( uint64_t a0, @@ -456,11 +470,13 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Multiplies `a' by `b' to obtain a 128-bit product. The product is broken -| into two 64-bit pieces which are stored at the locations pointed to by -| `z0Ptr' and `z1Ptr'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Multiplies `a' by `b' to obtain a 128-bit product. The product is broken +into two 64-bit pieces which are stored at the locations pointed to by +`z0Ptr' and `z1Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr ) { @@ -485,13 +501,14 @@ INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr } -/*---------------------------------------------------------------------------- -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' by -| `b' to obtain a 192-bit product. The product is broken into three 64-bit -| pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and -| `z2Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Multiplies the 128-bit value formed by concatenating `a0' and `a1' by +`b' to obtain a 192-bit product. The product is broken into three 64-bit +pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and +`z2Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void mul128By64To192( uint64_t a0, @@ -513,13 +530,14 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the -| 128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit -| product. The product is broken into four 64-bit pieces which are stored at -| the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the +128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit +product. The product is broken into four 64-bit pieces which are stored at +the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. +------------------------------------------------------------------------------- +*/ INLINE void mul128To256( uint64_t a0, @@ -550,14 +568,16 @@ INLINE void } -/*---------------------------------------------------------------------------- -| Returns an approximation to the 64-bit integer quotient obtained by dividing -| `b' into the 128-bit value formed by concatenating `a0' and `a1'. The -| divisor `b' must be at least 2^63. If q is the exact quotient truncated -| toward zero, the approximation returned lies between q and q + 2 inclusive. -| If the exact quotient q is larger than 64 bits, the maximum positive 64-bit -| unsigned integer is returned. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns an approximation to the 64-bit integer quotient obtained by dividing +`b' into the 128-bit value formed by concatenating `a0' and `a1'. The +divisor `b' must be at least 2^63. If q is the exact quotient truncated +toward zero, the approximation returned lies between q and q + 2 inclusive. +If the exact quotient q is larger than 64 bits, the maximum positive 64-bit +unsigned integer is returned. +------------------------------------------------------------------------------- +*/ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) { @@ -581,15 +601,17 @@ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b ) } -/*---------------------------------------------------------------------------- -| Returns an approximation to the square root of the 32-bit significand given -| by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of -| `aExp' (the least significant bit) is 1, the integer returned approximates -| 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' -| is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either -| case, the approximation returned lies strictly within +/-2 of the exact -| value. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns an approximation to the square root of the 32-bit significand given +by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of +`aExp' (the least significant bit) is 1, the integer returned approximates +2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' +is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either +case, the approximation returned lies strictly within +/-2 of the exact +value. +------------------------------------------------------------------------------- +*/ static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) { @@ -620,10 +642,12 @@ static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) } -/*---------------------------------------------------------------------------- -| Returns the number of leading 0 bits before the most-significant 1 bit of -| `a'. If `a' is zero, 32 is returned. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the number of leading 0 bits before the most-significant 1 bit of +`a'. If `a' is zero, 32 is returned. +------------------------------------------------------------------------------- +*/ static int8 countLeadingZeros32( uint32_t a ) { @@ -668,10 +692,12 @@ static int8 countLeadingZeros32( uint32_t a ) #endif } -/*---------------------------------------------------------------------------- -| Returns the number of leading 0 bits before the most-significant 1 bit of -| `a'. If `a' is zero, 64 is returned. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the number of leading 0 bits before the most-significant 1 bit of +`a'. If `a' is zero, 64 is returned. +------------------------------------------------------------------------------- +*/ static int8 countLeadingZeros64( uint64_t a ) { @@ -696,11 +722,13 @@ static int8 countLeadingZeros64( uint64_t a ) #endif } -/*---------------------------------------------------------------------------- -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' -| is equal to the 128-bit value formed by concatenating `b0' and `b1'. -| Otherwise, returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' +is equal to the 128-bit value formed by concatenating `b0' and `b1'. +Otherwise, returns 0. +------------------------------------------------------------------------------- +*/ INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) { @@ -709,11 +737,13 @@ INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less -| than or equal to the 128-bit value formed by concatenating `b0' and `b1'. -| Otherwise, returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less +than or equal to the 128-bit value formed by concatenating `b0' and `b1'. +Otherwise, returns 0. +------------------------------------------------------------------------------- +*/ INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) { @@ -722,11 +752,13 @@ INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less -| than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, -| returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less +than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, +returns 0. +------------------------------------------------------------------------------- +*/ INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) { @@ -735,11 +767,13 @@ INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is -| not equal to the 128-bit value formed by concatenating `b0' and `b1'. -| Otherwise, returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is +not equal to the 128-bit value formed by concatenating `b0' and `b1'. +Otherwise, returns 0. +------------------------------------------------------------------------------- +*/ INLINE flag ne128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) { diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h index 518f694..ba9bfeb 100644 --- a/fpu/softfloat-specialize.h +++ b/fpu/softfloat-specialize.h @@ -4,10 +4,11 @@ * Derived from SoftFloat. */ -/*============================================================================ +/* +=============================================================================== This C source fragment is part of the SoftFloat IEC/IEEE Floating-point -Arithmetic Package, Release 2b. +Arithmetic Package, Release 2a. Written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center @@ -16,22 +17,19 @@ National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. More information -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ arithmetic/SoftFloat.html'. -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. Derivative works are acceptable, even for commercial purposes, so long as -(1) the source code for the derivative work includes prominent notice that -the work is derivative, and (2) the source code includes prominent notice with -these four paragraphs for those parts of this code that are retained. +(1) they include prominent notice that the work is derivative, and (2) they +include prominent notice akin to these four paragraphs for those parts of +this code that are retained. =============================================================================*/ @@ -48,9 +46,11 @@ these four paragraphs for those parts of this code that are retained. #define NO_SIGNALING_NANS 1 #endif -/*---------------------------------------------------------------------------- -| The pattern for a default generated half-precision NaN. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated half-precision NaN. +------------------------------------------------------------------------------- +*/ #if defined(TARGET_ARM) const float16 float16_default_nan = const_float16(0x7E00); #elif SNAN_BIT_IS_ONE @@ -59,9 +59,11 @@ const float16 float16_default_nan = const_float16(0x7DFF); const float16 float16_default_nan = const_float16(0xFE00); #endif -/*---------------------------------------------------------------------------- -| The pattern for a default generated single-precision NaN. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated single-precision NaN. +------------------------------------------------------------------------------- +*/ #if defined(TARGET_SPARC) const float32 float32_default_nan = const_float32(0x7FFFFFFF); #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) || \ @@ -73,9 +75,11 @@ const float32 float32_default_nan = const_float32(0x7FBFFFFF); const float32 float32_default_nan = const_float32(0xFFC00000); #endif -/*---------------------------------------------------------------------------- -| The pattern for a default generated double-precision NaN. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated double-precision NaN. +------------------------------------------------------------------------------- +*/ #if defined(TARGET_SPARC) const float64 float64_default_nan = const_float64(LIT64( 0x7FFFFFFFFFFFFFFF )); #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) @@ -86,9 +90,11 @@ const float64 float64_default_nan = const_float64(LIT64( 0x7FF7FFFFFFFFFFFF )); const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); #endif -/*---------------------------------------------------------------------------- -| The pattern for a default generated extended double-precision NaN. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated extended double-precision NaN. +------------------------------------------------------------------------------- +*/ #if SNAN_BIT_IS_ONE #define floatx80_default_nan_high 0x7FFF #define floatx80_default_nan_low LIT64( 0xBFFFFFFFFFFFFFFF ) @@ -100,10 +106,12 @@ const float64 float64_default_nan = const_float64(LIT64( 0xFFF8000000000000 )); const floatx80 floatx80_default_nan = make_floatx80_init(floatx80_default_nan_high, floatx80_default_nan_low); -/*---------------------------------------------------------------------------- -| The pattern for a default generated quadruple-precision NaN. The `high' and -| `low' values hold the most- and least-significant bits, respectively. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated quadruple-precision NaN. The `high' and +`low' values hold the most- and least-significant bits, respectively. +------------------------------------------------------------------------------- +*/ #if SNAN_BIT_IS_ONE #define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF ) #define float128_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) @@ -115,21 +123,25 @@ const floatx80 floatx80_default_nan const float128 float128_default_nan = make_float128_init(float128_default_nan_high, float128_default_nan_low); -/*---------------------------------------------------------------------------- -| Raises the exceptions specified by `flags'. Floating-point traps can be -| defined here if desired. It is currently not possible for such a trap -| to substitute a result value. If traps are not implemented, this routine -| should be simply `float_exception_flags |= flags;'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Raises the exceptions specified by `flags'. Floating-point traps can be +defined here if desired. It is currently not possible for such a trap +to substitute a result value. If traps are not implemented, this routine +should be simply `float_exception_flags |= flags;'. +------------------------------------------------------------------------------- +*/ void float_raise( int8 flags STATUS_PARAM ) { STATUS(float_exception_flags) |= flags; } -/*---------------------------------------------------------------------------- -| Internal canonical NaN format. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Internal canonical NaN format. +------------------------------------------------------------------------------- +*/ typedef struct { flag sign; uint64_t high, low; @@ -146,10 +158,12 @@ int float16_is_signaling_nan(float16 a_) return 0; } #else -/*---------------------------------------------------------------------------- -| Returns 1 if the half-precision floating-point value `a' is a quiet -| NaN; otherwise returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the half-precision floating-point value `a' is a quiet +NaN; otherwise returns 0. +------------------------------------------------------------------------------- +*/ int float16_is_quiet_nan(float16 a_) { @@ -161,10 +175,12 @@ int float16_is_quiet_nan(float16 a_) #endif } -/*---------------------------------------------------------------------------- -| Returns 1 if the half-precision floating-point value `a' is a signaling -| NaN; otherwise returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the half-precision floating-point value `a' is a signaling +NaN; otherwise returns 0. +------------------------------------------------------------------------------- +*/ int float16_is_signaling_nan(float16 a_) { @@ -177,10 +193,12 @@ int float16_is_signaling_nan(float16 a_) } #endif -/*---------------------------------------------------------------------------- -| Returns a quiet NaN if the half-precision floating point value `a' is a -| signaling NaN; otherwise returns `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns a quiet NaN if the half-precision floating point value `a' is a +signaling NaN; otherwise returns `a'. +------------------------------------------------------------------------------- +*/ float16 float16_maybe_silence_nan(float16 a_) { if (float16_is_signaling_nan(a_)) { @@ -199,11 +217,13 @@ float16 float16_maybe_silence_nan(float16 a_) return a_; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the half-precision floating-point NaN -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid -| exception is raised. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of converting the half-precision floating-point NaN +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid +exception is raised. +------------------------------------------------------------------------------- +*/ static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) { @@ -216,10 +236,12 @@ static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) return z; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the canonical NaN `a' to the half- -| precision floating-point format. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of converting the canonical NaN `a' to the half- +precision floating-point format. +------------------------------------------------------------------------------- +*/ static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM) { @@ -248,10 +270,12 @@ int float32_is_signaling_nan(float32 a_) return 0; } #else -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point value `a' is a quiet -| NaN; otherwise returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point value `a' is a quiet +NaN; otherwise returns 0. +------------------------------------------------------------------------------- +*/ int float32_is_quiet_nan( float32 a_ ) { @@ -263,10 +287,12 @@ int float32_is_quiet_nan( float32 a_ ) #endif } -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point value `a' is a signaling -| NaN; otherwise returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point value `a' is a signaling +NaN; otherwise returns 0. +------------------------------------------------------------------------------- +*/ int float32_is_signaling_nan( float32 a_ ) { @@ -279,10 +305,12 @@ int float32_is_signaling_nan( float32 a_ ) } #endif -/*---------------------------------------------------------------------------- -| Returns a quiet NaN if the single-precision floating point value `a' is a -| signaling NaN; otherwise returns `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns a quiet NaN if the single-precision floating point value `a' is a +signaling NaN; otherwise returns `a'. +------------------------------------------------------------------------------- +*/ float32 float32_maybe_silence_nan( float32 a_ ) { @@ -302,12 +330,13 @@ float32 float32_maybe_silence_nan( float32 a_ ) return a_; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point NaN -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid -| exception is raised. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the single-precision floating-point NaN +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid +exception is raised. +------------------------------------------------------------------------------- +*/ static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) { commonNaNT z; @@ -319,10 +348,12 @@ static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) return z; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the canonical NaN `a' to the single- -| precision floating-point format. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of converting the canonical NaN `a' to the single- +precision floating-point format. +------------------------------------------------------------------------------- +*/ static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) { @@ -339,22 +370,24 @@ static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) return float32_default_nan; } -/*---------------------------------------------------------------------------- -| Select which NaN to propagate for a two-input operation. -| IEEE754 doesn't specify all the details of this, so the -| algorithm is target-specific. -| The routine is passed various bits of information about the -| two NaNs and should return 0 to select NaN a and 1 for NaN b. -| Note that signalling NaNs are always squashed to quiet NaNs -| by the caller, by calling floatXX_maybe_silence_nan() before -| returning them. -| -| aIsLargerSignificand is only valid if both a and b are NaNs -| of some kind, and is true if a has the larger significand, -| or if both a and b have the same significand but a is -| positive but b is negative. It is only needed for the x87 -| tie-break rule. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Select which NaN to propagate for a two-input operation. +IEEE754 doesn't specify all the details of this, so the +algorithm is target-specific. +The routine is passed various bits of information about the +two NaNs and should return 0 to select NaN a and 1 for NaN b. +Note that signalling NaNs are always squashed to quiet NaNs +by the caller, by calling floatXX_maybe_silence_nan() before +returning them. + +aIsLargerSignificand is only valid if both a and b are NaNs +of some kind, and is true if a has the larger significand, +or if both a and b have the same significand but a is +positive but b is negative. It is only needed for the x87 +tie-break rule. +------------------------------------------------------------------------------- +*/ #if defined(TARGET_ARM) static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, @@ -451,12 +484,14 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, } #endif -/*---------------------------------------------------------------------------- -| Select which NaN to propagate for a three-input operation. -| For the moment we assume that no CPU needs the 'larger significand' -| information. -| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Select which NaN to propagate for a three-input operation. +For the moment we assume that no CPU needs the 'larger significand' +information. +Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN +------------------------------------------------------------------------------- +*/ #if defined(TARGET_ARM) static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM) @@ -554,12 +589,13 @@ static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, } #endif -/*---------------------------------------------------------------------------- -| Takes two single-precision floating-point values `a' and `b', one of which -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a -| signaling NaN, the invalid exception is raised. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes two single-precision floating-point values `a' and `b', one of which +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a +signaling NaN, the invalid exception is raised. +------------------------------------------------------------------------------- +*/ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) { flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; @@ -594,14 +630,16 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) } } -/*---------------------------------------------------------------------------- -| Takes three single-precision floating-point values `a', `b' and `c', one of -| which is a NaN, and returns the appropriate NaN result. If any of `a', -| `b' or `c' is a signaling NaN, the invalid exception is raised. -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case -| obviously c is a NaN, and whether to propagate c or some other NaN is -| implementation defined). -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Takes three single-precision floating-point values `a', `b' and `c', one of +which is a NaN, and returns the appropriate NaN result. If any of `a', +`b' or `c' is a signaling NaN, the invalid exception is raised. +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case +obviously c is a NaN, and whether to propagate c or some other NaN is +implementation defined). +------------------------------------------------------------------------------- +*/ static float32 propagateFloat32MulAddNaN(float32 a, float32 b, float32 c, flag infzero STATUS_PARAM) @@ -656,10 +694,12 @@ int float64_is_signaling_nan(float64 a_) return 0; } #else -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point value `a' is a quiet -| NaN; otherwise returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point value `a' is a quiet +NaN; otherwise returns 0. +------------------------------------------------------------------------------- +*/ int float64_is_quiet_nan( float64 a_ ) { @@ -673,10 +713,12 @@ int float64_is_quiet_nan( float64 a_ ) #endif } -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point value `a' is a signaling -| NaN; otherwise returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point value `a' is a signaling +NaN; otherwise returns 0. +------------------------------------------------------------------------------- +*/ int float64_is_signaling_nan( float64 a_ ) { @@ -691,10 +733,12 @@ int float64_is_signaling_nan( float64 a_ ) } #endif -/*---------------------------------------------------------------------------- -| Returns a quiet NaN if the double-precision floating point value `a' is a -| signaling NaN; otherwise returns `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns a quiet NaN if the double-precision floating point value `a' is a +signaling NaN; otherwise returns `a'. +------------------------------------------------------------------------------- +*/ float64 float64_maybe_silence_nan( float64 a_ ) { @@ -714,12 +758,13 @@ float64 float64_maybe_silence_nan( float64 a_ ) return a_; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point NaN -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid -| exception is raised. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the double-precision floating-point NaN +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid +exception is raised. +------------------------------------------------------------------------------- +*/ static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) { commonNaNT z; @@ -731,10 +776,12 @@ static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) return z; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the canonical NaN `a' to the double- -| precision floating-point format. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of converting the canonical NaN `a' to the double- +precision floating-point format. +------------------------------------------------------------------------------- +*/ static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) { @@ -753,12 +800,13 @@ static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) return float64_default_nan; } -/*---------------------------------------------------------------------------- -| Takes two double-precision floating-point values `a' and `b', one of which -| is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a -| signaling NaN, the invalid exception is raised. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes two double-precision floating-point values `a' and `b', one of which +is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a +signaling NaN, the invalid exception is raised. +------------------------------------------------------------------------------- +*/ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) { flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; @@ -793,14 +841,16 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) } } -/*---------------------------------------------------------------------------- -| Takes three double-precision floating-point values `a', `b' and `c', one of -| which is a NaN, and returns the appropriate NaN result. If any of `a', -| `b' or `c' is a signaling NaN, the invalid exception is raised. -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case -| obviously c is a NaN, and whether to propagate c or some other NaN is -| implementation defined). -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Takes three double-precision floating-point values `a', `b' and `c', one of +which is a NaN, and returns the appropriate NaN result. If any of `a', +`b' or `c' is a signaling NaN, the invalid exception is raised. +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case +obviously c is a NaN, and whether to propagate c or some other NaN is +implementation defined). +------------------------------------------------------------------------------- +*/ static float64 propagateFloat64MulAddNaN(float64 a, float64 b, float64 c, flag infzero STATUS_PARAM) @@ -855,11 +905,13 @@ int floatx80_is_signaling_nan(floatx80 a_) return 0; } #else -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point value `a' is a -| quiet NaN; otherwise returns 0. This slightly differs from the same -| function for other types as floatx80 has an explicit bit. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point value `a' is a +quiet NaN; otherwise returns 0. This slightly differs from the same +function for other types as floatx80 has an explicit bit. +------------------------------------------------------------------------------- +*/ int floatx80_is_quiet_nan( floatx80 a ) { @@ -877,11 +929,13 @@ int floatx80_is_quiet_nan( floatx80 a ) #endif } -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point value `a' is a -| signaling NaN; otherwise returns 0. This slightly differs from the same -| function for other types as floatx80 has an explicit bit. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point value `a' is a +signaling NaN; otherwise returns 0. This slightly differs from the same +function for other types as floatx80 has an explicit bit. +------------------------------------------------------------------------------- +*/ int floatx80_is_signaling_nan( floatx80 a ) { @@ -900,10 +954,12 @@ int floatx80_is_signaling_nan( floatx80 a ) } #endif -/*---------------------------------------------------------------------------- -| Returns a quiet NaN if the extended double-precision floating point value -| `a' is a signaling NaN; otherwise returns `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns a quiet NaN if the extended double-precision floating point value +`a' is a signaling NaN; otherwise returns `a'. +------------------------------------------------------------------------------- +*/ floatx80 floatx80_maybe_silence_nan( floatx80 a ) { @@ -923,12 +979,13 @@ floatx80 floatx80_maybe_silence_nan( floatx80 a ) return a; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the extended double-precision floating- -| point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the -| invalid exception is raised. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the extended double-precision floating- +point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the +invalid exception is raised. +------------------------------------------------------------------------------- +*/ static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) { commonNaNT z; @@ -946,10 +1003,12 @@ static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) return z; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the canonical NaN `a' to the extended -| double-precision floating-point format. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of converting the canonical NaN `a' to the extended +double-precision floating-point format. +------------------------------------------------------------------------------- +*/ static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) { @@ -972,12 +1031,13 @@ static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) return z; } -/*---------------------------------------------------------------------------- -| Takes two extended double-precision floating-point values `a' and `b', one -| of which is a NaN, and returns the appropriate NaN result. If either `a' or -| `b' is a signaling NaN, the invalid exception is raised. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes two extended double-precision floating-point values `a' and `b', one +of which is a NaN, and returns the appropriate NaN result. If either `a' or +`b' is a signaling NaN, the invalid exception is raised. +------------------------------------------------------------------------------- +*/ static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARAM) { flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; @@ -1023,10 +1083,12 @@ int float128_is_signaling_nan(float128 a_) return 0; } #else -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point value `a' is a quiet -| NaN; otherwise returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point value `a' is a quiet +NaN; otherwise returns 0. +------------------------------------------------------------------------------- +*/ int float128_is_quiet_nan( float128 a ) { @@ -1041,10 +1103,12 @@ int float128_is_quiet_nan( float128 a ) #endif } -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point value `a' is a -| signaling NaN; otherwise returns 0. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point value `a' is a +signaling NaN; otherwise returns 0. +------------------------------------------------------------------------------- +*/ int float128_is_signaling_nan( float128 a ) { @@ -1060,10 +1124,12 @@ int float128_is_signaling_nan( float128 a ) } #endif -/*---------------------------------------------------------------------------- -| Returns a quiet NaN if the quadruple-precision floating point value `a' is -| a signaling NaN; otherwise returns `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns a quiet NaN if the quadruple-precision floating point value `a' is +a signaling NaN; otherwise returns `a'. +------------------------------------------------------------------------------- +*/ float128 float128_maybe_silence_nan( float128 a ) { @@ -1083,12 +1149,13 @@ float128 float128_maybe_silence_nan( float128 a ) return a; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the quadruple-precision floating-point NaN -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid -| exception is raised. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the quadruple-precision floating-point NaN +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid +exception is raised. +------------------------------------------------------------------------------- +*/ static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) { commonNaNT z; @@ -1099,10 +1166,12 @@ static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) return z; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the canonical NaN `a' to the quadruple- -| precision floating-point format. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of converting the canonical NaN `a' to the quadruple- +precision floating-point format. +------------------------------------------------------------------------------- +*/ static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) { @@ -1119,12 +1188,13 @@ static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) return z; } -/*---------------------------------------------------------------------------- -| Takes two quadruple-precision floating-point values `a' and `b', one of -| which is a NaN, and returns the appropriate NaN result. If either `a' or -| `b' is a signaling NaN, the invalid exception is raised. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes two quadruple-precision floating-point values `a' and `b', one of +which is a NaN, and returns the appropriate NaN result. If either `a' or +`b' is a signaling NaN, the invalid exception is raised. +------------------------------------------------------------------------------- +*/ static float128 propagateFloat128NaN( float128 a, float128 b STATUS_PARAM) { flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 7ba51b6..9145582 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -4,10 +4,11 @@ * Derived from SoftFloat. */ -/*============================================================================ +/* +=============================================================================== -This C source file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic -Package, Release 2b. +This C source file is part of the SoftFloat IEC/IEEE Floating-point +Arithmetic Package, Release 2a. Written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. More information -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ arithmetic/SoftFloat.html'. -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. Derivative works are acceptable, even for commercial purposes, so long as -(1) the source code for the derivative work includes prominent notice that -the work is derivative, and (2) the source code includes prominent notice with -these four paragraphs for those parts of this code that are retained. +(1) they include prominent notice that the work is derivative, and (2) they +include prominent notice akin to these four paragraphs for those parts of +this code that are retained. -=============================================================================*/ +=============================================================================== +*/ /* softfloat (and in particular the code in softfloat-specialize.h) is * target-dependent and needs the TARGET_* macros. @@ -42,21 +41,25 @@ these four paragraphs for those parts of this code that are retained. #include "fpu/softfloat.h" -/*---------------------------------------------------------------------------- -| Primitive arithmetic functions, including multi-word arithmetic, and -| division and square root approximations. (Can be specialized to target if -| desired.) -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Primitive arithmetic functions, including multi-word arithmetic, and +division and square root approximations. (Can be specialized to target if +desired.) +------------------------------------------------------------------------------- +*/ #include "softfloat-macros.h" -/*---------------------------------------------------------------------------- -| Functions and definitions to determine: (1) whether tininess for underflow -| is detected before or after rounding by default, (2) what (if anything) -| happens when exceptions are raised, (3) how signaling NaNs are distinguished -| from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs -| are propagated from function inputs to output. These details are target- -| specific. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Functions and definitions to determine: (1) whether tininess for underflow +is detected before or after rounding by default, (2) what (if anything) +happens when exceptions are raised, (3) how signaling NaNs are distinguished +from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs +are propagated from function inputs to output. These details are target- +specific. +------------------------------------------------------------------------------- +*/ #include "softfloat-specialize.h" void set_float_rounding_mode(int val STATUS_PARAM) @@ -74,43 +77,51 @@ void set_floatx80_rounding_precision(int val STATUS_PARAM) STATUS(floatx80_rounding_precision) = val; } -/*---------------------------------------------------------------------------- -| Returns the fraction bits of the half-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the fraction bits of the half-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE uint32_t extractFloat16Frac(float16 a) { return float16_val(a) & 0x3ff; } -/*---------------------------------------------------------------------------- -| Returns the exponent bits of the half-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the exponent bits of the half-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE int_fast16_t extractFloat16Exp(float16 a) { return (float16_val(a) >> 10) & 0x1f; } -/*---------------------------------------------------------------------------- -| Returns the sign bit of the single-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the sign bit of the single-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE flag extractFloat16Sign(float16 a) { return float16_val(a)>>15; } -/*---------------------------------------------------------------------------- -| Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 -| and 7, and returns the properly rounded 32-bit integer corresponding to the -| input. If `zSign' is 1, the input is negated before being converted to an -| integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input -| is simply rounded to an integer, with the inexact exception raised if the -| input cannot be represented exactly as an integer. However, if the fixed- -| point input is too large, the invalid exception is raised and the largest -| positive or negative integer is returned. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 +and 7, and returns the properly rounded 32-bit integer corresponding to the +input. If `zSign' is 1, the input is negated before being converted to an +integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input +is simply rounded to an integer, with the inexact exception raised if the +input cannot be represented exactly as an integer. However, if the fixed- +point input is too large, the invalid exception is raised and the largest +positive or negative integer is returned. +------------------------------------------------------------------------------- +*/ static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) { @@ -150,17 +161,19 @@ static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) } -/*---------------------------------------------------------------------------- -| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and -| `absZ1', with binary point between bits 63 and 64 (between the input words), -| and returns the properly rounded 64-bit integer corresponding to the input. -| If `zSign' is 1, the input is negated before being converted to an integer. -| Ordinarily, the fixed-point input is simply rounded to an integer, with -| the inexact exception raised if the input cannot be represented exactly as -| an integer. However, if the fixed-point input is too large, the invalid -| exception is raised and the largest positive or negative integer is -| returned. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Takes the 128-bit fixed-point value formed by concatenating `absZ0' and +`absZ1', with binary point between bits 63 and 64 (between the input words), +and returns the properly rounded 64-bit integer corresponding to the input. +If `zSign' is 1, the input is negated before being converted to an integer. +Ordinarily, the fixed-point input is simply rounded to an integer, with +the inexact exception raised if the input cannot be represented exactly as +an integer. However, if the fixed-point input is too large, the invalid +exception is raised and the largest positive or negative integer is +returned. +------------------------------------------------------------------------------- +*/ static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATUS_PARAM) { @@ -203,9 +216,11 @@ static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t absZ1 STATU } -/*---------------------------------------------------------------------------- -| Returns the fraction bits of the single-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the fraction bits of the single-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE uint32_t extractFloat32Frac( float32 a ) { @@ -214,9 +229,11 @@ INLINE uint32_t extractFloat32Frac( float32 a ) } -/*---------------------------------------------------------------------------- -| Returns the exponent bits of the single-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the exponent bits of the single-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE int_fast16_t extractFloat32Exp(float32 a) { @@ -225,10 +242,11 @@ INLINE int_fast16_t extractFloat32Exp(float32 a) } -/*---------------------------------------------------------------------------- -| Returns the sign bit of the single-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the sign bit of the single-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE flag extractFloat32Sign( float32 a ) { @@ -236,10 +254,12 @@ INLINE flag extractFloat32Sign( float32 a ) } -/*---------------------------------------------------------------------------- -| If `a' is denormal and we are in flush-to-zero mode then set the -| input-denormal exception and return zero. Otherwise just return the value. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +If `a' is denormal and we are in flush-to-zero mode then set the +input-denormal exception and return zero. Otherwise just return the value. +------------------------------------------------------------------------------- +*/ static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) { if (STATUS(flush_inputs_to_zero)) { @@ -251,13 +271,14 @@ static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) return a; } -/*---------------------------------------------------------------------------- -| Normalizes the subnormal single-precision floating-point value represented -| by the denormalized significand `aSig'. The normalized exponent and -| significand are stored at the locations pointed to by `zExpPtr' and -| `zSigPtr', respectively. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Normalizes the subnormal single-precision floating-point value represented +by the denormalized significand `aSig'. The normalized exponent and +significand are stored at the locations pointed to by `zExpPtr' and +`zSigPtr', respectively. +------------------------------------------------------------------------------- +*/ static void normalizeFloat32Subnormal(uint32_t aSig, int_fast16_t *zExpPtr, uint32_t *zSigPtr) { @@ -269,16 +290,18 @@ static void } -/*---------------------------------------------------------------------------- -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a -| single-precision floating-point value, returning the result. After being -| shifted into the proper positions, the three fields are simply added -| together to form the result. This means that any integer portion of `zSig' -| will be added into the exponent. Since a properly normalized significand -| will have an integer portion equal to 1, the `zExp' input should be 1 less -| than the desired result exponent whenever `zSig' is a complete, normalized -| significand. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a +single-precision floating-point value, returning the result. After being +shifted into the proper positions, the three fields are simply added +together to form the result. This means that any integer portion of `zSig' +will be added into the exponent. Since a properly normalized significand +will have an integer portion equal to 1, the `zExp' input should be 1 less +than the desired result exponent whenever `zSig' is a complete, normalized +significand. +------------------------------------------------------------------------------- +*/ INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) { @@ -288,27 +311,29 @@ INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) } -/*---------------------------------------------------------------------------- -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', -| and significand `zSig', and returns the proper single-precision floating- -| point value corresponding to the abstract input. Ordinarily, the abstract -| value is simply rounded and packed into the single-precision format, with -| the inexact exception raised if the abstract input cannot be represented -| exactly. However, if the abstract value is too large, the overflow and -| inexact exceptions are raised and an infinity or maximal finite value is -| returned. If the abstract value is too small, the input value is rounded to -| a subnormal number, and the underflow and inexact exceptions are raised if -| the abstract input cannot be represented exactly as a subnormal single- -| precision floating-point number. -| The input significand `zSig' has its binary point between bits 30 -| and 29, which is 7 bits to the left of the usual location. This shifted -| significand must be normalized or smaller. If `zSig' is not normalized, -| `zExp' must be 0; in that case, the result returned is a subnormal number, -| and it must not require rounding. In the usual case that `zSig' is -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. -| The handling of underflow and overflow follows the IEC/IEEE Standard for -| Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Takes an abstract floating-point value having sign `zSign', exponent `zExp', +and significand `zSig', and returns the proper single-precision floating- +point value corresponding to the abstract input. Ordinarily, the abstract +value is simply rounded and packed into the single-precision format, with +the inexact exception raised if the abstract input cannot be represented +exactly. However, if the abstract value is too large, the overflow and +inexact exceptions are raised and an infinity or maximal finite value is +returned. If the abstract value is too small, the input value is rounded to +a subnormal number, and the underflow and inexact exceptions are raised if +the abstract input cannot be represented exactly as a subnormal single- +precision floating-point number. + The input significand `zSig' has its binary point between bits 30 +and 29, which is 7 bits to the left of the usual location. This shifted +significand must be normalized or smaller. If `zSig' is not normalized, +`zExp' must be 0; in that case, the result returned is a subnormal number, +and it must not require rounding. In the usual case that `zSig' is +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. +The handling of underflow and overflow follows the IEC/IEEE Standard for +Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) { @@ -366,15 +391,16 @@ static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig } -/*---------------------------------------------------------------------------- -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', -| and significand `zSig', and returns the proper single-precision floating- -| point value corresponding to the abstract input. This routine is just like -| `roundAndPackFloat32' except that `zSig' does not have to be normalized. -| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' -| floating-point exponent. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes an abstract floating-point value having sign `zSign', exponent `zExp', +and significand `zSig', and returns the proper single-precision floating- +point value corresponding to the abstract input. This routine is just like +`roundAndPackFloat32' except that `zSig' does not have to be normalized. +Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' +floating-point exponent. +------------------------------------------------------------------------------- +*/ static float32 normalizeRoundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig STATUS_PARAM) { @@ -385,9 +411,11 @@ static float32 } -/*---------------------------------------------------------------------------- -| Returns the fraction bits of the double-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the fraction bits of the double-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE uint64_t extractFloat64Frac( float64 a ) { @@ -396,9 +424,11 @@ INLINE uint64_t extractFloat64Frac( float64 a ) } -/*---------------------------------------------------------------------------- -| Returns the exponent bits of the double-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the exponent bits of the double-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE int_fast16_t extractFloat64Exp(float64 a) { @@ -407,10 +437,11 @@ INLINE int_fast16_t extractFloat64Exp(float64 a) } -/*---------------------------------------------------------------------------- -| Returns the sign bit of the double-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the sign bit of the double-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE flag extractFloat64Sign( float64 a ) { @@ -418,10 +449,12 @@ INLINE flag extractFloat64Sign( float64 a ) } -/*---------------------------------------------------------------------------- -| If `a' is denormal and we are in flush-to-zero mode then set the -| input-denormal exception and return zero. Otherwise just return the value. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +If `a' is denormal and we are in flush-to-zero mode then set the +input-denormal exception and return zero. Otherwise just return the value. +------------------------------------------------------------------------------- +*/ static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) { if (STATUS(flush_inputs_to_zero)) { @@ -433,13 +466,14 @@ static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) return a; } -/*---------------------------------------------------------------------------- -| Normalizes the subnormal double-precision floating-point value represented -| by the denormalized significand `aSig'. The normalized exponent and -| significand are stored at the locations pointed to by `zExpPtr' and -| `zSigPtr', respectively. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Normalizes the subnormal double-precision floating-point value represented +by the denormalized significand `aSig'. The normalized exponent and +significand are stored at the locations pointed to by `zExpPtr' and +`zSigPtr', respectively. +------------------------------------------------------------------------------- +*/ static void normalizeFloat64Subnormal(uint64_t aSig, int_fast16_t *zExpPtr, uint64_t *zSigPtr) { @@ -451,16 +485,18 @@ static void } -/*---------------------------------------------------------------------------- -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a -| double-precision floating-point value, returning the result. After being -| shifted into the proper positions, the three fields are simply added -| together to form the result. This means that any integer portion of `zSig' -| will be added into the exponent. Since a properly normalized significand -| will have an integer portion equal to 1, the `zExp' input should be 1 less -| than the desired result exponent whenever `zSig' is a complete, normalized -| significand. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a +double-precision floating-point value, returning the result. After being +shifted into the proper positions, the three fields are simply added +together to form the result. This means that any integer portion of `zSig' +will be added into the exponent. Since a properly normalized significand +will have an integer portion equal to 1, the `zExp' input should be 1 less +than the desired result exponent whenever `zSig' is a complete, normalized +significand. +------------------------------------------------------------------------------- +*/ INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) { @@ -470,27 +506,29 @@ INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) } -/*---------------------------------------------------------------------------- -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', -| and significand `zSig', and returns the proper double-precision floating- -| point value corresponding to the abstract input. Ordinarily, the abstract -| value is simply rounded and packed into the double-precision format, with -| the inexact exception raised if the abstract input cannot be represented -| exactly. However, if the abstract value is too large, the overflow and -| inexact exceptions are raised and an infinity or maximal finite value is -| returned. If the abstract value is too small, the input value is rounded -| to a subnormal number, and the underflow and inexact exceptions are raised -| if the abstract input cannot be represented exactly as a subnormal double- -| precision floating-point number. -| The input significand `zSig' has its binary point between bits 62 -| and 61, which is 10 bits to the left of the usual location. This shifted -| significand must be normalized or smaller. If `zSig' is not normalized, -| `zExp' must be 0; in that case, the result returned is a subnormal number, -| and it must not require rounding. In the usual case that `zSig' is -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. -| The handling of underflow and overflow follows the IEC/IEEE Standard for -| Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Takes an abstract floating-point value having sign `zSign', exponent `zExp', +and significand `zSig', and returns the proper double-precision floating- +point value corresponding to the abstract input. Ordinarily, the abstract +value is simply rounded and packed into the double-precision format, with +the inexact exception raised if the abstract input cannot be represented +exactly. However, if the abstract value is too large, the overflow and +inexact exceptions are raised and an infinity or maximal finite value is +returned. If the abstract value is too small, the input value is rounded +to a subnormal number, and the underflow and inexact exceptions are raised +if the abstract input cannot be represented exactly as a subnormal double- +precision floating-point number. + The input significand `zSig' has its binary point between bits 62 +and 61, which is 10 bits to the left of the usual location. This shifted +significand must be normalized or smaller. If `zSig' is not normalized, +`zExp' must be 0; in that case, the result returned is a subnormal number, +and it must not require rounding. In the usual case that `zSig' is +normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. +The handling of underflow and overflow follows the IEC/IEEE Standard for +Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) { @@ -548,15 +586,16 @@ static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig } -/*---------------------------------------------------------------------------- -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', -| and significand `zSig', and returns the proper double-precision floating- -| point value corresponding to the abstract input. This routine is just like -| `roundAndPackFloat64' except that `zSig' does not have to be normalized. -| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' -| floating-point exponent. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes an abstract floating-point value having sign `zSign', exponent `zExp', +and significand `zSig', and returns the proper double-precision floating- +point value corresponding to the abstract input. This routine is just like +`roundAndPackFloat64' except that `zSig' does not have to be normalized. +Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' +floating-point exponent. +------------------------------------------------------------------------------- +*/ static float64 normalizeRoundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig STATUS_PARAM) { @@ -567,10 +606,12 @@ static float64 } -/*---------------------------------------------------------------------------- -| Returns the fraction bits of the extended double-precision floating-point -| value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the fraction bits of the extended double-precision floating-point +value `a'. +------------------------------------------------------------------------------- +*/ INLINE uint64_t extractFloatx80Frac( floatx80 a ) { @@ -579,11 +620,12 @@ INLINE uint64_t extractFloatx80Frac( floatx80 a ) } -/*---------------------------------------------------------------------------- -| Returns the exponent bits of the extended double-precision floating-point -| value `a'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the exponent bits of the extended double-precision floating-point +value `a'. +------------------------------------------------------------------------------- +*/ INLINE int32 extractFloatx80Exp( floatx80 a ) { @@ -591,11 +633,12 @@ INLINE int32 extractFloatx80Exp( floatx80 a ) } -/*---------------------------------------------------------------------------- -| Returns the sign bit of the extended double-precision floating-point value -| `a'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the sign bit of the extended double-precision floating-point value +`a'. +------------------------------------------------------------------------------- +*/ INLINE flag extractFloatx80Sign( floatx80 a ) { @@ -603,13 +646,14 @@ INLINE flag extractFloatx80Sign( floatx80 a ) } -/*---------------------------------------------------------------------------- -| Normalizes the subnormal extended double-precision floating-point value -| represented by the denormalized significand `aSig'. The normalized exponent -| and significand are stored at the locations pointed to by `zExpPtr' and -| `zSigPtr', respectively. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Normalizes the subnormal extended double-precision floating-point value +represented by the denormalized significand `aSig'. The normalized exponent +and significand are stored at the locations pointed to by `zExpPtr' and +`zSigPtr', respectively. +------------------------------------------------------------------------------- +*/ static void normalizeFloatx80Subnormal( uint64_t aSig, int32 *zExpPtr, uint64_t *zSigPtr ) { @@ -621,10 +665,12 @@ static void } -/*---------------------------------------------------------------------------- -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into an -| extended double-precision floating-point value, returning the result. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Packs the sign `zSign', exponent `zExp', and significand `zSig' into an +extended double-precision floating-point value, returning the result. +------------------------------------------------------------------------------- +*/ INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) { @@ -636,30 +682,31 @@ INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) } -/*---------------------------------------------------------------------------- -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', -| and extended significand formed by the concatenation of `zSig0' and `zSig1', -| and returns the proper extended double-precision floating-point value -| corresponding to the abstract input. Ordinarily, the abstract value is -| rounded and packed into the extended double-precision format, with the -| inexact exception raised if the abstract input cannot be represented -| exactly. However, if the abstract value is too large, the overflow and -| inexact exceptions are raised and an infinity or maximal finite value is -| returned. If the abstract value is too small, the input value is rounded to -| a subnormal number, and the underflow and inexact exceptions are raised if -| the abstract input cannot be represented exactly as a subnormal extended -| double-precision floating-point number. -| If `roundingPrecision' is 32 or 64, the result is rounded to the same -| number of bits as single or double precision, respectively. Otherwise, the -| result is rounded to the full precision of the extended double-precision -| format. -| The input significand must be normalized or smaller. If the input -| significand is not normalized, `zExp' must be 0; in that case, the result -| returned is a subnormal number, and it must not require rounding. The -| handling of underflow and overflow follows the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes an abstract floating-point value having sign `zSign', exponent `zExp', +and extended significand formed by the concatenation of `zSig0' and `zSig1', +and returns the proper extended double-precision floating-point value +corresponding to the abstract input. Ordinarily, the abstract value is +rounded and packed into the extended double-precision format, with the +inexact exception raised if the abstract input cannot be represented +exactly. However, if the abstract value is too large, the overflow and +inexact exceptions are raised and an infinity or maximal finite value is +returned. If the abstract value is too small, the input value is rounded to +a subnormal number, and the underflow and inexact exceptions are raised if +the abstract input cannot be represented exactly as a subnormal extended +double-precision floating-point number. + If `roundingPrecision' is 32 or 64, the result is rounded to the same +number of bits as single or double precision, respectively. Otherwise, the +result is rounded to the full precision of the extended double-precision +format. + The input significand must be normalized or smaller. If the input +significand is not normalized, `zExp' must be 0; in that case, the result +returned is a subnormal number, and it must not require rounding. The +handling of underflow and overflow follows the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static floatx80 roundAndPackFloatx80( int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 @@ -823,15 +870,16 @@ static floatx80 } -/*---------------------------------------------------------------------------- -| Takes an abstract floating-point value having sign `zSign', exponent -| `zExp', and significand formed by the concatenation of `zSig0' and `zSig1', -| and returns the proper extended double-precision floating-point value -| corresponding to the abstract input. This routine is just like -| `roundAndPackFloatx80' except that the input significand does not have to be -| normalized. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes an abstract floating-point value having sign `zSign', exponent +`zExp', and significand formed by the concatenation of `zSig0' and `zSig1', +and returns the proper extended double-precision floating-point value +corresponding to the abstract input. This routine is just like +`roundAndPackFloatx80' except that the input significand does not have to be +normalized. +------------------------------------------------------------------------------- +*/ static floatx80 normalizeRoundAndPackFloatx80( int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 @@ -852,10 +900,12 @@ static floatx80 } -/*---------------------------------------------------------------------------- -| Returns the least-significant 64 fraction bits of the quadruple-precision -| floating-point value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the least-significant 64 fraction bits of the quadruple-precision +floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE uint64_t extractFloat128Frac1( float128 a ) { @@ -864,10 +914,12 @@ INLINE uint64_t extractFloat128Frac1( float128 a ) } -/*---------------------------------------------------------------------------- -| Returns the most-significant 48 fraction bits of the quadruple-precision -| floating-point value `a'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the most-significant 48 fraction bits of the quadruple-precision +floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE uint64_t extractFloat128Frac0( float128 a ) { @@ -876,11 +928,12 @@ INLINE uint64_t extractFloat128Frac0( float128 a ) } -/*---------------------------------------------------------------------------- -| Returns the exponent bits of the quadruple-precision floating-point value -| `a'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the exponent bits of the quadruple-precision floating-point value +`a'. +------------------------------------------------------------------------------- +*/ INLINE int32 extractFloat128Exp( float128 a ) { @@ -888,10 +941,11 @@ INLINE int32 extractFloat128Exp( float128 a ) } -/*---------------------------------------------------------------------------- -| Returns the sign bit of the quadruple-precision floating-point value `a'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the sign bit of the quadruple-precision floating-point value `a'. +------------------------------------------------------------------------------- +*/ INLINE flag extractFloat128Sign( float128 a ) { @@ -899,16 +953,17 @@ INLINE flag extractFloat128Sign( float128 a ) } -/*---------------------------------------------------------------------------- -| Normalizes the subnormal quadruple-precision floating-point value -| represented by the denormalized significand formed by the concatenation of -| `aSig0' and `aSig1'. The normalized exponent is stored at the location -| pointed to by `zExpPtr'. The most significant 49 bits of the normalized -| significand are stored at the location pointed to by `zSig0Ptr', and the -| least significant 64 bits of the normalized significand are stored at the -| location pointed to by `zSig1Ptr'. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Normalizes the subnormal quadruple-precision floating-point value +represented by the denormalized significand formed by the concatenation of +`aSig0' and `aSig1'. The normalized exponent is stored at the location +pointed to by `zExpPtr'. The most significant 49 bits of the normalized +significand are stored at the location pointed to by `zSig0Ptr', and the +least significant 64 bits of the normalized significand are stored at the +location pointed to by `zSig1Ptr'. +------------------------------------------------------------------------------- +*/ static void normalizeFloat128Subnormal( uint64_t aSig0, @@ -940,19 +995,20 @@ static void } -/*---------------------------------------------------------------------------- -| Packs the sign `zSign', the exponent `zExp', and the significand formed -| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision -| floating-point value, returning the result. After being shifted into the -| proper positions, the three fields `zSign', `zExp', and `zSig0' are simply -| added together to form the most significant 32 bits of the result. This -| means that any integer portion of `zSig0' will be added into the exponent. -| Since a properly normalized significand will have an integer portion equal -| to 1, the `zExp' input should be 1 less than the desired result exponent -| whenever `zSig0' and `zSig1' concatenated form a complete, normalized -| significand. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Packs the sign `zSign', the exponent `zExp', and the significand formed +by the concatenation of `zSig0' and `zSig1' into a quadruple-precision +floating-point value, returning the result. After being shifted into the +proper positions, the three fields `zSign', `zExp', and `zSig0' are simply +added together to form the most significant 32 bits of the result. This +means that any integer portion of `zSig0' will be added into the exponent. +Since a properly normalized significand will have an integer portion equal +to 1, the `zExp' input should be 1 less than the desired result exponent +whenever `zSig0' and `zSig1' concatenated form a complete, normalized +significand. +------------------------------------------------------------------------------- +*/ INLINE float128 packFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 ) { @@ -964,27 +1020,28 @@ INLINE float128 } -/*---------------------------------------------------------------------------- -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', -| and extended significand formed by the concatenation of `zSig0', `zSig1', -| and `zSig2', and returns the proper quadruple-precision floating-point value -| corresponding to the abstract input. Ordinarily, the abstract value is -| simply rounded and packed into the quadruple-precision format, with the -| inexact exception raised if the abstract input cannot be represented -| exactly. However, if the abstract value is too large, the overflow and -| inexact exceptions are raised and an infinity or maximal finite value is -| returned. If the abstract value is too small, the input value is rounded to -| a subnormal number, and the underflow and inexact exceptions are raised if -| the abstract input cannot be represented exactly as a subnormal quadruple- -| precision floating-point number. -| The input significand must be normalized or smaller. If the input -| significand is not normalized, `zExp' must be 0; in that case, the result -| returned is a subnormal number, and it must not require rounding. In the -| usual case that the input significand is normalized, `zExp' must be 1 less -| than the ``true'' floating-point exponent. The handling of underflow and -| overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes an abstract floating-point value having sign `zSign', exponent `zExp', +and extended significand formed by the concatenation of `zSig0', `zSig1', +and `zSig2', and returns the proper quadruple-precision floating-point value +corresponding to the abstract input. Ordinarily, the abstract value is +simply rounded and packed into the quadruple-precision format, with the +inexact exception raised if the abstract input cannot be represented +exactly. However, if the abstract value is too large, the overflow and +inexact exceptions are raised and an infinity or maximal finite value is +returned. If the abstract value is too small, the input value is rounded to +a subnormal number, and the underflow and inexact exceptions are raised if +the abstract input cannot be represented exactly as a subnormal quadruple- +precision floating-point number. + The input significand must be normalized or smaller. If the input +significand is not normalized, `zExp' must be 0; in that case, the result +returned is a subnormal number, and it must not require rounding. In the +usual case that the input significand is normalized, `zExp' must be 1 less +than the ``true'' floating-point exponent. The handling of underflow and +overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static float128 roundAndPackFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1, uint64_t zSig2 STATUS_PARAM) @@ -1079,16 +1136,17 @@ static float128 } -/*---------------------------------------------------------------------------- -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', -| and significand formed by the concatenation of `zSig0' and `zSig1', and -| returns the proper quadruple-precision floating-point value corresponding -| to the abstract input. This routine is just like `roundAndPackFloat128' -| except that the input significand has fewer bits and does not have to be -| normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- -| point exponent. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Takes an abstract floating-point value having sign `zSign', exponent `zExp', +and significand formed by the concatenation of `zSig0' and `zSig1', and +returns the proper quadruple-precision floating-point value corresponding +to the abstract input. This routine is just like `roundAndPackFloat128' +except that the input significand has fewer bits and does not have to be +normalized. In all cases, `zExp' must be 1 less than the ``true'' floating- +point exponent. +------------------------------------------------------------------------------- +*/ static float128 normalizeRoundAndPackFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 STATUS_PARAM) @@ -1115,13 +1173,14 @@ static float128 } -/*---------------------------------------------------------------------------- -| Returns the result of converting the 32-bit two's complement integer `a' -| to the single-precision floating-point format. The conversion is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - -float32 int32_to_float32( int32 a STATUS_PARAM ) +/* +------------------------------------------------------------------------------- +Returns the result of converting the 32-bit two's complement integer `a' +to the single-precision floating-point format. The conversion is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ +float32 int32_to_float32( int32 a STATUS_PARAM) { flag zSign; @@ -1132,13 +1191,14 @@ float32 int32_to_float32( int32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the 32-bit two's complement integer `a' -| to the double-precision floating-point format. The conversion is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - -float64 int32_to_float64( int32 a STATUS_PARAM ) +/* +------------------------------------------------------------------------------- +Returns the result of converting the 32-bit two's complement integer `a' +to the double-precision floating-point format. The conversion is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ +float64 int32_to_float64( int32 a STATUS_PARAM) { flag zSign; uint32 absA; @@ -1154,13 +1214,14 @@ float64 int32_to_float64( int32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the 32-bit two's complement integer `a' -| to the extended double-precision floating-point format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the 32-bit two's complement integer `a' +to the extended double-precision floating-point format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) { flag zSign; @@ -1177,12 +1238,13 @@ floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the 32-bit two's complement integer `a' to -| the quadruple-precision floating-point format. The conversion is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the 32-bit two's complement integer `a' to +the quadruple-precision floating-point format. The conversion is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 int32_to_float128( int32 a STATUS_PARAM ) { flag zSign; @@ -1199,12 +1261,13 @@ float128 int32_to_float128( int32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the 64-bit two's complement integer `a' -| to the single-precision floating-point format. The conversion is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the 64-bit two's complement integer `a' +to the single-precision floating-point format. The conversion is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float32 int64_to_float32( int64 a STATUS_PARAM ) { flag zSign; @@ -1252,12 +1315,13 @@ float32 uint64_to_float32( uint64 a STATUS_PARAM ) } } -/*---------------------------------------------------------------------------- -| Returns the result of converting the 64-bit two's complement integer `a' -| to the double-precision floating-point format. The conversion is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the 64-bit two's complement integer `a' +to the double-precision floating-point format. The conversion is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 int64_to_float64( int64 a STATUS_PARAM ) { flag zSign; @@ -1285,13 +1349,14 @@ float64 uint64_to_float64(uint64 a STATUS_PARAM) return normalizeRoundAndPackFloat64(0, exp, a STATUS_VAR); } -/*---------------------------------------------------------------------------- -| Returns the result of converting the 64-bit two's complement integer `a' -| to the extended double-precision floating-point format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the 64-bit two's complement integer `a' +to the extended double-precision floating-point format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) { flag zSign; @@ -1306,12 +1371,13 @@ floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the 64-bit two's complement integer `a' to -| the quadruple-precision floating-point format. The conversion is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the 64-bit two's complement integer `a' to +the quadruple-precision floating-point format. The conversion is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 int64_to_float128( int64 a STATUS_PARAM ) { flag zSign; @@ -1347,16 +1413,17 @@ float128 uint64_to_float128(uint64 a STATUS_PARAM) return normalizeRoundAndPackFloat128(0, 0x406E, a, 0 STATUS_VAR); } -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point value -| `a' to the 32-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic---which means in particular that the conversion is rounded -| according to the current rounding mode. If `a' is a NaN, the largest -| positive integer is returned. Otherwise, if the conversion overflows, the -| largest integer with the same sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the single-precision floating-point value +`a' to the 32-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic---which means in particular that the conversion is rounded +according to the current rounding mode. If `a' is a NaN, the largest +positive integer is returned. Otherwise, if the conversion overflows, the +largest integer with the same sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int32 float32_to_int32( float32 a STATUS_PARAM ) { flag aSign; @@ -1378,16 +1445,17 @@ int32 float32_to_int32( float32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point value -| `a' to the 32-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic, except that the conversion is always rounded toward zero. -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if -| the conversion overflows, the largest integer with the same sign as `a' is -| returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the single-precision floating-point value +`a' to the 32-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic, except that the conversion is always rounded toward zero. +If `a' is a NaN, the largest positive integer is returned. Otherwise, if +the conversion overflows, the largest integer with the same sign as `a' is +returned. +------------------------------------------------------------------------------- +*/ int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) { flag aSign; @@ -1421,15 +1489,17 @@ int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point value -| `a' to the 16-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic, except that the conversion is always rounded toward zero. -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if -| the conversion overflows, the largest integer with the same sign as `a' is -| returned. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of converting the single-precision floating-point value +`a' to the 16-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic, except that the conversion is always rounded toward zero. +If `a' is a NaN, the largest positive integer is returned. Otherwise, if +the conversion overflows, the largest integer with the same sign as `a' is +returned. +------------------------------------------------------------------------------- +*/ int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) { @@ -1470,16 +1540,17 @@ int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point value -| `a' to the 64-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic---which means in particular that the conversion is rounded -| according to the current rounding mode. If `a' is a NaN, the largest -| positive integer is returned. Otherwise, if the conversion overflows, the -| largest integer with the same sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the single-precision floating-point value +`a' to the 64-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic---which means in particular that the conversion is rounded +according to the current rounding mode. If `a' is a NaN, the largest +positive integer is returned. Otherwise, if the conversion overflows, the +largest integer with the same sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int64 float32_to_int64( float32 a STATUS_PARAM ) { flag aSign; @@ -1507,16 +1578,17 @@ int64 float32_to_int64( float32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point value -| `a' to the 64-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic, except that the conversion is always rounded toward zero. If -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the -| conversion overflows, the largest integer with the same sign as `a' is -| returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the single-precision floating-point value +`a' to the 64-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic, except that the conversion is always rounded toward zero. If +`a' is a NaN, the largest positive integer is returned. Otherwise, if the +conversion overflows, the largest integer with the same sign as `a' is +returned. +------------------------------------------------------------------------------- +*/ int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) { flag aSign; @@ -1554,13 +1626,14 @@ int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point value -| `a' to the double-precision floating-point format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the single-precision floating-point value +`a' to the double-precision floating-point format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float32_to_float64( float32 a STATUS_PARAM ) { flag aSign; @@ -1584,13 +1657,14 @@ float64 float32_to_float64( float32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point value -| `a' to the extended double-precision floating-point format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the single-precision floating-point value +`a' to the extended double-precision floating-point format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) { flag aSign; @@ -1614,13 +1688,14 @@ floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point value -| `a' to the double-precision floating-point format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the single-precision floating-point value +`a' to the double-precision floating-point format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ float128 float32_to_float128( float32 a STATUS_PARAM ) { flag aSign; @@ -1644,14 +1719,15 @@ float128 float32_to_float128( float32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Rounds the single-precision floating-point value `a' to an integer, and -| returns the result as a single-precision floating-point value. The -| operation is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - -float32 float32_round_to_int( float32 a STATUS_PARAM) +/* +------------------------------------------------------------------------------- +Rounds the single-precision floating-point value `a' to an integer, and +returns the result as a single-precision floating-point value. The +operation is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ +float32 float32_round_to_int( float32 a STATUS_PARAM ) { flag aSign; int_fast16_t aExp; @@ -1704,15 +1780,16 @@ float32 float32_round_to_int( float32 a STATUS_PARAM) } -/*---------------------------------------------------------------------------- -| Returns the result of adding the absolute values of the single-precision -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated -| before being returned. `zSign' is ignored if the result is a NaN. -| The addition is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - -static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) +/* +------------------------------------------------------------------------------- +Returns the result of adding the absolute values of the single-precision +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated +before being returned. `zSign' is ignored if the result is a NaN. +The addition is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ +static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) { int_fast16_t aExp, bExp, zExp; uint32_t aSig, bSig, zSig; @@ -1783,15 +1860,16 @@ static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) } -/*---------------------------------------------------------------------------- -| Returns the result of subtracting the absolute values of the single- -| precision floating-point values `a' and `b'. If `zSign' is 1, the -| difference is negated before being returned. `zSign' is ignored if the -| result is a NaN. The subtraction is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - -static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) +/* +------------------------------------------------------------------------------- +Returns the result of subtracting the absolute values of the single- +precision floating-point values `a' and `b'. If `zSign' is 1, the +difference is negated before being returned. `zSign' is ignored if the +result is a NaN. The subtraction is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ +static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM ) { int_fast16_t aExp, bExp, zExp; uint32_t aSig, bSig, zSig; @@ -1858,12 +1936,13 @@ static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) } -/*---------------------------------------------------------------------------- -| Returns the result of adding the single-precision floating-point values `a' -| and `b'. The operation is performed according to the IEC/IEEE Standard for -| Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of adding the single-precision floating-point values `a' +and `b'. The operation is performed according to the IEC/IEEE Standard for +Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float32 float32_add( float32 a, float32 b STATUS_PARAM ) { flag aSign, bSign; @@ -1881,12 +1960,13 @@ float32 float32_add( float32 a, float32 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of subtracting the single-precision floating-point values -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard -| for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of subtracting the single-precision floating-point values +`a' and `b'. The operation is performed according to the IEC/IEEE Standard +for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float32 float32_sub( float32 a, float32 b STATUS_PARAM ) { flag aSign, bSign; @@ -1904,12 +1984,13 @@ float32 float32_sub( float32 a, float32 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of multiplying the single-precision floating-point values -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard -| for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of multiplying the single-precision floating-point values +`a' and `b'. The operation is performed according to the IEC/IEEE Standard +for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float32 float32_mul( float32 a, float32 b STATUS_PARAM ) { flag aSign, bSign, zSign; @@ -1967,12 +2048,13 @@ float32 float32_mul( float32 a, float32 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of dividing the single-precision floating-point value `a' -| by the corresponding value `b'. The operation is performed according to the -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of dividing the single-precision floating-point value `a' +by the corresponding value `b'. The operation is performed according to the +IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float32 float32_div( float32 a, float32 b STATUS_PARAM ) { flag aSign, bSign, zSign; @@ -2031,12 +2113,13 @@ float32 float32_div( float32 a, float32 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the remainder of the single-precision floating-point value `a' -| with respect to the corresponding value `b'. The operation is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the remainder of the single-precision floating-point value `a' +with respect to the corresponding value `b'. The operation is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float32 float32_rem( float32 a, float32 b STATUS_PARAM ) { flag aSign, zSign; @@ -2132,16 +2215,18 @@ float32 float32_rem( float32 a, float32 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of multiplying the single-precision floating-point values -| `a' and `b' then adding 'c', with no intermediate rounding step after the -| multiplication. The operation is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic 754-2008. -| The flags argument allows the caller to select negation of the -| addend, the intermediate product, or the final result. (The difference -| between this and having the caller do a separate negation is that negating -| externally will flip the sign bit on NaNs.) -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of multiplying the single-precision floating-point values +`a' and `b' then adding 'c', with no intermediate rounding step after the +multiplication. The operation is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic 754-2008. +The flags argument allows the caller to select negation of the +addend, the intermediate product, or the final result. (The difference +between this and having the caller do a separate negation is that negating +externally will flip the sign bit on NaNs.) +------------------------------------------------------------------------------- +*/ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) { @@ -2339,12 +2424,13 @@ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM) } -/*---------------------------------------------------------------------------- -| Returns the square root of the single-precision floating-point value `a'. -| The operation is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the square root of the single-precision floating-point value `a'. +The operation is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float32 float32_sqrt( float32 a STATUS_PARAM ) { flag aSign; @@ -2394,23 +2480,25 @@ float32 float32_sqrt( float32 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the binary exponential of the single-precision floating-point value -| `a'. The operation is performed according to the IEC/IEEE Standard for -| Binary Floating-Point Arithmetic. -| -| Uses the following identities: -| -| 1. ------------------------------------------------------------------------- -| x x*ln(2) -| 2 = e -| -| 2. ------------------------------------------------------------------------- -| 2 3 4 5 n -| x x x x x x x -| e = 1 + --- + --- + --- + --- + --- + ... + --- + ... -| 1! 2! 3! 4! 5! n! -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the binary exponential of the single-precision floating-point value +`a'. The operation is performed according to the IEC/IEEE Standard for +Binary Floating-Point Arithmetic. + +Uses the following identities: + +1. ------------------------------------------------------------------------- + x x*ln(2) + 2 = e + +2. ------------------------------------------------------------------------- + 2 3 4 5 n + x x x x x x x + e = 1 + --- + --- + --- + --- + --- + ... + --- + ... + 1! 2! 3! 4! 5! n! +------------------------------------------------------------------------------- +*/ static const float64 float32_exp2_coefficients[15] = { @@ -2474,11 +2562,13 @@ float32 float32_exp2( float32 a STATUS_PARAM ) return float64_to_float32(r, status); } -/*---------------------------------------------------------------------------- -| Returns the binary log of the single-precision floating-point value `a'. -| The operation is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the binary log of the single-precision floating-point value `a'. +The operation is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float32 float32_log2( float32 a STATUS_PARAM ) { flag aSign, zSign; @@ -2522,12 +2612,14 @@ float32 float32_log2( float32 a STATUS_PARAM ) return normalizeRoundAndPackFloat32( zSign, 0x85, zSig STATUS_VAR ); } -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point value `a' is equal to -| the corresponding value `b', and 0 otherwise. The invalid exception is -| raised if either operand is a NaN. Otherwise, the comparison is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point value `a' is equal to +the corresponding value `b', and 0 otherwise. The invalid exception is +raised if either operand is a NaN. Otherwise, the comparison is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float32_eq( float32 a, float32 b STATUS_PARAM ) { @@ -2546,12 +2638,14 @@ int float32_eq( float32 a, float32 b STATUS_PARAM ) return ( av == bv ) || ( (uint32_t) ( ( av | bv )<<1 ) == 0 ); } -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point value `a' is less than -| or equal to the corresponding value `b', and 0 otherwise. The invalid -| exception is raised if either operand is a NaN. The comparison is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point value `a' is less than +or equal to the corresponding value `b', and 0 otherwise. The invalid +exception is raised if either operand is a NaN. The comparison is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float32_le( float32 a, float32 b STATUS_PARAM ) { @@ -2575,12 +2669,14 @@ int float32_le( float32 a, float32 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point value `a' is less than -| the corresponding value `b', and 0 otherwise. The invalid exception is -| raised if either operand is a NaN. The comparison is performed according -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point value `a' is less than +the corresponding value `b', and 0 otherwise. The invalid exception is +raised if either operand is a NaN. The comparison is performed according +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float32_lt( float32 a, float32 b STATUS_PARAM ) { @@ -2604,12 +2700,14 @@ int float32_lt( float32 a, float32 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point values `a' and `b' cannot -| be compared, and 0 otherwise. The invalid exception is raised if either -| operand is a NaN. The comparison is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point values `a' and `b' cannot +be compared, and 0 otherwise. The invalid exception is raised if either +operand is a NaN. The comparison is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float32_unordered( float32 a, float32 b STATUS_PARAM ) { @@ -2625,12 +2723,14 @@ int float32_unordered( float32 a, float32 b STATUS_PARAM ) return 0; } -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point value `a' is equal to -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an -| exception. The comparison is performed according to the IEC/IEEE Standard -| for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point value `a' is equal to +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an +exception. The comparison is performed according to the IEC/IEEE Standard +for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) { @@ -2649,12 +2749,14 @@ int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) ( (uint32_t) ( ( float32_val(a) | float32_val(b) )<<1 ) == 0 ); } -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point value `a' is less than or -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not -| cause an exception. Otherwise, the comparison is performed according to the -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point value `a' is less than or +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not +cause an exception. Otherwise, the comparison is performed according to the +IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) { @@ -2680,12 +2782,14 @@ int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point value `a' is less than -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an -| exception. Otherwise, the comparison is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point value `a' is less than +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an +exception. Otherwise, the comparison is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) { @@ -2711,12 +2815,14 @@ int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the single-precision floating-point values `a' and `b' cannot -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The -| comparison is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the single-precision floating-point values `a' and `b' cannot +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The +comparison is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) { @@ -2734,16 +2840,17 @@ int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) return 0; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point value -| `a' to the 32-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic---which means in particular that the conversion is rounded -| according to the current rounding mode. If `a' is a NaN, the largest -| positive integer is returned. Otherwise, if the conversion overflows, the -| largest integer with the same sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the double-precision floating-point value +`a' to the 32-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic---which means in particular that the conversion is rounded +according to the current rounding mode. If `a' is a NaN, the largest +positive integer is returned. Otherwise, if the conversion overflows, the +largest integer with the same sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int32 float64_to_int32( float64 a STATUS_PARAM ) { flag aSign; @@ -2762,16 +2869,17 @@ int32 float64_to_int32( float64 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point value -| `a' to the 32-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic, except that the conversion is always rounded toward zero. -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if -| the conversion overflows, the largest integer with the same sign as `a' is -| returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the double-precision floating-point value +`a' to the 32-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic, except that the conversion is always rounded toward zero. +If `a' is a NaN, the largest positive integer is returned. Otherwise, if +the conversion overflows, the largest integer with the same sign as `a' is +returned. +------------------------------------------------------------------------------- +*/ int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) { flag aSign; @@ -2809,15 +2917,17 @@ int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point value -| `a' to the 16-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic, except that the conversion is always rounded toward zero. -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if -| the conversion overflows, the largest integer with the same sign as `a' is -| returned. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of converting the double-precision floating-point value +`a' to the 16-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic, except that the conversion is always rounded toward zero. +If `a' is a NaN, the largest positive integer is returned. Otherwise, if +the conversion overflows, the largest integer with the same sign as `a' is +returned. +------------------------------------------------------------------------------- +*/ int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) { @@ -2860,16 +2970,17 @@ int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) return z; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point value -| `a' to the 64-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic---which means in particular that the conversion is rounded -| according to the current rounding mode. If `a' is a NaN, the largest -| positive integer is returned. Otherwise, if the conversion overflows, the -| largest integer with the same sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the double-precision floating-point value +`a' to the 64-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic---which means in particular that the conversion is rounded +according to the current rounding mode. If `a' is a NaN, the largest +positive integer is returned. Otherwise, if the conversion overflows, the +largest integer with the same sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int64 float64_to_int64( float64 a STATUS_PARAM ) { flag aSign; @@ -2903,16 +3014,17 @@ int64 float64_to_int64( float64 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point value -| `a' to the 64-bit two's complement integer format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic, except that the conversion is always rounded toward zero. -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if -| the conversion overflows, the largest integer with the same sign as `a' is -| returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the double-precision floating-point value +`a' to the 64-bit two's complement integer format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic, except that the conversion is always rounded toward zero. +If `a' is a NaN, the largest positive integer is returned. Otherwise, if +the conversion overflows, the largest integer with the same sign as `a' is +returned. +------------------------------------------------------------------------------- +*/ int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) { flag aSign; @@ -2956,13 +3068,14 @@ int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point value -| `a' to the single-precision floating-point format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the double-precision floating-point value +`a' to the single-precision floating-point format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ float32 float64_to_float32( float64 a STATUS_PARAM ) { flag aSign; @@ -2989,16 +3102,18 @@ float32 float64_to_float32( float64 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a -| half-precision floating-point value, returning the result. After being -| shifted into the proper positions, the three fields are simply added -| together to form the result. This means that any integer portion of `zSig' -| will be added into the exponent. Since a properly normalized significand -| will have an integer portion equal to 1, the `zExp' input should be 1 less -| than the desired result exponent whenever `zSig' is a complete, normalized -| significand. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a +half-precision floating-point value, returning the result. After being +shifted into the proper positions, the three fields are simply added +together to form the result. This means that any integer portion of `zSig' +will be added into the exponent. Since a properly normalized significand +will have an integer portion equal to 1, the `zExp' input should be 1 less +than the desired result exponent whenever `zSig' is a complete, normalized +significand. +------------------------------------------------------------------------------- +*/ static float16 packFloat16(flag zSign, int_fast16_t zExp, uint16_t zSig) { return make_float16( @@ -3132,13 +3247,14 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM) return packFloat16(aSign, aExp + 14, aSig >> 13); } -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point value -| `a' to the extended double-precision floating-point format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the double-precision floating-point value +`a' to the extended double-precision floating-point format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) { flag aSign; @@ -3163,13 +3279,14 @@ floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point value -| `a' to the quadruple-precision floating-point format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the double-precision floating-point value +`a' to the quadruple-precision floating-point format. The conversion is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ float128 float64_to_float128( float64 a STATUS_PARAM ) { flag aSign; @@ -3194,13 +3311,14 @@ float128 float64_to_float128( float64 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Rounds the double-precision floating-point value `a' to an integer, and -| returns the result as a double-precision floating-point value. The -| operation is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Rounds the double-precision floating-point value `a' to an integer, and +returns the result as a double-precision floating-point value. The +operation is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float64_round_to_int( float64 a STATUS_PARAM ) { flag aSign; @@ -3267,14 +3385,15 @@ float64 float64_trunc_to_int( float64 a STATUS_PARAM) return res; } -/*---------------------------------------------------------------------------- -| Returns the result of adding the absolute values of the double-precision -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated -| before being returned. `zSign' is ignored if the result is a NaN. -| The addition is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of adding the absolute values of the double-precision +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated +before being returned. `zSign' is ignored if the result is a NaN. +The addition is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) { int_fast16_t aExp, bExp, zExp; @@ -3346,14 +3465,15 @@ static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of subtracting the absolute values of the double- -| precision floating-point values `a' and `b'. If `zSign' is 1, the -| difference is negated before being returned. `zSign' is ignored if the -| result is a NaN. The subtraction is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of subtracting the absolute values of the double- +precision floating-point values `a' and `b'. If `zSign' is 1, the +difference is negated before being returned. `zSign' is ignored if the +result is a NaN. The subtraction is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) { int_fast16_t aExp, bExp, zExp; @@ -3421,12 +3541,13 @@ static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of adding the double-precision floating-point values `a' -| and `b'. The operation is performed according to the IEC/IEEE Standard for -| Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of adding the double-precision floating-point values `a' +and `b'. The operation is performed according to the IEC/IEEE Standard for +Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float64_add( float64 a, float64 b STATUS_PARAM ) { flag aSign, bSign; @@ -3444,12 +3565,13 @@ float64 float64_add( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of subtracting the double-precision floating-point values -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard -| for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of subtracting the double-precision floating-point values +`a' and `b'. The operation is performed according to the IEC/IEEE Standard +for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float64_sub( float64 a, float64 b STATUS_PARAM ) { flag aSign, bSign; @@ -3467,12 +3589,13 @@ float64 float64_sub( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of multiplying the double-precision floating-point values -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard -| for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of multiplying the double-precision floating-point values +`a' and `b'. The operation is performed according to the IEC/IEEE Standard +for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float64_mul( float64 a, float64 b STATUS_PARAM ) { flag aSign, bSign, zSign; @@ -3528,12 +3651,13 @@ float64 float64_mul( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of dividing the double-precision floating-point value `a' -| by the corresponding value `b'. The operation is performed according to -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of dividing the double-precision floating-point value `a' +by the corresponding value `b'. The operation is performed according to +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float64_div( float64 a, float64 b STATUS_PARAM ) { flag aSign, bSign, zSign; @@ -3600,12 +3724,13 @@ float64 float64_div( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the remainder of the double-precision floating-point value `a' -| with respect to the corresponding value `b'. The operation is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the remainder of the double-precision floating-point value `a' +with respect to the corresponding value `b'. The operation is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float64_rem( float64 a, float64 b STATUS_PARAM ) { flag aSign, zSign; @@ -3686,16 +3811,18 @@ float64 float64_rem( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of multiplying the double-precision floating-point values -| `a' and `b' then adding 'c', with no intermediate rounding step after the -| multiplication. The operation is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic 754-2008. -| The flags argument allows the caller to select negation of the -| addend, the intermediate product, or the final result. (The difference -| between this and having the caller do a separate negation is that negating -| externally will flip the sign bit on NaNs.) -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the result of multiplying the double-precision floating-point values +`a' and `b' then adding 'c', with no intermediate rounding step after the +multiplication. The operation is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic 754-2008. +The flags argument allows the caller to select negation of the +addend, the intermediate product, or the final result. (The difference +between this and having the caller do a separate negation is that negating +externally will flip the sign bit on NaNs.) +------------------------------------------------------------------------------- +*/ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) { @@ -3912,12 +4039,13 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM) } } -/*---------------------------------------------------------------------------- -| Returns the square root of the double-precision floating-point value `a'. -| The operation is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the square root of the double-precision floating-point value `a'. +The operation is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float64_sqrt( float64 a STATUS_PARAM ) { flag aSign; @@ -3964,11 +4092,13 @@ float64 float64_sqrt( float64 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the binary log of the double-precision floating-point value `a'. -| The operation is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns the binary log of the double-precision floating-point value `a'. +The operation is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float64_log2( float64 a STATUS_PARAM ) { flag aSign, zSign; @@ -4011,12 +4141,14 @@ float64 float64_log2( float64 a STATUS_PARAM ) return normalizeRoundAndPackFloat64( zSign, 0x408, zSig STATUS_VAR ); } -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point value `a' is equal to the -| corresponding value `b', and 0 otherwise. The invalid exception is raised -| if either operand is a NaN. Otherwise, the comparison is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point value `a' is equal to the +corresponding value `b', and 0 otherwise. The invalid exception is raised +if either operand is a NaN. Otherwise, the comparison is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float64_eq( float64 a, float64 b STATUS_PARAM ) { @@ -4036,12 +4168,14 @@ int float64_eq( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point value `a' is less than or -| equal to the corresponding value `b', and 0 otherwise. The invalid -| exception is raised if either operand is a NaN. The comparison is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point value `a' is less than or +equal to the corresponding value `b', and 0 otherwise. The invalid +exception is raised if either operand is a NaN. The comparison is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float64_le( float64 a, float64 b STATUS_PARAM ) { @@ -4065,12 +4199,14 @@ int float64_le( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point value `a' is less than -| the corresponding value `b', and 0 otherwise. The invalid exception is -| raised if either operand is a NaN. The comparison is performed according -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point value `a' is less than +the corresponding value `b', and 0 otherwise. The invalid exception is +raised if either operand is a NaN. The comparison is performed according +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float64_lt( float64 a, float64 b STATUS_PARAM ) { @@ -4094,12 +4230,14 @@ int float64_lt( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point values `a' and `b' cannot -| be compared, and 0 otherwise. The invalid exception is raised if either -| operand is a NaN. The comparison is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point values `a' and `b' cannot +be compared, and 0 otherwise. The invalid exception is raised if either +operand is a NaN. The comparison is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float64_unordered( float64 a, float64 b STATUS_PARAM ) { @@ -4115,12 +4253,14 @@ int float64_unordered( float64 a, float64 b STATUS_PARAM ) return 0; } -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point value `a' is equal to the -| corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an -| exception.The comparison is performed according to the IEC/IEEE Standard -| for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point value `a' is equal to the +corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an +exception.The comparison is performed according to the IEC/IEEE Standard +for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) { @@ -4142,12 +4282,14 @@ int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point value `a' is less than or -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not -| cause an exception. Otherwise, the comparison is performed according to the -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point value `a' is less than or +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not +cause an exception. Otherwise, the comparison is performed according to the +IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) { @@ -4173,12 +4315,14 @@ int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point value `a' is less than -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an -| exception. Otherwise, the comparison is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point value `a' is less than +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an +exception. Otherwise, the comparison is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) { @@ -4204,12 +4348,14 @@ int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the double-precision floating-point values `a' and `b' cannot -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The -| comparison is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the double-precision floating-point values `a' and `b' cannot +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The +comparison is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) { @@ -4227,16 +4373,17 @@ int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) return 0; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the extended double-precision floating- -| point value `a' to the 32-bit two's complement integer format. The -| conversion is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic---which means in particular that the conversion -| is rounded according to the current rounding mode. If `a' is a NaN, the -| largest positive integer is returned. Otherwise, if the conversion -| overflows, the largest integer with the same sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the extended double-precision floating- +point value `a' to the 32-bit two's complement integer format. The +conversion is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic---which means in particular that the conversion +is rounded according to the current rounding mode. If `a' is a NaN, the +largest positive integer is returned. Otherwise, if the conversion +overflows, the largest integer with the same sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) { flag aSign; @@ -4254,16 +4401,17 @@ int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the extended double-precision floating- -| point value `a' to the 32-bit two's complement integer format. The -| conversion is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic, except that the conversion is always rounded -| toward zero. If `a' is a NaN, the largest positive integer is returned. -| Otherwise, if the conversion overflows, the largest integer with the same -| sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the extended double-precision floating- +point value `a' to the 32-bit two's complement integer format. The +conversion is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic, except that the conversion is always rounded +toward zero. If `a' is a NaN, the largest positive integer is returned. +Otherwise, if the conversion overflows, the largest integer with the same +sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) { flag aSign; @@ -4299,16 +4447,17 @@ int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the extended double-precision floating- -| point value `a' to the 64-bit two's complement integer format. The -| conversion is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic---which means in particular that the conversion -| is rounded according to the current rounding mode. If `a' is a NaN, -| the largest positive integer is returned. Otherwise, if the conversion -| overflows, the largest integer with the same sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the extended double-precision floating- +point value `a' to the 64-bit two's complement integer format. The +conversion is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic---which means in particular that the conversion +is rounded according to the current rounding mode. If `a' is a NaN, +the largest positive integer is returned. Otherwise, if the conversion +overflows, the largest integer with the same sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) { flag aSign; @@ -4339,16 +4488,17 @@ int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the extended double-precision floating- -| point value `a' to the 64-bit two's complement integer format. The -| conversion is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic, except that the conversion is always rounded -| toward zero. If `a' is a NaN, the largest positive integer is returned. -| Otherwise, if the conversion overflows, the largest integer with the same -| sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the extended double-precision floating- +point value `a' to the 64-bit two's complement integer format. The +conversion is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic, except that the conversion is always rounded +toward zero. If `a' is a NaN, the largest positive integer is returned. +Otherwise, if the conversion overflows, the largest integer with the same +sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) { flag aSign; @@ -4383,13 +4533,14 @@ int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the extended double-precision floating- -| point value `a' to the single-precision floating-point format. The -| conversion is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the extended double-precision floating- +point value `a' to the single-precision floating-point format. The +conversion is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) { flag aSign; @@ -4411,13 +4562,14 @@ float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the extended double-precision floating- -| point value `a' to the double-precision floating-point format. The -| conversion is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the extended double-precision floating- +point value `a' to the double-precision floating-point format. The +conversion is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) { flag aSign; @@ -4439,13 +4591,14 @@ float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the extended double-precision floating- -| point value `a' to the quadruple-precision floating-point format. The -| conversion is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the extended double-precision floating- +point value `a' to the quadruple-precision floating-point format. The +conversion is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) { flag aSign; @@ -4463,13 +4616,14 @@ float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Rounds the extended double-precision floating-point value `a' to an integer, -| and returns the result as an extended quadruple-precision floating-point -| value. The operation is performed according to the IEC/IEEE Standard for -| Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Rounds the extended double-precision floating-point value `a' to an integer, +and returns the result as an extended quadruple-precision floating-point +value. The operation is performed according to the IEC/IEEE Standard for +Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) { flag aSign; @@ -4536,14 +4690,15 @@ floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of adding the absolute values of the extended double- -| precision floating-point values `a' and `b'. If `zSign' is 1, the sum is -| negated before being returned. `zSign' is ignored if the result is a NaN. -| The addition is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of adding the absolute values of the extended double- +precision floating-point values `a' and `b'. If `zSign' is 1, the sum is +negated before being returned. `zSign' is ignored if the result is a NaN. +The addition is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM) { int32 aExp, bExp, zExp; @@ -4602,14 +4757,15 @@ static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM } -/*---------------------------------------------------------------------------- -| Returns the result of subtracting the absolute values of the extended -| double-precision floating-point values `a' and `b'. If `zSign' is 1, the -| difference is negated before being returned. `zSign' is ignored if the -| result is a NaN. The subtraction is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of subtracting the absolute values of the extended +double-precision floating-point values `a' and `b'. If `zSign' is 1, the +difference is negated before being returned. `zSign' is ignored if the +result is a NaN. The subtraction is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM ) { int32 aExp, bExp, zExp; @@ -4670,12 +4826,13 @@ static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STATUS_PARAM } -/*---------------------------------------------------------------------------- -| Returns the result of adding the extended double-precision floating-point -| values `a' and `b'. The operation is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of adding the extended double-precision floating-point +values `a' and `b'. The operation is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) { flag aSign, bSign; @@ -4691,12 +4848,13 @@ floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of subtracting the extended double-precision floating- -| point values `a' and `b'. The operation is performed according to the -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of subtracting the extended double-precision floating- +point values `a' and `b'. The operation is performed according to the +IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) { flag aSign, bSign; @@ -4712,12 +4870,13 @@ floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of multiplying the extended double-precision floating- -| point values `a' and `b'. The operation is performed according to the -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of multiplying the extended double-precision floating- +point values `a' and `b'. The operation is performed according to the +IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) { flag aSign, bSign, zSign; @@ -4771,12 +4930,13 @@ floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of dividing the extended double-precision floating-point -| value `a' by the corresponding value `b'. The operation is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of dividing the extended double-precision floating-point +value `a' by the corresponding value `b'. The operation is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) { flag aSign, bSign, zSign; @@ -4851,12 +5011,13 @@ floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the remainder of the extended double-precision floating-point value -| `a' with respect to the corresponding value `b'. The operation is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the remainder of the extended double-precision floating-point value +`a' with respect to the corresponding value `b'. The operation is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) { flag aSign, zSign; @@ -4947,12 +5108,13 @@ floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the square root of the extended double-precision floating-point -| value `a'. The operation is performed according to the IEC/IEEE Standard -| for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the square root of the extended double-precision floating-point +value `a'. The operation is performed according to the IEC/IEEE Standard +for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) { flag aSign; @@ -5017,12 +5179,14 @@ floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point value `a' is equal -| to the corresponding value `b', and 0 otherwise. The invalid exception is -| raised if either operand is a NaN. Otherwise, the comparison is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point value `a' is equal +to the corresponding value `b', and 0 otherwise. The invalid exception is +raised if either operand is a NaN. Otherwise, the comparison is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) { @@ -5044,13 +5208,15 @@ int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point value `a' is -| less than or equal to the corresponding value `b', and 0 otherwise. The -| invalid exception is raised if either operand is a NaN. The comparison is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point value `a' is +less than or equal to the corresponding value `b', and 0 otherwise. The +invalid exception is raised if either operand is a NaN. The comparison is +performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) { @@ -5078,12 +5244,14 @@ int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point value `a' is -| less than the corresponding value `b', and 0 otherwise. The invalid -| exception is raised if either operand is a NaN. The comparison is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point value `a' is +less than the corresponding value `b', and 0 otherwise. The invalid +exception is raised if either operand is a NaN. The comparison is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) { @@ -5111,12 +5279,14 @@ int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point values `a' and `b' -| cannot be compared, and 0 otherwise. The invalid exception is raised if -| either operand is a NaN. The comparison is performed according to the -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point values `a' and `b' +cannot be compared, and 0 otherwise. The invalid exception is raised if +either operand is a NaN. The comparison is performed according to the +IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) { if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) @@ -5130,12 +5300,14 @@ int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) return 0; } -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point value `a' is -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not -| cause an exception. The comparison is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point value `a' is +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not +cause an exception. The comparison is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) { @@ -5160,12 +5332,14 @@ int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point value `a' is less -| than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs -| do not cause an exception. Otherwise, the comparison is performed according -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point value `a' is less +than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs +do not cause an exception. Otherwise, the comparison is performed according +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) { @@ -5196,12 +5370,14 @@ int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point value `a' is less -| than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause -| an exception. Otherwise, the comparison is performed according to the -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point value `a' is less +than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause +an exception. Otherwise, the comparison is performed according to the +IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) { @@ -5232,12 +5408,14 @@ int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the extended double-precision floating-point values `a' and `b' -| cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. -| The comparison is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the extended double-precision floating-point values `a' and `b' +cannot be compared, and 0 otherwise. Quiet NaNs do not cause an exception. +The comparison is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) { if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) @@ -5254,16 +5432,17 @@ int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) return 0; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the quadruple-precision floating-point -| value `a' to the 32-bit two's complement integer format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic---which means in particular that the conversion is rounded -| according to the current rounding mode. If `a' is a NaN, the largest -| positive integer is returned. Otherwise, if the conversion overflows, the -| largest integer with the same sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the quadruple-precision floating-point +value `a' to the 32-bit two's complement integer format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic---which means in particular that the conversion is rounded +according to the current rounding mode. If `a' is a NaN, the largest +positive integer is returned. Otherwise, if the conversion overflows, the +largest integer with the same sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int32 float128_to_int32( float128 a STATUS_PARAM ) { flag aSign; @@ -5283,16 +5462,17 @@ int32 float128_to_int32( float128 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the quadruple-precision floating-point -| value `a' to the 32-bit two's complement integer format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic, except that the conversion is always rounded toward zero. If -| `a' is a NaN, the largest positive integer is returned. Otherwise, if the -| conversion overflows, the largest integer with the same sign as `a' is -| returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the quadruple-precision floating-point +value `a' to the 32-bit two's complement integer format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic, except that the conversion is always rounded toward zero. If +`a' is a NaN, the largest positive integer is returned. Otherwise, if the +conversion overflows, the largest integer with the same sign as `a' is +returned. +------------------------------------------------------------------------------- +*/ int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) { flag aSign; @@ -5331,16 +5511,17 @@ int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the quadruple-precision floating-point -| value `a' to the 64-bit two's complement integer format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic---which means in particular that the conversion is rounded -| according to the current rounding mode. If `a' is a NaN, the largest -| positive integer is returned. Otherwise, if the conversion overflows, the -| largest integer with the same sign as `a' is returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the quadruple-precision floating-point +value `a' to the 64-bit two's complement integer format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic---which means in particular that the conversion is rounded +according to the current rounding mode. If `a' is a NaN, the largest +positive integer is returned. Otherwise, if the conversion overflows, the +largest integer with the same sign as `a' is returned. +------------------------------------------------------------------------------- +*/ int64 float128_to_int64( float128 a STATUS_PARAM ) { flag aSign; @@ -5374,16 +5555,17 @@ int64 float128_to_int64( float128 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the quadruple-precision floating-point -| value `a' to the 64-bit two's complement integer format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic, except that the conversion is always rounded toward zero. -| If `a' is a NaN, the largest positive integer is returned. Otherwise, if -| the conversion overflows, the largest integer with the same sign as `a' is -| returned. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the quadruple-precision floating-point +value `a' to the 64-bit two's complement integer format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic, except that the conversion is always rounded toward zero. +If `a' is a NaN, the largest positive integer is returned. Otherwise, if +the conversion overflows, the largest integer with the same sign as `a' is +returned. +------------------------------------------------------------------------------- +*/ int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) { flag aSign; @@ -5435,13 +5617,14 @@ int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the quadruple-precision floating-point -| value `a' to the single-precision floating-point format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the quadruple-precision floating-point +value `a' to the single-precision floating-point format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ float32 float128_to_float32( float128 a STATUS_PARAM ) { flag aSign; @@ -5470,13 +5653,14 @@ float32 float128_to_float32( float128 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the quadruple-precision floating-point -| value `a' to the double-precision floating-point format. The conversion -| is performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the quadruple-precision floating-point +value `a' to the double-precision floating-point format. The conversion +is performed according to the IEC/IEEE Standard for Binary Floating-Point +Arithmetic. +------------------------------------------------------------------------------- +*/ float64 float128_to_float64( float128 a STATUS_PARAM ) { flag aSign; @@ -5503,13 +5687,14 @@ float64 float128_to_float64( float128 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of converting the quadruple-precision floating-point -| value `a' to the extended double-precision floating-point format. The -| conversion is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of converting the quadruple-precision floating-point +value `a' to the extended double-precision floating-point format. The +conversion is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) { flag aSign; @@ -5538,13 +5723,14 @@ floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Rounds the quadruple-precision floating-point value `a' to an integer, and -| returns the result as a quadruple-precision floating-point value. The -| operation is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Rounds the quadruple-precision floating-point value `a' to an integer, and +returns the result as a quadruple-precision floating-point value. The +operation is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 float128_round_to_int( float128 a STATUS_PARAM ) { flag aSign; @@ -5641,14 +5827,15 @@ float128 float128_round_to_int( float128 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of adding the absolute values of the quadruple-precision -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated -| before being returned. `zSign' is ignored if the result is a NaN. -| The addition is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of adding the absolute values of the quadruple-precision +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated +before being returned. `zSign' is ignored if the result is a NaN. +The addition is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) { int32 aExp, bExp, zExp; @@ -5727,14 +5914,15 @@ static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM } -/*---------------------------------------------------------------------------- -| Returns the result of subtracting the absolute values of the quadruple- -| precision floating-point values `a' and `b'. If `zSign' is 1, the -| difference is negated before being returned. `zSign' is ignored if the -| result is a NaN. The subtraction is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of subtracting the absolute values of the quadruple- +precision floating-point values `a' and `b'. If `zSign' is 1, the +difference is negated before being returned. `zSign' is ignored if the +result is a NaN. The subtraction is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM) { int32 aExp, bExp, zExp; @@ -5811,12 +5999,13 @@ static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM } -/*---------------------------------------------------------------------------- -| Returns the result of adding the quadruple-precision floating-point values -| `a' and `b'. The operation is performed according to the IEC/IEEE Standard -| for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of adding the quadruple-precision floating-point values +`a' and `b'. The operation is performed according to the IEC/IEEE Standard +for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 float128_add( float128 a, float128 b STATUS_PARAM ) { flag aSign, bSign; @@ -5832,12 +6021,13 @@ float128 float128_add( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of subtracting the quadruple-precision floating-point -| values `a' and `b'. The operation is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of subtracting the quadruple-precision floating-point +values `a' and `b'. The operation is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 float128_sub( float128 a, float128 b STATUS_PARAM ) { flag aSign, bSign; @@ -5853,12 +6043,13 @@ float128 float128_sub( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of multiplying the quadruple-precision floating-point -| values `a' and `b'. The operation is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of multiplying the quadruple-precision floating-point +values `a' and `b'. The operation is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 float128_mul( float128 a, float128 b STATUS_PARAM ) { flag aSign, bSign, zSign; @@ -5917,12 +6108,13 @@ float128 float128_mul( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the result of dividing the quadruple-precision floating-point value -| `a' by the corresponding value `b'. The operation is performed according to -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the result of dividing the quadruple-precision floating-point value +`a' by the corresponding value `b'. The operation is performed according to +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 float128_div( float128 a, float128 b STATUS_PARAM ) { flag aSign, bSign, zSign; @@ -6001,12 +6193,13 @@ float128 float128_div( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the remainder of the quadruple-precision floating-point value `a' -| with respect to the corresponding value `b'. The operation is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the remainder of the quadruple-precision floating-point value `a' +with respect to the corresponding value `b'. The operation is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 float128_rem( float128 a, float128 b STATUS_PARAM ) { flag aSign, zSign; @@ -6110,12 +6303,13 @@ float128 float128_rem( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns the square root of the quadruple-precision floating-point value `a'. -| The operation is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - +/* +------------------------------------------------------------------------------- +Returns the square root of the quadruple-precision floating-point value `a'. +The operation is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ float128 float128_sqrt( float128 a STATUS_PARAM ) { flag aSign; @@ -6179,12 +6373,14 @@ float128 float128_sqrt( float128 a STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point value `a' is equal to -| the corresponding value `b', and 0 otherwise. The invalid exception is -| raised if either operand is a NaN. Otherwise, the comparison is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point value `a' is equal to +the corresponding value `b', and 0 otherwise. The invalid exception is +raised if either operand is a NaN. Otherwise, the comparison is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float128_eq( float128 a, float128 b STATUS_PARAM ) { @@ -6206,12 +6402,14 @@ int float128_eq( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point value `a' is less than -| or equal to the corresponding value `b', and 0 otherwise. The invalid -| exception is raised if either operand is a NaN. The comparison is performed -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point value `a' is less than +or equal to the corresponding value `b', and 0 otherwise. The invalid +exception is raised if either operand is a NaN. The comparison is performed +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float128_le( float128 a, float128 b STATUS_PARAM ) { @@ -6239,12 +6437,14 @@ int float128_le( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point value `a' is less than -| the corresponding value `b', and 0 otherwise. The invalid exception is -| raised if either operand is a NaN. The comparison is performed according -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point value `a' is less than +the corresponding value `b', and 0 otherwise. The invalid exception is +raised if either operand is a NaN. The comparison is performed according +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float128_lt( float128 a, float128 b STATUS_PARAM ) { @@ -6272,12 +6472,14 @@ int float128_lt( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot -| be compared, and 0 otherwise. The invalid exception is raised if either -| operand is a NaN. The comparison is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot +be compared, and 0 otherwise. The invalid exception is raised if either +operand is a NaN. The comparison is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float128_unordered( float128 a, float128 b STATUS_PARAM ) { @@ -6292,12 +6494,14 @@ int float128_unordered( float128 a, float128 b STATUS_PARAM ) return 0; } -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point value `a' is equal to -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an -| exception. The comparison is performed according to the IEC/IEEE Standard -| for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point value `a' is equal to +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an +exception. The comparison is performed according to the IEC/IEEE Standard +for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) { @@ -6322,12 +6526,14 @@ int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point value `a' is less than -| or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not -| cause an exception. Otherwise, the comparison is performed according to the -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point value `a' is less than +or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not +cause an exception. Otherwise, the comparison is performed according to the +IEC/IEEE Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) { @@ -6358,12 +6564,14 @@ int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point value `a' is less than -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an -| exception. Otherwise, the comparison is performed according to the IEC/IEEE -| Standard for Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point value `a' is less than +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an +exception. Otherwise, the comparison is performed according to the IEC/IEEE +Standard for Binary Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) { @@ -6394,12 +6602,14 @@ int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) } -/*---------------------------------------------------------------------------- -| Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The -| comparison is performed according to the IEC/IEEE Standard for Binary -| Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Returns 1 if the quadruple-precision floating-point values `a' and `b' cannot +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The +comparison is performed according to the IEC/IEEE Standard for Binary +Floating-Point Arithmetic. +------------------------------------------------------------------------------- +*/ int float128_unordered_quiet( float128 a, float128 b STATUS_PARAM ) { diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index f3927e2..b646621 100644 --- a/include/fpu/softfloat.h +++ b/include/fpu/softfloat.h @@ -4,10 +4,11 @@ * Derived from SoftFloat. */ -/*============================================================================ +/* +============================================================================ -This C header file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic -Package, Release 2b. +This C header file is part of the SoftFloat IEC/IEEE Floating-point +Arithmetic Package, Release 2a. Written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. More information -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ arithmetic/SoftFloat.html'. -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES, -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE. +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. Derivative works are acceptable, even for commercial purposes, so long as -(1) the source code for the derivative work includes prominent notice that -the work is derivative, and (2) the source code includes prominent notice with -these four paragraphs for those parts of this code that are retained. +(1) they include prominent notice that the work is derivative, and (2) they +include prominent notice akin to these four paragraphs for those parts of +this code that are retained. -=============================================================================*/ +=============================================================================== +*/ #ifndef SOFTFLOAT_H #define SOFTFLOAT_H @@ -46,14 +45,16 @@ these four paragraphs for those parts of this code that are retained. #include "config-host.h" #include "qemu/osdep.h" -/*---------------------------------------------------------------------------- -| Each of the following `typedef's defines the most convenient type that holds -| integers of at least as many bits as specified. For example, `uint8' should -| be the most convenient type that can hold unsigned integers of as many as -| 8 bits. The `flag' type must be able to hold either a 0 or 1. For most -| implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed -| to the same as `int'. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Each of the following `typedef's defines the most convenient type that holds +integers of at least as many bits as specified. For example, `uint8' should +be the most convenient type that can hold unsigned integers of as many as +8 bits. The `flag' type must be able to hold either a 0 or 1. For most +implementations of C, `flag', `uint8', and `int8' should all be `typedef'ed +to the same as `int'. +------------------------------------------------------------------------------- +*/ typedef uint8_t flag; typedef uint8_t uint8; typedef int8_t int8; @@ -69,9 +70,11 @@ typedef int64_t int64; #define STATUS(field) status->field #define STATUS_VAR , status -/*---------------------------------------------------------------------------- -| Software IEC/IEEE floating-point ordering relations -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE floating-point ordering relations +------------------------------------------------------------------------------- +*/ enum { float_relation_less = -1, float_relation_equal = 0, @@ -79,9 +82,11 @@ enum { float_relation_unordered = 2 }; -/*---------------------------------------------------------------------------- -| Software IEC/IEEE floating-point types. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE floating-point types. +------------------------------------------------------------------------------- +*/ /* Use structures for soft-float types. This prevents accidentally mixing them with native int/float types. A sufficiently clever compiler and sane ABI should be able to see though these structs. However @@ -137,17 +142,21 @@ typedef struct { #define make_float128(high_, low_) ((float128) { .high = high_, .low = low_ }) #define make_float128_init(high_, low_) { .high = high_, .low = low_ } -/*---------------------------------------------------------------------------- -| Software IEC/IEEE floating-point underflow tininess-detection mode. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE floating-point underflow tininess-detection mode. +------------------------------------------------------------------------------- +*/ enum { float_tininess_after_rounding = 0, float_tininess_before_rounding = 1 }; -/*---------------------------------------------------------------------------- -| Software IEC/IEEE floating-point rounding mode. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE floating-point rounding mode. +------------------------------------------------------------------------------- +*/ enum { float_round_nearest_even = 0, float_round_down = 1, @@ -155,9 +164,11 @@ enum { float_round_to_zero = 3 }; -/*---------------------------------------------------------------------------- -| Software IEC/IEEE floating-point exception flags. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE floating-point exception flags. +------------------------------------------------------------------------------- +*/ enum { float_flag_invalid = 1, float_flag_divbyzero = 4, @@ -167,7 +178,6 @@ enum { float_flag_input_denormal = 64, float_flag_output_denormal = 128 }; - typedef struct float_status { signed char float_detect_tininess; signed char float_rounding_mode; @@ -204,27 +214,33 @@ INLINE int get_float_exception_flags(float_status *status) } void set_floatx80_rounding_precision(int val STATUS_PARAM); -/*---------------------------------------------------------------------------- -| Routine to raise any or all of the software IEC/IEEE floating-point -| exception flags. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Routine to raise any or all of the software IEC/IEEE floating-point +exception flags. +------------------------------------------------------------------------------- +*/ void float_raise( int8 flags STATUS_PARAM); -/*---------------------------------------------------------------------------- -| Options to indicate which negations to perform in float*_muladd() -| Using these differs from negating an input or output before calling -| the muladd function in that this means that a NaN doesn't have its -| sign bit inverted before it is propagated. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Options to indicate which negations to perform in float*_muladd() +Using these differs from negating an input or output before calling +the muladd function in that this means that a NaN doesn't have its +sign bit inverted before it is propagated. +------------------------------------------------------------------------------- +*/ enum { float_muladd_negate_c = 1, float_muladd_negate_product = 2, float_muladd_negate_result = 4, }; -/*---------------------------------------------------------------------------- -| Software IEC/IEEE integer-to-floating-point conversion routines. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE integer-to-floating-point conversion routines. +------------------------------------------------------------------------------- +*/ float32 int32_to_float32( int32 STATUS_PARAM ); float64 int32_to_float64( int32 STATUS_PARAM ); float32 uint32_to_float32( uint32 STATUS_PARAM ); @@ -239,15 +255,19 @@ floatx80 int64_to_floatx80( int64 STATUS_PARAM ); float128 int64_to_float128( int64 STATUS_PARAM ); float128 uint64_to_float128( uint64 STATUS_PARAM ); -/*---------------------------------------------------------------------------- -| Software half-precision conversion routines. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software half-precision conversion routines. +*---------------------------------------------------------------------------- +*/ float16 float32_to_float16( float32, flag STATUS_PARAM ); float32 float16_to_float32( float16, flag STATUS_PARAM ); -/*---------------------------------------------------------------------------- -| Software half-precision operations. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software half-precision operations. +------------------------------------------------------------------------------- +*/ int float16_is_quiet_nan( float16 ); int float16_is_signaling_nan( float16 ); float16 float16_maybe_silence_nan( float16 ); @@ -257,14 +277,18 @@ INLINE int float16_is_any_nan(float16 a) return ((float16_val(a) & ~0x8000) > 0x7c00); } -/*---------------------------------------------------------------------------- -| The pattern for a default generated half-precision NaN. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated half-precision NaN. +------------------------------------------------------------------------------- +*/ extern const float16 float16_default_nan; -/*---------------------------------------------------------------------------- -| Software IEC/IEEE single-precision conversion routines. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE single-precision conversion routines. +------------------------------------------------------------------------------- +*/ int_fast16_t float32_to_int16_round_to_zero(float32 STATUS_PARAM); uint_fast16_t float32_to_uint16_round_to_zero(float32 STATUS_PARAM); int32 float32_to_int32( float32 STATUS_PARAM ); @@ -277,9 +301,11 @@ float64 float32_to_float64( float32 STATUS_PARAM ); floatx80 float32_to_floatx80( float32 STATUS_PARAM ); float128 float32_to_float128( float32 STATUS_PARAM ); -/*---------------------------------------------------------------------------- -| Software IEC/IEEE single-precision operations. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE single-precision operations. +------------------------------------------------------------------------------- +*/ float32 float32_round_to_int( float32 STATUS_PARAM ); float32 float32_add( float32, float32 STATUS_PARAM ); float32 float32_sub( float32, float32 STATUS_PARAM ); @@ -361,14 +387,18 @@ INLINE float32 float32_set_sign(float32 a, int sign) #define float32_infinity make_float32(0x7f800000) -/*---------------------------------------------------------------------------- -| The pattern for a default generated single-precision NaN. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated single-precision NaN. +------------------------------------------------------------------------------- +*/ extern const float32 float32_default_nan; -/*---------------------------------------------------------------------------- -| Software IEC/IEEE double-precision conversion routines. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE double-precision conversion routines. +------------------------------------------------------------------------------- +*/ int_fast16_t float64_to_int16_round_to_zero(float64 STATUS_PARAM); uint_fast16_t float64_to_uint16_round_to_zero(float64 STATUS_PARAM); int32 float64_to_int32( float64 STATUS_PARAM ); @@ -383,9 +413,11 @@ float32 float64_to_float32( float64 STATUS_PARAM ); floatx80 float64_to_floatx80( float64 STATUS_PARAM ); float128 float64_to_float128( float64 STATUS_PARAM ); -/*---------------------------------------------------------------------------- -| Software IEC/IEEE double-precision operations. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE double-precision operations. +------------------------------------------------------------------------------- +*/ float64 float64_round_to_int( float64 STATUS_PARAM ); float64 float64_trunc_to_int( float64 STATUS_PARAM ); float64 float64_add( float64, float64 STATUS_PARAM ); @@ -467,14 +499,18 @@ INLINE float64 float64_set_sign(float64 a, int sign) #define float64_half make_float64(0x3fe0000000000000LL) #define float64_infinity make_float64(0x7ff0000000000000LL) -/*---------------------------------------------------------------------------- -| The pattern for a default generated double-precision NaN. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated double-precision NaN. +------------------------------------------------------------------------------- +*/ extern const float64 float64_default_nan; -/*---------------------------------------------------------------------------- -| Software IEC/IEEE extended double-precision conversion routines. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE extended double-precision conversion routines. +------------------------------------------------------------------------------- +*/ int32 floatx80_to_int32( floatx80 STATUS_PARAM ); int32 floatx80_to_int32_round_to_zero( floatx80 STATUS_PARAM ); int64 floatx80_to_int64( floatx80 STATUS_PARAM ); @@ -483,9 +519,11 @@ float32 floatx80_to_float32( floatx80 STATUS_PARAM ); float64 floatx80_to_float64( floatx80 STATUS_PARAM ); float128 floatx80_to_float128( floatx80 STATUS_PARAM ); -/*---------------------------------------------------------------------------- -| Software IEC/IEEE extended double-precision operations. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE extended double-precision operations. +------------------------------------------------------------------------------- +*/ floatx80 floatx80_round_to_int( floatx80 STATUS_PARAM ); floatx80 floatx80_add( floatx80, floatx80 STATUS_PARAM ); floatx80 floatx80_sub( floatx80, floatx80 STATUS_PARAM ); @@ -552,14 +590,18 @@ INLINE int floatx80_is_any_nan(floatx80 a) #define floatx80_half make_floatx80(0x3ffe, 0x8000000000000000LL) #define floatx80_infinity make_floatx80(0x7fff, 0x8000000000000000LL) -/*---------------------------------------------------------------------------- -| The pattern for a default generated extended double-precision NaN. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated extended double-precision NaN. +------------------------------------------------------------------------------- +*/ extern const floatx80 floatx80_default_nan; -/*---------------------------------------------------------------------------- -| Software IEC/IEEE quadruple-precision conversion routines. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE quadruple-precision conversion routines. +------------------------------------------------------------------------------- +*/ int32 float128_to_int32( float128 STATUS_PARAM ); int32 float128_to_int32_round_to_zero( float128 STATUS_PARAM ); int64 float128_to_int64( float128 STATUS_PARAM ); @@ -568,9 +610,11 @@ float32 float128_to_float32( float128 STATUS_PARAM ); float64 float128_to_float64( float128 STATUS_PARAM ); floatx80 float128_to_floatx80( float128 STATUS_PARAM ); -/*---------------------------------------------------------------------------- -| Software IEC/IEEE quadruple-precision operations. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +Software IEC/IEEE quadruple-precision operations. +------------------------------------------------------------------------------- +*/ float128 float128_round_to_int( float128 STATUS_PARAM ); float128 float128_add( float128, float128 STATUS_PARAM ); float128 float128_sub( float128, float128 STATUS_PARAM ); @@ -633,9 +677,11 @@ INLINE int float128_is_any_nan(float128 a) #define float128_zero make_float128(0, 0) -/*---------------------------------------------------------------------------- -| The pattern for a default generated quadruple-precision NaN. -*----------------------------------------------------------------------------*/ +/* +------------------------------------------------------------------------------- +The pattern for a default generated quadruple-precision NaN. +------------------------------------------------------------------------------- +*/ extern const float128 float128_default_nan; #endif /* !SOFTFLOAT_H */