diff mbox series

[11/11] cutils: Improve qemu_strtosz handling of fractions

Message ID 20230508200343.791450-12-eblake@redhat.com
State New
Headers show
Series Fix qemu_strtosz() read-out-of-bounds | expand

Commit Message

Eric Blake May 8, 2023, 8:03 p.m. UTC
We have several limitations and bugs worth fixing; they are
inter-related enough that it is not worth splitting this patch into
smaller pieces:

* ".5k" should work to specify 512, just as "0.5k" does
* "1.9999k" and "1." + "9"*50 + "k" should both produce the same
  result of 2048 after rounding
* "1." + "0"*350 + "1B" should not be treated the same as "1.0B";
  underflow in the fraction should not be lost
* "7.99e99" and "7.99e999" look similar, but our code was doing a
  read-out-of-bounds on the latter because it was not expecting ERANGE
  due to overflow. While we document that scientific notation is not
  supported, and the previous patch actually fixed
  qemu_strtod_finite() to no longer return ERANGE overflows, it is
  easier to pre-filter than to try and determine after the fact if
  strtod() consumed more than we wanted.  Note that this is a
  low-level semantic change (when endptr is not NULL, we can now
  successfully parse with a scale of 'E' and then report trailing
  junk, instead of failing outright with EINVAL); but an earlier
  commit already argued that this is not a high-level semantic change
  since the only caller passing in a non-NULL endptr also checks that
  the tail is whitespace-only.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 tests/unit/test-cutils.c | 51 +++++++++++------------
 util/cutils.c            | 89 ++++++++++++++++++++++++++++------------
 2 files changed, 88 insertions(+), 52 deletions(-)

Comments

Eric Blake May 8, 2023, 9:21 p.m. UTC | #1
On Mon, May 08, 2023 at 03:03:43PM -0500, Eric Blake wrote:
> We have several limitations and bugs worth fixing; they are
> inter-related enough that it is not worth splitting this patch into
> smaller pieces:
> 
> * ".5k" should work to specify 512, just as "0.5k" does
> * "1.9999k" and "1." + "9"*50 + "k" should both produce the same
>   result of 2048 after rounding
> * "1." + "0"*350 + "1B" should not be treated the same as "1.0B";
>   underflow in the fraction should not be lost
> * "7.99e99" and "7.99e999" look similar, but our code was doing a
>   read-out-of-bounds on the latter because it was not expecting ERANGE
>   due to overflow. While we document that scientific notation is not
>   supported, and the previous patch actually fixed
>   qemu_strtod_finite() to no longer return ERANGE overflows, it is
>   easier to pre-filter than to try and determine after the fact if
>   strtod() consumed more than we wanted.  Note that this is a
>   low-level semantic change (when endptr is not NULL, we can now
>   successfully parse with a scale of 'E' and then report trailing
>   junk, instead of failing outright with EINVAL); but an earlier
>   commit already argued that this is not a high-level semantic change
>   since the only caller passing in a non-NULL endptr also checks that
>   the tail is whitespace-only.
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629

Also,

Fixes: cf923b78 ("utils: Improve qemu_strtosz() to have 64 bits of precision", 6.0.0)
Fixes: 7625a1ed ("utils: Use fixed-point arithmetic in qemu_strtosz", 6.0.0)

> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  tests/unit/test-cutils.c | 51 +++++++++++------------
>  util/cutils.c            | 89 ++++++++++++++++++++++++++++------------
>  2 files changed, 88 insertions(+), 52 deletions(-)
>
Hanna Czenczek May 9, 2023, 5:54 p.m. UTC | #2
On 08.05.23 22:03, Eric Blake wrote:
> We have several limitations and bugs worth fixing; they are
> inter-related enough that it is not worth splitting this patch into
> smaller pieces:
>
> * ".5k" should work to specify 512, just as "0.5k" does
> * "1.9999k" and "1." + "9"*50 + "k" should both produce the same
>    result of 2048 after rounding
> * "1." + "0"*350 + "1B" should not be treated the same as "1.0B";
>    underflow in the fraction should not be lost
> * "7.99e99" and "7.99e999" look similar, but our code was doing a
>    read-out-of-bounds on the latter because it was not expecting ERANGE
>    due to overflow. While we document that scientific notation is not
>    supported, and the previous patch actually fixed
>    qemu_strtod_finite() to no longer return ERANGE overflows, it is
>    easier to pre-filter than to try and determine after the fact if
>    strtod() consumed more than we wanted.  Note that this is a
>    low-level semantic change (when endptr is not NULL, we can now
>    successfully parse with a scale of 'E' and then report trailing
>    junk, instead of failing outright with EINVAL); but an earlier
>    commit already argued that this is not a high-level semantic change
>    since the only caller passing in a non-NULL endptr also checks that
>    the tail is whitespace-only.
>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   tests/unit/test-cutils.c | 51 +++++++++++------------
>   util/cutils.c            | 89 ++++++++++++++++++++++++++++------------
>   2 files changed, 88 insertions(+), 52 deletions(-)

[...]

> diff --git a/util/cutils.c b/util/cutils.c
> index 0e056a27a44..d1dfbc69d16 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c

[...]

> @@ -246,27 +244,66 @@ static int do_strtosz(const char *nptr, const char **end,
>               retval = -EINVAL;
>               goto out;
>           }
> -    } else if (*endptr == '.') {
> +    } else if (*endptr == '.' || (endptr == nptr && strchr(nptr, '.'))) {

What case is there where we have a fraction but *endptr != '.'?

>           /*
>            * Input looks like a fraction.  Make sure even 1.k works
> -         * without fractional digits.  If we see an exponent, treat
> -         * the entire input as invalid instead.
> +         * without fractional digits.  strtod tries to treat 'e' as an
> +         * exponent, but we want to treat it as a scaling suffix;
> +         * doing this requires modifying a copy of the fraction.
>            */
> -        double fraction;
> +        double fraction = 0.0;
>
> -        f = endptr;
> -        retval = qemu_strtod_finite(f, &endptr, &fraction);
> -        if (retval) {
> +        if (retval == 0 && *endptr == '.' && !isdigit(endptr[1])) {
> +            /* If we got here, we parsed at least one digit already. */
>               endptr++;
> -        } else if (memchr(f, 'e', endptr - f) || memchr(f, 'E', endptr - f)) {
> -            endptr = nptr;
> -            retval = -EINVAL;
> -            goto out;
>           } else {
> -            /* Extract into a 64-bit fixed-point fraction. */
> +            char *e;
> +            const char *tail;
> +            g_autofree char *copy = g_strdup(endptr);
> +
> +            e = strchr(copy, 'e');
> +            if (e) {
> +                *e = '\0';
> +            }
> +            e = strchr(copy, 'E');
> +            if (e) {
> +                *e = '\0';
> +            }
> +            /*
> +             * If this is a floating point, we are guaranteed that '.'
> +             * appears before any possible digits in copy.  If it is
> +             * not a floating point, strtod will fail.  Either way,
> +             * there is now no exponent in copy, so if it parses, we
> +             * know 0.0 <= abs(result) <= 1.0 (after rounding), and
> +             * ERANGE is only possible on underflow which is okay.
> +             */
> +            retval = qemu_strtod_finite(copy, &tail, &fraction);
> +            endptr += tail - copy;
> +        }
> +
> +        /* Extract into a 64-bit fixed-point fraction. */
> +        if (fraction == 1.0) {
> +            if (val == UINT64_MAX) {
> +                retval = -ERANGE;
> +                goto out;
> +            }
> +            val++;
> +        } else if (retval == -ERANGE) {
> +            /* See comments above about underflow */
> +            valf = 1;

It doesn’t really matter because even an EiB is just 2^60, and so 1 EiB 
* 2^-64 (the resolution of our fractional part) is still less than 1, but:

DBL_MIN * 0x1p64 is 2^-(1022-64) == 2^-958, i.e. much less than 1, so 
I’d set valf to 0 here.

(If you put “.00000000000000000001” into this, there won’t be an 
underflow, but the value is so small that valf ends up 0.  But if you 
put `.$(yes 0 | head -n 307 | tr -d '\n')1` into this, there will be an 
underflow, setting valf to 1, even though the value is smaller.)

Hanna

> +            retval = 0;
> +        } else {
>               valf = (uint64_t)(fraction * 0x1p64);
>           }
>       }
> +    if (retval) {
> +        goto out;
> +    }
> +    if (memchr(nptr, '-', endptr - nptr) != NULL) {
> +        endptr = nptr;
> +        retval = -EINVAL;
> +        goto out;
> +    }
>       c = *endptr;
>       mul = suffix_mul(c, unit);
>       if (mul > 0) {
Eric Blake May 9, 2023, 9:28 p.m. UTC | #3
On Tue, May 09, 2023 at 07:54:30PM +0200, Hanna Czenczek wrote:
> On 08.05.23 22:03, Eric Blake wrote:
> > We have several limitations and bugs worth fixing; they are
> > inter-related enough that it is not worth splitting this patch into
> > smaller pieces:
> > 
> > * ".5k" should work to specify 512, just as "0.5k" does
> > * "1.9999k" and "1." + "9"*50 + "k" should both produce the same
> >    result of 2048 after rounding
> > * "1." + "0"*350 + "1B" should not be treated the same as "1.0B";
> >    underflow in the fraction should not be lost
> > * "7.99e99" and "7.99e999" look similar, but our code was doing a
> >    read-out-of-bounds on the latter because it was not expecting ERANGE
> >    due to overflow. While we document that scientific notation is not
> >    supported, and the previous patch actually fixed
> >    qemu_strtod_finite() to no longer return ERANGE overflows, it is
> >    easier to pre-filter than to try and determine after the fact if
> >    strtod() consumed more than we wanted.  Note that this is a
> >    low-level semantic change (when endptr is not NULL, we can now
> >    successfully parse with a scale of 'E' and then report trailing
> >    junk, instead of failing outright with EINVAL); but an earlier
> >    commit already argued that this is not a high-level semantic change
> >    since the only caller passing in a non-NULL endptr also checks that
> >    the tail is whitespace-only.
> > 
> > Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> >   tests/unit/test-cutils.c | 51 +++++++++++------------
> >   util/cutils.c            | 89 ++++++++++++++++++++++++++++------------
> >   2 files changed, 88 insertions(+), 52 deletions(-)
> 
> [...]
> 
> > diff --git a/util/cutils.c b/util/cutils.c
> > index 0e056a27a44..d1dfbc69d16 100644
> > --- a/util/cutils.c
> > +++ b/util/cutils.c
> 
> [...]
> 
> > @@ -246,27 +244,66 @@ static int do_strtosz(const char *nptr, const char **end,
> >               retval = -EINVAL;
> >               goto out;
> >           }
> > -    } else if (*endptr == '.') {
> > +    } else if (*endptr == '.' || (endptr == nptr && strchr(nptr, '.'))) {
> 
> What case is there where we have a fraction but *endptr != '.'?

Bigger context:

result = qemu_strtou64(nptr, &endptr, 10, &val);
// at this point, result is one of:
//  a. 0 - we parsed a decimal string, endptr points to any slop
//  b. -EINVAL - we could not recognize a decimal string: multiple reasons
//  b.1. nptr was NULL (endptr is NULL)
//  b.2. nptr was "" or otherwise whitespace only (endptr is nptr)
//  b.3. the first non-whitespace in nptr was not a sign or digit (endptr is nptr)
//  c. -ERANGE - we saw a decimal string, but it didn't fit in uint64 (endptr is
//    past first digit)
if (retval == -ERANGE || !nptr) {
    // filter out c. and b.1
    goto out;
}
if (retval == 0 && val == 0 && (*endptr == 'x' || *endptr == 'X')) {
    // a, where we must decipher between "0x", "00x", "0xjunk", "0x1", ...
    // not changed by this patch, and where we give -EINVAL if we see any trailing
    // slop like "0x1." or "0x1p"
} else  if (*endptr == '.' || (endptr == nptr && strchr(nptr, '.'))) {
    // The left half is possible in both a. (such as "1.5k")
    // and b.3. when '.' was the first slop byte (such as ".5k")
    // The right half is possible only for b.3 when '.' was not the first slop
    // (needed for covering " +.5k")
    // At this point, b.2. has been filtered out

...

> 
> >           /*
> >            * Input looks like a fraction.  Make sure even 1.k works
> > -         * without fractional digits.  If we see an exponent, treat
> > -         * the entire input as invalid instead.
> > +         * without fractional digits.  strtod tries to treat 'e' as an
> > +         * exponent, but we want to treat it as a scaling suffix;
> > +         * doing this requires modifying a copy of the fraction.
> >            */
> > -        double fraction;
> > +        double fraction = 0.0;
> > 
> > -        f = endptr;
> > -        retval = qemu_strtod_finite(f, &endptr, &fraction);
> > -        if (retval) {
> > +        if (retval == 0 && *endptr == '.' && !isdigit(endptr[1])) {
> > +            /* If we got here, we parsed at least one digit already. */
> >               endptr++;

The 'retval == 0' check could equally be written 'endptr > nptr' (the
two are synonymous based on the conditions of a.; we cannot get here
under b.3); the '*endptr == '.' is necessary so that if nptr=="1junk",
we use 'j' as the scaling suffix rather than trying to skip past a
non-present '.'; and the '!isdigit(endptr[1])' is necessary so that
"1.k" does not result in us trying to call strtod(".k") which would
fail for an unexpected EINVAL.  Basically, this branch handles all
cases where we've seen at least one digit and the only thing between
digits and a possible scaling suffix is a single '.', so strtod is not
worth using.

> > -        } else if (memchr(f, 'e', endptr - f) || memchr(f, 'E', endptr - f)) {
> > -            endptr = nptr;
> > -            retval = -EINVAL;
> > -            goto out;
> >           } else {
> > -            /* Extract into a 64-bit fixed-point fraction. */
> > +            char *e;
> > +            const char *tail;
> > +            g_autofree char *copy = g_strdup(endptr);
> > +

If we get into this branch, we could be in condition a. (such as
"1.1k" where endptr is ".1k") or in b.3 (such as ".5k" where endptr is
".5k", but also thinks like " junk." where endptr is " junk." or even
".k").  But we've already proven we don't need to worry about "0x1p1"
(filtered above in the hex code), and at this point we strip all
exponents (if endptr is ".9e999", copy is ".9")...

> > +            e = strchr(copy, 'e');
> > +            if (e) {
> > +                *e = '\0';
> > +            }
> > +            e = strchr(copy, 'E');
> > +            if (e) {
> > +                *e = '\0';
> > +            }
> > +            /*
> > +             * If this is a floating point, we are guaranteed that '.'
> > +             * appears before any possible digits in copy.  If it is
> > +             * not a floating point, strtod will fail.  Either way,
> > +             * there is now no exponent in copy, so if it parses, we
> > +             * know 0.0 <= abs(result) <= 1.0 (after rounding), and
> > +             * ERANGE is only possible on underflow which is okay.
> > +             */
> > +            retval = qemu_strtod_finite(copy, &tail, &fraction);

...so that by the time we do try qemu_strtod_finite(), it is either a
valid floating point fraction with at least one digit and no exponent
and possibly some slop (such as ".5k" or " +.5" - will produce retval
= 0 or -ERANGE for underflow, based on the previous patch to
qemu_strtod_finite) or complete junk with retval = -EINVAL where there
was a '.' but other characters appeared first (such as " junk.") or
where there are no digits but also no scaling suffix (such as ". ";
hmm, looks like I could get more coverage if I add ". " and ".k" to my
unit tests in v2).

> > +            endptr += tail - copy;
> > +        }
> > +
> > +        /* Extract into a 64-bit fixed-point fraction. */
> > +        if (fraction == 1.0) {
> > +            if (val == UINT64_MAX) {
> > +                retval = -ERANGE;
> > +                goto out;
> > +            }
> > +            val++;
> > +        } else if (retval == -ERANGE) {
> > +            /* See comments above about underflow */
> > +            valf = 1;
> 
> It doesn’t really matter because even an EiB is just 2^60, and so 1 EiB *
> 2^-64 (the resolution of our fractional part) is still less than 1, but:
> 
> DBL_MIN * 0x1p64 is 2^-(1022-64) == 2^-958, i.e. much less than 1, so I’d
> set valf to 0 here.
> 
> (If you put “.00000000000000000001” into this, there won’t be an underflow,
> but the value is so small that valf ends up 0.  But if you put `.$(yes 0 |
> head -n 307 | tr -d '\n')1` into this, there will be an underflow, setting
> valf to 1, even though the value is smaller.)

Oh, good point.  I was trying to say that "1.000B" (for any amount of
zeroes) is okay (all zeroes in the fraction is needless typing but not
an ambiguous value regardless of rounding), while "1.0001B" (for any
amount of zeroes) is not (you can't request a non-zero fraction
without a scale larger than bytes, even if the fraction you do request
rounds to zero at your chosen scale).  But you have come up with a
counter-case where I didn't quite achieve that.

But I think the solution to that is not to treat underflow as valf =
0, but rather to alter this snippet:

-            valf = (uint64_t)(fraction * 0x1p64);
+            /*
+             * If fraction was non-zero, add slop small enough that it doesn't
+             * impact rounding, but does let us reject "1..00000000000000000001B".
+             */
+            valf = (uint64_t)(fraction * 0x1p64) | !fraction;

so that between the ERANGE branch and this slop, valf is guaranteed
non-zero if fraction contained any non-zero digits.  It looks like I
need to add yet another unit test before posting v2.
Hanna Czenczek May 10, 2023, 7:46 a.m. UTC | #4
On 09.05.23 23:28, Eric Blake wrote:
> On Tue, May 09, 2023 at 07:54:30PM +0200, Hanna Czenczek wrote:
>> On 08.05.23 22:03, Eric Blake wrote:
>>> We have several limitations and bugs worth fixing; they are
>>> inter-related enough that it is not worth splitting this patch into
>>> smaller pieces:
>>>
>>> * ".5k" should work to specify 512, just as "0.5k" does
>>> * "1.9999k" and "1." + "9"*50 + "k" should both produce the same
>>>     result of 2048 after rounding
>>> * "1." + "0"*350 + "1B" should not be treated the same as "1.0B";
>>>     underflow in the fraction should not be lost
>>> * "7.99e99" and "7.99e999" look similar, but our code was doing a
>>>     read-out-of-bounds on the latter because it was not expecting ERANGE
>>>     due to overflow. While we document that scientific notation is not
>>>     supported, and the previous patch actually fixed
>>>     qemu_strtod_finite() to no longer return ERANGE overflows, it is
>>>     easier to pre-filter than to try and determine after the fact if
>>>     strtod() consumed more than we wanted.  Note that this is a
>>>     low-level semantic change (when endptr is not NULL, we can now
>>>     successfully parse with a scale of 'E' and then report trailing
>>>     junk, instead of failing outright with EINVAL); but an earlier
>>>     commit already argued that this is not a high-level semantic change
>>>     since the only caller passing in a non-NULL endptr also checks that
>>>     the tail is whitespace-only.
>>>
>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>> ---
>>>    tests/unit/test-cutils.c | 51 +++++++++++------------
>>>    util/cutils.c            | 89 ++++++++++++++++++++++++++++------------
>>>    2 files changed, 88 insertions(+), 52 deletions(-)
>> [...]
>>
>>> diff --git a/util/cutils.c b/util/cutils.c
>>> index 0e056a27a44..d1dfbc69d16 100644
>>> --- a/util/cutils.c
>>> +++ b/util/cutils.c
>> [...]
>>
>>> @@ -246,27 +244,66 @@ static int do_strtosz(const char *nptr, const char **end,
>>>                retval = -EINVAL;
>>>                goto out;
>>>            }
>>> -    } else if (*endptr == '.') {
>>> +    } else if (*endptr == '.' || (endptr == nptr && strchr(nptr, '.'))) {
>> What case is there where we have a fraction but *endptr != '.'?
> Bigger context:
>
> result = qemu_strtou64(nptr, &endptr, 10, &val);
> // at this point, result is one of:
> //  a. 0 - we parsed a decimal string, endptr points to any slop
> //  b. -EINVAL - we could not recognize a decimal string: multiple reasons
> //  b.1. nptr was NULL (endptr is NULL)
> //  b.2. nptr was "" or otherwise whitespace only (endptr is nptr)
> //  b.3. the first non-whitespace in nptr was not a sign or digit (endptr is nptr)
> //  c. -ERANGE - we saw a decimal string, but it didn't fit in uint64 (endptr is
> //    past first digit)
> if (retval == -ERANGE || !nptr) {
>      // filter out c. and b.1
>      goto out;
> }
> if (retval == 0 && val == 0 && (*endptr == 'x' || *endptr == 'X')) {
>      // a, where we must decipher between "0x", "00x", "0xjunk", "0x1", ...
>      // not changed by this patch, and where we give -EINVAL if we see any trailing
>      // slop like "0x1." or "0x1p"
> } else  if (*endptr == '.' || (endptr == nptr && strchr(nptr, '.'))) {
>      // The left half is possible in both a. (such as "1.5k")
>      // and b.3. when '.' was the first slop byte (such as ".5k")
>      // The right half is possible only for b.3 when '.' was not the first slop
>      // (needed for covering " +.5k")

Ah, I see.  Yes, I couldn’t come up with parseable non-digit characters 
(whitespace, '+').  Thanks, that makes sense!

>      // At this point, b.2. has been filtered out
>
> ...
>
>>>            /*
>>>             * Input looks like a fraction.  Make sure even 1.k works
>>> -         * without fractional digits.  If we see an exponent, treat
>>> -         * the entire input as invalid instead.
>>> +         * without fractional digits.  strtod tries to treat 'e' as an
>>> +         * exponent, but we want to treat it as a scaling suffix;
>>> +         * doing this requires modifying a copy of the fraction.
>>>             */
>>> -        double fraction;
>>> +        double fraction = 0.0;
>>>
>>> -        f = endptr;
>>> -        retval = qemu_strtod_finite(f, &endptr, &fraction);
>>> -        if (retval) {
>>> +        if (retval == 0 && *endptr == '.' && !isdigit(endptr[1])) {
>>> +            /* If we got here, we parsed at least one digit already. */
>>>                endptr++;
> The 'retval == 0' check could equally be written 'endptr > nptr' (the
> two are synonymous based on the conditions of a.; we cannot get here
> under b.3); the '*endptr == '.' is necessary so that if nptr=="1junk",
> we use 'j' as the scaling suffix rather than trying to skip past a
> non-present '.'; and the '!isdigit(endptr[1])' is necessary so that
> "1.k" does not result in us trying to call strtod(".k") which would
> fail for an unexpected EINVAL.  Basically, this branch handles all
> cases where we've seen at least one digit and the only thing between
> digits and a possible scaling suffix is a single '.', so strtod is not
> worth using.
>
>>> -        } else if (memchr(f, 'e', endptr - f) || memchr(f, 'E', endptr - f)) {
>>> -            endptr = nptr;
>>> -            retval = -EINVAL;
>>> -            goto out;
>>>            } else {
>>> -            /* Extract into a 64-bit fixed-point fraction. */
>>> +            char *e;
>>> +            const char *tail;
>>> +            g_autofree char *copy = g_strdup(endptr);
>>> +
> If we get into this branch, we could be in condition a. (such as
> "1.1k" where endptr is ".1k") or in b.3 (such as ".5k" where endptr is
> ".5k", but also thinks like " junk." where endptr is " junk." or even
> ".k").  But we've already proven we don't need to worry about "0x1p1"
> (filtered above in the hex code), and at this point we strip all
> exponents (if endptr is ".9e999", copy is ".9")...
>
>>> +            e = strchr(copy, 'e');
>>> +            if (e) {
>>> +                *e = '\0';
>>> +            }
>>> +            e = strchr(copy, 'E');
>>> +            if (e) {
>>> +                *e = '\0';
>>> +            }
>>> +            /*
>>> +             * If this is a floating point, we are guaranteed that '.'
>>> +             * appears before any possible digits in copy.  If it is
>>> +             * not a floating point, strtod will fail.  Either way,
>>> +             * there is now no exponent in copy, so if it parses, we
>>> +             * know 0.0 <= abs(result) <= 1.0 (after rounding), and
>>> +             * ERANGE is only possible on underflow which is okay.
>>> +             */
>>> +            retval = qemu_strtod_finite(copy, &tail, &fraction);
> ...so that by the time we do try qemu_strtod_finite(), it is either a
> valid floating point fraction with at least one digit and no exponent
> and possibly some slop (such as ".5k" or " +.5" - will produce retval
> = 0 or -ERANGE for underflow, based on the previous patch to
> qemu_strtod_finite) or complete junk with retval = -EINVAL where there
> was a '.' but other characters appeared first (such as " junk.") or
> where there are no digits but also no scaling suffix (such as ". ";
> hmm, looks like I could get more coverage if I add ". " and ".k" to my
> unit tests in v2).
>
>>> +            endptr += tail - copy;
>>> +        }
>>> +
>>> +        /* Extract into a 64-bit fixed-point fraction. */
>>> +        if (fraction == 1.0) {
>>> +            if (val == UINT64_MAX) {
>>> +                retval = -ERANGE;
>>> +                goto out;
>>> +            }
>>> +            val++;
>>> +        } else if (retval == -ERANGE) {
>>> +            /* See comments above about underflow */
>>> +            valf = 1;
>> It doesn’t really matter because even an EiB is just 2^60, and so 1 EiB *
>> 2^-64 (the resolution of our fractional part) is still less than 1, but:
>>
>> DBL_MIN * 0x1p64 is 2^-(1022-64) == 2^-958, i.e. much less than 1, so I’d
>> set valf to 0 here.
>>
>> (If you put “.00000000000000000001” into this, there won’t be an underflow,
>> but the value is so small that valf ends up 0.  But if you put `.$(yes 0 |
>> head -n 307 | tr -d '\n')1` into this, there will be an underflow, setting
>> valf to 1, even though the value is smaller.)
> Oh, good point.  I was trying to say that "1.000B" (for any amount of
> zeroes) is okay (all zeroes in the fraction is needless typing but not
> an ambiguous value regardless of rounding), while "1.0001B" (for any
> amount of zeroes) is not (you can't request a non-zero fraction
> without a scale larger than bytes, even if the fraction you do request
> rounds to zero at your chosen scale).  But you have come up with a
> counter-case where I didn't quite achieve that.
>
> But I think the solution to that is not to treat underflow as valf =
> 0, but rather to alter this snippet:
>
> -            valf = (uint64_t)(fraction * 0x1p64);
> +            /*
> +             * If fraction was non-zero, add slop small enough that it doesn't
> +             * impact rounding, but does let us reject "1..00000000000000000001B".
> +             */
> +            valf = (uint64_t)(fraction * 0x1p64) | !fraction;
>
> so that between the ERANGE branch and this slop, valf is guaranteed
> non-zero if fraction contained any non-zero digits.

I’d make it a logical || instead of |, but that sounds good, yes.

Hanna
Hanna Czenczek May 10, 2023, 7:48 a.m. UTC | #5
On 10.05.23 09:46, Hanna Czenczek wrote:
> On 09.05.23 23:28, Eric Blake wrote:

[...]

>> But I think the solution to that is not to treat underflow as valf =
>> 0, but rather to alter this snippet:
>>
>> -            valf = (uint64_t)(fraction * 0x1p64);
>> +            /*
>> +             * If fraction was non-zero, add slop small enough that 
>> it doesn't
>> +             * impact rounding, but does let us reject 
>> "1..00000000000000000001B".
>> +             */
>> +            valf = (uint64_t)(fraction * 0x1p64) | !fraction;
>>
>> so that between the ERANGE branch and this slop, valf is guaranteed
>> non-zero if fraction contained any non-zero digits.
>
> I’d make it a logical || instead of |, but that sounds good, yes.

Sent too soon – exactly when sending I realized that I’m very wrong.  
Still, I wouldn’t just | the 1 on, and rather make take the LoC to make it

valf = (uint64_t)(fraction * 0x1p64);
if (!valf && fraction) {
     /*
      * If fraction was non-zero, add slop small enough that it doesn't
      * impact rounding, but does let us reject "1..00000000000000000001B".
      */
     valf = 1;
}
diff mbox series

Patch

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index f781997aef7..1fb9d5323ab 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -2693,14 +2693,14 @@  static void test_qemu_strtosz_float(void)
     g_assert_cmpuint(res, ==, 1024);
     g_assert_true(endptr == str + 4);

-    /* FIXME An empty fraction head should be tolerated */
+    /* An empty fraction head is tolerated */
     str = " .5k";
     endptr = str;
     res = 0xbaadf00d;
     err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
-    g_assert_cmpuint(res, ==, 0 /* FIXME 512 */);
-    g_assert_true(endptr == str /* FIXME + 4 */);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpuint(res, ==, 512);
+    g_assert_true(endptr == str + 4);

     /* For convenience, we permit values that are not byte-exact */
     str = "12.345M";
@@ -2711,16 +2711,16 @@  static void test_qemu_strtosz_float(void)
     g_assert_cmpuint(res, ==, (uint64_t) (12.345 * MiB + 0.5));
     g_assert_true(endptr == str + 7);

-    /* FIXME Fraction tail should round correctly */
+    /* Fraction tail can round up */
     str = "1.9999999999999999999999999999999999999999999999999999k";
     endptr = str;
     res = 0xbaadf00d;
     err = qemu_strtosz(str, &endptr, &res);
     g_assert_cmpint(err, ==, 0);
-    g_assert_cmpint(res, ==, 1024 /* FIXME 2048 */);
+    g_assert_cmpuint(res, ==, 2048);
     g_assert_true(endptr == str + 55);

-    /* FIXME ERANGE underflow in the fraction tail should not matter for 'k' */
+    /* ERANGE underflow in the fraction tail does not matter for 'k' */
     str = "1."
         "00000000000000000000000000000000000000000000000000"
         "00000000000000000000000000000000000000000000000000"
@@ -2734,7 +2734,7 @@  static void test_qemu_strtosz_float(void)
     res = 0xbaadf00d;
     err = qemu_strtosz(str, &endptr, &res);
     g_assert_cmpint(err, ==, 0);
-    g_assert_cmpuint(res, ==, 1 /* FIXME 1024 */);
+    g_assert_cmpuint(res, ==, 1024);
     g_assert_true(endptr == str + 354);
 }

@@ -2826,16 +2826,16 @@  static void test_qemu_strtosz_invalid(void)
     g_assert_cmpuint(res, ==, 0);
     g_assert_true(endptr == str);

-    /* FIXME Fraction tail can cause ERANGE overflow */
+    /* Fraction tail can cause ERANGE overflow */
     str = "15.9999999999999999999999999999999999999999999999999999E";
     endptr = str;
     res = 0xbaadf00d;
     err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, 0 /* FIXME -ERANGE */);
-    g_assert_cmpuint(res, ==, 15ULL * EiB /* FIXME 0 */);
-    g_assert_true(endptr == str + 56 /* FIXME str */);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert_cmpuint(res, ==, 0);
+    g_assert_true(endptr == str + 56);

-    /* FIXME ERANGE underflow in the fraction tail should matter for 'B' */
+    /* ERANGE underflow in the fraction tail matters for 'B' */
     str = "1."
         "00000000000000000000000000000000000000000000000000"
         "00000000000000000000000000000000000000000000000000"
@@ -2848,9 +2848,9 @@  static void test_qemu_strtosz_invalid(void)
     endptr = str;
     res = 0xbaadf00d;
     err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, 0 /* FIXME -EINVAL */);
-    g_assert_cmpuint(res, ==, 1 /* FIXME 0 */);
-    g_assert_true(endptr == str + 354 /* FIXME str */);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert_cmpuint(res, ==, 0);
+    g_assert_true(endptr == str);

     /* No hex fractions */
     str = "0x1.8k";
@@ -3045,14 +3045,14 @@  static void test_qemu_strtosz_trailing(void)
     g_assert_cmpint(err, ==, -EINVAL);
     g_assert_cmpuint(res, ==, 0);

-    /* FIXME should stop parse after 'e'. No floating point exponents */
+    /* Parse stops at 'e', which is not a floating point exponent */
     str = "1.5e1k";
     endptr = NULL;
     res = 0xbaadf00d;
     err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
-    g_assert_cmpuint(res, ==, 0 /* FIXME EiB * 1.5 */);
-    g_assert_true(endptr == str /* FIXME + 4 */);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpuint(res, ==, EiB * 1.5);
+    g_assert_true(endptr == str + 4);

     res = 0xbaadf00d;
     err = qemu_strtosz(str, NULL, &res);
@@ -3063,23 +3063,22 @@  static void test_qemu_strtosz_trailing(void)
     endptr = NULL;
     res = 0xbaadf00d;
     err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
-    g_assert_cmpuint(res, ==, 0 /* FIXME EiB * 1.5 */);
-    g_assert_true(endptr == str /* FIXME + 4 */);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpuint(res, ==, EiB * 1.5);
+    g_assert_true(endptr == str + 4);

     res = 0xbaadf00d;
     err = qemu_strtosz(str, NULL, &res);
     g_assert_cmpint(err, ==, -EINVAL);
     g_assert_cmpuint(res, ==, 0);

-    /* FIXME overflow in fraction is buggy */
     str = "1.5E999";
     endptr = NULL;
     res = 0xbaadf00d;
     err = qemu_strtosz(str, &endptr, &res);
     g_assert_cmpint(err, ==, 0);
-    g_assert_cmpuint(res, ==, 1 /* FIXME EiB * 1.5 */);
-    g_assert(endptr == str + 2 /* FIXME + 4 */);
+    g_assert_cmpuint(res, ==, EiB * 1.5);
+    g_assert_true(endptr == str + 4);

     res = 0xbaadf00d;
     err = qemu_strtosz(str, NULL, &res);
diff --git a/util/cutils.c b/util/cutils.c
index 0e056a27a44..d1dfbc69d16 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -194,14 +194,17 @@  static int64_t suffix_mul(char suffix, int64_t unit)
  * - 12345 - decimal, scale determined by @default_suffix and @unit
  * - 12345{bBkKmMgGtTpPeE} - decimal, scale determined by suffix and @unit
  * - 12345.678{kKmMgGtTpPeE} - decimal, scale determined by suffix, and
- *   fractional portion is truncated to byte
+ *   fractional portion is truncated to byte, either side of . may be empty
  * - 0x7fEE - hexadecimal, unit determined by @default_suffix
  *
  * The following are intentionally not supported
- * - hex with scaling suffix, such as 0x20M
- * - octal, such as 08
- * - fractional hex, such as 0x1.8
- * - floating point exponents, such as 1e3
+ * - hex with scaling suffix, such as 0x20M (0x1b is 27, not 1)
+ * - octal, such as 08 (parsed as decimal instead)
+ * - binary, such as 0b1000 (parsed as 0b with trailing garbage "1000")
+ * - fractional hex, such as 0x1.8 (parsed as 0 with trailing garbage "x1.8")
+ * - floating point exponents, such as 1e3 (parsed as 1e with trailing
+ *   garbage "3") or 0x1p3 (parsed as 1 with trailing garbage "p3")
+ * - non-finite values, such as inf or NaN
  *
  * The end pointer will be returned in *end, if not NULL.  If there is
  * no fraction, the input can be decimal or hexadecimal; if there is a
@@ -220,22 +223,17 @@  static int do_strtosz(const char *nptr, const char **end,
                       uint64_t *result)
 {
     int retval;
-    const char *endptr, *f;
+    const char *endptr;
     unsigned char c;
-    uint64_t val, valf = 0;
+    uint64_t val = 0, valf = 0;
     int64_t mul;

     /* Parse integral portion as decimal. */
     retval = qemu_strtou64(nptr, &endptr, 10, &val);
-    if (retval) {
+    if (retval == -ERANGE || !nptr) {
         goto out;
     }
-    if (memchr(nptr, '-', endptr - nptr) != NULL) {
-        endptr = nptr;
-        retval = -EINVAL;
-        goto out;
-    }
-    if (val == 0 && (*endptr == 'x' || *endptr == 'X')) {
+    if (retval == 0 && val == 0 && (*endptr == 'x' || *endptr == 'X')) {
         /* Input looks like hex; reparse, and insist on no fraction or suffix. */
         retval = qemu_strtou64(nptr, &endptr, 16, &val);
         if (retval) {
@@ -246,27 +244,66 @@  static int do_strtosz(const char *nptr, const char **end,
             retval = -EINVAL;
             goto out;
         }
-    } else if (*endptr == '.') {
+    } else if (*endptr == '.' || (endptr == nptr && strchr(nptr, '.'))) {
         /*
          * Input looks like a fraction.  Make sure even 1.k works
-         * without fractional digits.  If we see an exponent, treat
-         * the entire input as invalid instead.
+         * without fractional digits.  strtod tries to treat 'e' as an
+         * exponent, but we want to treat it as a scaling suffix;
+         * doing this requires modifying a copy of the fraction.
          */
-        double fraction;
+        double fraction = 0.0;

-        f = endptr;
-        retval = qemu_strtod_finite(f, &endptr, &fraction);
-        if (retval) {
+        if (retval == 0 && *endptr == '.' && !isdigit(endptr[1])) {
+            /* If we got here, we parsed at least one digit already. */
             endptr++;
-        } else if (memchr(f, 'e', endptr - f) || memchr(f, 'E', endptr - f)) {
-            endptr = nptr;
-            retval = -EINVAL;
-            goto out;
         } else {
-            /* Extract into a 64-bit fixed-point fraction. */
+            char *e;
+            const char *tail;
+            g_autofree char *copy = g_strdup(endptr);
+
+            e = strchr(copy, 'e');
+            if (e) {
+                *e = '\0';
+            }
+            e = strchr(copy, 'E');
+            if (e) {
+                *e = '\0';
+            }
+            /*
+             * If this is a floating point, we are guaranteed that '.'
+             * appears before any possible digits in copy.  If it is
+             * not a floating point, strtod will fail.  Either way,
+             * there is now no exponent in copy, so if it parses, we
+             * know 0.0 <= abs(result) <= 1.0 (after rounding), and
+             * ERANGE is only possible on underflow which is okay.
+             */
+            retval = qemu_strtod_finite(copy, &tail, &fraction);
+            endptr += tail - copy;
+        }
+
+        /* Extract into a 64-bit fixed-point fraction. */
+        if (fraction == 1.0) {
+            if (val == UINT64_MAX) {
+                retval = -ERANGE;
+                goto out;
+            }
+            val++;
+        } else if (retval == -ERANGE) {
+            /* See comments above about underflow */
+            valf = 1;
+            retval = 0;
+        } else {
             valf = (uint64_t)(fraction * 0x1p64);
         }
     }
+    if (retval) {
+        goto out;
+    }
+    if (memchr(nptr, '-', endptr - nptr) != NULL) {
+        endptr = nptr;
+        retval = -EINVAL;
+        goto out;
+    }
     c = *endptr;
     mul = suffix_mul(c, unit);
     if (mul > 0) {