diff mbox series

[1/2] localedef: Fix handling of empty mon_decimal_point

Message ID 20220131053442.3995804-2-carlos@redhat.com
State New
Headers show
Series Make C/POSIX and C.UTF-8 consistent. | expand

Commit Message

Carlos O'Donell Jan. 31, 2022, 5:34 a.m. UTC
The handling of mon_decimal_point is incorrect when it comes to
handling the empty "" value.  The existing parser in monetary_read()
will correctly handle setting the non-wide-character value and the
wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are
set in the locale definition.  However, in monetary_finish() we have
conflicting TEST_ELEM() which sets a default value (if the locale
definition doesn't include one), and subsequent code which looks for
mon_decimal_point to be NULL to issue a specific error message and set
the defaults. The latter is unused because TEST_ELEM() always sets a
default.  The simplest solution is to remove the TEST_ELEM() check,
and allow the existing check to look to see if mon_decimal_point is
NULL and set an appropriate default.  The final fix is to move the
setting of mon_decimal_point_wc so it occurs only when
mon_decimal_point is being set to a default, keeping both values
consistent. There is no way to tell the difference between
mon_decimal_point_wc having been set to the empty string and not
having been defined at all, for that distinction we must use
mon_decimal_point being NULL or "", and so we must logically set
the default together with mon_decimal_point.

Lastly, there are more fixes similar to this that could be made to
ld-monetary.c, but we avoid that in order to fix just the code
required for mon_decimal_point, which impacts the ability for C.UTF-8
to set mon_decimal_point to "", since without this fix we end up with
an inconsistent setting of mon_decimal_point set to "", but
mon_decimal_point_wc set to "." which is incorrect.

Tested on x86_64 and i686 without regression.
---
 locale/programs/ld-monetary.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Florian Weimer Jan. 31, 2022, 3:26 p.m. UTC | #1
* Carlos O'Donell:

> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
> index 277b9ff042..3b0412b405 100644
> --- a/locale/programs/ld-monetary.c
> +++ b/locale/programs/ld-monetary.c
> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY");
>  
>    TEST_ELEM (int_curr_symbol, "");
>    TEST_ELEM (currency_symbol, "");
> -  TEST_ELEM (mon_decimal_point, ".");
>    TEST_ELEM (mon_thousands_sep, "");
>    TEST_ELEM (positive_sign, "");
>    TEST_ELEM (negative_sign, "");
> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>  	record_error (0, 0, _("%s: field `%s' not defined"),
>  		      "LC_MONETARY", "mon_decimal_point");
>        monetary->mon_decimal_point = ".";
> +      monetary->mon_decimal_point_wc = L'.';
>      }
>    else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
>      {
> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>  %s: value for field `%s' must not be an empty string"),
>  		    "LC_MONETARY", "mon_decimal_point");
>      }
> -  if (monetary->mon_decimal_point_wc == L'\0')
> -    monetary->mon_decimal_point_wc = L'.';
>  
>    if (monetary->mon_grouping_len == 0)
>      {

There's an existing comment

  /* The decimal point must not be empty.  This is not said explicitly
     in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
     != "".  */

that says that empty strings/null characters are invalid.
The comment was clearly copied from locale/programs/ld-numeric.c.

*However* we have got this code in stdio-common/printf_fp.c:

      decimal = _nl_lookup (loc, LC_MONETARY, MON_DECIMAL_POINT);
      if (*decimal == '\0')
	decimal = _nl_lookup (loc, LC_NUMERIC, DECIMAL_POINT);
      decimalwc = _nl_lookup_word (loc, LC_MONETARY,
				    _NL_MONETARY_DECIMAL_POINT_WC);
      if (decimalwc == L'\0')
	decimalwc = _nl_lookup_word (loc, LC_NUMERIC,
				      _NL_NUMERIC_DECIMAL_POINT_WC);

So we use LC_NUMERIC as the fallback, and our strfmon implementation is
okay with it.  But our localeconv implementation lacks this fallback,
which looks like a bug because the built-in C locale uses an empty
string/a null character.

Still I think simplifying the locale data is the right direction here.

Thanks,
Florian
Andreas Schwab Jan. 31, 2022, 4:09 p.m. UTC | #2
On Jan 31 2022, Florian Weimer via Libc-alpha wrote:

> There's an existing comment
>
>   /* The decimal point must not be empty.  This is not said explicitly
>      in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
>      != "".  */
>
> that says that empty strings/null characters are invalid.

This is only about decimal_point, mon_decimal_point can be empty.
Florian Weimer Jan. 31, 2022, 4:20 p.m. UTC | #3
* Andreas Schwab:

> On Jan 31 2022, Florian Weimer via Libc-alpha wrote:
>
>> There's an existing comment
>>
>>   /* The decimal point must not be empty.  This is not said explicitly
>>      in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
>>      != "".  */
>>
>> that says that empty strings/null characters are invalid.
>
> This is only about decimal_point, mon_decimal_point can be empty.

Hmm, I'll take your word for it.

So the comment should definitely go, and the Carlos' change is the right
way to do it?

Thanks,
Florian
Andreas Schwab Jan. 31, 2022, 4:30 p.m. UTC | #4
On Jan 31 2022, Florian Weimer wrote:

> * Andreas Schwab:
>
>> On Jan 31 2022, Florian Weimer via Libc-alpha wrote:
>>
>>> There's an existing comment
>>>
>>>   /* The decimal point must not be empty.  This is not said explicitly
>>>      in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
>>>      != "".  */
>>>
>>> that says that empty strings/null characters are invalid.
>>
>> This is only about decimal_point, mon_decimal_point can be empty.
>
> Hmm, I'll take your word for it.

See 7.11.2.1, paragraph 3 and 10.
Florian Weimer Jan. 31, 2022, 4:37 p.m. UTC | #5
* Andreas Schwab:

> On Jan 31 2022, Florian Weimer wrote:
>
>> * Andreas Schwab:
>>
>>> On Jan 31 2022, Florian Weimer via Libc-alpha wrote:
>>>
>>>> There's an existing comment
>>>>
>>>>   /* The decimal point must not be empty.  This is not said explicitly
>>>>      in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
>>>>      != "".  */
>>>>
>>>> that says that empty strings/null characters are invalid.
>>>
>>> This is only about decimal_point, mon_decimal_point can be empty.
>>
>> Hmm, I'll take your word for it.
>
> See 7.11.2.1, paragraph 3 and 10.

That is fairly conclusive indeed (numbers match C11).

Are you okay with Carlos' patch with a comment update?

Thanks,
Florian
Florian Weimer Feb. 1, 2022, 11:47 a.m. UTC | #6
* Carlos O'Donell:

> The handling of mon_decimal_point is incorrect when it comes to
> handling the empty "" value.  The existing parser in monetary_read()
> will correctly handle setting the non-wide-character value and the
> wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are
> set in the locale definition.  However, in monetary_finish() we have
> conflicting TEST_ELEM() which sets a default value (if the locale
> definition doesn't include one), and subsequent code which looks for
> mon_decimal_point to be NULL to issue a specific error message and set
> the defaults. The latter is unused because TEST_ELEM() always sets a
> default.  The simplest solution is to remove the TEST_ELEM() check,
> and allow the existing check to look to see if mon_decimal_point is
> NULL and set an appropriate default.  The final fix is to move the
> setting of mon_decimal_point_wc so it occurs only when
> mon_decimal_point is being set to a default, keeping both values
> consistent. There is no way to tell the difference between
> mon_decimal_point_wc having been set to the empty string and not
> having been defined at all, for that distinction we must use
> mon_decimal_point being NULL or "", and so we must logically set
> the default together with mon_decimal_point.
>
> Lastly, there are more fixes similar to this that could be made to
> ld-monetary.c, but we avoid that in order to fix just the code
> required for mon_decimal_point, which impacts the ability for C.UTF-8
> to set mon_decimal_point to "", since without this fix we end up with
> an inconsistent setting of mon_decimal_point set to "", but
> mon_decimal_point_wc set to "." which is incorrect.
>
> Tested on x86_64 and i686 without regression.
> ---
>  locale/programs/ld-monetary.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
> index 277b9ff042..3b0412b405 100644
> --- a/locale/programs/ld-monetary.c
> +++ b/locale/programs/ld-monetary.c
> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY");
>  
>    TEST_ELEM (int_curr_symbol, "");
>    TEST_ELEM (currency_symbol, "");
> -  TEST_ELEM (mon_decimal_point, ".");
>    TEST_ELEM (mon_thousands_sep, "");
>    TEST_ELEM (positive_sign, "");
>    TEST_ELEM (negative_sign, "");
> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>  	record_error (0, 0, _("%s: field `%s' not defined"),
>  		      "LC_MONETARY", "mon_decimal_point");
>        monetary->mon_decimal_point = ".";
> +      monetary->mon_decimal_point_wc = L'.';
>      }
>    else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
>      {
> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>  %s: value for field `%s' must not be an empty string"),
>  		    "LC_MONETARY", "mon_decimal_point");
>      }
> -  if (monetary->mon_decimal_point_wc == L'\0')
> -    monetary->mon_decimal_point_wc = L'.';
>  
>    if (monetary->mon_grouping_len == 0)
>      {

I have verified that this does not change the localedef output for the
existing locales created by install-locale-files.

I think we need further cleanups in the comments and checks (which were
coped from LC_NUMERIC, but should not apply to LC_MONETARY).  But I
think we can release with this version.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian
Carlos O'Donell Feb. 1, 2022, 4 p.m. UTC | #7
On 2/1/22 06:47, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> The handling of mon_decimal_point is incorrect when it comes to
>> handling the empty "" value.  The existing parser in monetary_read()
>> will correctly handle setting the non-wide-character value and the
>> wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are
>> set in the locale definition.  However, in monetary_finish() we have
>> conflicting TEST_ELEM() which sets a default value (if the locale
>> definition doesn't include one), and subsequent code which looks for
>> mon_decimal_point to be NULL to issue a specific error message and set
>> the defaults. The latter is unused because TEST_ELEM() always sets a
>> default.  The simplest solution is to remove the TEST_ELEM() check,
>> and allow the existing check to look to see if mon_decimal_point is
>> NULL and set an appropriate default.  The final fix is to move the
>> setting of mon_decimal_point_wc so it occurs only when
>> mon_decimal_point is being set to a default, keeping both values
>> consistent. There is no way to tell the difference between
>> mon_decimal_point_wc having been set to the empty string and not
>> having been defined at all, for that distinction we must use
>> mon_decimal_point being NULL or "", and so we must logically set
>> the default together with mon_decimal_point.
>>
>> Lastly, there are more fixes similar to this that could be made to
>> ld-monetary.c, but we avoid that in order to fix just the code
>> required for mon_decimal_point, which impacts the ability for C.UTF-8
>> to set mon_decimal_point to "", since without this fix we end up with
>> an inconsistent setting of mon_decimal_point set to "", but
>> mon_decimal_point_wc set to "." which is incorrect.
>>
>> Tested on x86_64 and i686 without regression.
>> ---
>>  locale/programs/ld-monetary.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
>> index 277b9ff042..3b0412b405 100644
>> --- a/locale/programs/ld-monetary.c
>> +++ b/locale/programs/ld-monetary.c
>> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY");
>>  
>>    TEST_ELEM (int_curr_symbol, "");
>>    TEST_ELEM (currency_symbol, "");
>> -  TEST_ELEM (mon_decimal_point, ".");
>>    TEST_ELEM (mon_thousands_sep, "");
>>    TEST_ELEM (positive_sign, "");
>>    TEST_ELEM (negative_sign, "");
>> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>>  	record_error (0, 0, _("%s: field `%s' not defined"),
>>  		      "LC_MONETARY", "mon_decimal_point");
>>        monetary->mon_decimal_point = ".";
>> +      monetary->mon_decimal_point_wc = L'.';
>>      }
>>    else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
>>      {
>> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>>  %s: value for field `%s' must not be an empty string"),
>>  		    "LC_MONETARY", "mon_decimal_point");
>>      }
>> -  if (monetary->mon_decimal_point_wc == L'\0')
>> -    monetary->mon_decimal_point_wc = L'.';
>>  
>>    if (monetary->mon_grouping_len == 0)
>>      {
> 
> I have verified that this does not change the localedef output for the
> existing locales created by install-locale-files.
> 
> I think we need further cleanups in the comments and checks (which were
> coped from LC_NUMERIC, but should not apply to LC_MONETARY).  But I
> think we can release with this version.

I filed this bug to track that:
Bug 28845 - ld-monetary.c should be updated to match ISO C and other standards.
https://sourceware.org/bugzilla/show_bug.cgi?id=28845

Thanks for the review!


> Reviewed-by: Florian Weimer <fweimer@redhat.com>
> 
> Thanks,
> Florian
>
Carlos O'Donell Feb. 1, 2022, 4:14 p.m. UTC | #8
On 2/1/22 11:00, Carlos O'Donell wrote:
> On 2/1/22 06:47, Florian Weimer wrote:
>> * Carlos O'Donell:
>>
>>> The handling of mon_decimal_point is incorrect when it comes to
>>> handling the empty "" value.  The existing parser in monetary_read()
>>> will correctly handle setting the non-wide-character value and the
>>> wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are
>>> set in the locale definition.  However, in monetary_finish() we have
>>> conflicting TEST_ELEM() which sets a default value (if the locale
>>> definition doesn't include one), and subsequent code which looks for
>>> mon_decimal_point to be NULL to issue a specific error message and set
>>> the defaults. The latter is unused because TEST_ELEM() always sets a
>>> default.  The simplest solution is to remove the TEST_ELEM() check,
>>> and allow the existing check to look to see if mon_decimal_point is
>>> NULL and set an appropriate default.  The final fix is to move the
>>> setting of mon_decimal_point_wc so it occurs only when
>>> mon_decimal_point is being set to a default, keeping both values
>>> consistent. There is no way to tell the difference between
>>> mon_decimal_point_wc having been set to the empty string and not
>>> having been defined at all, for that distinction we must use
>>> mon_decimal_point being NULL or "", and so we must logically set
>>> the default together with mon_decimal_point.
>>>
>>> Lastly, there are more fixes similar to this that could be made to
>>> ld-monetary.c, but we avoid that in order to fix just the code
>>> required for mon_decimal_point, which impacts the ability for C.UTF-8
>>> to set mon_decimal_point to "", since without this fix we end up with
>>> an inconsistent setting of mon_decimal_point set to "", but
>>> mon_decimal_point_wc set to "." which is incorrect.
>>>
>>> Tested on x86_64 and i686 without regression.
>>> ---
>>>  locale/programs/ld-monetary.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
>>> index 277b9ff042..3b0412b405 100644
>>> --- a/locale/programs/ld-monetary.c
>>> +++ b/locale/programs/ld-monetary.c
>>> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY");
>>>  
>>>    TEST_ELEM (int_curr_symbol, "");
>>>    TEST_ELEM (currency_symbol, "");
>>> -  TEST_ELEM (mon_decimal_point, ".");
>>>    TEST_ELEM (mon_thousands_sep, "");
>>>    TEST_ELEM (positive_sign, "");
>>>    TEST_ELEM (negative_sign, "");
>>> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>>>  	record_error (0, 0, _("%s: field `%s' not defined"),
>>>  		      "LC_MONETARY", "mon_decimal_point");
>>>        monetary->mon_decimal_point = ".";
>>> +      monetary->mon_decimal_point_wc = L'.';
>>>      }
>>>    else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
>>>      {
>>> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>>>  %s: value for field `%s' must not be an empty string"),
>>>  		    "LC_MONETARY", "mon_decimal_point");
>>>      }
>>> -  if (monetary->mon_decimal_point_wc == L'\0')
>>> -    monetary->mon_decimal_point_wc = L'.';
>>>  
>>>    if (monetary->mon_grouping_len == 0)
>>>      {
>>
>> I have verified that this does not change the localedef output for the
>> existing locales created by install-locale-files.
>>
>> I think we need further cleanups in the comments and checks (which were
>> coped from LC_NUMERIC, but should not apply to LC_MONETARY).  But I
>> think we can release with this version.
> 
> I filed this bug to track that:
> Bug 28845 - ld-monetary.c should be updated to match ISO C and other standards.
> https://sourceware.org/bugzilla/show_bug.cgi?id=28845

And I filed one more bug to track the original bug, which I'll close after push:
https://sourceware.org/bugzilla/show_bug.cgi?id=28847
diff mbox series

Patch

diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
index 277b9ff042..3b0412b405 100644
--- a/locale/programs/ld-monetary.c
+++ b/locale/programs/ld-monetary.c
@@ -207,7 +207,6 @@  No definition for %s category found"), "LC_MONETARY");
 
   TEST_ELEM (int_curr_symbol, "");
   TEST_ELEM (currency_symbol, "");
-  TEST_ELEM (mon_decimal_point, ".");
   TEST_ELEM (mon_thousands_sep, "");
   TEST_ELEM (positive_sign, "");
   TEST_ELEM (negative_sign, "");
@@ -257,6 +256,7 @@  not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
 	record_error (0, 0, _("%s: field `%s' not defined"),
 		      "LC_MONETARY", "mon_decimal_point");
       monetary->mon_decimal_point = ".";
+      monetary->mon_decimal_point_wc = L'.';
     }
   else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
     {
@@ -264,8 +264,6 @@  not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
 %s: value for field `%s' must not be an empty string"),
 		    "LC_MONETARY", "mon_decimal_point");
     }
-  if (monetary->mon_decimal_point_wc == L'\0')
-    monetary->mon_decimal_point_wc = L'.';
 
   if (monetary->mon_grouping_len == 0)
     {