Message ID | 876132lbic.fsf@gnu.org |
---|---|
State | New |
Headers | show |
On Tue, Sep 22, 2015 at 05:27:55PM +0200, Ludovic Courtès wrote: > With libc 2.22 people are starting to realize that libc does not > guarantee that it can load locale data built with another libc version, > but they learn it the hard way: > > loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' failed. > > This patch changes such conditions to return EINVAL instead of aborting. > > WDYT? > While that assert is quite cryptic I dont see why just returning EINVAL is better. How do you distinguish that its wrong locale version versus not installed?
Ondřej Bílka <neleai@seznam.cz> skribis: > On Tue, Sep 22, 2015 at 05:27:55PM +0200, Ludovic Courtès wrote: >> With libc 2.22 people are starting to realize that libc does not >> guarantee that it can load locale data built with another libc version, >> but they learn it the hard way: >> >> loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' failed. >> >> This patch changes such conditions to return EINVAL instead of aborting. >> >> WDYT? >> > While that assert is quite cryptic I dont see why just returning EINVAL is > better. How do you distinguish that its wrong locale version versus not > installed? The rest of this function already returns EINVAL when something is fishy. This patch makes the behavior more consistent. Ludo’. c
On Tue, Sep 22, 2015 at 11:22:40PM +0200, Ludovic Courtès wrote: > Ondřej Bílka <neleai@seznam.cz> skribis: > > > On Tue, Sep 22, 2015 at 05:27:55PM +0200, Ludovic Courtès wrote: > >> With libc 2.22 people are starting to realize that libc does not > >> guarantee that it can load locale data built with another libc version, > >> but they learn it the hard way: > >> > >> loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' failed. > >> > >> This patch changes such conditions to return EINVAL instead of aborting. > >> > >> WDYT? > >> > > While that assert is quite cryptic I dont see why just returning EINVAL is > > better. How do you distinguish that its wrong locale version versus not > > installed? > > The rest of this function already returns EINVAL when something is > fishy. This patch makes the behavior more consistent. > Then I take that back. But I don't see how this is reliable assertion to detect different libc version. So could you as followup patch add version field and check that instead this assert?
ludo@gnu.org (Ludovic Courtès) writes: > diff --git a/locale/loadlocale.c b/locale/loadlocale.c > index fdba6e9..e04e720 100644 > --- a/locale/loadlocale.c > +++ b/locale/loadlocale.c > @@ -122,8 +122,9 @@ _nl_intern_locale_data (int category, const void *data, size_t datasize) > { > #define CATTEST(cat) \ > case LC_##cat: \ > - assert (cnt < (sizeof (_nl_value_type_LC_##cat) \ > - / sizeof (_nl_value_type_LC_##cat[0]))); \ > + if (cnt >= (sizeof (_nl_value_type_LC_##cat) \ > + / sizeof (_nl_value_type_LC_##cat[0]))) \ > + goto puntdata; \ > break > CATTEST (NUMERIC); > CATTEST (TIME); What about the other assertion? Andreas.
Andreas Schwab <schwab@suse.de> writes:
> What about the other assertion?
Ignore that. This one is a real internal consistency check.
Andreas.
Ondřej Bílka <neleai@seznam.cz> skribis: > On Tue, Sep 22, 2015 at 11:22:40PM +0200, Ludovic Courtès wrote: >> Ondřej Bílka <neleai@seznam.cz> skribis: >> >> > On Tue, Sep 22, 2015 at 05:27:55PM +0200, Ludovic Courtès wrote: >> >> With libc 2.22 people are starting to realize that libc does not >> >> guarantee that it can load locale data built with another libc version, >> >> but they learn it the hard way: >> >> >> >> loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' failed. >> >> >> >> This patch changes such conditions to return EINVAL instead of aborting. >> >> >> >> WDYT? >> >> >> > While that assert is quite cryptic I dont see why just returning EINVAL is >> > better. How do you distinguish that its wrong locale version versus not >> > installed? >> >> The rest of this function already returns EINVAL when something is >> fishy. This patch makes the behavior more consistent. >> > Then I take that back. But I don't see how this is reliable assertion to > detect different libc version. The goal is not to detect a different libc version, but rather to gracefully handle the situation where incompatible (or broken) locale data is found. > So could you as followup patch add version field and check that > instead this assert? It would be inaccurate since sometimes different libc versions produce and consume the same binary data (typically when no locale category elements are added, as was the case between ~2.19 to 2.21.) In addition, there is no errno value to report this specific version-mismatch diagnostic. Ludo’.
On Wed, Sep 23, 2015 at 11:45:52PM +0200, Ludovic Courtès wrote: > Ondřej Bílka <neleai@seznam.cz> skribis: > > > On Tue, Sep 22, 2015 at 11:22:40PM +0200, Ludovic Courtès wrote: > >> Ondřej Bílka <neleai@seznam.cz> skribis: > >> > >> > On Tue, Sep 22, 2015 at 05:27:55PM +0200, Ludovic Courtès wrote: > >> >> With libc 2.22 people are starting to realize that libc does not > >> >> guarantee that it can load locale data built with another libc version, > >> >> but they learn it the hard way: > >> >> > >> >> loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' failed. > >> >> > >> >> This patch changes such conditions to return EINVAL instead of aborting. > >> >> > >> >> WDYT? > >> >> > >> > While that assert is quite cryptic I dont see why just returning EINVAL is > >> > better. How do you distinguish that its wrong locale version versus not > >> > installed? > >> > >> The rest of this function already returns EINVAL when something is > >> fishy. This patch makes the behavior more consistent. > >> > > Then I take that back. But I don't see how this is reliable assertion to > > detect different libc version. > > The goal is not to detect a different libc version, but rather to > gracefully handle the situation where incompatible (or broken) locale > data is found. > But my point is that this assert doesn't do that reliably. There could be incompatible change that doesn't trigger that assert. > > So could you as followup patch add version field and check that > > instead this assert? > > It would be inaccurate since sometimes different libc versions produce > and consume the same binary data (typically when no locale category > elements are added, as was the case between ~2.19 to 2.21.) > Its better to keep it simple, you couldn't use locale data between versions period. There could be also changes of algorithm so its harder to tell if it will work or not. > In addition, there is no errno value to report this specific > version-mismatch diagnostic. > We could also reject your patch based on this as you dont report specific value for assert so this doesn't matter.
Ondřej, I think we have been miscommunicating. I noticed that a program linked against 2.21 or earlier would abort with an assertion failure when it stumbles upon 2.22 locale data. All the patch tries to do is change the abort to EINVAL (and skip locale data) when that happens. I’m not claiming this is perfect, and I agree with you on that point. I’m just saying that ignoring the faulty locale data and returning EINVAL (which the application can choose to take into account or not) is preferable to aborting. Does that make sense? Thanks, Ludo’.
On 09/24/2015 12:12 PM, Ludovic Courtès wrote: > Ondřej, I think we have been miscommunicating. > > I noticed that a program linked against 2.21 or earlier would abort with > an assertion failure when it stumbles upon 2.22 locale data. > > All the patch tries to do is change the abort to EINVAL (and skip locale > data) when that happens. > > I’m not claiming this is perfect, and I agree with you on that point. > I’m just saying that ignoring the faulty locale data and returning > EINVAL (which the application can choose to take into account or not) is > preferable to aborting. > > Does that make sense? Despite Roland saying "LGTM", I think this is not a good change. Firstly, it's not the community consensus as Ondrej is pointing out. https://sourceware.org/glibc/wiki/Style_and_Conventions#Error_Handling It is a fundamental system misconfiguration issue not to have upgraded the binary locale data from one release to another. The community consensus was that user errors like this should fail immediately, but in ways which the user understands the failure, and fixes the system. Returning an error code simply leads to the user ignoring the serious configuration issue. Worse is that in a locale archive file we now skip such bad binary archives (_nl_load_locale_from_archive), and this hides the problem. Worse we might also skip locale files in directories like /usr/lib/locale/C.utf8, which we might want to always be loaded as a default UTF-8 locale. I'd rather see an error message in Fedora than allow that to continue by skipping that locale with no error given. We should abort, but the abort error message should be much clearer about what's going wrong. Therefore I would accept a patch that gives a clearer error message in this case, but not one that removes the assert. Cheers, Carlos.
"Carlos O'Donell" <carlos@redhat.com> skribis: > Despite Roland saying "LGTM", I think this is not a good change. > > Firstly, it's not the community consensus as Ondrej is pointing out. > > https://sourceware.org/glibc/wiki/Style_and_Conventions#Error_Handling “Assertions are for internal consistency checking only.” I would argue that what this patch changes is not an internal consistency check. Furthermore, the function in question returns EINVAL in other similar cases–e.g., when libc 2.22 loads LC_COLLATE data from libc 2.21. So it is not clear to me that the patch would be a violation of these rules. WDYT? > It is a fundamental system misconfiguration issue not to have upgraded > the binary locale data from one release to another. I would not call it “misconfiguration.” With GNU Guix, unprivileged users can install packages in their “profile” and they are free to choose whether and when to upgrade those packages. Consequently, they might be using a libc version different from the one the system administrator used to build the system-wide locale data. Currently, the assertion failure greatly penalizes this use case: https://lists.gnu.org/archive/html/guix-devel/2015-09/msg00717.html Returning EINVAL instead of aborting would make things easier. But it’s not perfect. IMO, a discussion on improving the coexistence of different libc/locale versions on the same system is in order. But it’s beyond the scope of this two-line patch. Thanks, Ludo’.
On 09/26/2015 06:24 AM, Ludovic Courtès wrote: > Furthermore, the function in question returns EINVAL in other similar > cases–e.g., when libc 2.22 loads LC_COLLATE data from libc 2.21. If you change this particular case to EINVAL, what does the user see as a result of this change? Do they get a non-zero exit code from `localedef --list-archive` along with an error written out to stderr? This is the kind of change I'm expecting. If we are removing an assertion, we should be replacing it with something meaningful and verifying that meaningful change. You need not change any of the other cases you've found that return EINVAL, we can update those incrementally, but for this one change you're making we should fix it as best we can. Cheers, Carlos.
"Carlos O'Donell" <carlos@redhat.com> skribis: > On 09/26/2015 06:24 AM, Ludovic Courtès wrote: >> Furthermore, the function in question returns EINVAL in other similar >> cases–e.g., when libc 2.22 loads LC_COLLATE data from libc 2.21. > > If you change this particular case to EINVAL, what does the user see > as a result of this change? The user-visible change is that, if incompatible or broken locale data is found, a call like: setlocale (LC_ALL, ""); returns EINVAL instead of aborting. > Do they get a non-zero exit code from `localedef --list-archive` along > with an error written out to stderr? ‘localedef’ starts with: setlocale (LC_MESSAGES, ""); setlocale (LC_CTYPE, ""); so it will no longer abort when invalid locale data is found (although in the 2.21 → 2.22 transition, only the LC_COLLATE data format differs anyway.) Apart from that, ‘localedef --list-archive’ simply opens the locale archive (typically /usr/lib/locale/locale-archive, regardless of the ‘LOCPATH’ environment variable value), so its behavior is unchanged. Am I overlooking something? > This is the kind of change I'm expecting. If we are removing an assertion, > we should be replacing it with something meaningful and verifying that > meaningful change. Yes, agreed. The function that is changed, ‘_nl_intern_locale_data’, has only two callers in libc, and both check whether it returns NULL. So it seems to me that the code is not introducing anything new in the API contract. WDYT? Thank you, Ludo’.
Hello! A friendly ping so the discussion does not die out. :-) https://sourceware.org/ml/libc-alpha/2015-09/threads.html#00575 https://sourceware.org/ml/libc-alpha/2015-09/msg00727.html TIA, Ludo’.
On 29/09/15 06:54, Carlos O'Donell wrote: > On 09/26/2015 06:24 AM, Ludovic Courtès wrote: >> Furthermore, the function in question returns EINVAL in other similar >> cases–e.g., when libc 2.22 loads LC_COLLATE data from libc 2.21. > > If you change this particular case to EINVAL, what does the user see > as a result of this change? Do they get a non-zero exit code from > `localedef --list-archive` along with an error written out to stderr? > > This is the kind of change I'm expecting. If we are removing an assertion, > we should be replacing it with something meaningful and verifying that > meaningful change. > > You need not change any of the other cases you've found that return EINVAL, > we can update those incrementally, but for this one change you're making > we should fix it as best we can. > If I am reading this correctly, the change to from an abort to EINVAL would be fine if it is accompanied by a change to localedef --list-archive. Is that correct? A solution to this would be great given we now run into this assert with locale archives built with different glibc builds along the 2.22 release branch. Allan
Allan McRae <allan@archlinux.org> skribis: > On 29/09/15 06:54, Carlos O'Donell wrote: >> On 09/26/2015 06:24 AM, Ludovic Courtès wrote: >>> Furthermore, the function in question returns EINVAL in other similar >>> cases–e.g., when libc 2.22 loads LC_COLLATE data from libc 2.21. >> >> If you change this particular case to EINVAL, what does the user see >> as a result of this change? Do they get a non-zero exit code from >> `localedef --list-archive` along with an error written out to stderr? >> >> This is the kind of change I'm expecting. If we are removing an assertion, >> we should be replacing it with something meaningful and verifying that >> meaningful change. >> >> You need not change any of the other cases you've found that return EINVAL, >> we can update those incrementally, but for this one change you're making >> we should fix it as best we can. >> > > If I am reading this correctly, the change to from an abort to EINVAL > would be fine if it is accompanied by a change to localedef > --list-archive. Is that correct? My understanding is that no such change is needed, but I’m waiting for confirmation or clarification: https://sourceware.org/ml/libc-alpha/2015-09/msg00727.html > A solution to this would be great given we now run into this assert with > locale archives built with different glibc builds along the 2.22 release > branch. I’m glad you value the practical benefits. ;-) Ludo’.
On 09/29/2015 04:08 AM, Ludovic Courtès wrote: > "Carlos O'Donell" <carlos@redhat.com> skribis: > >> On 09/26/2015 06:24 AM, Ludovic Courtès wrote: >>> Furthermore, the function in question returns EINVAL in other similar >>> cases–e.g., when libc 2.22 loads LC_COLLATE data from libc 2.21. >> >> If you change this particular case to EINVAL, what does the user see >> as a result of this change? > > The user-visible change is that, if incompatible or broken locale data > is found, a call like: > > setlocale (LC_ALL, ""); > > returns EINVAL instead of aborting. Perfect. >> Do they get a non-zero exit code from `localedef --list-archive` along >> with an error written out to stderr? > > ‘localedef’ starts with: > > setlocale (LC_MESSAGES, ""); > setlocale (LC_CTYPE, ""); > > so it will no longer abort when invalid locale data is found (although > in the 2.21 → 2.22 transition, only the LC_COLLATE data format differs > anyway.) OK. > Apart from that, ‘localedef --list-archive’ simply opens the locale > archive (typically /usr/lib/locale/locale-archive, regardless of the > ‘LOCPATH’ environment variable value), so its behavior is unchanged. > > Am I overlooking something? If the locale-archive is upgraded to the new format with LC_COLLATE changed what happens when you run localedef --list-archive? Does it list zero locales and exit with an exit code of zero? I would hope that it prints something about the broken locale, because after the removal of the assertion you won't know anything is wrong with the archive? >> This is the kind of change I'm expecting. If we are removing an assertion, >> we should be replacing it with something meaningful and verifying that >> meaningful change. > > Yes, agreed. > > The function that is changed, ‘_nl_intern_locale_data’, has only two > callers in libc, and both check whether it returns NULL. So it seems to > me that the code is not introducing anything new in the API contract. > WDYT? It isn't introducing anything new, but you are removing an assert and we need to make sure that the intent of the design remains: Warn the user something is wrong with the locale data. I see two cases: - What does setlocale() return? - Verified. You say it returns EINVAL for the affected locale and that's perfect. - What does localedef --list-archive return? - The new LC_COLLATE format will make it's way into the binary locale archive and that means glibc can't read the locale-archive? Does it fail? exit code? If we cover those two cases with some kind of error message then I think we're done. Cheers, Carlos.
On 10/12/2015 08:49 PM, Allan McRae wrote: > On 29/09/15 06:54, Carlos O'Donell wrote: >> On 09/26/2015 06:24 AM, Ludovic Courtès wrote: >>> Furthermore, the function in question returns EINVAL in other similar >>> cases–e.g., when libc 2.22 loads LC_COLLATE data from libc 2.21. >> >> If you change this particular case to EINVAL, what does the user see >> as a result of this change? Do they get a non-zero exit code from >> `localedef --list-archive` along with an error written out to stderr? >> >> This is the kind of change I'm expecting. If we are removing an assertion, >> we should be replacing it with something meaningful and verifying that >> meaningful change. >> >> You need not change any of the other cases you've found that return EINVAL, >> we can update those incrementally, but for this one change you're making >> we should fix it as best we can. >> > > If I am reading this correctly, the change to from an abort to EINVAL > would be fine if it is accompanied by a change to localedef > --list-archive. Is that correct? > > A solution to this would be great given we now run into this assert with > locale archives built with different glibc builds along the 2.22 release > branch. Yes. I'll make some general comments in the thread you started about the patch, rather than here. Cheers, Carlos.
"Carlos O'Donell" <carlos@redhat.com> skribis: > On 09/29/2015 04:08 AM, Ludovic Courtès wrote: >> "Carlos O'Donell" <carlos@redhat.com> skribis: [...] >> Apart from that, ‘localedef --list-archive’ simply opens the locale >> archive (typically /usr/lib/locale/locale-archive, regardless of the >> ‘LOCPATH’ environment variable value), so its behavior is unchanged. >> >> Am I overlooking something? > > If the locale-archive is upgraded to the new format with LC_COLLATE changed > what happens when you run localedef --list-archive? Does it list zero locales > and exit with an exit code of zero? The patch does not change archive loading; it changes locale data loading, which is unrelated (loadlocale.c vs. loadarchive.c.) > - What does localedef --list-archive return? > > - The new LC_COLLATE format will make it's way into the binary locale archive > and that means glibc can't read the locale-archive? Does it fail? exit code? The patch does not change how locale archives are handled. I think we’re confusing locale archive and locale data; or am I simply missing something? :-) Thanks, Ludo’.
Hello, Ludovic Courtès, le Tue 22 Sep 2015 17:27:55 +0200, a écrit : > loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' failed. > > This patch changes such conditions to return EINVAL instead of aborting. Just like it does for the __glibc_unlikely (idx > (size_t) newdata->filesize) test above, so it doesn't actually introduce any new error condition. I thus commited the change, thanks! Samuel
On 10/13/2015 10:45 AM, Ludovic Courtès wrote: >> - What does localedef --list-archive return? >> >> - The new LC_COLLATE format will make it's way into the binary locale archive >> and that means glibc can't read the locale-archive? Does it fail? exit code? > > The patch does not change how locale archives are handled. > > I think we’re confusing locale archive and locale data; or am I simply > missing something? :-) Your patch is OK. Notes: (1) Do we return NULL and EINVAL? Yes. Loading locale data from the locale archive uses _nl_load_locale_from_archive. The function _nl_load_locale_from_archive calls _nl_intern_locale_data which can trigger the assert on invalid type sizes. ~~~ locale/loadarchive.c ~~~ 134 _nl_load_locale_from_archive (int category, const char **namep) ... 478 lia->data[cnt] = _nl_intern_locale_data (cnt, 479 results[cnt].addr, 480 results[cnt].len); ~~~ Which seems like it can trigger the assertion when loading the larger LC_COLLATE data from the archive. Now we return NULL, ignore the failed load, and potentially return NULL again since `lia->data[category]` is now NULL. This means `_nl_find_locale` returns NULL, and then functions like `setlocale` appear to return NULL to indicate no data was loaded with errno set to EINVAL. (2) Does localedef --list-archive work? Yes. It is unaffected by the LC_COLLATE changes since the locale archive records have explicit length and can be listed even when they can't be loaded. This is wrong IMO, and we should have done something to detect the invalid LC_COLLATE and print a warning, but that's another QoI issue unrelated to the patch you're trying to apply. c.
On 10/27/2015 11:30 AM, Samuel Thibault wrote: > Hello, > > Ludovic Courtès, le Tue 22 Sep 2015 17:27:55 +0200, a écrit : >> loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' failed. >> >> This patch changes such conditions to return EINVAL instead of aborting. > > Just like it does for the __glibc_unlikely (idx > (size_t) > newdata->filesize) test above, so it doesn't actually introduce any new > error condition. > > I thus commited the change, thanks! > > Samuel > Thanks Samuel! c.
diff --git a/locale/loadlocale.c b/locale/loadlocale.c index fdba6e9..e04e720 100644 --- a/locale/loadlocale.c +++ b/locale/loadlocale.c @@ -122,8 +122,9 @@ _nl_intern_locale_data (int category, const void *data, size_t datasize) { #define CATTEST(cat) \ case LC_##cat: \ - assert (cnt < (sizeof (_nl_value_type_LC_##cat) \ - / sizeof (_nl_value_type_LC_##cat[0]))); \ + if (cnt >= (sizeof (_nl_value_type_LC_##cat) \ + / sizeof (_nl_value_type_LC_##cat[0]))) \ + goto puntdata; \ break CATTEST (NUMERIC); CATTEST (TIME);