Message ID | 5571B8C2.8000108@redhat.com |
---|---|
State | New |
Headers | show |
On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: > Hi, > > make country_isbn definitions consistent across locales by using > Unicode code points not numerals everywhere. The code in > locale/categories.def and locale/programs/ld-address.c already > handles strings. > > Please apply. > Possible but why, when these are numbers which are easier to read than strings?
Hi, On 2015-06-09 10:11, Ondřej Bílka wrote: > On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: >> >> make country_isbn definitions consistent across locales by using >> Unicode code points not numerals everywhere. The code in >> locale/categories.def and locale/programs/ld-address.c already >> handles strings. >> >> Please apply. >> > Possible but why, when these are numbers which are easier to read than > strings? that's true, and I don't feel too strongly about this, but currently some locales are using numbers and some are using Unicode code points so there's a bit of inconsistency, also it's not that hard to read these once one sees that e.g. 12 becomes "<U0031><U0032>" i.e. only the last digit matters. Thanks,
On 09 Jun 2015 13:12, Marko Myllynen wrote: > On 2015-06-09 10:11, Ondřej Bílka wrote: > > On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: > >> make country_isbn definitions consistent across locales by using > >> Unicode code points not numerals everywhere. The code in > >> locale/categories.def and locale/programs/ld-address.c already > >> handles strings. > >> > >> Please apply. > > > > Possible but why, when these are numbers which are easier to read than > > strings? > > that's true, and I don't feel too strongly about this, but currently > some locales are using numbers and some are using Unicode code points so > there's a bit of inconsistency, also it's not that hard to read these > once one sees that e.g. 12 becomes "<U0031><U0032>" i.e. only the last > digit matters. i find many of the U markers pointlessly obscure, especially when they're used for characters that are in the ASCII standard. if we're standardizing on UTF8 encodings in general, why can't we convert these files as well ? keep in mind that i'm ignorant of the tooling around these files ;). -mike
On Tue, Jul 21, 2015 at 04:18:40AM -0400, Mike Frysinger wrote: > On 09 Jun 2015 13:12, Marko Myllynen wrote: > > On 2015-06-09 10:11, Ond??ej Bílka wrote: > > > On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: > > >> make country_isbn definitions consistent across locales by using > > >> Unicode code points not numerals everywhere. The code in > > >> locale/categories.def and locale/programs/ld-address.c already > > >> handles strings. > > >> > > >> Please apply. > > > > > > Possible but why, when these are numbers which are easier to read than > > > strings? > > > > that's true, and I don't feel too strongly about this, but currently > > some locales are using numbers and some are using Unicode code points so > > there's a bit of inconsistency, also it's not that hard to read these > > once one sees that e.g. 12 becomes "<U0031><U0032>" i.e. only the last > > digit matters. > > i find many of the U markers pointlessly obscure, especially when they're used > for characters that are in the ASCII standard. if we're standardizing on UTF8 > encodings in general, why can't we convert these files as well ? keep in mind > that i'm ignorant of the tooling around these files ;). The use of Unicode points helps making the locales portable, eg. when crosscompiling for different architectures, including embedded systems, ebcdic systems, utf-16 systems and utf8 systems, when you are on a different host platform. For the ASCII characters one could use the symbolic character name from the POSIX locale. They are much more readable than the Unicode code points, IMHO. Best regards Keld
On 07/21/2015 10:40 AM, keld@keldix.com wrote: > The use of Unicode points helps making the locales portable, eg. > when crosscompiling for different architectures, including embedded systems, ebcdic > systems, utf-16 systems and utf8 systems, when you are on a different host platform. Is this really a relevant use case? Cross-compiling glibc to an EBCDIC system?
On Tue, Jul 21, 2015 at 10:54:21AM +0200, Florian Weimer wrote: > On 07/21/2015 10:40 AM, keld@keldix.com wrote: > > > The use of Unicode points helps making the locales portable, eg. > > when crosscompiling for different architectures, including embedded systems, ebcdic > > systems, utf-16 systems and utf8 systems, when you are on a different host platform. > > Is this really a relevant use case? Cross-compiling glibc to an EBCDIC > system? I also mentioned other cases, which may be more relevant.. Best regards keld
On 07/21/2015 11:02 AM, keld@keldix.com wrote: > On Tue, Jul 21, 2015 at 10:54:21AM +0200, Florian Weimer wrote: >> On 07/21/2015 10:40 AM, keld@keldix.com wrote: >> >>> The use of Unicode points helps making the locales portable, eg. >>> when crosscompiling for different architectures, including embedded systems, ebcdic >>> systems, utf-16 systems and utf8 systems, when you are on a different host platform. >> >> Is this really a relevant use case? Cross-compiling glibc to an EBCDIC >> system? > > I also mentioned other cases, which may be more relevant.. I can't see how. Unless someone maintains this code and processing pipeline, it's not going to work with the current code. Is anyone doing it? I doubt it. We don't use trigraphs in C sources, so I really don't get why we have to use an equivalent construct in the locale definitions. Unless the goal is to raise the bar for new contributors for some reason, but I think the project has long walked away from that approach.
On 21 Jul 2015 10:40, keld@keldix.com wrote: > On Tue, Jul 21, 2015 at 04:18:40AM -0400, Mike Frysinger wrote: > > On 09 Jun 2015 13:12, Marko Myllynen wrote: > > > On 2015-06-09 10:11, Ond??ej Bílka wrote: > > > > On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: > > > >> make country_isbn definitions consistent across locales by using > > > >> Unicode code points not numerals everywhere. The code in > > > >> locale/categories.def and locale/programs/ld-address.c already > > > >> handles strings. > > > >> > > > >> Please apply. > > > > > > > > Possible but why, when these are numbers which are easier to read than > > > > strings? > > > > > > that's true, and I don't feel too strongly about this, but currently > > > some locales are using numbers and some are using Unicode code points so > > > there's a bit of inconsistency, also it's not that hard to read these > > > once one sees that e.g. 12 becomes "<U0031><U0032>" i.e. only the last > > > digit matters. > > > > i find many of the U markers pointlessly obscure, especially when they're used > > for characters that are in the ASCII standard. if we're standardizing on UTF8 > > encodings in general, why can't we convert these files as well ? keep in mind > > that i'm ignorant of the tooling around these files ;). > > The use of Unicode points helps making the locales portable, eg. > when crosscompiling for different architectures, including embedded systems, ebcdic > systems, utf-16 systems and utf8 systems, when you are on a different host platform. i'm referring to the tools we use -- either inside of the source repo (i.e. ones we wrote/maintain), or external ones that operate on our files directly (i.e. gcc). what actual problems do you see here ? vague references like "cross-compiling is magic" aren't really that interesting. keep in mind we already use (and agreed to standardize on) UTF8 in things like *.c and *.h and ChangeLog and READMEs and info pages. -mike
On Tue, Jul 21, 2015 at 05:22:17AM -0400, Mike Frysinger wrote: > On 21 Jul 2015 10:40, keld@keldix.com wrote: > > On Tue, Jul 21, 2015 at 04:18:40AM -0400, Mike Frysinger wrote: > > > On 09 Jun 2015 13:12, Marko Myllynen wrote: > > > > On 2015-06-09 10:11, Ond??ej Bílka wrote: > > > > > On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: > > > > >> make country_isbn definitions consistent across locales by using > > > > >> Unicode code points not numerals everywhere. The code in > > > > >> locale/categories.def and locale/programs/ld-address.c already > > > > >> handles strings. > > > > >> > > > > >> Please apply. > > > > > > > > > > Possible but why, when these are numbers which are easier to read than > > > > > strings? > > > > > > > > that's true, and I don't feel too strongly about this, but currently > > > > some locales are using numbers and some are using Unicode code points so > > > > there's a bit of inconsistency, also it's not that hard to read these > > > > once one sees that e.g. 12 becomes "<U0031><U0032>" i.e. only the last > > > > digit matters. > > > > > > i find many of the U markers pointlessly obscure, especially when they're used > > > for characters that are in the ASCII standard. if we're standardizing on UTF8 > > > encodings in general, why can't we convert these files as well ? keep in mind > > > that i'm ignorant of the tooling around these files ;). > > > > The use of Unicode points helps making the locales portable, eg. > > when crosscompiling for different architectures, including embedded systems, ebcdic > > systems, utf-16 systems and utf8 systems, when you are on a different host platform. > > i'm referring to the tools we use -- either inside of the source repo > (i.e. ones we wrote/maintain), or external ones that operate on our > files directly (i.e. gcc). what actual problems do you see here ? > vague references like "cross-compiling is magic" aren't really that > interesting. It would mean that you cannot use the locale sources for crosscompiling when using some different character sets on the hosting and the target machines. Eg if you are making embedded systems on IOS or Windows or other utf16 machines for an utf8 target, or making stuff for android. Or the other way round if you are omn an utf8 host and generate locales for a utf16 target such as a utf16 embedded system or an iphone or ipad system. I suggest you use the POSIX character names instead, eg 12 becomes "<1><2>" > keep in mind we already use (and agreed to standardize on) UTF8 in > things like *.c and *.h and ChangeLog and READMEs and info pages. That is not related. Of cause we have our sources in a specific encoding, and when sources are moved between platforms (aka portability) the sources text may be converted from one representation to another, which happens eg. when you move our sources to an IOS or Windows platform. Best regards Keld
On Tue, Jul 21, 2015 at 01:58:52PM +0200, Keld Simonsen wrote:
> I suggest you use the POSIX character names instead, eg 12 becomes "<1><2>"
I am sorry, in POSIX this would be "<one><two>" - I would then suggest that you
use the character naming of ISO TR 14652 or ISO TR 30112, which would be the
abovementioned "<1><2>" -or the POSIX names, but the 14652/30112 names are more readable, IMHO.
Best regards
keld
On Tue, 21 Jul 2015, Keld Simonsen wrote: > It would mean that you cannot use the locale sources for crosscompiling > when using some different character sets on the hosting and the target > machines. Eg if you are making embedded systems on IOS or Windows or > other utf16 machines for an utf8 target, or making stuff for android. Or > the other way round if you are omn an utf8 host and generate locales for > a utf16 target such as a utf16 embedded system or an iphone or ipad > system. On the build system on which glibc is built, we can always assume that the glibc sources are the exact sequences of octets provided by the glibc project, not converted into another character set and without any conversions of line endings. Furthermore, on any system using glibc and executing tools such as localedef with the installed locale source files, it can be assumed that those source files are the files shipped with glibc, not those files after conversion into another character set. Use of glibc source files after conversion into another character set is outside the scope of the glibc project - glibc is not expected to build with such converted source files. Now, it's true that the installed localedef utility should be usable in locale A to generate locale B, for any pair (A, B) of installed locales - rather than only being able to generate locales as part of the glibc build / install process. If localedef interprets locale sources in the character set of the locale in which it runs, that may mean the installed locale sources do need to be in ASCII. How does localedef determine the character set in which to interpret the textual locale source files?
On Wed, Jul 22, 2015 at 05:25:04PM +0000, Joseph Myers wrote: > On Tue, 21 Jul 2015, Keld Simonsen wrote: > > > It would mean that you cannot use the locale sources for crosscompiling > > when using some different character sets on the hosting and the target > > machines. Eg if you are making embedded systems on IOS or Windows or > > other utf16 machines for an utf8 target, or making stuff for android. Or > > the other way round if you are omn an utf8 host and generate locales for > > a utf16 target such as a utf16 embedded system or an iphone or ipad > > system. > > On the build system on which glibc is built, we can always assume that the > glibc sources are the exact sequences of octets provided by the glibc > project, not converted into another character set and without any > conversions of line endings. Furthermore, on any system using glibc and > executing tools such as localedef with the installed locale source files, > it can be assumed that those source files are the files shipped with > glibc, not those files after conversion into another character set. Use > of glibc source files after conversion into another character set is > outside the scope of the glibc project - glibc is not expected to build > with such converted source files. Sounds strange. glibc is the library for the GNU C language. Standard ISO C is coded character set independent, as is also POSIX. Why would the glibc project not follow ISO C and POSIX design goals? Why would glibc exclude itself from Apple and Microsoft (utf16) and non-utf8 Linux and UNIX systems? Maybe we should clone glibc to make it available on other platforms than those using utf8. Or maybe you are not correct. I have not been watching the glibc project close enough to tell. > Now, it's true that the installed localedef utility should be usable in > locale A to generate locale B, for any pair (A, B) of installed locales - > rather than only being able to generate locales as part of the glibc build > / install process. If localedef interprets locale sources in the > character set of the locale in which it runs, that may mean the installed > locale sources do need to be in ASCII. How does localedef determine the > character set in which to interpret the textual locale source files? Yes, that is why we use UCS symbolic code points. I would then rather to be fully consistent use UCS symbolic code points all the way thru a locale source, it is a bit more cumbersome, but I would rather be consistent. And it would facilitate the crosscompiling I wrote about. I don't think there is a mix of locales where it matters on Linux boxes. Oh well, some thinkable scenarios: Apple or Windosw users on a linux box, linux users on apple or Windows boxes, Some mix with EBCDIC - more unlikely, but still thinkable is a big mainfame and number cruncher environment, the mainframe being IBM mainframe running VM/CMS and the number cruncher being a linux supercomputer, eg in a financial institution. Keld
On Wed, 22 Jul 2015, Keld Simonsen wrote: > > On the build system on which glibc is built, we can always assume that the > > glibc sources are the exact sequences of octets provided by the glibc > > project, not converted into another character set and without any > > conversions of line endings. Furthermore, on any system using glibc and > > executing tools such as localedef with the installed locale source files, > > it can be assumed that those source files are the files shipped with > > glibc, not those files after conversion into another character set. Use > > of glibc source files after conversion into another character set is > > outside the scope of the glibc project - glibc is not expected to build > > with such converted source files. > > Sounds strange. glibc is the library for the GNU C language. Standard No it's not. It's the C library for the GNU system. glibc has a range of requirements, including ELF, TLS, an MMU, two's complement integers, 32-bit int, 32-bit or 64-bit long, 32-bit UTF-32 wchar_t, IEEE binary32 float, IEEE binary64 double, various GNU tools present on the build system as documented in install.texi, .... > ISO C is coded character set independent, as is also POSIX. Why would > the glibc project not follow ISO C and POSIX design goals? Why would Because glibc makes particular implementation choices in areas that are implementation-defined. It's an implementation, not a meta-implementation that tries to cover the range of permitted implementation choices. Meta-implementations (at least of the language part of ISO C) exist, but they exist in the field of formal systems used to reason about C programs. > glibc exclude itself from Apple and Microsoft (utf16) and non-utf8 Linux > and UNIX systems? It's about 15-20 years since glibc was usable as a replacement C library for systems with an existing native non-free C library. Those systems are not relevant to glibc nowadays (Apple and Microsoft systems fail the basic requirement of using ELF, which is assumed all over glibc). UTF-16 is supported in iconv (only), just like EBCDIC. Non-UTF-8 locales are supported, but deprecated (new non-UTF-8 locales should not be added, and any existing non-UTF-8 locales should have a UTF-8 counterpart), and to be usable in a POSIX-compliant way must have a character set that includes ASCII. Given sufficiently many GNU tools built on a non-GNU build system, it should be possible to cross-compile glibc there - but localedef itself is only ever linked against glibc and run on a system using glibc (the cross-localedef functionality checked in to glibc is limited to allowing one glibc system to generate locales for another system with the same glibc version but a different endianness). > > Now, it's true that the installed localedef utility should be usable in > > locale A to generate locale B, for any pair (A, B) of installed locales - > > rather than only being able to generate locales as part of the glibc build > > / install process. If localedef interprets locale sources in the > > character set of the locale in which it runs, that may mean the installed > > locale sources do need to be in ASCII. How does localedef determine the > > character set in which to interpret the textual locale source files? > > Yes, that is why we use UCS symbolic code points. I would then rather to be "Yes" does not answer my question about how localedef determines the character set of its input. > fully consistent use UCS symbolic code points all the way thru a locale > source, it is a bit more cumbersome, but I would rather be consistent. I'd rather have some extension to allow a locale source file to declare that it is in UTF-8, and then use UTF-8 throughout except for control characters or combining characters used in isolation.
On Wed, Jul 22, 2015 at 08:02:23PM +0000, Joseph Myers wrote: > > > Now, it's true that the installed localedef utility should be usable in > > > locale A to generate locale B, for any pair (A, B) of installed locales - > > > rather than only being able to generate locales as part of the glibc build > > > / install process. If localedef interprets locale sources in the > > > character set of the locale in which it runs, that may mean the installed > > > locale sources do need to be in ASCII. How does localedef determine the > > > character set in which to interpret the textual locale source files? > > > > Yes, that is why we use UCS symbolic code points. I would then rather to be > > "Yes" does not answer my question about how localedef determines the > character set of its input. > > > fully consistent use UCS symbolic code points all the way thru a locale > > source, it is a bit more cumbersome, but I would rather be consistent. > > I'd rather have some extension to allow a locale source file to declare > that it is in UTF-8, and then use UTF-8 throughout except for control > characters or combining characters used in isolation. > I second that. It would be technically easy to do, so its mostly matter of selecting proper interface. If we require some utf8 locale (if we decide for C.UTF8 then use it otherwise for example en_US. Then it would be matter of selecting different locale on files marked say by having UTF8 in first line. Sample implementation would be: fgets (first_line, 5, locale); if (!memcmp (first_line, "UTF8", 4)) setlocale(LC_ALL,"en_US.UTF8"); else /* unget first line. */
On 07/23/2015 06:27 PM, Ondřej Bílka wrote: >> I'd rather have some extension to allow a locale source file to declare >> that it is in UTF-8, and then use UTF-8 throughout except for control >> characters or combining characters used in isolation. >> > I second that. It would be technically easy to do, so its mostly matter > of selecting proper interface. If we require some utf8 locale (if we > decide for C.UTF8 then use it otherwise for example en_US. > > Then it would be matter of selecting different locale on files marked > say by having UTF8 in first line. Sample implementation would be: > > fgets (first_line, 5, locale); > if (!memcmp (first_line, "UTF8", 4)) > setlocale(LC_ALL,"en_US.UTF8"); > else > /* unget first line. */ > I agree with Joseph's position here. Further to that, my primary goal is to make contribution for these files easier. I have no interest in the abstract cases that are not being supported by anyone at the present moment. Cheers, Carlos.
On 06/09/2015 03:11 AM, Ondřej Bílka wrote: > On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: >> Hi, >> >> make country_isbn definitions consistent across locales by using >> Unicode code points not numerals everywhere. The code in >> locale/categories.def and locale/programs/ld-address.c already >> handles strings. >> >> Please apply. >> > Possible but why, when these are numbers which are easier to read than > strings? > I agree with Ondrej. Why? The question we should all be asking ourselves here is: "What can we do to make it *easier* to maintain these files?" Making everyone write in Unicode code points is not easier. Joseph, Ondrej, and myself agree that we should find a way to just make these files UTF-8. I expect that a precondition is going to be to add an unremovable C.UTF-8 locale, which I think is important. Cheers, Carlos.
Carlos O'Donell wrote: > Joseph, Ondrej, and myself agree that we should find a way to just make > these files UTF-8. I expect that a precondition is going to be to add > an unremovable C.UTF-8 locale, which I think is important. I also like the idea of having these files be UTF-8. Why is an unremovable C.UTF-8 locale a precondition, though? We should be able to assume a properly-installed localedef and a minimum set of locales for running development tools like localedef. The minimum set could include en_US.UTF-8.
On 07/23/2015 10:16 PM, Paul Eggert wrote: > Carlos O'Donell wrote: >> Joseph, Ondrej, and myself agree that we should find a way to just >> make these files UTF-8. I expect that a precondition is going to be >> to add an unremovable C.UTF-8 locale, which I think is important. > > I also like the idea of having these files be UTF-8. > > Why is an unremovable C.UTF-8 locale a precondition, though? We > should be able to assume a properly-installed localedef and a minimum > set of locales for running development tools like localedef. The > minimum set could include en_US.UTF-8. Agreed. I should not have said "precondition" when I really meant "nice to have" since it simplifies some of the error handling if you know you have a fallback UTF-8 locale you can use. c.
On Thu, Jul 23, 2015 at 07:16:45PM -0700, Paul Eggert wrote: > Carlos O'Donell wrote: > >Joseph, Ondrej, and myself agree that we should find a way to just make > >these files UTF-8. I expect that a precondition is going to be to add > >an unremovable C.UTF-8 locale, which I think is important. > > I also like the idea of having these files be UTF-8. > > Why is an unremovable C.UTF-8 locale a precondition, though? We > should be able to assume a properly-installed localedef and a > minimum set of locales for running development tools like localedef. > The minimum set could include en_US.UTF-8. You are then going to deviate from POSUX and ISO TR 30112 practice. As you may know, I am involved in POSIX and 30112 standardization, and I have tried to align 30112 with glibc practice. If you are deviating from POSIX guidelines, I have trouble unifying the two goals. I also would like to have the 30112 standard implemented more broadly than just glibc. Rumours has it that this is done somewhere else. I would like to then use the glibc locales for the ISO 15897 registry, and give maximum usability to those - that is the locales should be fulle charset independent. Also SC35 is looking at revising 30112 and in that revision we would like to also update the character info to a new revision of 10646, aligned to what is used in the sorting standard ISO 14651, and I would like to use the glibc LC_CTYPE for that. I agree with that it should be easier to create locales. One could do that with a GUI that helped create and proofread and test locales. Best regards Keld
On Thu, Jul 23, 2015 at 07:16:45PM -0700, Paul Eggert wrote: > Carlos O'Donell wrote: > >Joseph, Ondrej, and myself agree that we should find a way to just make > >these files UTF-8. I expect that a precondition is going to be to add > >an unremovable C.UTF-8 locale, which I think is important. > > I also like the idea of having these files be UTF-8. > > Why is an unremovable C.UTF-8 locale a precondition, though? We > should be able to assume a properly-installed localedef and a > minimum set of locales for running development tools like localedef. > The minimum set could include en_US.UTF-8. I would recommend using the i18n locale - that is the purpose of i18n locale to be the locale to buid all the other locales from. Best regards Keld
On Wed, Jul 22, 2015 at 08:02:23PM +0000, Joseph Myers wrote: > On Wed, 22 Jul 2015, Keld Simonsen wrote: > > > > On the build system on which glibc is built, we can always assume that the > > > glibc sources are the exact sequences of octets provided by the glibc > > > project, not converted into another character set and without any > > > conversions of line endings. Furthermore, on any system using glibc and > > > executing tools such as localedef with the installed locale source files, > > > it can be assumed that those source files are the files shipped with > > > glibc, not those files after conversion into another character set. Use > > > of glibc source files after conversion into another character set is > > > outside the scope of the glibc project - glibc is not expected to build > > > with such converted source files. > > > > Sounds strange. glibc is the library for the GNU C language. Standard > > No it's not. It's the C library for the GNU system. glibc has a range of > requirements, including ELF, TLS, an MMU, two's complement integers, > 32-bit int, 32-bit or 64-bit long, 32-bit UTF-32 wchar_t, IEEE binary32 > float, IEEE binary64 double, various GNU tools present on the build system > as documented in install.texi, .... Yes, understood, but I don't think any of these requirements influenses the locales part. > > ISO C is coded character set independent, as is also POSIX. Why would > > the glibc project not follow ISO C and POSIX design goals? Why would > > Because glibc makes particular implementation choices in areas that are > implementation-defined. It's an implementation, not a meta-implementation > that tries to cover the range of permitted implementation choices. > Meta-implementations (at least of the language part of ISO C) exist, but > they exist in the field of formal systems used to reason about C programs. I am also active in C standardization. I think it is a good goal to not deviate and restrict an implementalton more than necessary. And at least not restrict it further than already implemented. That would lead to a loss of functionality. > > glibc exclude itself from Apple and Microsoft (utf16) and non-utf8 Linux > > and UNIX systems? > > It's about 15-20 years since glibc was usable as a replacement C library > for systems with an existing native non-free C library. Those systems are > not relevant to glibc nowadays (Apple and Microsoft systems fail the basic > requirement of using ELF, which is assumed all over glibc). UTF-16 is > supported in iconv (only), just like EBCDIC. Non-UTF-8 locales are > supported, but deprecated (new non-UTF-8 locales should not be added, and > any existing non-UTF-8 locales should have a UTF-8 counterpart), and to be > usable in a POSIX-compliant way must have a character set that includes > ASCII. I thought cygwin was a GNU implementation for windows, and that it also implemented glibc. I now understand that the cygwin libc is different from glibc. But how different? Do they use glibc locales, or are they able to? I would like the glibc locales to also be usable in other libc environments. Most of all because they IMHO are the most comprehensive set of locales available. So that would benefit users also outside glibc. Why not have this in mind also for our project? > Given sufficiently many GNU tools built on a non-GNU build system, it > should be possible to cross-compile glibc there - but localedef itself is > only ever linked against glibc and run on a system using glibc (the > cross-localedef functionality checked in to glibc is limited to allowing > one glibc system to generate locales for another system with the same > glibc version but a different endianness). > > > > Now, it's true that the installed localedef utility should be usable in > > > locale A to generate locale B, for any pair (A, B) of installed locales - > > > rather than only being able to generate locales as part of the glibc build > > > / install process. If localedef interprets locale sources in the > > > character set of the locale in which it runs, that may mean the installed > > > locale sources do need to be in ASCII. How does localedef determine the > > > character set in which to interpret the textual locale source files? > > > > Yes, that is why we use UCS symbolic code points. I would then rather to be > > "Yes" does not answer my question about how localedef determines the > character set of its input. My understanding is that the charset of the source is the charset of the locale of the environment that localedef is running in. If the locale then is ASCII only then there is no need for conversion of it - except for conversion into UTF16. Restricting the source further to invariant-ASCII also makes the source portable to EBCDIC systems. Unicode restricts its sources to ASCII, possibly also for this reason. Unicode do not publish their data in Unicode. > > fully consistent use UCS symbolic code points all the way thru a locale > > source, it is a bit more cumbersome, but I would rather be consistent. > > I'd rather have some extension to allow a locale source file to declare > that it is in UTF-8, and then use UTF-8 throughout except for control > characters or combining characters used in isolation. That would make it difficult to maintain in environments that is not using utf8. Using ASCII only would make the locales maintainable on all systems. Best regards Keld
On Fri, 24 Jul 2015, Keld Simonsen wrote: > > Because glibc makes particular implementation choices in areas that are > > implementation-defined. It's an implementation, not a meta-implementation > > that tries to cover the range of permitted implementation choices. > > Meta-implementations (at least of the language part of ISO C) exist, but > > they exist in the field of formal systems used to reason about C programs. > > I am also active in C standardization. I think it is a good goal to not > deviate and restrict an implementalton more than necessary. And at least > not restrict it further than already implemented. That would lead to a loss > of functionality. The point of things being implementation-defined is to allow implementations flexibility in what is convenient for those implementations. glibc duly uses that flexibility to adopt particular choices for implementation-defined behavior (some depending on the architecture, but most being globally fixed for all glibc configurations, so that all glibc code is free to rely on those choices). > I thought cygwin was a GNU implementation for windows, and that it also > implemented glibc. I now understand that the cygwin libc is different from > glibc. But how different? Do they use glibc locales, or are they able to? I don't think there's any use of glibc locales by newlib as Cygwin's libc. > I would like the glibc locales to also be usable in other libc environments. > Most of all because they IMHO are the most comprehensive set of locales available. > So that would benefit users also outside glibc. Why not have this in mind > also for our project? I think CLDR is more likely to be the most comprehensive set of locales (it certainly claims to be "the largest and most extensive standard repository of locale data available"), and unlike glibc's locales is intended for wider use. Even if we did want wider use for glibc's locales (beyond use by glibc's locale-dependent functions after having been compiled into binary form by glibc's localedef program from the same version of glibc) I think we should still say: UTF-8 is the way of the present and future, other multibyte character sets are legacy. And, just as we require a range of GNU tools to build glibc, so we can rely on features of one part of the GNU system when working on another part, so we should require GNU localedef to build glibc's locales. > > I'd rather have some extension to allow a locale source file to declare > > that it is in UTF-8, and then use UTF-8 throughout except for control > > characters or combining characters used in isolation. > > That would make it difficult to maintain in environments that is not using utf8. It would make the locales easier to maintain for people using UTF-8, the number of which (among people concerned with i18n) can be presumed to be much greater than the number using legacy character sets.
Keld Simonsen wrote: > it should be easier to create locales. > One could do that with a GUI that helped create and proofread and test locales. I'm not aware of any such GUI, and even if one existed people would have to be trained to use it. In contrast, we already have GUIs (e.g., Emacs) that people already know how to use and that work reasonably well with UTF-8 localedef sources. Although the other goals you mention are laudable ones, surely they could be achieved by an automatic transformation of UTF-8 localedef sources into a less-readable equivalent with angle brackets, an equivalent that could be processed even by hypothetical tools operating in legacy multibyte locales. This shouldn't require a fancy GUI; it should be a relatively simple batch program. Any engineering effort in this area would likely need this kind of transformation anyway, and any software developers in this specialized area should be able to take on this relatively minor extra task.
On Fri, Jul 24, 2015 at 09:50:46AM -0700, Paul Eggert wrote: > Keld Simonsen wrote: > >it should be easier to create locales. > >One could do that with a GUI that helped create and proofread and test > >locales. > > I'm not aware of any such GUI, and even if one existed people would have to > be trained to use it. In contrast, we already have GUIs (e.g., Emacs) that > people already know how to use and that work reasonably well with UTF-8 > localedef sources. > > Although the other goals you mention are laudable ones, surely they could > be achieved by an automatic transformation of UTF-8 localedef sources into > a less-readable equivalent with angle brackets, an equivalent that could be > processed even by hypothetical tools operating in legacy multibyte locales. > This shouldn't require a fancy GUI; it should be a relatively simple batch > program. Any engineering effort in this area would likely need this kind > of transformation anyway, and any software developers in this specialized > area should be able to take on this relatively minor extra task. We could have a utility to do that, and probably there was one developed when Ulrich converted the mnemonic style to UCS codepoints. But maybe that is lost. Or it could be part of the localedef utility, given that localedef understands the full syntax of locales, then a conversion option to and from different charsets and symbolic representations could be done, with some better chances of being maintained and updated for new features, and not lost. I was also thinking of testing, how would a date be output with this date format? I think there may be someting like that lying around, eg for KDE localization, which I think is based on some other data and formats than glibc locales, but it is a much bigger work than just doing some conversion of characters. Keld
On Fri, Jul 24, 2015 at 03:11:15PM +0000, Joseph Myers wrote: > On Fri, 24 Jul 2015, Keld Simonsen wrote: > > > > Because glibc makes particular implementation choices in areas that are > > > implementation-defined. It's an implementation, not a meta-implementation > > > that tries to cover the range of permitted implementation choices. > > > Meta-implementations (at least of the language part of ISO C) exist, but > > > they exist in the field of formal systems used to reason about C programs. > > > > I am also active in C standardization. I think it is a good goal to not > > deviate and restrict an implementalton more than necessary. And at least > > not restrict it further than already implemented. That would lead to a loss > > of functionality. > > The point of things being implementation-defined is to allow > implementations flexibility in what is convenient for those > implementations. glibc duly uses that flexibility to adopt particular > choices for implementation-defined behavior (some depending on the > architecture, but most being globally fixed for all glibc configurations, > so that all glibc code is free to rely on those choices). Yes, of cause implementation defined allowance is to be used. I then have another hat on, as I am involved in writing the standards. I have to have a generic point of view, and also from the users point of view implementation defined items are no good for portability, so you cannot be sure of your independence. You are bound to the implementation of which you used the implementation defined specs. I don't know about the goals of the glibc project, but there are a number of possibilities to get out to a bigger audience. Actually the locales are mostly used for end user apps, and glibc has a end user audience, that could be made bigger. Eg both the Apple end user community and the Android user community are way bigger than the glibc end user community. And they could be a target for at least glibc locales. I believe both Apple and Google use POSIX derived localization, including the locale model. I, at least as the editor of ISO TR 30122, need to have those communities in sight. I have been cooperating with the glibc community, especially Ulrich, but also with FSF as I have donated many locale and charmap specs to them. And I am usig glibc i18n locale as the locale source in the standard. So I would welcome if glibc adhered to the design goals of character set independence, that both POSIX and 30112 have, a design goal also shared by Unicode Inc. > > I thought cygwin was a GNU implementation for windows, and that it also > > implemented glibc. I now understand that the cygwin libc is different from > > glibc. But how different? Do they use glibc locales, or are they able to? > > I don't think there's any use of glibc locales by newlib as Cygwin's libc. I believe if that is true, then they use something based on my earlier locales, that I released to X/Open many years ago. Those were widely used in the industry, as they were the only and most comprehensive locales around, freely available. They also were the basis for many of the glibc locales. I think there is a potential for glibc locales to take that position today. > > I would like the glibc locales to also be usable in other libc environments. > > Most of all because they IMHO are the most comprehensive set of locales available. > > So that would benefit users also outside glibc. Why not have this in mind > > also for our project? > > I think CLDR is more likely to be the most comprehensive set of locales > (it certainly claims to be "the largest and most extensive standard > repository of locale data available"), and unlike glibc's locales is > intended for wider use. Even if we did want wider use for glibc's locales > (beyond use by glibc's locale-dependent functions after having been > compiled into binary form by glibc's localedef program from the same > version of glibc) I think we should still say: UTF-8 is the way of the > present and future, other multibyte character sets are legacy. And, just > as we require a range of GNU tools to build glibc, so we can rely on > features of one part of the GNU system when working on another part, so we > should require GNU localedef to build glibc's locales. CLDR is not POSIX like locales, they are in XML. Also I believe they are not in the same quality as the glibc locales. I for one had an experience with Unicode that they would not take my specs, even if I represented Danish Standards. The result was that their Danish spec did not adhere to Danish Standards and to Danish official orthography rules. I then gave up contact with them. > > > I'd rather have some extension to allow a locale source file to declare > > > that it is in UTF-8, and then use UTF-8 throughout except for control > > > characters or combining characters used in isolation. > > > > That would make it difficult to maintain in environments that is not using utf8. > > It would make the locales easier to maintain for people using UTF-8, the > number of which (among people concerned with i18n) can be presumed to be > much greater than the number using legacy character sets. Yes, but you are excluding some communities. So: easier for the majority, impossible for a number of diverse minorities, which actually has the potential to be much larger than the current user base. Best regards Keld
On Sat, 25 Jul 2015, Keld Simonsen wrote: > > It would make the locales easier to maintain for people using UTF-8, the > > number of which (among people concerned with i18n) can be presumed to be > > much greater than the number using legacy character sets. > > Yes, but you are excluding some communities. So: easier for the majority, > impossible for a number of diverse minorities, which actually has the potential > to be much larger than the current user base. I think it's appropriate to say: if you want to use the glibc locales outside of glibc, you are responsible for maintaining the tools required to do so (e.g. for converting the encoding of locale source files). I don't think such tools for conversion of encodings would be hard to write or need much maintenance when written (and in one direction - converting the ASCII files to UTF-8 - they might even be written by the glibc project as part of the initial conversion work).
Hi, On 2015-07-24 03:23, Carlos O'Donell wrote: > On 06/09/2015 03:11 AM, Ondřej Bílka wrote: >> On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: >>> >>> make country_isbn definitions consistent across locales by using >>> Unicode code points not numerals everywhere. The code in >>> locale/categories.def and locale/programs/ld-address.c already >>> handles strings. >>> >> Possible but why, when these are numbers which are easier to read than >> strings? > > I agree with Ondrej. Why? see above, for consistency. > The question we should all be asking ourselves here is: > > "What can we do to make it *easier* to maintain these files?" Currently the definitions of this particular key across locales are inconsistent and it doesn't make things easier as one can get confused which form should be used for country_isbn. > Making everyone write in Unicode code points is not easier. The patch was only about making one individual key consistent, it's not like this patch would add any additional generic burden. Thanks,
On Mon, Aug 10, 2015 at 01:31:30PM +0300, Marko Myllynen wrote: > Hi, > > On 2015-07-24 03:23, Carlos O'Donell wrote: > > On 06/09/2015 03:11 AM, Ond??ej Bílka wrote: > >> On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: > >>> > >>> make country_isbn definitions consistent across locales by using > >>> Unicode code points not numerals everywhere. The code in > >>> locale/categories.def and locale/programs/ld-address.c already > >>> handles strings. > >>> > >> Possible but why, when these are numbers which are easier to read than > >> strings? > > > > I agree with Ondrej. Why? > > see above, for consistency. > > > The question we should all be asking ourselves here is: > > > > "What can we do to make it *easier* to maintain these files?" > > Currently the definitions of this particular key across locales are > inconsistent and it doesn't make things easier as one can get confused > which form should be used for country_isbn. > > > Making everyone write in Unicode code points is not easier. > > The patch was only about making one individual key consistent, it's not > like this patch would add any additional generic burden. Why not continue to use the UCS codepoints, as we do for all other strings in locales. That would also add to consistency, and for portability (which I understand is not a goal amongst glibc developers - but anyway...) Best regards keld
Hi, On 2015-08-10 14:05, Keld Simonsen wrote: > On Mon, Aug 10, 2015 at 01:31:30PM +0300, Marko Myllynen wrote: >> On 2015-07-24 03:23, Carlos O'Donell wrote: >>> On 06/09/2015 03:11 AM, Ond??ej Bílka wrote: >>>> On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote: >>>>> >>>>> make country_isbn definitions consistent across locales by using >>>>> Unicode code points not numerals everywhere. The code in >>>>> locale/categories.def and locale/programs/ld-address.c already >>>>> handles strings. >>>>> >>>> Possible but why, when these are numbers which are easier to read than >>>> strings? >>> >>> I agree with Ondrej. Why? >> >> see above, for consistency. >> >>> The question we should all be asking ourselves here is: >>> >>> "What can we do to make it *easier* to maintain these files?" >> >> Currently the definitions of this particular key across locales are >> inconsistent and it doesn't make things easier as one can get confused >> which form should be used for country_isbn. >> >>> Making everyone write in Unicode code points is not easier. >> >> The patch was only about making one individual key consistent, it's not >> like this patch would add any additional generic burden. > > Why not continue to use the UCS codepoints, as we do for all other strings in locales. > That would also add to consistency, and for portability (which I > understand is not a goal amongst glibc developers - but anyway...) that is exactly what my patch proposal was doing, nothing more, nothing less: switch those locales using plain numbers for country_isbn to use Unicode code points for country_isbn to make things consistent across all locales. In the long term we could look for alternatives for creating and maintaining locales easier in general but in the short term I think the best solution is to keep things consistent. Thanks,
diff --git a/localedata/locales/af_ZA b/localedata/locales/af_ZA index 143ad75..29223d5 100644 --- a/localedata/locales/af_ZA +++ b/localedata/locales/af_ZA @@ -275,7 +275,7 @@ country_car "<U005A><U0041>" % ISO 2108 % http://www.isbn-international.org/html/prefix/prefa.htm -country_isbn 0 +country_isbn "<U0030>" % ISO 639 language abbreviations: % 639-1 2 letter, 639-2 3 letter terminology diff --git a/localedata/locales/ak_GH b/localedata/locales/ak_GH index 159acc8..0f5a667 100644 --- a/localedata/locales/ak_GH +++ b/localedata/locales/ak_GH @@ -195,7 +195,7 @@ country_ab2 "<U0047><U0048>" % GHA country_ab3 "<U0047><U0048><U0041>" country_num 288 -country_isbn 9964 +country_isbn "<U0039><U0039><U0036><U0034>" % Akan lang_name "<U0041><U006B><U0061><U006E>" % ak diff --git a/localedata/locales/bg_BG b/localedata/locales/bg_BG index 74e5ad4..4a62159 100644 --- a/localedata/locales/bg_BG +++ b/localedata/locales/bg_BG @@ -266,7 +266,7 @@ country_ab2 "<U0042><U0047>" country_ab3 "<U0042><U0047><U0052>" country_num 100 country_car "<U0042><U0047>" -country_isbn 954 +country_isbn "<U0039><U0035><U0034>" % български език lang_name "<U0431><U044A><U043B><U0433><U0430><U0440><U0441><U043A><U0438><U0020><U0435><U0437><U0438><U043A>" lang_ab "<U0062><U0067>" diff --git a/localedata/locales/cmn_TW b/localedata/locales/cmn_TW index a332659..01838ed 100644 --- a/localedata/locales/cmn_TW +++ b/localedata/locales/cmn_TW @@ -200,7 +200,7 @@ country_ab2 "<U0054><U0057>" % TWN country_ab3 "<U0054><U0057><U004E>" country_num 158 -country_isbn 957 +country_isbn "<U0039><U0035><U0037>" % 漢語官話 lang_name "<U6F22><U8A9E><U5B98><U8A71>" % cmn diff --git a/localedata/locales/cy_GB b/localedata/locales/cy_GB index 66298e0..31e1e89 100644 --- a/localedata/locales/cy_GB +++ b/localedata/locales/cy_GB @@ -40,7 +40,7 @@ country_name "<U0043><U0079><U006D><U0072><U0075>" country_ab2 "<U0047><U0042>" country_ab3 "<U0047><U0042><U0052>" country_num 826 -country_isbn 0 +country_isbn "<U0030>" country_car "<U0047><U0042>" lang_name "<U0043><U0079><U006D><U0072><U0061><U0065><U0067>" lang_ab "<U0063><U0079>" diff --git a/localedata/locales/de_DE b/localedata/locales/de_DE index e2704a7..26d83c8 100644 --- a/localedata/locales/de_DE +++ b/localedata/locales/de_DE @@ -193,7 +193,7 @@ country_ab2 "<U0044><U0045>" country_ab3 "<U0044><U0045><U0055>" country_num 276 country_car "<U0044>" -country_isbn 3 +country_isbn "<U0033>" % Deutsch lang_name "<U0044><U0065><U0075><U0074><U0073><U0063><U0068>" % de diff --git a/localedata/locales/en_NG b/localedata/locales/en_NG index 364b549..f6d4005 100644 --- a/localedata/locales/en_NG +++ b/localedata/locales/en_NG @@ -270,7 +270,7 @@ country_car "<U0057><U0041><U004E>" % ISO 2108 % http://www.isbn-international.org/ -country_isbn 978 +country_isbn "<U0039><U0037><U0038>" % ISO 639 language abbreviations: % 639-1 2 letter, 639-2 3 letter terminology diff --git a/localedata/locales/en_US b/localedata/locales/en_US index d79c228..08154bc 100644 --- a/localedata/locales/en_US +++ b/localedata/locales/en_US @@ -164,7 +164,7 @@ country_ab3 "<U0055><U0053><U0041>" country_num 840 % USA country_car "<U0055><U0053><U0041>" -country_isbn 0 +country_isbn "<U0030>" % English lang_name "<U0045><U006E><U0067><U006C><U0069><U0073><U0068>" % en diff --git a/localedata/locales/en_ZA b/localedata/locales/en_ZA index 294b0a3..263c718 100644 --- a/localedata/locales/en_ZA +++ b/localedata/locales/en_ZA @@ -338,7 +338,7 @@ country_car "<U005A><U0041>" % ISO 2108 % http://www.isbn-international.org/html/prefix/prefa.htm -country_isbn 0 +country_isbn "<U0030>" % ISO 639 language abbreviations: % 639-1 2 letter, 639-2 3 letter terminology diff --git a/localedata/locales/es_CR b/localedata/locales/es_CR index b5dec84..18b10f9 100644 --- a/localedata/locales/es_CR +++ b/localedata/locales/es_CR @@ -155,7 +155,7 @@ postal_fmt "<U0025><U0066><U0025><U004E><U0025><U0061><U0025><U004E>/ country_name "<U0043><U006F><U0073><U0074><U0061><U0020><U0052><U0069><U0063><U0061>" country_post "<U0043><U0052>" country_car "<U0043><U0052>" -country_isbn "9930,9977,9968" +country_isbn "<U0039><U0039><U0033><U0030><U002C><U0039><U0039><U0037><U0037><U002C><U0039><U0039><U0036><U0038>" country_ab2 "<U0043><U0052>" country_ab3 "<U0043><U0052><U0049>" country_num 188 diff --git a/localedata/locales/es_US b/localedata/locales/es_US index 6b808d5..357102e 100644 --- a/localedata/locales/es_US +++ b/localedata/locales/es_US @@ -208,7 +208,7 @@ country_ab2 "<U0055><U0053>" country_ab3 "<U0055><U0053><U0041>" country_num 840 country_car "<U0055><U0053><U0041>" -country_isbn 0 +country_isbn "<U0030>" % Español lang_name "<U0045><U0073><U0070><U0061><U00F1><U006F><U006C>" % es diff --git a/localedata/locales/fi_FI b/localedata/locales/fi_FI index e87878c..6ba91ba 100644 --- a/localedata/locales/fi_FI +++ b/localedata/locales/fi_FI @@ -253,7 +253,7 @@ country_num 246 country_name "<U0053><U0075><U006F><U006D><U0069>" country_post "<U0046><U0049>" country_car "<U0046><U0049><U004E>" -country_isbn 952 +country_isbn "<U0039><U0035><U0032>" % suomi lang_name "<U0073><U0075><U006F><U006D><U0069>" lang_ab "<U0066><U0069>" diff --git a/localedata/locales/fy_DE b/localedata/locales/fy_DE index 046d775..e68ed7d 100644 --- a/localedata/locales/fy_DE +++ b/localedata/locales/fy_DE @@ -48,7 +48,7 @@ country_ab3 "<U0044><U0045><U0055>" % D country_car "<U0044>" country_num 276 -country_isbn "3" +country_isbn "<U0033>" % FIXME country_name in Low Saxon ? % Frysk lang_name "<U0046><U0072><U0079><U0073><U006B>" diff --git a/localedata/locales/gd_GB b/localedata/locales/gd_GB index 41943f5..765f9df 100644 --- a/localedata/locales/gd_GB +++ b/localedata/locales/gd_GB @@ -148,7 +148,7 @@ country_ab3 "<U0047><U0042><U0052>" country_num 826 % GB country_car "<U0047><U0042>" -country_isbn 0 +country_isbn "<U0030>" % Gàidhlig lang_name "<U0047><U00E0><U0069><U0064><U0068><U006C><U0069><U0067>" % gd diff --git a/localedata/locales/ha_NG b/localedata/locales/ha_NG index 6ea1a88..c5d1f77 100644 --- a/localedata/locales/ha_NG +++ b/localedata/locales/ha_NG @@ -287,7 +287,7 @@ country_car "<U0057><U0041><U004E>" % ISO 2108 % http://www.isbn-international.org/ -country_isbn 978 +country_isbn "<U0039><U0037><U0038>" % ISO 639 language abbreviations: % 639-1 2 letter, 639-2 3 letter terminology diff --git a/localedata/locales/hak_TW b/localedata/locales/hak_TW index 454ebad..543206a 100644 --- a/localedata/locales/hak_TW +++ b/localedata/locales/hak_TW @@ -199,7 +199,7 @@ country_ab2 "<U0054><U0057>" % TWN country_ab3 "<U0054><U0057><U004E>" country_num 158 -country_isbn 957 +country_isbn "<U0039><U0035><U0037>" % 漢語客家語 lang_name "<U6F22><U8A9E><U5BA2><U5BB6><U8A9E>" % hak diff --git a/localedata/locales/hsb_DE b/localedata/locales/hsb_DE index db130fd..b177663 100644 --- a/localedata/locales/hsb_DE +++ b/localedata/locales/hsb_DE @@ -2212,7 +2212,7 @@ country_ab2 "<U0044><U0045>" country_ab3 "<U0044><U0045><U0055>" country_num 276 country_car "<U0044>" -country_isbn 3 +country_isbn "<U0033>" lang_name "<U0048><U006F><U0072><U006E><U006A><U006F><U0073><U0065>/ <U0072><U0062><U0161><U0107><U0069><U006E><U0061>" lang_ab "" diff --git a/localedata/locales/ht_HT b/localedata/locales/ht_HT index 66ae10b..8f12153 100644 --- a/localedata/locales/ht_HT +++ b/localedata/locales/ht_HT @@ -193,7 +193,7 @@ country_ab2 "<U0048><U0054>" % HTI country_ab3 "<U0048><U0054><U0049>" country_num 332 -country_isbn 99935 +country_isbn "<U0039><U0039><U0039><U0033><U0035>" % RH country_car "<U0052><U0048>" % diff --git a/localedata/locales/ia_FR b/localedata/locales/ia_FR index 722cc6e..64248c8 100644 --- a/localedata/locales/ia_FR +++ b/localedata/locales/ia_FR @@ -128,7 +128,7 @@ country_post "<U0046>" country_ab2 "<U0046><U0052>" country_ab3 "<U0046><U0052><U0041>" country_num 250 -country_isbn 2 +country_isbn "<U0032>" country_car "<U0046>" lang_name "<U0049><U006E><U0074><U0065><U0072><U006C><U0069><U006E><U0067><U0075><U0061>" diff --git a/localedata/locales/ig_NG b/localedata/locales/ig_NG index 8b1a48b..32f0f08 100644 --- a/localedata/locales/ig_NG +++ b/localedata/locales/ig_NG @@ -484,7 +484,7 @@ country_car "<U0057><U0041><U004E>" % ISO 2108 % http://www.isbn-international.org/ -country_isbn 978 +country_isbn "<U0039><U0037><U0038>" % ISO 639 language abbreviations: % 639-1 2 letter, 639-2 3 letter terminology diff --git a/localedata/locales/ka_GE b/localedata/locales/ka_GE index 459c467..ad47bae 100644 --- a/localedata/locales/ka_GE +++ b/localedata/locales/ka_GE @@ -45,7 +45,7 @@ country_ab3 "GEO" country_num 268 % GE country_car "<U0047><U0045>" -country_isbn "99928" +country_isbn "<U0039><U0039><U0039><U0032><U0038>" % ქართული lang_name "<U10E5><U10D0><U10E0><U10D7><U10E3><U10DA><U10D8>" % ka diff --git a/localedata/locales/ku_TR b/localedata/locales/ku_TR index d974bfb..44e2528 100644 --- a/localedata/locales/ku_TR +++ b/localedata/locales/ku_TR @@ -206,7 +206,7 @@ country_post "TR" country_ab2 "TR" country_ab3 "TUR" country_num 792 -country_isbn 975 +country_isbn "<U0039><U0037><U0035>" % TR country_car "<U0054><U0052>" % "kurdi" diff --git a/localedata/locales/lb_LU b/localedata/locales/lb_LU index a74e162..36cb98e 100644 --- a/localedata/locales/lb_LU +++ b/localedata/locales/lb_LU @@ -175,7 +175,7 @@ country_ab2 "<U004C><U0055>" country_ab3 "<U004C><U0055><U0058>" country_num 442 country_car "<U004C>" -country_isbn 2 +country_isbn "<U0032>" lang_name "<U004C><U00EB><U0074><U007A><U0065><U0062><U0075><U0065>/ <U0072><U0067><U0065><U0073><U0063><U0068>" lang_ab "<U006C><U0062>" diff --git a/localedata/locales/li_BE b/localedata/locales/li_BE index 5a89754..e917802 100644 --- a/localedata/locales/li_BE +++ b/localedata/locales/li_BE @@ -47,7 +47,7 @@ country_ab2 "<U0042><U0045>" country_ab3 "<U0042><U0045><U004C>" country_car "<U0042>" country_num 56 -%FIXME country_isbn "2" +country_isbn "<U0032>" % Lèmbörgs lang_name "<U004C><U00E8><U006D><U0062><U00F6><U0072><U0067><U0073>" lang_ab "<U006C><U0069>" diff --git a/localedata/locales/li_NL b/localedata/locales/li_NL index b07c4a4..b92acbf 100644 --- a/localedata/locales/li_NL +++ b/localedata/locales/li_NL @@ -47,7 +47,7 @@ country_ab2 "<U004E><U004C>" country_ab3 "<U004E><U004C><U0044>" country_car "<U004E><U004C>" country_num 528 -%FIXME country_isbn "2" +country_isbn "<U0033>" % Lèmbörgs lang_name "<U004C><U00E8><U006D><U0062><U00F6><U0072><U0067><U0073>" lang_ab "<U006C><U0069>" diff --git a/localedata/locales/lzh_TW b/localedata/locales/lzh_TW index 73b4897..0f26ecf 100644 --- a/localedata/locales/lzh_TW +++ b/localedata/locales/lzh_TW @@ -234,7 +234,7 @@ country_ab2 "<U0054><U0057>" % TWN country_ab3 "<U0054><U0057><U004E>" country_num 158 -country_isbn 957 +country_isbn "<U0039><U0035><U0037>" % 漢語文言 lang_name "<U6F22><U8A9E><U6587><U8A00>" % lzh diff --git a/localedata/locales/mk_MK b/localedata/locales/mk_MK index b751679..31653e7 100644 --- a/localedata/locales/mk_MK +++ b/localedata/locales/mk_MK @@ -152,7 +152,7 @@ country_ab2 "<U004d><U004b>" country_ab3 "<U004d><U004b><U0044>" country_car "<U004d><U004b>" country_num 807 -country_isbn "9989" +country_isbn "<U0039><U0039><U0038><U0039>" % македонски јазик lang_name "<U043C><U0430><U043A><U0435><U0434><U043E><U043D><U0441><U043A>/<U0438><U0020><U0458><U0430><U0437><U0438><U043A>" lang_ab "<U006d><U006b>" diff --git a/localedata/locales/mn_MN b/localedata/locales/mn_MN index 6649537..acb32da 100644 --- a/localedata/locales/mn_MN +++ b/localedata/locales/mn_MN @@ -254,7 +254,7 @@ country_ab2 "<U004D><U004E>" country_ab3 "<U004D><U004E><U0047>" country_num 496 country_car "<U004D><U0047><U004C>" -country_isbn 99929 +country_isbn "<U0039><U0039><U0039><U0032><U0039>" % Монгол хэл lang_name "<U041C><U043E><U043D><U0433><U043E><U043B><U0020><U0445><U044D><U043B>" lang_ab "<U006D><U006E>" diff --git a/localedata/locales/nan_TW b/localedata/locales/nan_TW index 0c11174..08bbb2d 100644 --- a/localedata/locales/nan_TW +++ b/localedata/locales/nan_TW @@ -200,7 +200,7 @@ country_ab2 "<U0054><U0057>" % TWN country_ab3 "<U0054><U0057><U004E>" country_num 158 -country_isbn 957 +country_isbn "<U0039><U0035><U0037>" % 漢語閩南語 lang_name "<U6F22><U8A9E><U95A9><U5357><U8A9E>" % nan diff --git a/localedata/locales/nds_DE b/localedata/locales/nds_DE index e1ab6e0..81d0ad4 100644 --- a/localedata/locales/nds_DE +++ b/localedata/locales/nds_DE @@ -46,7 +46,7 @@ country_ab2 "<U0044><U0045>" country_ab3 "<U0044><U0045><U0055>" country_car "<U0044>" country_num 276 -country_isbn "3" +country_isbn "<U0033>" lang_name "<U004E><U0065><U0064><U0064><U0065><U0072><U0073><U0061><U0073><U0073><U0069><U0073><U0063><U0068>" %lang_ab lang_term "<U006E><U0064><U0073>" diff --git a/localedata/locales/nds_NL b/localedata/locales/nds_NL index 14051f6..c59d3e6 100644 --- a/localedata/locales/nds_NL +++ b/localedata/locales/nds_NL @@ -45,7 +45,7 @@ country_ab2 "<U004E><U004C>" country_ab3 "<U004E><U004C><U0044>" country_car "<U004E><U004C>" country_num 528 -country_isbn "3" +country_isbn "<U0033>" lang_name "<U004E><U0065><U0064><U0064><U0065><U0072><U0073><U0061><U0073><U0073><U0069><U0073><U0063><U0068>" %lang_ab lang_term "<U006E><U0064><U0073>" diff --git a/localedata/locales/oc_FR b/localedata/locales/oc_FR index 10e3a03..5a9fca6 100644 --- a/localedata/locales/oc_FR +++ b/localedata/locales/oc_FR @@ -44,7 +44,7 @@ country_post "F" country_ab2 "FR" country_ab3 "FRA" country_num 250 -country_isbn "2" +country_isbn "<U0032>" country_car "F" % Occitan lang_name "<U004F><U0063><U0063><U0069><U0074><U0061><U006E>" diff --git a/localedata/locales/pap_AN b/localedata/locales/pap_AN index 63262a5..f3c5a96 100644 --- a/localedata/locales/pap_AN +++ b/localedata/locales/pap_AN @@ -49,7 +49,7 @@ postal_fmt "<U0025><U0064><U0025><U004E><U0025><U0066><U0025><U004E><U0025><U006 country_ab2 "<U0041><U004E>" country_ab3 "<U0041><U004E><U0054>" country_num 530 -country_isbn "99904" +country_isbn "<U0039><U0039><U0039><U0030><U0034>" country_car "<U004E><U0041>" % lang_ab lang_term "<U0070><U0061><U0070>" diff --git a/localedata/locales/ro_RO b/localedata/locales/ro_RO index 610f071..ab41ab7 100644 --- a/localedata/locales/ro_RO +++ b/localedata/locales/ro_RO @@ -377,7 +377,7 @@ country_car "<U0052><U004F>" % ISBN code is 973 % see: http://homepages.cwi.nl/~dik/english/codes/isbn.html % and other sources -country_isbn 973 +country_isbn "<U0039><U0037><U0033>" % FIXME: is it really RO? country_post "<U0052><U004F>" % language names are not capitalized in Romanian ( roma>na( ) diff --git a/localedata/locales/sq_MK b/localedata/locales/sq_MK index 9d6aef7..9d3957e 100644 --- a/localedata/locales/sq_MK +++ b/localedata/locales/sq_MK @@ -100,7 +100,7 @@ country_ab2 "<U004d><U004b>" country_ab3 "<U004d><U004b><U0044>" country_car "<U004d><U004b>" country_num 807 -country_isbn "9989" +country_isbn "<U0039><U0039><U0038><U0039>" % shqip lang_name "<U0073><U0068><U0071><U0069><U0070>" % sq diff --git a/localedata/locales/sv_FI b/localedata/locales/sv_FI index fca2935..007828e 100644 --- a/localedata/locales/sv_FI +++ b/localedata/locales/sv_FI @@ -143,7 +143,7 @@ country_num 246 country_name "<U0046><U0069><U006E><U006C><U0061><U006E><U0064>" country_post "<U0046><U0049>" country_car "<U0046><U0049><U004E>" -country_isbn 952 +country_isbn "<U0039><U0035><U0032>" % svenska lang_name "<U0073><U0076><U0065><U006E><U0073><U006B><U0061>" lang_ab "<U0073><U0076>" diff --git a/localedata/locales/tr_CY b/localedata/locales/tr_CY index e2e6936..8665dfa 100644 --- a/localedata/locales/tr_CY +++ b/localedata/locales/tr_CY @@ -98,7 +98,7 @@ country_name "<U004E><U006F><U0072><U0074><U0068><U0065><U0072><U006E>/ country_post "<U0054><U0052>" % TR country_car "<U0054><U0052>" -country_isbn 975 +country_isbn "<U0039><U0037><U0035>" country_num 792 % TR country_ab2 "<U0054><U0052>" diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR index f54be2c..82c8699 100644 --- a/localedata/locales/tr_TR +++ b/localedata/locales/tr_TR @@ -3587,7 +3587,7 @@ country_name "<U0054><U0075><U0072><U006B><U0065><U0079>" country_post "<U0054><U0052>" % TR country_car "<U0054><U0052>" -country_isbn 975 +country_isbn "<U0039><U0037><U0035>" country_num 792 % TR country_ab2 "<U0054><U0052>" diff --git a/localedata/locales/uk_UA b/localedata/locales/uk_UA index 511f004..a910ec6 100644 --- a/localedata/locales/uk_UA +++ b/localedata/locales/uk_UA @@ -1246,7 +1246,7 @@ country_num 804 country_car "<U0055><U0041>" % ISBN code, for books. -country_isbn 966 +country_isbn "<U0039><U0036><U0036>" % Two-letter abbreviation of the language, see ISO 639. lang_ab "<U0075><U006B>" diff --git a/localedata/locales/unm_US b/localedata/locales/unm_US index 482a7da..3467b8c 100644 --- a/localedata/locales/unm_US +++ b/localedata/locales/unm_US @@ -150,7 +150,7 @@ country_ab3 "<U0055><U0053><U0041>" country_num 840 % USA country_car "<U0055><U0053><U0041>" -country_isbn 0 +country_isbn "<U0030>" % lang_name "" % lang_ab "" % unm diff --git a/localedata/locales/wa_BE b/localedata/locales/wa_BE index a2fb3be..21979c5 100644 --- a/localedata/locales/wa_BE +++ b/localedata/locales/wa_BE @@ -42,7 +42,7 @@ country_post "B" country_ab2 "BE" country_ab3 "BEL" country_num 56 -country_isbn "2" +country_isbn "<U0032>" % B country_car "<U0042>" lang_name "<U0057><U0061><U006C><U006F><U006E>" diff --git a/localedata/locales/wae_CH b/localedata/locales/wae_CH index 5f11613..264aa63 100644 --- a/localedata/locales/wae_CH +++ b/localedata/locales/wae_CH @@ -236,6 +236,6 @@ postal_fmt "<U0025><U0066><U0025><U004E><U0025><U0061><U0025><U004E>/ country_ab2 "<U0043><U0048>" country_ab3 "<U0043><U0048><U0045>" country_num 756 -country_isbn 3 +country_isbn "<U0033>" END LC_ADDRESS diff --git a/localedata/locales/yi_US b/localedata/locales/yi_US index 97ed218..7c2259b 100644 --- a/localedata/locales/yi_US +++ b/localedata/locales/yi_US @@ -50,7 +50,7 @@ country_num 840 % USA country_car "<U0055><U0053><U0041>" % FIXME Check which isbn for Yiddish in USA -country_isbn "0" +country_isbn "<U0030>" lang_name "<U05D9><U05D9><U05B4><U05D3><U05D9><U05E9>" % yi lang_ab "<U0079><U0069>" diff --git a/localedata/locales/yo_NG b/localedata/locales/yo_NG index c88ca6e..37a948e 100644 --- a/localedata/locales/yo_NG +++ b/localedata/locales/yo_NG @@ -491,7 +491,7 @@ country_car "<U0057><U0041><U004E>" % ISO 2108 % http://www.isbn-international.org/ -country_isbn 978 +country_isbn "<U0039><U0037><U0038>" % ISO 639 language abbreviations: % 639-1 2 letter, 639-2 3 letter terminology