Message ID | 5616D304.8010909@redhat.com |
---|---|
State | New |
Headers | show |
On 10/08/2015 04:33 PM, Carlos O'Donell wrote: > The optimization introduced in commit > f13c2a8dff2329c6692a80176262ceaaf8a6f74e, causes regressions in > sorting for languages that have digraphs that change sort order, like > cs_CZ which sorts ch between h and i. > > My analysis shows the fast-forwarding optimization in STRCOLL advances > through a digraph while possibly stopping in the middle which results > in a subsequent skipping of the digraph and incorrect sorting. The > optimization is incorrect as implemented and because of that I'm > removing it for 2.23, and I will also commit this fix for 2.22 where > it was originally introduced. > > This patch reverts the optimization, introduces a new bug-strcoll2.c > regression test that tests both cs_CZ.UTF-8 and da_DK.ISO-8859-1 and > ensures they sort one digraph each correctly. The optimization can't be > applied without regressing this test. > > Checked on x86_64, bug-strcoll2.c fails without this patch and passes > after. > > Checked in for 2.23. Forgot to add string/Makefile. Checking that in now. c.
On 10/08/2015 04:50 PM, Carlos O'Donell wrote: > On 10/08/2015 04:33 PM, Carlos O'Donell wrote: >> The optimization introduced in commit >> f13c2a8dff2329c6692a80176262ceaaf8a6f74e, causes regressions in >> sorting for languages that have digraphs that change sort order, like >> cs_CZ which sorts ch between h and i. >> >> My analysis shows the fast-forwarding optimization in STRCOLL advances >> through a digraph while possibly stopping in the middle which results >> in a subsequent skipping of the digraph and incorrect sorting. The >> optimization is incorrect as implemented and because of that I'm >> removing it for 2.23, and I will also commit this fix for 2.22 where >> it was originally introduced. >> >> This patch reverts the optimization, introduces a new bug-strcoll2.c >> regression test that tests both cs_CZ.UTF-8 and da_DK.ISO-8859-1 and >> ensures they sort one digraph each correctly. The optimization can't be >> applied without regressing this test. >> >> Checked on x86_64, bug-strcoll2.c fails without this patch and passes >> after. >> >> Checked in for 2.23. > > Forgot to add string/Makefile. Checking that in now. > > c. > Pushed. commit 233127a79e74c1490cae021877c0213337893dcf Author: Carlos O'Donell <carlos@systemhalted.org> Date: Thu Oct 8 16:54:30 2015 -0400 strcoll: Add bug-strcoll2 to testsuite. Adds bug-strcoll2 to the string tests, along with the generation of required locales. Cheers, Carlos.
Sorry for coming up late with that, but the UTF-8 detection part is needed in the patch for #18441. How can I handle this correctly? Maybe it is also possible to fix the STRDIFF-patch. Leonhard Am 08.10.2015 um 22:33 schrieb Carlos O'Donell: > The optimization introduced in commit > f13c2a8dff2329c6692a80176262ceaaf8a6f74e, causes regressions in > sorting for languages that have digraphs that change sort order, like > cs_CZ which sorts ch between h and i. > > My analysis shows the fast-forwarding optimization in STRCOLL advances > through a digraph while possibly stopping in the middle which results > in a subsequent skipping of the digraph and incorrect sorting. The > optimization is incorrect as implemented and because of that I'm > removing it for 2.23, and I will also commit this fix for 2.22 where > it was originally introduced. > > This patch reverts the optimization, introduces a new bug-strcoll2.c > regression test that tests both cs_CZ.UTF-8 and da_DK.ISO-8859-1 and > ensures they sort one digraph each correctly. The optimization can't be > applied without regressing this test. > > Checked on x86_64, bug-strcoll2.c fails without this patch and passes > after. > > Checked in for 2.23. > > 2015-10-08 Carlos O'Donell <carlos@redhat.com> > > [BZ #18589] > * string/bug-strcoll2.c: New file. > * locale/categories.def: Revert commit > f13c2a8dff2329c6692a80176262ceaaf8a6f74e. > * locale/langinfo.h: Likewise. > * locale/localeinfo.h: Likewise. > * locale/C-collate.c: Likewise. > * programs/ld-collate.c (collate_output): Likewise. > * string/strcoll_l.c (STRDIFF): Likewise. > (STRCOLL): Likewise. > * wcsmbs/wcscoll_l.c: Likewise. >
On 10/13/2015 01:41 AM, Leonhard Holz wrote: > Sorry for coming up late with that, but the UTF-8 detection part is > needed in the patch for #18441. How can I handle this correctly? > > Maybe it is also possible to fix the STRDIFF-patch. To fix the STRDIFF patch you would have to advance by units of comparison including advancing by entire digraphs. It would slow down the fast-forward pass, and you would have to see if it makes it feasible to use. Cheers, Carlos.
diff --git a/string/bug-strcoll2.c b/string/bug-strcoll2.c new file mode 100644 index 0000000..5ce2f94 --- /dev/null +++ b/string/bug-strcoll2.c @@ -0,0 +1,93 @@ +/* Bug 18589: sort-test.sh fails at random. + Copyright (C) 1998-2015 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdio.h> +#include <string.h> +#include <locale.h> + +/* An incorrect strcoll optimization resulted in incorrect + results from strcoll for cs_CZ and da_DK. */ + +int +test_cs_CZ (void) +{ + const char t1[] = "config"; + const char t2[] = "choose"; + if (setlocale (LC_ALL, "cs_CZ.UTF-8") == NULL) + { + perror ("setlocale"); + return 1; + } + /* In Czech the digraph ch sorts after c, therefore we expect + config to sort before choose. */ + int a = strcoll (t1, t2); + int b = strcoll (t2, t1); + printf ("strcoll (\"%s\", \"%s\") = %d\n", t1, t2, a); + printf ("strcoll (\"%s\", \"%s\") = %d\n", t2, t1, b); + if (a < 0 && b > 0) + { + puts ("PASS: config < choose"); + return 0; + } + else + { + puts ("FAIL: Wrong sorting in cz_CZ.UTF-8."); + return 1; + } +} + +int +test_da_DK (void) +{ + const char t1[] = "AS"; + const char t2[] = "AA"; + if (setlocale (LC_ALL, "da_DK.ISO-8859-1") == NULL) + { + perror ("setlocale"); + return 1; + } + /* AA should be treated as the last letter of the Danish alphabet, + hence sorting after AS. */ + int a = strcoll (t1, t2); + int b = strcoll (t2, t1); + printf ("strcoll (\"%s\", \"%s\") = %d\n", t1, t2, a); + printf ("strcoll (\"%s\", \"%s\") = %d\n", t2, t1, b); + if (a < 0 && b > 0) + { + puts ("PASS: AS < AA"); + return 0; + } + else + { + puts ("FAIL: Wrong sorting in da_DK.ISO-8859-1"); + return 1; + } +} + +static int +do_test (void) +{ + int err = 0; + err |= test_cs_CZ (); + err |= test_da_DK (); + return err; +} + +#define TEST_FUNCTION do_test () +#include "../test-skeleton.c"