Message ID | 1308087182-26577-1-git-send-email-brobecker@adacore.com |
---|---|
State | New |
Headers | show |
Looks OK to me.
On Tue, Jun 14, 2011 at 2:33 PM, Joel Brobecker <brobecker@adacore.com> wrote: > Hello, > > HFS+, the FS on Darwin, is case insensitive. So this patch adjusts > filename_cmp.c to ignore the casing when comparing filenames on Darwin. This is wrong as not all FSs are case insensitive. In fact HFS+ can be case sensitive too. I think you need better check than just saying all Darwin is case insensitive. This is just like using FAT32 on Linux. In fact I think HAVE_DOS_BASED_FILE_SYSTEM is incorrect also for NTFS as it can also be case sensitive. Thanks, Andrew Pinski
> This is wrong as not all FSs are case insensitive. In fact HFS+ can > be case sensitive too. I think you need better check than just > saying all Darwin is case insensitive. This is just like using > FAT32 on Linux. In fact I think HAVE_DOS_BASED_FILE_SYSTEM is > incorrect also for NTFS as it can also be case sensitive. There's a difference between case preserving and case sensitive, though, and we really don't have a portable way to detect case-sensitivity on a per-directory basis, sow how can we do better?
On Jun 15, 2011, at 12:01 AM, DJ Delorie wrote: > >> This is wrong as not all FSs are case insensitive. In fact HFS+ can >> be case sensitive too. I think you need better check than just >> saying all Darwin is case insensitive. This is just like using >> FAT32 on Linux. In fact I think HAVE_DOS_BASED_FILE_SYSTEM is >> incorrect also for NTFS as it can also be case sensitive. > > There's a difference between case preserving and case sensitive, > though, and we really don't have a portable way to detect > case-sensitivity on a per-directory basis, sow how can we do better? Seconded. I have mixed feelings about this issue. Handling HFS as case insensitive might not be user friendly, but on the other hand it might be boring for developers working on a case sensitive HFS with two files that differs only in case. We can do something better using pathconf(2) with the _PC_CASE_SENSITIVE, but this would result in an API change and maybe performance issue. Tristan.
> There's a difference between case preserving and case sensitive, > though, and we really don't have a portable way to detect > case-sensitivity on a per-directory basis, sow how can we do better? That's roughly my thoughts on this issue. It's true that this property is not an OS property, and we could try devising some ways to test it in various ways. But I think that the current solution is good enough for practical purposes. A better solution is, in my opinion, more effort than it is worth.
On Jun 14 18:01, DJ Delorie wrote: > > > This is wrong as not all FSs are case insensitive. In fact HFS+ can > > be case sensitive too. I think you need better check than just > > saying all Darwin is case insensitive. This is just like using > > FAT32 on Linux. In fact I think HAVE_DOS_BASED_FILE_SYSTEM is > > incorrect also for NTFS as it can also be case sensitive. > > There's a difference between case preserving and case sensitive, > though, and we really don't have a portable way to detect > case-sensitivity on a per-directory basis, sow how can we do better? As Andrew points out, NTFS can be case-sensitive as well, and on Windows the case-sensitivity vs. case-preserving behaviour can be chosen for each file or directory descriptor at the time the file is opened. IMHO it's actually a pity that the filename comparison behaves differently on different systems. I think it would make sense to behave identical on all systems. What about this: Always search case-sensitive. If file has been found, return. Otherwise, search case-insensitive. Talking about case-insensitive comparison, the filename_cmp and filename_ncmp functions don't work for multibyte codesets, only for singlebyte codesets. Given that UTF-8 is standard nowadays, shouldn't these functions be replaced with multibyte-aware versions? Along the same lines, the entire set of safe-ctype functions only work for ASCII and EBCDIC... Corinna
> Date: Wed, 15 Jun 2011 10:22:36 +0200 > From: Corinna Vinschen <vinschen@redhat.com> > > On Jun 14 18:01, DJ Delorie wrote: > > > > > This is wrong as not all FSs are case insensitive. In fact HFS+ can > > > be case sensitive too. I think you need better check than just > > > saying all Darwin is case insensitive. This is just like using > > > FAT32 on Linux. In fact I think HAVE_DOS_BASED_FILE_SYSTEM is > > > incorrect also for NTFS as it can also be case sensitive. > > > > There's a difference between case preserving and case sensitive, > > though, and we really don't have a portable way to detect > > case-sensitivity on a per-directory basis, sow how can we do better? > > As Andrew points out, NTFS can be case-sensitive as well, and on Windows > the case-sensitivity vs. case-preserving behaviour can be chosen for > each file or directory descriptor at the time the file is opened. > > IMHO it's actually a pity that the filename comparison behaves differently > on different systems. I think it would make sense to behave identical on > all systems. What about this: Always search case-sensitive. If file has > been found, return. Otherwise, search case-insensitive. Over my dead body. On a proper operating system filenames are case-sensitive. Your suggestion would create spurious matches. Even on case-preserving filesystems I'd argue that treating them as case-sensitive is still the right approach. If that creates problems, it means somebody was sloppy and didn't type the proper name of the file or some piece of code in the toolchain arbitrarily changed the case of a filename. I don't mind punishing people for that. They have to learn that on a proper operating system file names are case-sensitive! If you're still using an operating system with fully case-insensitive filesystems, I feel very, very sorry for you. > Talking about case-insensitive comparison, the filename_cmp and > filename_ncmp functions don't work for multibyte codesets, only for > singlebyte codesets. Given that UTF-8 is standard nowadays, shouldn't > these functions be replaced with multibyte-aware versions? For UTF-8, that isn't necessary. Normal string manipulation functions work just fine on them, since UTF-8 strings don't contain any embedded NUL characters. It's only when you try to be too clever about case-insensitivity that you run into problems. > Along the same lines, the entire set of safe-ctype functions only > work for ASCII and EBCDIC... That really should only matter for displaying filenames. Anyway. I really don't care how deep a hole people have dug for themselves in trying to support Windows in all its various broken configurations. But on a native debugger for a UNIX-like system, or a cross debugger between such systems, filename_cmp() should simply do a strcmp().
On Jun 15 11:58, Mark Kettenis wrote: > > Date: Wed, 15 Jun 2011 10:22:36 +0200 > > From: Corinna Vinschen <...> Please do not quote my email address in the body of your message. Thank you. > > IMHO it's actually a pity that the filename comparison behaves differently > > on different systems. I think it would make sense to behave identical on > > all systems. What about this: Always search case-sensitive. If file has > > been found, return. Otherwise, search case-insensitive. > > Over my dead body. On a proper operating system filenames are > case-sensitive. Your suggestion would create spurious matches. Indeed. Probably the case sensitivity should not be hardcoded in a low-level function at all. The application should decide if it wants case-sensitive or case-insensitive filename comparison. This way, the comparison could be based on OS, filesystem, or user choice. > Even on case-preserving filesystems I'd argue that treating them as > case-sensitive is still the right approach. If that creates problems, > it means somebody was sloppy and didn't type the proper name of the > file or some piece of code in the toolchain arbitrarily changed the > case of a filename. I don't mind punishing people for that. They > have to learn that on a proper operating system file names are > case-sensitive! I wasn't aware that gcc, gdb, and other tools using libiberty are supposed to punish people for the features of the OS they are working on. At one point I actually thought they were supposed to *help* developers. I seem to be wrong. > > Talking about case-insensitive comparison, the filename_cmp and > > filename_ncmp functions don't work for multibyte codesets, only for > > singlebyte codesets. Given that UTF-8 is standard nowadays, shouldn't > > these functions be replaced with multibyte-aware versions? > > For UTF-8, that isn't necessary. Normal string manipulation functions > work just fine on them, since UTF-8 strings don't contain any embedded > NUL characters. It's only when you try to be too clever about > case-insensitivity that you run into problems. If you read the text you're replying to once more, you see that I'm explicitely talking about case-insensitive comparison. In that case, the functions won't work correctly, unless you use a singlebyte codeset. The tolower function on a single byte just doesn't make sense in multibyte charsets. The right thing to do would be something along the lines of mbstowcs (wide_a, a); mbstowcs (wide_b, b); return wcscasecmp (wide_a, wide_b); > > Along the same lines, the entire set of safe-ctype functions only > > work for ASCII and EBCDIC... > > That really should only matter for displaying filenames. It matters for case-insensitive filename comparison as well. > Anyway. I really don't care how deep a hole people have dug for > themselves in trying to support Windows in all its various broken > configurations. I can't help but notice that you seem to have a strained relationship to Windows. However, if you read the OP again, you'll notice that the patch was supposed to help developers on MacOS, not Windows. For Windows the function already performs case-insensitive comparison, albeit wrong. Corinna
On Wed, 15 Jun 2011, Corinna Vinschen wrote: > these functions be replaced with multibyte-aware versions? Along the > same lines, the entire set of safe-ctype functions only work for ASCII > and EBCDIC... That's the whole point of safe-ctype: that code that is processing things such as C source code whose semantics do not depend on the host locale can examine character properties in a locale-independent way. Where C source code has multibyte characters, the correct handling depends in detail on the version of C and cannot be done by generic code.
On Wednesday 15 June 2011 11:44:19, Corinna Vinschen wrote: > Indeed. Probably the case sensitivity should not be hardcoded in a > low-level function at all. The application should decide if it wants > case-sensitive or case-insensitive filename comparison. This way, > the comparison could be based on OS, filesystem, or user choice. <http://sourceware.org/ml/gdb-patches/2010-12/msg00343.html> (that only handles filename comparison, not file opening)
On 6/15/2011 5:58 AM, Mark Kettenis wrote: > Over my dead body. On a proper operating system filenames are > case-sensitive. Your suggestion would create spurious matches. Yes, we all know that Unix systems chose case sensitive, and are happy to have files differing only by case in the same directory. Obviously any proper software has to fully support such systems (if I was in the same mode as you and adding gratuitious flames to my comments, I would have preceded the word systems by brain-dead). > > Even on case-preserving filesystems I'd argue that treating them as > case-sensitive is still the right approach. Absolutely not, please don't visit your unix-borne predjudices on non-unix systems. There is nothing worse for Windows users than having to put up with silly decisions like this that visit unix nonsense (and it is nonsense in a windows environment) on windows software. > If that creates problems, > it means somebody was sloppy and didn't type the proper name of the > file The whole point in a system like Windows which is case preserving but not case sensitive is that you are NOT expected to type in the "proper" capitalization. In English, we recognize the words English and ENGLISH as equivalent, and windows users expect the same treatment. So the normal expectation in windows systems is that, yes, you can make nice capitalization like MyFile if you like, and it will be properly displayed. But any software that requires me to type MyFile rather than myfile is junk! > If you're still using an operating system with fully case-insensitive > filesystems, I feel very, very sorry for you. You are allowed to have this opinion, I feel the same about people who have to tolerate case-sensitive file systems, but I am quite happy with software for Unix systems fully behaving (I would agree that any software that EVER did case insensitive matching, as suggested earlier in this thread would be broken on Unix). But following your suggestion would be equally broken in Windows. > or some piece of code in the toolchain arbitrarily changed the > case of a filename. I don't mind punishing people for that. They > have to learn that on a proper operating system file names are > case-sensitive! This kind of unix arrogance leads to junk unusable software on windows. It's really important not to visit your unix predjudices on windows users. After all we feel the same way in return, I find Unix systems complete junk for many reasons, one of which is the very annoying case sensitive viewpoint, but I do not translate my feelings into silly suggestions for making software malfunction on Unix. You should not make this mistake in a reverse direction.
On Jun 15 10:45, Joseph S. Myers wrote: > On Wed, 15 Jun 2011, Corinna Vinschen wrote: > > > these functions be replaced with multibyte-aware versions? Along the > > same lines, the entire set of safe-ctype functions only work for ASCII > > and EBCDIC... > > That's the whole point of safe-ctype: that code that is processing things > such as C source code whose semantics do not depend on the host locale can > examine character properties in a locale-independent way. Where C source > code has multibyte characters, the correct handling depends in detail on > the version of C and cannot be done by generic code. Ok, I see. Just in this specific case it's about filenames, not C source. I don't think it makes sense to restrict filenames to ASCII or EBCDIC chars. Corinna
> Date: Wed, 15 Jun 2011 10:22:36 +0200 > From: Corinna Vinschen <vinschen@redhat.com> > Cc: Andrew Pinski <pinskia@gmail.com>, brobecker@adacore.com, gcc-patches@gcc.gnu.org, gdb-patches@sourceware.org > > Talking about case-insensitive comparison, the filename_cmp and > filename_ncmp functions don't work for multibyte codesets, only for > singlebyte codesets. Given that UTF-8 is standard nowadays, shouldn't > these functions be replaced with multibyte-aware versions? I agree, but if we go that way, shouldn't we support UTF-16, which is used by the native Windows APIs? Windows does not use UTF-8 for file names.
> Date: Wed, 15 Jun 2011 06:59:11 -0400 > From: Robert Dewar <dewar@adacore.com> > CC: vinschen@redhat.com, dj@redhat.com, pinskia@gmail.com, brobecker@adacore.com, gcc-patches@gcc.gnu.org, gdb-patches@sourceware.org > > > or some piece of code in the toolchain arbitrarily changed the > > case of a filename. I don't mind punishing people for that. They > > have to learn that on a proper operating system file names are > > case-sensitive! > > This kind of unix arrogance leads to junk unusable software on > windows. It's really important not to visit your unix predjudices > on windows users. After all we feel the same way in return, I > find Unix systems complete junk for many reasons, one of which > is the very annoying case sensitive viewpoint, but I do not > translate my feelings into silly suggestions for making > software malfunction on Unix. You should not make this mistake > in a reverse direction. I cannot agree more.
On Jun 15 20:27, Eli Zaretskii wrote: > > Date: Wed, 15 Jun 2011 10:22:36 +0200 > > From: Corinna Vinschen <...> > > Talking about case-insensitive comparison, the filename_cmp and > > filename_ncmp functions don't work for multibyte codesets, only for > > singlebyte codesets. Given that UTF-8 is standard nowadays, shouldn't > > these functions be replaced with multibyte-aware versions? > > I agree, but if we go that way, shouldn't we support UTF-16, which is > used by the native Windows APIs? Windows does not use UTF-8 for file > names. I don't think so. UTF-16 is Windows' wchar_t (or WCHAR) codeset, but the applications calling the libiberty functions are using the char datatype with single- or multibyte codesets. If the filename_cmp function converts the multibyte input strings to wchar_t and compares the wide char strings case insensitive(*), they would use UTF-16 under the hood on Windows anyway. (*) As proposed in http://sourceware.org/ml/gdb-patches/2011-06/msg00210.html, basically like this: #ifdef _WIN32 #define wcscasecmp _wcsicmp #endif mbstowcs (wide_a, a); mbstowcs (wide_b, b); return wcscasecmp (wide_a, wide_b); Corinna
> Looks OK to me.
Thanks, DJ. I've just checked the patch in on the GCC side.
I will push it on the src/GDB CVS momentarily.
diff --git a/include/filenames.h b/include/filenames.h index d4955df..75ec330 100644 --- a/include/filenames.h +++ b/include/filenames.h @@ -34,10 +34,18 @@ extern "C" { # ifndef HAVE_DOS_BASED_FILE_SYSTEM # define HAVE_DOS_BASED_FILE_SYSTEM 1 # endif +# ifndef HAVE_CASE_INSENSITIVE_FILE_SYSTEM +# define HAVE_CASE_INSENSITIVE_FILE_SYSTEM 1 +# endif # define HAS_DRIVE_SPEC(f) HAS_DOS_DRIVE_SPEC (f) # define IS_DIR_SEPARATOR(c) IS_DOS_DIR_SEPARATOR (c) # define IS_ABSOLUTE_PATH(f) IS_DOS_ABSOLUTE_PATH (f) #else /* not DOSish */ +# if defined(__APPLE__) +# ifndef HAVE_CASE_INSENSITIVE_FILE_SYSTEM +# define HAVE_CASE_INSENSITIVE_FILE_SYSTEM 1 +# endif +# endif /* __APPLE__ */ # define HAS_DRIVE_SPEC(f) (0) # define IS_DIR_SEPARATOR(c) IS_UNIX_DIR_SEPARATOR (c) # define IS_ABSOLUTE_PATH(f) IS_UNIX_ABSOLUTE_PATH (f) diff --git a/libiberty/filename_cmp.c b/libiberty/filename_cmp.c index 0eed120..5179f8d 100644 --- a/libiberty/filename_cmp.c +++ b/libiberty/filename_cmp.c @@ -50,19 +50,27 @@ and backward slashes are equal. int filename_cmp (const char *s1, const char *s2) { -#ifndef HAVE_DOS_BASED_FILE_SYSTEM +#if !defined(HAVE_DOS_BASED_FILE_SYSTEM) \ + && !defined(HAVE_CASE_INSENSITIVE_FILE_SYSTEM) return strcmp(s1, s2); #else for (;;) { - int c1 = TOLOWER (*s1); - int c2 = TOLOWER (*s2); + int c1 = *s1; + int c2 = *s2; +#if defined (HAVE_CASE_INSENSITIVE_FILE_SYSTEM) + c1 = TOLOWER (c1); + c2 = TOLOWER (c2); +#endif + +#if defined (HAVE_DOS_BASED_FILE_SYSTEM) /* On DOS-based file systems, the '/' and the '\' are equivalent. */ if (c1 == '/') c1 = '\\'; if (c2 == '/') c2 = '\\'; +#endif if (c1 != c2) return (c1 - c2); @@ -100,21 +108,29 @@ and backward slashes are equal. int filename_ncmp (const char *s1, const char *s2, size_t n) { -#ifndef HAVE_DOS_BASED_FILE_SYSTEM +#if !defined(HAVE_DOS_BASED_FILE_SYSTEM) \ + && !defined(HAVE_CASE_INSENSITIVE_FILE_SYSTEM) return strncmp(s1, s2, n); #else if (!n) return 0; for (; n > 0; --n) { - int c1 = TOLOWER (*s1); - int c2 = TOLOWER (*s2); + int c1 = *s1; + int c2 = *s2; +#if defined (HAVE_CASE_INSENSITIVE_FILE_SYSTEM) + c1 = TOLOWER (c1); + c2 = TOLOWER (c2); +#endif + +#if defined (HAVE_DOS_BASED_FILE_SYSTEM) /* On DOS-based file systems, the '/' and the '\' are equivalent. */ if (c1 == '/') c1 = '\\'; if (c2 == '/') c2 = '\\'; +#endif if (c1 == '\0' || c1 != c2) return (c1 - c2);