[RFA/libiberty] Darwin has case-insensitive filesystems

Message ID	1308087182-26577-1-git-send-email-brobecker@adacore.com
State	New
Headers	show Return-Path: <gcc-patches-return-294429-incoming=patchwork.ozlabs.org@gcc.gnu.org> From: Joel Brobecker <brobecker@adacore.com> To: gcc-patches@gcc.gnu.org Cc: gdb-patches@sourceware.org, Joel Brobecker <brobecker@adacore.com> Subject: [RFA/libiberty] Darwin has case-insensitive filesystems Date: Tue, 14 Jun 2011 14:33:02 -0700 Message-Id: <1308087182-26577-1-git-send-email-brobecker@adacore.com> Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org

Joel Brobecker June 14, 2011, 9:33 p.m. UTC

Hello,

HFS+, the FS on Darwin, is case insensitive. So this patch adjusts
filename_cmp.c to ignore the casing when comparing filenames on Darwin.

This is visible in GDB when trying to break on a file whose name
is, say 'Mixed_Case.adb', but was compiled using 'mixed_case.adb'
as the filename.  In that case, GDB says it cannot find 'Mixed_Case.adb'.

There are two parts:
  1. in include/filenames.h: I add a new macro
     HAVE_CASE_INSENSITIVE_FILE_SYSTEM, which is defined on systems
     that have DOS-like file systems, as well as on Darwin.
  2. Adjust filename_cmp and filename_ncmp to take it into account.

I am also wondering whether it makes sense or not to keep the
case-sensitive & no DOS-like features case separate, or whether
we should handle this case using the same code as on Windows/Darwin.
In other words, does it make a difference to be using strcmp/strncmp
when we can, versus always using our loop that compares character by
character?

include/ChangeLog:

        * filenames.h (HAVE_CASE_INSENSITIVE_FILE_SYSTEM): Define
        on Darwin, as well as on the systems that use a DOS-like
        filesystem.

libiberty/ChangeLog:

        * filename_cmp.c (filename_cmp, filename_ncmp): Add handling of
        HAVE_CASE_INSENSITIVE_FILE_SYSTEM.

Tested on x86_64-darwin as well as on x86_64-linux. I haven't tested
on a Windows system yet, but I will get our gdb-testsuite's daily
results before tomorrow.

Does this look OK to commit?

Thanks!

DJ Delorie June 14, 2011, 9:38 p.m. UTC | #1

Looks OK to me.

Andrew Pinski June 14, 2011, 9:56 p.m. UTC | #2

On Tue, Jun 14, 2011 at 2:33 PM, Joel Brobecker <brobecker@adacore.com> wrote:
> Hello,
>
> HFS+, the FS on Darwin, is case insensitive. So this patch adjusts
> filename_cmp.c to ignore the casing when comparing filenames on Darwin.

This is wrong as not all FSs are case insensitive.  In fact HFS+ can
be case sensitive too.  I think you need better check than just saying
all Darwin is case insensitive.  This is just like using FAT32 on
Linux.  In fact I think HAVE_DOS_BASED_FILE_SYSTEM is incorrect also
for NTFS as it can also be case sensitive.

Thanks,
Andrew Pinski

DJ Delorie June 14, 2011, 10:01 p.m. UTC | #3

> This is wrong as not all FSs are case insensitive.  In fact HFS+ can
> be case sensitive too.  I think you need better check than just
> saying all Darwin is case insensitive.  This is just like using
> FAT32 on Linux.  In fact I think HAVE_DOS_BASED_FILE_SYSTEM is
> incorrect also for NTFS as it can also be case sensitive.

There's a difference between case preserving and case sensitive,
though, and we really don't have a portable way to detect
case-sensitivity on a per-directory basis, sow how can we do better?

Tristan Gingold June 15, 2011, 2:53 a.m. UTC | #4

On Jun 15, 2011, at 12:01 AM, DJ Delorie wrote:

> 
>> This is wrong as not all FSs are case insensitive.  In fact HFS+ can
>> be case sensitive too.  I think you need better check than just
>> saying all Darwin is case insensitive.  This is just like using
>> FAT32 on Linux.  In fact I think HAVE_DOS_BASED_FILE_SYSTEM is
>> incorrect also for NTFS as it can also be case sensitive.
> 
> There's a difference between case preserving and case sensitive,
> though, and we really don't have a portable way to detect
> case-sensitivity on a per-directory basis, sow how can we do better?

Seconded.

I have mixed feelings about this issue.

Handling HFS as case insensitive might not be user friendly, but on the other hand
it might be boring for developers working on a case sensitive HFS with two files that
differs only in case.

We can do something better using pathconf(2) with the _PC_CASE_SENSITIVE, but this would
result in an API change and maybe performance issue.

Tristan.

Joel Brobecker June 15, 2011, 4:24 a.m. UTC | #5

> There's a difference between case preserving and case sensitive,
> though, and we really don't have a portable way to detect
> case-sensitivity on a per-directory basis, sow how can we do better?

That's roughly my thoughts on this issue. It's true that this property
is not an OS property, and we could try devising some ways to test it
in various ways.  But I think that the current solution is good enough
for practical purposes. A better solution is, in my opinion, more
effort than it is worth.

Corinna Vinschen June 15, 2011, 8:22 a.m. UTC | #6

On Jun 14 18:01, DJ Delorie wrote:
> 
> > This is wrong as not all FSs are case insensitive.  In fact HFS+ can
> > be case sensitive too.  I think you need better check than just
> > saying all Darwin is case insensitive.  This is just like using
> > FAT32 on Linux.  In fact I think HAVE_DOS_BASED_FILE_SYSTEM is
> > incorrect also for NTFS as it can also be case sensitive.
> 
> There's a difference between case preserving and case sensitive,
> though, and we really don't have a portable way to detect
> case-sensitivity on a per-directory basis, sow how can we do better?

As Andrew points out, NTFS can be case-sensitive as well, and on Windows
the case-sensitivity vs. case-preserving behaviour can be chosen for
each file or directory descriptor at the time the file is opened.

IMHO it's actually a pity that the filename comparison behaves differently
on different systems.  I think it would make sense to behave identical on
all systems.  What about this:  Always search case-sensitive.  If file has
been found, return.  Otherwise, search case-insensitive.

Talking about case-insensitive comparison, the filename_cmp and
filename_ncmp functions don't work for multibyte codesets, only for
singlebyte codesets.  Given that UTF-8 is standard nowadays, shouldn't
these functions be replaced with multibyte-aware versions?  Along the
same lines, the entire set of safe-ctype functions only work for ASCII
and EBCDIC...

Corinna

Mark Kettenis June 15, 2011, 9:58 a.m. UTC | #7

> Date: Wed, 15 Jun 2011 10:22:36 +0200
> From: Corinna Vinschen <vinschen@redhat.com>
> 
> On Jun 14 18:01, DJ Delorie wrote:
> > 
> > > This is wrong as not all FSs are case insensitive.  In fact HFS+ can
> > > be case sensitive too.  I think you need better check than just
> > > saying all Darwin is case insensitive.  This is just like using
> > > FAT32 on Linux.  In fact I think HAVE_DOS_BASED_FILE_SYSTEM is
> > > incorrect also for NTFS as it can also be case sensitive.
> > 
> > There's a difference between case preserving and case sensitive,
> > though, and we really don't have a portable way to detect
> > case-sensitivity on a per-directory basis, sow how can we do better?
> 
> As Andrew points out, NTFS can be case-sensitive as well, and on Windows
> the case-sensitivity vs. case-preserving behaviour can be chosen for
> each file or directory descriptor at the time the file is opened.
> 
> IMHO it's actually a pity that the filename comparison behaves differently
> on different systems.  I think it would make sense to behave identical on
> all systems.  What about this:  Always search case-sensitive.  If file has
> been found, return.  Otherwise, search case-insensitive.

Over my dead body.  On a proper operating system filenames are
case-sensitive.  Your suggestion would create spurious matches.

Even on case-preserving filesystems I'd argue that treating them as
case-sensitive is still the right approach.  If that creates problems,
it means somebody was sloppy and didn't type the proper name of the
file or some piece of code in the toolchain arbitrarily changed the
case of a filename.  I don't mind punishing people for that.  They
have to learn that on a proper operating system file names are
case-sensitive!

If you're still using an operating system with fully case-insensitive
filesystems, I feel very, very sorry for you.

> Talking about case-insensitive comparison, the filename_cmp and
> filename_ncmp functions don't work for multibyte codesets, only for
> singlebyte codesets.  Given that UTF-8 is standard nowadays, shouldn't
> these functions be replaced with multibyte-aware versions?

For UTF-8, that isn't necessary.  Normal string manipulation functions
work just fine on them, since UTF-8 strings don't contain any embedded
NUL characters.  It's only when you try to be too clever about
case-insensitivity that you run into problems.

> Along the same lines, the entire set of safe-ctype functions only
> work for ASCII and EBCDIC...

That really should only matter for displaying filenames.

Anyway.  I really don't care how deep a hole people have dug for
themselves in trying to support Windows in all its various broken
configurations.  But on a native debugger for a UNIX-like system, or a
cross debugger between such systems, filename_cmp() should simply do a
strcmp().

Corinna Vinschen June 15, 2011, 10:44 a.m. UTC | #8

On Jun 15 11:58, Mark Kettenis wrote:
> > Date: Wed, 15 Jun 2011 10:22:36 +0200
> > From: Corinna Vinschen <...>

Please do not quote my email address in the body of your message.
Thank you.

> > IMHO it's actually a pity that the filename comparison behaves differently
> > on different systems.  I think it would make sense to behave identical on
> > all systems.  What about this:  Always search case-sensitive.  If file has
> > been found, return.  Otherwise, search case-insensitive.
> 
> Over my dead body.  On a proper operating system filenames are
> case-sensitive.  Your suggestion would create spurious matches.

Indeed.  Probably the case sensitivity should not be hardcoded in a
low-level function at all.  The application should decide if it wants
case-sensitive or case-insensitive filename comparison.  This way,
the comparison could be based on OS, filesystem, or user choice.

> Even on case-preserving filesystems I'd argue that treating them as
> case-sensitive is still the right approach.  If that creates problems,
> it means somebody was sloppy and didn't type the proper name of the
> file or some piece of code in the toolchain arbitrarily changed the
> case of a filename.  I don't mind punishing people for that.  They
> have to learn that on a proper operating system file names are
> case-sensitive!

I wasn't aware that gcc, gdb, and other tools using libiberty are
supposed to punish people for the features of the OS they are working
on.  At one point I actually thought they were supposed to *help*
developers.  I seem to be wrong.

> > Talking about case-insensitive comparison, the filename_cmp and
> > filename_ncmp functions don't work for multibyte codesets, only for
> > singlebyte codesets.  Given that UTF-8 is standard nowadays, shouldn't
> > these functions be replaced with multibyte-aware versions?
> 
> For UTF-8, that isn't necessary.  Normal string manipulation functions
> work just fine on them, since UTF-8 strings don't contain any embedded
> NUL characters.  It's only when you try to be too clever about
> case-insensitivity that you run into problems.

If you read the text you're replying to once more, you see that I'm
explicitely talking about case-insensitive comparison.  In that case,
the functions won't work correctly, unless you use a singlebyte codeset.
The tolower function on a single byte just doesn't make sense in
multibyte charsets.  The right thing to do would be something along
the lines of

    mbstowcs (wide_a, a);
    mbstowcs (wide_b, b);
    return wcscasecmp (wide_a, wide_b);

> > Along the same lines, the entire set of safe-ctype functions only
> > work for ASCII and EBCDIC...
> 
> That really should only matter for displaying filenames.

It matters for case-insensitive filename comparison as well.

> Anyway.  I really don't care how deep a hole people have dug for
> themselves in trying to support Windows in all its various broken
> configurations.

I can't help but notice that you seem to have a strained relationship to
Windows.  However, if you read the OP again, you'll notice that the patch
was supposed to help developers on MacOS, not Windows.  For Windows the
function already performs case-insensitive comparison, albeit wrong.

Corinna

Joseph Myers June 15, 2011, 10:45 a.m. UTC | #9

On Wed, 15 Jun 2011, Corinna Vinschen wrote:

> these functions be replaced with multibyte-aware versions?  Along the
> same lines, the entire set of safe-ctype functions only work for ASCII
> and EBCDIC...

That's the whole point of safe-ctype: that code that is processing things 
such as C source code whose semantics do not depend on the host locale can 
examine character properties in a locale-independent way.  Where C source 
code has multibyte characters, the correct handling depends in detail on 
the version of C and cannot be done by generic code.

Pedro Alves June 15, 2011, 10:55 a.m. UTC | #10

On Wednesday 15 June 2011 11:44:19, Corinna Vinschen wrote:
> Indeed.  Probably the case sensitivity should not be hardcoded in a
> low-level function at all.  The application should decide if it wants
> case-sensitive or case-insensitive filename comparison.  This way,
> the comparison could be based on OS, filesystem, or user choice.

<http://sourceware.org/ml/gdb-patches/2010-12/msg00343.html>

(that only handles filename comparison, not file opening)

Robert Dewar June 15, 2011, 10:59 a.m. UTC | #11

On 6/15/2011 5:58 AM, Mark Kettenis wrote:

> Over my dead body.  On a proper operating system filenames are
> case-sensitive.  Your suggestion would create spurious matches.

Yes, we all know that Unix systems chose case sensitive, and
are happy to have files differing only by case in the same
directory.

Obviously any proper software has to fully support such
systems (if I was in the same mode as you and adding
gratuitious flames to my comments, I would have
preceded the word systems by brain-dead).
>
> Even on case-preserving filesystems I'd argue that treating them as
> case-sensitive is still the right approach.

Absolutely not, please don't visit your unix-borne predjudices
on non-unix systems. There is nothing worse for Windows users
than having to put up with silly decisions like this that
visit unix nonsense (and it is nonsense in a windows environment)
on windows software.

> If that creates problems,
> it means somebody was sloppy and didn't type the proper name of the
> file

The whole point in a system like Windows which is case preserving
but not case sensitive is that you are NOT expected to type in
the "proper" capitalization. In English, we recognize the words
English and ENGLISH as equivalent, and windows users expect the
same treatment.

So the normal expectation in windows systems is that, yes, you can
make nice capitalization like MyFile if you like, and it will be
properly displayed.

But any software that requires me to type MyFile rather than
myfile is junk!

> If you're still using an operating system with fully case-insensitive
> filesystems, I feel very, very sorry for you.

You are allowed to have this opinion, I feel the same about people
who have to tolerate case-sensitive file systems, but I am quite
happy with software for Unix systems fully behaving (I would agree
that any software that EVER did case insensitive matching, as
suggested earlier in this thread would be broken on Unix). But
following your suggestion would be equally broken in Windows.

>  or some piece of code in the toolchain arbitrarily changed the
> case of a filename.  I don't mind punishing people for that.  They
> have to learn that on a proper operating system file names are
> case-sensitive!

This kind of unix arrogance leads to junk unusable software on
windows. It's really important not to visit your unix predjudices
on windows users. After all we feel the same way in return, I
find Unix systems complete junk for many reasons, one of which
is the very annoying case sensitive viewpoint, but I do not
translate my feelings into silly suggestions for making
software malfunction on Unix. You should not make this mistake
in a reverse direction.

Corinna Vinschen June 15, 2011, 10:59 a.m. UTC | #12

On Jun 15 10:45, Joseph S. Myers wrote:
> On Wed, 15 Jun 2011, Corinna Vinschen wrote:
> 
> > these functions be replaced with multibyte-aware versions?  Along the
> > same lines, the entire set of safe-ctype functions only work for ASCII
> > and EBCDIC...
> 
> That's the whole point of safe-ctype: that code that is processing things 
> such as C source code whose semantics do not depend on the host locale can 
> examine character properties in a locale-independent way.  Where C source 
> code has multibyte characters, the correct handling depends in detail on 
> the version of C and cannot be done by generic code.

Ok, I see.

Just in this specific case it's about filenames, not C source.  I don't
think it makes sense to restrict filenames to ASCII or EBCDIC chars.


Corinna

Eli Zaretskii June 15, 2011, 5:27 p.m. UTC | #13

> Date: Wed, 15 Jun 2011 10:22:36 +0200
> From: Corinna Vinschen <vinschen@redhat.com>
> Cc: Andrew Pinski <pinskia@gmail.com>, brobecker@adacore.com,	gcc-patches@gcc.gnu.org, gdb-patches@sourceware.org
> 
> Talking about case-insensitive comparison, the filename_cmp and
> filename_ncmp functions don't work for multibyte codesets, only for
> singlebyte codesets.  Given that UTF-8 is standard nowadays, shouldn't
> these functions be replaced with multibyte-aware versions?

I agree, but if we go that way, shouldn't we support UTF-16, which is
used by the native Windows APIs?  Windows does not use UTF-8 for file
names.

Eli Zaretskii June 15, 2011, 5:28 p.m. UTC | #14

> Date: Wed, 15 Jun 2011 06:59:11 -0400
> From: Robert Dewar <dewar@adacore.com>
> CC: vinschen@redhat.com, dj@redhat.com, pinskia@gmail.com,  brobecker@adacore.com, gcc-patches@gcc.gnu.org,  gdb-patches@sourceware.org
> 
> >  or some piece of code in the toolchain arbitrarily changed the
> > case of a filename.  I don't mind punishing people for that.  They
> > have to learn that on a proper operating system file names are
> > case-sensitive!
> 
> This kind of unix arrogance leads to junk unusable software on
> windows. It's really important not to visit your unix predjudices
> on windows users. After all we feel the same way in return, I
> find Unix systems complete junk for many reasons, one of which
> is the very annoying case sensitive viewpoint, but I do not
> translate my feelings into silly suggestions for making
> software malfunction on Unix. You should not make this mistake
> in a reverse direction.

I cannot agree more.

Corinna Vinschen June 15, 2011, 7:41 p.m. UTC | #15

On Jun 15 20:27, Eli Zaretskii wrote:
> > Date: Wed, 15 Jun 2011 10:22:36 +0200
> > From: Corinna Vinschen <...>
> > Talking about case-insensitive comparison, the filename_cmp and
> > filename_ncmp functions don't work for multibyte codesets, only for
> > singlebyte codesets.  Given that UTF-8 is standard nowadays, shouldn't
> > these functions be replaced with multibyte-aware versions?
> 
> I agree, but if we go that way, shouldn't we support UTF-16, which is
> used by the native Windows APIs?  Windows does not use UTF-8 for file
> names.

I don't think so.  UTF-16 is Windows' wchar_t (or WCHAR) codeset, but
the applications calling the libiberty functions are using the char
datatype with single- or multibyte codesets.

If the filename_cmp function converts the multibyte input strings
to wchar_t and compares the wide char strings case insensitive(*),
they would use UTF-16 under the hood on Windows anyway.  

(*) As proposed in
    http://sourceware.org/ml/gdb-patches/2011-06/msg00210.html,
    basically like this:

    #ifdef _WIN32
    #define wcscasecmp _wcsicmp
    #endif
    mbstowcs (wide_a, a);
    mbstowcs (wide_b, b);
    return wcscasecmp (wide_a, wide_b);

Corinna

Joel Brobecker July 1, 2011, 5:58 p.m. UTC | #16

> Looks OK to me.

Thanks, DJ. I've just checked the patch in on the GCC side.
I will push it on the src/GDB CVS momentarily.

[RFA/libiberty] Darwin has case-insensitive filesystems

Commit Message

Comments

Patch