diff mbox

glob: Avoid copying the d_name field of struct dirent [BZ #19779]

Message ID 56E339A7.7060704@redhat.com
State New
Headers show

Commit Message

Florian Weimer March 11, 2016, 9:33 p.m. UTC
Introducing a new struct makes it unnecessary to copy the name out of
the dirent structure.   The name is subsequently allocated on the heap
anyway.

I extended the existing test with a long name.  The test now crashes
before the fix.

We should split sysdeps/unix/sysv/linux/i386/glob64.c into multiple
files, rather than including posix/glob.c multiple times.

DIRENT_MUST_BE can be removed, but that's an unrelated cleanup (it was
dead before).

Florian

Comments

Paul Eggert March 11, 2016, 10 p.m. UTC | #1
I merely read the patch. Comments:

 > +static void
 > +convert_dirent (const struct dirent *source, struct abstract_dirent *target)

Since this is all inlined, how about if making it a pure function, with a 
signature like this instead?

static struct abstract_dirent
convert_dirent (struct dirent const *source)

That should be a bit cleaner.

Otherwise, it looks good.
Florian Weimer March 11, 2016, 10:02 p.m. UTC | #2
On 03/11/2016 11:00 PM, Paul Eggert wrote:
> I merely read the patch. Comments:
> 
>> +static void
>> +convert_dirent (const struct dirent *source, struct abstract_dirent
> *target)
> 
> Since this is all inlined, how about if making it a pure function, with
> a signature like this instead?
> 
> static struct abstract_dirent
> convert_dirent (struct dirent const *source)
> 
> That should be a bit cleaner.

The file is in gnulib, I feared that gnulib has to support compilers
which cannot return structs.

> Otherwise, it looks good.

Thanks.

Florian
Roland McGrath March 11, 2016, 10:27 p.m. UTC | #3
> --- a/posix/bug-glob2.c
> +++ b/posix/bug-glob2.c
[...]
> @@ -75,7 +87,7 @@ typedef struct
>    int level;
>    int idx;
>    struct dirent d;
> -  char room_for_dirent[NAME_MAX];
> +  char room_for_dirent[1000];
>  } my_DIR;

There should be some comment about where this arbitrary size comes from.

> +/* A representation of a directory entry which does not depend on the
> +   layout of struct dirent, or the size of ino_t.  */
> +struct abstract_dirent
> +{
> +  const char *d_name;
> +  int d_type;
> +  bool skip_entry;
> +};

I'm not convinced that this is a good name for the struct.  It's not an
abstract form of 'struct dirent', because the name member points into some
other buffer whose lifetime is not intrinsically related to this struct.

I would tend away from using d_ prefixes on any member names here.  For
d_name it specifically gives the wrong impression that it is the same as
dirent.d_name, which is an array rather than a pointer.  The DIRENT_MIGHT_*
macros are now used only with the internal struct, so they can just use the
unprefixed member name 'type', and their names can lose the DIRENT_ prefix.

uint8_t is big enough for d_type, and will make the struct smaller on
ILP32.  All else being equal, I would use __typeof (((struct dirent)
{}).d_type) but just using uint8_t is adequately simpler for the gnulib
case concerned with sticking to standard C (and doesn't need any
#ifdef _DIRENT_HAVE_D_TYPE check).

> +/* Extract name and type from directory entry.  No copy of the name is
> +   made.  */
> +static void
> +convert_dirent (const struct dirent *source, struct abstract_dirent *target)

I like Paul's suggestion of using a struct-returning function here.  The
comment should be explicit that the result will point to SOURCE->d_name and
so SOURCE must be kept live.

> +#if defined _DIRENT_HAVE_D_TYPE || defined HAVE_STRUCT_DIRENT_D_TYPE
> +  target->d_type = source->d_type;
> +#else
> +  target->d_type = 0;
> +#endif
> +#if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__
> +  /* Posix does not require that the d_ino field be present, and some
> +     systems do not provide it. */
> +  target->skip_entry = false;
> +#else
> +  target->skip_entry = source->d_ino == 0;
> +#endif

For both of these I'd prefer to see an inline or macro that encapsulates
the #if stuff without other repetition, e.g.:

	target->type = D_TYPE (source);
	target->skip_entry = D_SKIP_ME (source);

> +	      struct abstract_dirent e;
> +	      {
> +		struct dirent *d = (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)
> +				    ? ((struct dirent *)
> +				       (*pglob->gl_readdir) (stream))
> +				    : __readdir (stream));

Convert to __glibc_unlikely while you're here (unless the gnulib sharing
rules say not to, not clear on that off hand).

> +		if (d == NULL)
> +		  break;
> +		convert_dirent (d, &e);
> +	      }

This could all be nicely broken out into a function that takes STREAM,
FLAGS, PGLOB, and returns E.

> -		      len = NAMLEN (d);
> -		      names->name[cur] = (char *) malloc (len + 1);
> +		      names->name[cur] = strdup (e.d_name);
>  		      if (names->name[cur] == NULL)
>  			goto memory_error;
> -		      *((char *) mempcpy (names->name[cur++], name, len))
> -			= '\0';
> +		      ++cur;

In the _DIRENT_HAVE_D_NAMLEN case, this assumes that strdup is as efficient
as malloc+mempcpy (with no strlen required).  Perhaps it's close enough,
but that is a subtle change you didn't mention as intended.

> --- a/sysdeps/gnu/glob64.c
> +++ b/sysdeps/gnu/glob64.c
> @@ -1,3 +1,20 @@
> +/* Copyright (C) 1998-2016 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.

The top line must be a descriptive comment.  
Copyright goes on the second line.

> --- a/sysdeps/unix/sysv/linux/i386/glob64.c
> +++ b/sysdeps/unix/sysv/linux/i386/glob64.c
> @@ -1,3 +1,20 @@
> +/* Copyright (C) 1998-2016 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.

Ditto.


Thanks,
Roland
Florian Weimer March 12, 2016, 7:19 a.m. UTC | #4
This patch is incorrect.  We need to call readdir64 even for the 32-bit
glob.  I removed too much code.

Florian
Paul Eggert March 13, 2016, 7:02 a.m. UTC | #5
Florian Weimer wrote:
> I feared that gnulib has to support compilers
> which cannot return structs.

No, gnulib assumes C89 or better so it's OK to return a struct.
diff mbox

Patch

2016-03-11  Florian Weimer  <fweimer@redhat.com>

	[BZ #19779]
	Avoid copying names of directory entries.
	* posix/glob.c (NAMELEN, CONVERT_D_NAMLEN, CONVERT_D_INO)
	(CONVERT_D_TYPE CONVERT_DIRENT_DIRENT64, REAL_DIR_ENTRY)
	(NAME_MAX): Remove macros.
	(struct abstract_dirent): New type.
	(convert_dirent): New function.
	(glob_in_dir): Use struct abstract_dirent.  Call convert_dirent.
	Adjust references to the dirent object.  Use strdup instead of
	NAMELEN and mempcpy.
	* sysdeps/gnu/glob64.c (COMPILE_GLOB64): Remove macro.
	* sysdeps/unix/sysv/linux/i386/glob64.c (COMPILE_GLOB64): Likewise.
	(convert_dirent): Redefine for second file inclusion.
	* posix/bug-glob2.c (LONG_NAME): Define.
	(filesystem): Add LONG_NAME.
	(my_DIR): Increase the size of room_for_dirent.

diff --git a/posix/bug-glob2.c b/posix/bug-glob2.c
index ddf5ec9..581f28e 100644
--- a/posix/bug-glob2.c
+++ b/posix/bug-glob2.c
@@ -40,6 +40,17 @@ 
 # define PRINTF(fmt, args...)
 #endif
 
+#define LONG_NAME \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
+  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
 
 static struct
 {
@@ -58,6 +69,7 @@  static struct
       { ".", 3, DT_DIR, 0755 },
       { "..", 3, DT_DIR, 0755 },
       { "a", 3, DT_REG, 0644 },
+      { LONG_NAME, 3, DT_REG, 0644 },
     { "unreadable", 2, DT_DIR, 0111 },
       { ".", 3, DT_DIR, 0111 },
       { "..", 3, DT_DIR, 0755 },
@@ -75,7 +87,7 @@  typedef struct
   int level;
   int idx;
   struct dirent d;
-  char room_for_dirent[NAME_MAX];
+  char room_for_dirent[1000];
 } my_DIR;
 
 
diff --git a/posix/glob.c b/posix/glob.c
index 0c04c3c..aea1b71 100644
--- a/posix/glob.c
+++ b/posix/glob.c
@@ -24,6 +24,7 @@ 
 #include <errno.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <stdbool.h>
 #include <stddef.h>
 
 /* Outcomment the following line for production quality code.  */
@@ -57,10 +58,8 @@ 
 
 #if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
 # include <dirent.h>
-# define NAMLEN(dirent) strlen((dirent)->d_name)
 #else
 # define dirent direct
-# define NAMLEN(dirent) (dirent)->d_namlen
 # ifdef HAVE_SYS_NDIR_H
 #  include <sys/ndir.h>
 # endif
@@ -75,13 +74,6 @@ 
 # endif /* HAVE_VMSDIR_H */
 #endif
 
-
-/* In GNU systems, <dirent.h> defines this macro for us.  */
-#ifdef _D_NAMLEN
-# undef NAMLEN
-# define NAMLEN(d) _D_NAMLEN(d)
-#endif
-
 /* When used in the GNU libc the symbol _DIRENT_HAVE_D_TYPE is available
    if the `d_type' member for `struct dirent' is available.
    HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB.  */
@@ -103,54 +95,8 @@ 
 # define DIRENT_MIGHT_BE_DIR(d)		true
 #endif /* HAVE_D_TYPE */
 
-/* If the system has the `struct dirent64' type we use it internally.  */
-#if defined _LIBC && !defined COMPILE_GLOB64
-# if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
-#  define CONVERT_D_NAMLEN(d64, d32)
-# else
-#  define CONVERT_D_NAMLEN(d64, d32) \
-  (d64)->d_namlen = (d32)->d_namlen;
-# endif
-
-# if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__
-#  define CONVERT_D_INO(d64, d32)
-# else
-#  define CONVERT_D_INO(d64, d32) \
-  (d64)->d_ino = (d32)->d_ino;
-# endif
-
-# ifdef _DIRENT_HAVE_D_TYPE
-#  define CONVERT_D_TYPE(d64, d32) \
-  (d64)->d_type = (d32)->d_type;
-# else
-#  define CONVERT_D_TYPE(d64, d32)
-# endif
-
-# define CONVERT_DIRENT_DIRENT64(d64, d32) \
-  memcpy ((d64)->d_name, (d32)->d_name, NAMLEN (d32) + 1);		      \
-  CONVERT_D_NAMLEN (d64, d32)						      \
-  CONVERT_D_INO (d64, d32)						      \
-  CONVERT_D_TYPE (d64, d32)
-#endif
-
-
-#if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__
-/* Posix does not require that the d_ino field be present, and some
-   systems do not provide it. */
-# define REAL_DIR_ENTRY(dp) 1
-#else
-# define REAL_DIR_ENTRY(dp) (dp->d_ino != 0)
-#endif /* POSIX */
-
 #include <stdlib.h>
 #include <string.h>
-
-/* NAME_MAX is usually defined in <dirent.h> or <limits.h>.  */
-#include <limits.h>
-#ifndef NAME_MAX
-# define NAME_MAX (sizeof (((struct dirent *) 0)->d_name))
-#endif
-
 #include <alloca.h>
 
 #ifdef _LIBC
@@ -195,8 +141,37 @@ 
 
 static const char *next_brace_sub (const char *begin, int flags) __THROWNL;
 
+/* A representation of a directory entry which does not depend on the
+   layout of struct dirent, or the size of ino_t.  */
+struct abstract_dirent
+{
+  const char *d_name;
+  int d_type;
+  bool skip_entry;
+};
+
 #endif /* !defined _LIBC || !defined GLOB_ONLY_P */
 
+/* Extract name and type from directory entry.  No copy of the name is
+   made.  */
+static void
+convert_dirent (const struct dirent *source, struct abstract_dirent *target)
+{
+  target->d_name = source->d_name;
+#if defined _DIRENT_HAVE_D_TYPE || defined HAVE_STRUCT_DIRENT_D_TYPE
+  target->d_type = source->d_type;
+#else
+  target->d_type = 0;
+#endif
+#if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__
+  /* Posix does not require that the d_ino field be present, and some
+     systems do not provide it. */
+  target->skip_entry = false;
+#else
+  target->skip_entry = source->d_ino == 0;
+#endif
+}
+
 #ifndef attribute_hidden
 # define attribute_hidden
 #endif
@@ -1553,56 +1528,31 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
 
 	  while (1)
 	    {
-	      const char *name;
-	      size_t len;
-#if defined _LIBC && !defined COMPILE_GLOB64
-	      struct dirent64 *d;
-	      union
-		{
-		  struct dirent64 d64;
-		  char room [offsetof (struct dirent64, d_name[0])
-			     + NAME_MAX + 1];
-		}
-	      d64buf;
-
-	      if (__glibc_unlikely (flags & GLOB_ALTDIRFUNC))
-		{
-		  struct dirent *d32 = (*pglob->gl_readdir) (stream);
-		  if (d32 != NULL)
-		    {
-		      CONVERT_DIRENT_DIRENT64 (&d64buf.d64, d32);
-		      d = &d64buf.d64;
-		    }
-		  else
-		    d = NULL;
-		}
-	      else
-		d = __readdir64 (stream);
-#else
-	      struct dirent *d = (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)
-				  ? ((struct dirent *)
-				     (*pglob->gl_readdir) (stream))
-				  : __readdir (stream));
-#endif
-	      if (d == NULL)
-		break;
-	      if (! REAL_DIR_ENTRY (d))
+	      struct abstract_dirent e;
+	      {
+		struct dirent *d = (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)
+				    ? ((struct dirent *)
+				       (*pglob->gl_readdir) (stream))
+				    : __readdir (stream));
+		if (d == NULL)
+		  break;
+		convert_dirent (d, &e);
+	      }
+	      if (e.skip_entry)
 		continue;
 
 	      /* If we shall match only directories use the information
 		 provided by the dirent call if possible.  */
-	      if ((flags & GLOB_ONLYDIR) && !DIRENT_MIGHT_BE_DIR (d))
+	      if ((flags & GLOB_ONLYDIR) && !DIRENT_MIGHT_BE_DIR (&e))
 		continue;
 
-	      name = d->d_name;
-
-	      if (fnmatch (pattern, name, fnm_flags) == 0)
+	      if (fnmatch (pattern, e.d_name, fnm_flags) == 0)
 		{
 		  /* If the file we found is a symlink we have to
 		     make sure the target file exists.  */
-		  if (!DIRENT_MIGHT_BE_SYMLINK (d)
-		      || link_exists_p (dfd, directory, dirlen, name, pglob,
-					flags))
+		  if (!DIRENT_MIGHT_BE_SYMLINK (&e)
+		      || link_exists_p (dfd, directory, dirlen, e.d_name,
+					pglob, flags))
 		    {
 		      if (cur == names->count)
 			{
@@ -1622,12 +1572,10 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
 			  names = newnames;
 			  cur = 0;
 			}
-		      len = NAMLEN (d);
-		      names->name[cur] = (char *) malloc (len + 1);
+		      names->name[cur] = strdup (e.d_name);
 		      if (names->name[cur] == NULL)
 			goto memory_error;
-		      *((char *) mempcpy (names->name[cur++], name, len))
-			= '\0';
+		      ++cur;
 		      ++nfound;
 		    }
 		}
diff --git a/sysdeps/gnu/glob64.c b/sysdeps/gnu/glob64.c
index d1e4e6f..d53d16d 100644
--- a/sysdeps/gnu/glob64.c
+++ b/sysdeps/gnu/glob64.c
@@ -1,3 +1,20 @@ 
+/* Copyright (C) 1998-2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
 #include <dirent.h>
 #include <glob.h>
 #include <sys/stat.h>
@@ -17,8 +34,6 @@ 
 
 #define NO_GLOB_PATTERN_P 1
 
-#define COMPILE_GLOB64	1
-
 #include <posix/glob.c>
 
 libc_hidden_def (glob64)
diff --git a/sysdeps/unix/sysv/linux/i386/glob64.c b/sysdeps/unix/sysv/linux/i386/glob64.c
index b4fcd1a..727a468 100644
--- a/sysdeps/unix/sysv/linux/i386/glob64.c
+++ b/sysdeps/unix/sysv/linux/i386/glob64.c
@@ -1,3 +1,20 @@ 
+/* Copyright (C) 1998-2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
 #include <dirent.h>
 #include <glob.h>
 #include <sys/stat.h>
@@ -17,8 +34,6 @@ 
 
 #define NO_GLOB_PATTERN_P 1
 
-#define COMPILE_GLOB64	1
-
 #include <posix/glob.c>
 
 #include "shlib-compat.h"
@@ -43,6 +58,7 @@  int __old_glob64 (const char *__pattern, int __flags,
 #undef glob
 #define glob(pattern, flags, errfunc, pglob) \
   __old_glob64 (pattern, flags, errfunc, pglob)
+#define convert_dirent __old_convert_dirent
 #define glob_in_dir __old_glob_in_dir
 #define GLOB_ATTRIBUTE attribute_compat_text_section