From patchwork Sat Nov 12 11:45:20 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Nathan Sidwell <nathan@acm.org>
X-Patchwork-Id: 125338
Return-Path: 
 <gcc-patches-return-307698-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id B1DBC1007D8
	for <incoming@patchwork.ozlabs.org>;
	Sat, 12 Nov 2011 22:45:48 +1100 (EST)
Received: (qmail 9534 invoked by alias); 12 Nov 2011 11:45:45 -0000
Received: (qmail 9521 invoked by uid 22791); 12 Nov 2011 11:45:42 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0	tests=AWL, BAYES_00,
	RCVD_IN_DNSWL_LOW, TW_CP
X-Spam-Check-By: sourceware.org
Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com)
	(74.125.82.175) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Sat, 12 Nov 2011 11:45:25 +0000
Received: by wyg8 with SMTP id 8so152898wyg.20 for <gcc-patches@gcc.gnu.org>;
	Sat, 12 Nov 2011 03:45:24 -0800 (PST)
Received: by 10.216.14.22 with SMTP id c22mr10063wec.80.1321098323608;
	Sat, 12 Nov 2011 03:45:23 -0800 (PST)
Received: by 10.216.14.22 with SMTP id c22mr10058wec.80.1321098323421;
	Sat, 12 Nov 2011 03:45:23 -0800 (PST)
Received: from [192.168.44.105] (5ac3c889.bb.sky.com. [90.195.200.137]) by
	mx.google.com with ESMTPS id
	en10sm9457187wbb.0.2011.11.12.03.45.21 (version=SSLv3
	cipher=OTHER); Sat, 12 Nov 2011 03:45:22 -0800 (PST)
Message-ID: <4EBE5C50.1020804@acm.org>
Date: Sat, 12 Nov 2011 11:45:20 +0000
From: Nathan Sidwell <nathan@acm.org>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.23) Gecko/20110922 Lightning/1.0b2 Thunderbird/3.1.15
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: gcov patch, multiple paths
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

Hi,
I've committed this patch to gcov,  It deals with cases where the same header 
file, containing inline functions etc, is included by multiple different 
pathnames.  The current behavior treats these as distinct sources, which is (a) 
misleading and (b) in the absence of -p all but one instance of the resulting 
.gcov file will be overwritten.

We now canonicalize the names of source files by eliding '.' components always 
and eliding 'dir/..' components where possible.  We can't resolve a 'dir/..' 
component when 'dir' is a symlink.  We also canonicalize \\ path separators.

I discovered some faults with the preserve-pathname mangling, in that it didn't 
match the documented behaviour ('.' wasn't elided, and /../ turned into '#..#^'. 
  These are fixed with this patch too.

I updated the documentation to make it clear you can provide either source or 
object filenames to gcov.  Historically it accepted source files, but with 
current inlining and C++ features, it makes more sense to provide the object 
file names, especially in multi-directory builds.

tested on i686-pc-linux-gnu
2011-11-12  Nathan Sidwell  <nathan@acm.org>

	* gcov.c (struct name_map): New.
	(names, n_names, a_names): New global vars.
	(print_usage): Adjust usage.
	(generate_results): Canonicalize main file name.
	(release_structures): Adjust.
	(name_search, name_sort): New callbacks.
	(find_source): Look for and create a canonical name.
	(canonicalize_name): New.
	(make_gcov_file_name): Reimplement and fix mangling.
	(mangle_name): New.
	* doc/gcov.texi: Update documentation about path preservation.

	testsuite/
	* gcc.misc-tests/gcov-15.c: New.

Index: doc/gcov.texi
===================================================================
--- doc/gcov.texi	(revision 181300)
+++ doc/gcov.texi	(working copy)
@@ -113,7 +113,7 @@ compatible with any other profiling or t
 @section Invoking @command{gcov}
 
 @smallexample
-gcov @r{[}@var{options}@r{]} @var{sourcefiles}
+gcov @r{[}@var{options}@r{]} @var{files}
 @end smallexample
 
 @command{gcov} accepts the following options:
@@ -176,11 +176,12 @@ Do not create the @command{gcov} output
 @itemx --long-file-names
 Create long file names for included source files.  For example, if the
 header file @file{x.h} contains code, and was included in the file
-@file{a.c}, then running @command{gcov} on the file @file{a.c} will produce
-an output file called @file{a.c##x.h.gcov} instead of @file{x.h.gcov}.
-This can be useful if @file{x.h} is included in multiple source
-files.  If you use the @samp{-p} option, both the including and
-included file names will be complete path names.
+@file{a.c}, then running @command{gcov} on the file @file{a.c} will
+produce an output file called @file{a.c##x.h.gcov} instead of
+@file{x.h.gcov}.  This can be useful if @file{x.h} is included in
+multiple source files and you want to see the individual
+contributions.  If you use the @samp{-p} option, both the including
+and included file names will be complete path names.
 
 @item -p
 @itemx --preserve-paths
@@ -188,9 +189,9 @@ Preserve complete path information in th
 @file{.gcov} files.  Without this option, just the filename component is
 used.  With this option, all directories are used, with @samp{/} characters
 translated to @samp{#} characters, @file{.} directory components
-removed and @file{..}
+removed and unremoveable @file{..}
 components renamed to @samp{^}.  This is useful if sourcefiles are in several
-different directories.  It also affects the @samp{-l} option.
+different directories.
 
 @item -f
 @itemx --function-summaries
@@ -203,9 +204,8 @@ Specify either the directory containing
 object path name.  The @file{.gcno}, and
 @file{.gcda} data files are searched for using this option.  If a directory
 is specified, the data files are in that directory and named after the
-source file name, without its extension.  If a file is specified here,
-the data files are named after that file, without its extension.  If this
-option is not supplied, it defaults to the current directory.
+input file name, without its extension.  If a file is specified here,
+the data files are named after that file, without its extension.
 
 @item -u
 @itemx --unconditional-branches
@@ -223,12 +223,17 @@ when you invoked the compiler.  Otherwis
 the source files.  @command{gcov} produces files called
 @file{@var{mangledname}.gcov} in the current directory.  These contain
 the coverage information of the source file they correspond to.
-One @file{.gcov} file is produced for each source file containing code,
+One @file{.gcov} file is produced for each source (or header) file
+containing code,
 which was compiled to produce the data files.  The @var{mangledname} part
 of the output file name is usually simply the source file name, but can
 be something more complicated if the @samp{-l} or @samp{-p} options are
 given.  Refer to those options for details.
 
+If you invoke @command{gcov} with multiple input files, the
+contributions from each input file are summed.  Typically you would
+invoke it with the same list of files as the final link of your executable.
+
 The @file{.gcov} files contain the @samp{:} separated fields along with
 program source code.  The format is
 
Index: gcov.c
===================================================================
--- gcov.c	(revision 181300)
+++ gcov.c	(working copy)
@@ -231,7 +231,7 @@ typedef struct line_info
 
 typedef struct source_info
 {
-  /* Name of source file.  */
+  /* Canonical name of source file.  */
   char *name;
   time_t file_time;
 
@@ -246,6 +246,12 @@ typedef struct source_info
   function_t *functions;
 } source_t;
 
+typedef struct name_map
+{
+  char *name;  /* Source file name */
+  unsigned src;  /* Source file */
+} name_map_t;
+
 /* Holds a list of function basic block graphs.  */
 
 static function_t *functions;
@@ -255,6 +261,10 @@ static source_t *sources;   /* Array of
 static unsigned n_sources;  /* Number of sources */
 static unsigned a_sources;  /* Allocated sources */
 
+static name_map_t *names;   /* Mapping of file names to sources */
+static unsigned n_names;    /* Number of names */
+static unsigned a_names;    /* Allocated names */
+
 /* This holds data summary information.  */
 
 static unsigned object_runs;
@@ -341,6 +351,9 @@ static void print_version (void) ATTRIBU
 static void process_file (const char *);
 static void generate_results (const char *);
 static void create_file_names (const char *);
+static int name_search (const void *, const void *);
+static int name_sort (const void *, const void *);
+static char *canonicalize_name (const char *);
 static unsigned find_source (const char *);
 static function_t *read_graph_file (void);
 static int read_count_file (function_t *);
@@ -353,6 +366,7 @@ static void accumulate_line_counts (sour
 static int output_branch_count (FILE *, int, const arc_t *);
 static void output_lines (FILE *, const source_t *);
 static char *make_gcov_file_name (const char *, const char *);
+static char *mangle_name (const char *, char *);
 static void release_structures (void);
 static void release_function (function_t *);
 extern int main (int, char **);
@@ -414,7 +428,7 @@ print_usage (int error_p)
   FILE *file = error_p ? stderr : stdout;
   int status = error_p ? FATAL_EXIT_CODE : SUCCESS_EXIT_CODE;
 
-  fnotice (file, "Usage: gcov [OPTION]... SOURCEFILE...\n\n");
+  fnotice (file, "Usage: gcov [OPTION]... SOURCE|OBJ...\n\n");
   fnotice (file, "Print code coverage information.\n\n");
   fnotice (file, "  -h, --help                      Print this help, then exit\n");
   fnotice (file, "  -v, --version                   Print version number, then exit\n");
@@ -524,7 +538,7 @@ process_args (int argc, char **argv)
   return optind;
 }
 
-/* Process a single source file.  */
+/* Process a single input file.  */
 
 static void
 process_file (const char *file_name)
@@ -622,6 +636,16 @@ generate_results (const char *file_name)
 	}
     }
 
+  if (file_name)
+    {
+      name_map_t *name_map = (name_map_t *)bsearch
+	(file_name, names, n_names, sizeof (*names), name_search);
+      if (name_map)
+	file_name = sources[name_map->src].name;
+      else
+	file_name = canonicalize_name (file_name);
+    }
+  
   for (ix = n_sources, src = sources; ix--; src++)
     {
       accumulate_line_counts (src);
@@ -681,10 +705,12 @@ release_structures (void)
   function_t *fn;
 
   for (ix = n_sources; ix--;)
-    {
-      free (sources[ix].name);
-      free (sources[ix].lines);
-    }
+    free (sources[ix].lines);
+  free (sources);
+  
+  for (ix = n_names; ix--;)
+    free (names[ix].name);
+  free (names);
 
   while ((fn = functions))
     {
@@ -761,28 +787,75 @@ create_file_names (const char *file_name
   return;
 }
 
+/* A is a string and B is a pointer to name_map_t.  Compare for file
+   name orderability.  */
+
+static int
+name_search (const void *a_, const void *b_)
+{
+  const char *a = (const char *)a_;
+  const name_map_t *b = (const name_map_t *)b_;
+
+#if HAVE_DOS_BASED_FILE_SYSTEM
+  return strcasecmp (a, b->name);
+#else
+  return strcmp (a, b->name);
+#endif
+}
+
+/* A and B are a pointer to name_map_t.  Compare for file name
+   orderability.  */
+
+static int
+name_sort (const void *a_, const void *b_)
+{
+  const name_map_t *a = (const name_map_t *)a_;
+  return name_search (a->name, b_);
+}
+
 /* Find or create a source file structure for FILE_NAME. Copies
    FILE_NAME on creation */
 
 static unsigned
 find_source (const char *file_name)
 {
-  unsigned ix;
-  source_t *src = 0;
+  name_map_t *name_map;
+  char *canon;
+  unsigned idx;
   struct stat status;
 
   if (!file_name)
     file_name = "<unknown>";
-
-  for (ix = n_sources; ix--;)
-    if (!filename_cmp (file_name, sources[ix].name))
-      {
-	src = &sources[ix];
-	break;
-      }
-
-  if (!src)
-    {
+  name_map = (name_map_t *)bsearch
+    (file_name, names, n_names, sizeof (*names), name_search);
+  if (name_map)
+    {
+      idx = name_map->src;
+      goto check_date;
+    }
+
+  if (n_names + 2 > a_names)
+    {
+      /* Extend the name map array -- we'll be inserting one or two
+	 entries.  */
+      if (!a_names)
+	a_names = 10;
+      a_names *= 2;
+      name_map = XNEWVEC (name_map_t, a_names);
+      memcpy (name_map, names, n_names * sizeof (*names));
+      free (names);
+      names = name_map;
+    }
+  
+  /* Not found, try the canonical name. */
+  canon = canonicalize_name (file_name);
+  name_map = (name_map_t *)bsearch
+    (canon, names, n_names, sizeof (*names), name_search);
+  if (!name_map)
+    {
+      /* Not found with canonical name, create a new source.  */
+      source_t *src;
+      
       if (n_sources == a_sources)
 	{
 	  if (!a_sources)
@@ -793,31 +866,51 @@ find_source (const char *file_name)
 	  free (sources);
 	  sources = src;
 	}
-      ix = n_sources;
-      src = &sources[ix];
-      src->name = xstrdup (file_name);
+
+      idx = n_sources;
+
+      name_map = &names[n_names++];
+      name_map->name = canon;
+      name_map->src = idx;
+
+      src = &sources[n_sources++];
+      memset (src, 0, sizeof (*src));
+      src->name = canon;
       src->coverage.name = src->name;
-      n_sources++;
-      if (!stat (file_name, &status))
+      if (!stat (src->name, &status))
 	src->file_time = status.st_mtime;
     }
+  else
+    idx = name_map->src;
+
+  if (name_search (file_name, name_map))
+    {
+      /* Append the non-canonical name.  */
+      name_map = &names[n_names++];
+      name_map->name = xstrdup (file_name);
+      name_map->src = idx;
+    }
 
-  if (src->file_time > bbg_file_time)
+  /* Resort the name map.  */
+  qsort (names, n_names, sizeof (*names), name_sort);
+  
+ check_date:
+  if (sources[idx].file_time > bbg_file_time)
     {
       static int info_emitted;
 
       fnotice (stderr, "%s:source file is newer than graph file '%s'\n",
-	       src->name, bbg_file_name);
+	       file_name, bbg_file_name);
       if (!info_emitted)
 	{
 	  fnotice (stderr,
 		   "(the message is only displayed one per source file)\n");
 	  info_emitted = 1;
 	}
-      src->file_time = 0;
+      sources[idx].file_time = 0;
     }
 
-  return ix;
+  return idx;
 }
 
 /* Read the graph file.  Return list of functions read -- in reverse order.  */
@@ -1510,97 +1603,169 @@ function_summary (const coverage_t *cove
     }
 }
 
-/* Generate an output file name. LONG_OUTPUT_NAMES and PRESERVE_PATHS
-   affect name generation. With preserve_paths we create a filename
-   from all path components of the source file, replacing '/' with
-   '#', without it we simply take the basename component. With
+/* Canonicalize the filename NAME by canonicalizing directory
+   separators, eliding . components and resolving .. components
+   appropriately.  Always returns a unique string.  */
+
+static char *
+canonicalize_name (const char *name)
+{
+  /* The canonical name cannot be longer than the incoming name.  */
+  char *result = XNEWVEC (char, strlen (name) + 1);
+  const char *base = name, *probe;
+  char *ptr = result;
+  char *dd_base;
+  int slash = 0;
+
+#if HAVE_DOS_BASED_FILE_SYSTEM
+  if (base[0] && base[1] == ':')
+    {
+      result[0] = base[0];
+      result[1] = ':';
+      base += 2;
+      ptr += 2;
+    }
+#endif
+  for (dd_base = ptr; *base; base = probe)
+    {
+      size_t len;
+      
+      for (probe = base; *probe; probe++)
+	if (IS_DIR_SEPARATOR (*probe))
+	  break;
+
+      len = probe - base;
+      if (len == 1 && base[0] == '.')
+	/* Elide a '.' directory */
+	;
+      else if (len == 2 && base[0] == '.' && base[1] == '.')
+	{
+	  /* '..', we can only elide it and the previous directory, if
+	     we're not a symlink.  */
+	  struct stat buf;
+	  
+	  *ptr = 0;
+	  if (dd_base == ptr || stat (result, &buf) || S_ISLNK (buf.st_mode))
+	    {
+	      /* Cannot elide, or unreadable or a symlink.  */
+	      dd_base = ptr + 2 + slash;
+	      goto regular;
+	    }
+	  while (ptr != dd_base && *ptr != '/')
+	    ptr--;
+	  slash = ptr != result;
+	}
+      else
+	{
+	regular:
+	  /* Regular pathname component.  */
+	  if (slash)
+	    *ptr++ = '/';
+	  memcpy (ptr, base, len);
+	  ptr += len;
+	  slash = 1;
+	}
+
+      for (; IS_DIR_SEPARATOR (*probe); probe++)
+	continue;
+    }
+  *ptr = 0;
+
+  return result;
+}
+
+/* Generate an output file name. INPUT_NAME is the canonicalized main
+   input file and SRC_NAME is the canonicalized file name.
+   LONG_OUTPUT_NAMES and PRESERVE_PATHS affect name generation.  With
    long_output_names we prepend the processed name of the input file
    to each output name (except when the current source file is the
    input file, so you don't get a double concatenation). The two
-   components are separated by '##'. Also '.' filename components are
-   removed and '..'  components are renamed to '^'.  */
+   components are separated by '##'.  With preserve_paths we create a
+   filename from all path components of the source file, replacing '/'
+   with '#', and .. with '^', without it we simply take the basename
+   component.  (Remember, the canonicalized name will already have
+   elided '.' components and converted \\ separators.)  */
 
 static char *
 make_gcov_file_name (const char *input_name, const char *src_name)
 {
-  const char *cptr;
-  char *name;
+  char *ptr;
+  char *result;
 
   if (flag_long_names && input_name && strcmp (src_name, input_name))
     {
-      name = XNEWVEC (char, strlen (src_name) + strlen (input_name) + 10);
-      name[0] = 0;
       /* Generate the input filename part.  */
-      cptr = flag_preserve_paths ? NULL : lbasename (input_name);
-      strcat (name, cptr ? cptr : input_name);
-      strcat (name, "##");
+      result = XNEWVEC (char, strlen (input_name) + strlen (src_name) + 10);
+  
+      ptr = result;
+      ptr = mangle_name (input_name, ptr);
+      ptr[0] = ptr[1] = '#';
+      ptr += 2;
     }
   else
     {
-      name = XNEWVEC (char, strlen (src_name) + 10);
-      name[0] = 0;
+      result = XNEWVEC (char, strlen (src_name) + 10);
+      ptr = result;
     }
 
-  /* Generate the source filename part.  */
-
-  cptr = flag_preserve_paths ? NULL : lbasename (src_name);
-  strcat (name, cptr ? cptr : src_name);
+  ptr = mangle_name (src_name, ptr);
+  strcpy (ptr, ".gcov");
+  
+  return result;
+}
 
-  if (flag_preserve_paths)
+static char *
+mangle_name (char const *base, char *ptr)
+{
+  size_t len;
+  
+  /* Generate the source filename part.  */
+  if (!flag_preserve_paths)
     {
-      /* Convert '/' and '\' to '#', remove '/./', convert '/../' to '#^#',
+      base = lbasename (base);
+      len = strlen (base);
+      memcpy (ptr, base, len);
+      ptr += len;
+    }
+  else
+    {
+      /* Convert '/' to '#', convert '..' to '^',
 	 convert ':' to '~' on DOS based file system.  */
-      char *pnew = name, *pold = name;
-
-      /* First check for leading drive separator.  */
+      const char *probe;
 
-      while (*pold != '\0')
+#if HAVE_DOS_BASED_FILE_SYSTEM
+      if (base[0] && base[1] == ':')
 	{
-#if defined (HAVE_DOS_BASED_FILE_SYSTEM)
-	  if (*pold == ':')
-	    {
-	      *pnew++ = '~';
-	      pold++;
-	    }
-	  else
+	  ptr[0] = base[0];
+	  ptr[1] = '~';
+	  ptr += 2;
+	  base += 2;
+	}
 #endif
-	  if ((*pold == '/'
-		    && (strstr (pold, "/./") == pold
-		        || strstr (pold, "/.\\") == pold))
-		   || (*pold == '\\'
-		       && (strstr (pold, "\\.\\") == pold
-		           || strstr (pold, "\\./") == pold)))
-	      pold += 3;
-	  else if (*pold == '/'
-		   && (strstr (pold, "/../") == pold
-		       || strstr (pold, "/..\\") == pold))
-	    {
-	      strcpy (pnew, "#^#");
-	      pnew += 3;
-	      pold += 4;
-	    }
-	  else if (*pold == '\\'
-		   && (strstr (pold, "\\..\\") == pold
-		       || strstr (pold, "\\../") == pold))
+      for (; *base; base = probe)
+	{
+	  size_t len;
+
+	  for (probe = base; *probe; probe++)
+	    if (*probe == '/')
+	      break;
+	  len = probe - base;
+	  if (len == 2 && base[0] == '.' && base[1] == '.')
+	    *ptr++ = '^';
+	  else
 	    {
-	      strcpy (pnew, "#^#");
-	      pnew += 3;
-	      pold += 4;
+	      memcpy (ptr, base, len);
+	      ptr += len;
 	    }
-	  else if (*pold == '/' || *pold == '\\')
+	  if (*probe)
 	    {
-	      *pnew++ = '#';
-	      pold++;
+	      *ptr++ = '#';
+	      probe++;
 	    }
-	  else
-	    *pnew++ = *pold++;
 	}
-
-      *pnew = '\0';
     }
-
-  strcat (name, ".gcov");
-  return name;
+  
+  return ptr;
 }
 
 /* Scan through the bb_data for each line in the block, increment
Index: testsuite/gcc.misc-tests/gcov-15.c
===================================================================
--- testsuite/gcc.misc-tests/gcov-15.c	(revision 0)
+++ testsuite/gcc.misc-tests/gcov-15.c	(revision 0)
@@ -0,0 +1,30 @@
+/* Test gcov multiple paths to file.  */
+
+/* { dg-options "-fprofile-arcs -ftest-coverage" } */
+/* { dg-do run { target native } } */
+
+#if !RECURSIVE
+#define RECURSIVE 1
+#include "./gcov-15.c"
+#undef RECURSIVE
+#endif
+
+static void __attribute__ ((noinline)) Recursive (void);
+
+
+#if RECURSIVE
+static void __attribute__ ((noinline))
+Recursive ()
+{
+  return; /* count(1) */
+}
+
+#else
+int main ()
+{
+  Recursive (); /* count(1) */
+  return 0;
+}
+#endif
+
+/* { dg-final { run-gcov { -a gcov-15.c } } } */