From patchwork Sat Nov 12 11:45:20 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Sidwell X-Patchwork-Id: 125338 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id B1DBC1007D8 for ; Sat, 12 Nov 2011 22:45:48 +1100 (EST) Received: (qmail 9534 invoked by alias); 12 Nov 2011 11:45:45 -0000 Received: (qmail 9521 invoked by uid 22791); 12 Nov 2011 11:45:42 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, TW_CP X-Spam-Check-By: sourceware.org Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 12 Nov 2011 11:45:25 +0000 Received: by wyg8 with SMTP id 8so152898wyg.20 for ; Sat, 12 Nov 2011 03:45:24 -0800 (PST) Received: by 10.216.14.22 with SMTP id c22mr10063wec.80.1321098323608; Sat, 12 Nov 2011 03:45:23 -0800 (PST) Received: by 10.216.14.22 with SMTP id c22mr10058wec.80.1321098323421; Sat, 12 Nov 2011 03:45:23 -0800 (PST) Received: from [192.168.44.105] (5ac3c889.bb.sky.com. [90.195.200.137]) by mx.google.com with ESMTPS id en10sm9457187wbb.0.2011.11.12.03.45.21 (version=SSLv3 cipher=OTHER); Sat, 12 Nov 2011 03:45:22 -0800 (PST) Message-ID: <4EBE5C50.1020804@acm.org> Date: Sat, 12 Nov 2011 11:45:20 +0000 From: Nathan Sidwell User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.23) Gecko/20110922 Lightning/1.0b2 Thunderbird/3.1.15 MIME-Version: 1.0 To: GCC Patches Subject: gcov patch, multiple paths Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, I've committed this patch to gcov, It deals with cases where the same header file, containing inline functions etc, is included by multiple different pathnames. The current behavior treats these as distinct sources, which is (a) misleading and (b) in the absence of -p all but one instance of the resulting .gcov file will be overwritten. We now canonicalize the names of source files by eliding '.' components always and eliding 'dir/..' components where possible. We can't resolve a 'dir/..' component when 'dir' is a symlink. We also canonicalize \\ path separators. I discovered some faults with the preserve-pathname mangling, in that it didn't match the documented behaviour ('.' wasn't elided, and /../ turned into '#..#^'. These are fixed with this patch too. I updated the documentation to make it clear you can provide either source or object filenames to gcov. Historically it accepted source files, but with current inlining and C++ features, it makes more sense to provide the object file names, especially in multi-directory builds. tested on i686-pc-linux-gnu 2011-11-12 Nathan Sidwell * gcov.c (struct name_map): New. (names, n_names, a_names): New global vars. (print_usage): Adjust usage. (generate_results): Canonicalize main file name. (release_structures): Adjust. (name_search, name_sort): New callbacks. (find_source): Look for and create a canonical name. (canonicalize_name): New. (make_gcov_file_name): Reimplement and fix mangling. (mangle_name): New. * doc/gcov.texi: Update documentation about path preservation. testsuite/ * gcc.misc-tests/gcov-15.c: New. Index: doc/gcov.texi =================================================================== --- doc/gcov.texi (revision 181300) +++ doc/gcov.texi (working copy) @@ -113,7 +113,7 @@ compatible with any other profiling or t @section Invoking @command{gcov} @smallexample -gcov @r{[}@var{options}@r{]} @var{sourcefiles} +gcov @r{[}@var{options}@r{]} @var{files} @end smallexample @command{gcov} accepts the following options: @@ -176,11 +176,12 @@ Do not create the @command{gcov} output @itemx --long-file-names Create long file names for included source files. For example, if the header file @file{x.h} contains code, and was included in the file -@file{a.c}, then running @command{gcov} on the file @file{a.c} will produce -an output file called @file{a.c##x.h.gcov} instead of @file{x.h.gcov}. -This can be useful if @file{x.h} is included in multiple source -files. If you use the @samp{-p} option, both the including and -included file names will be complete path names. +@file{a.c}, then running @command{gcov} on the file @file{a.c} will +produce an output file called @file{a.c##x.h.gcov} instead of +@file{x.h.gcov}. This can be useful if @file{x.h} is included in +multiple source files and you want to see the individual +contributions. If you use the @samp{-p} option, both the including +and included file names will be complete path names. @item -p @itemx --preserve-paths @@ -188,9 +189,9 @@ Preserve complete path information in th @file{.gcov} files. Without this option, just the filename component is used. With this option, all directories are used, with @samp{/} characters translated to @samp{#} characters, @file{.} directory components -removed and @file{..} +removed and unremoveable @file{..} components renamed to @samp{^}. This is useful if sourcefiles are in several -different directories. It also affects the @samp{-l} option. +different directories. @item -f @itemx --function-summaries @@ -203,9 +204,8 @@ Specify either the directory containing object path name. The @file{.gcno}, and @file{.gcda} data files are searched for using this option. If a directory is specified, the data files are in that directory and named after the -source file name, without its extension. If a file is specified here, -the data files are named after that file, without its extension. If this -option is not supplied, it defaults to the current directory. +input file name, without its extension. If a file is specified here, +the data files are named after that file, without its extension. @item -u @itemx --unconditional-branches @@ -223,12 +223,17 @@ when you invoked the compiler. Otherwis the source files. @command{gcov} produces files called @file{@var{mangledname}.gcov} in the current directory. These contain the coverage information of the source file they correspond to. -One @file{.gcov} file is produced for each source file containing code, +One @file{.gcov} file is produced for each source (or header) file +containing code, which was compiled to produce the data files. The @var{mangledname} part of the output file name is usually simply the source file name, but can be something more complicated if the @samp{-l} or @samp{-p} options are given. Refer to those options for details. +If you invoke @command{gcov} with multiple input files, the +contributions from each input file are summed. Typically you would +invoke it with the same list of files as the final link of your executable. + The @file{.gcov} files contain the @samp{:} separated fields along with program source code. The format is Index: gcov.c =================================================================== --- gcov.c (revision 181300) +++ gcov.c (working copy) @@ -231,7 +231,7 @@ typedef struct line_info typedef struct source_info { - /* Name of source file. */ + /* Canonical name of source file. */ char *name; time_t file_time; @@ -246,6 +246,12 @@ typedef struct source_info function_t *functions; } source_t; +typedef struct name_map +{ + char *name; /* Source file name */ + unsigned src; /* Source file */ +} name_map_t; + /* Holds a list of function basic block graphs. */ static function_t *functions; @@ -255,6 +261,10 @@ static source_t *sources; /* Array of static unsigned n_sources; /* Number of sources */ static unsigned a_sources; /* Allocated sources */ +static name_map_t *names; /* Mapping of file names to sources */ +static unsigned n_names; /* Number of names */ +static unsigned a_names; /* Allocated names */ + /* This holds data summary information. */ static unsigned object_runs; @@ -341,6 +351,9 @@ static void print_version (void) ATTRIBU static void process_file (const char *); static void generate_results (const char *); static void create_file_names (const char *); +static int name_search (const void *, const void *); +static int name_sort (const void *, const void *); +static char *canonicalize_name (const char *); static unsigned find_source (const char *); static function_t *read_graph_file (void); static int read_count_file (function_t *); @@ -353,6 +366,7 @@ static void accumulate_line_counts (sour static int output_branch_count (FILE *, int, const arc_t *); static void output_lines (FILE *, const source_t *); static char *make_gcov_file_name (const char *, const char *); +static char *mangle_name (const char *, char *); static void release_structures (void); static void release_function (function_t *); extern int main (int, char **); @@ -414,7 +428,7 @@ print_usage (int error_p) FILE *file = error_p ? stderr : stdout; int status = error_p ? FATAL_EXIT_CODE : SUCCESS_EXIT_CODE; - fnotice (file, "Usage: gcov [OPTION]... SOURCEFILE...\n\n"); + fnotice (file, "Usage: gcov [OPTION]... SOURCE|OBJ...\n\n"); fnotice (file, "Print code coverage information.\n\n"); fnotice (file, " -h, --help Print this help, then exit\n"); fnotice (file, " -v, --version Print version number, then exit\n"); @@ -524,7 +538,7 @@ process_args (int argc, char **argv) return optind; } -/* Process a single source file. */ +/* Process a single input file. */ static void process_file (const char *file_name) @@ -622,6 +636,16 @@ generate_results (const char *file_name) } } + if (file_name) + { + name_map_t *name_map = (name_map_t *)bsearch + (file_name, names, n_names, sizeof (*names), name_search); + if (name_map) + file_name = sources[name_map->src].name; + else + file_name = canonicalize_name (file_name); + } + for (ix = n_sources, src = sources; ix--; src++) { accumulate_line_counts (src); @@ -681,10 +705,12 @@ release_structures (void) function_t *fn; for (ix = n_sources; ix--;) - { - free (sources[ix].name); - free (sources[ix].lines); - } + free (sources[ix].lines); + free (sources); + + for (ix = n_names; ix--;) + free (names[ix].name); + free (names); while ((fn = functions)) { @@ -761,28 +787,75 @@ create_file_names (const char *file_name return; } +/* A is a string and B is a pointer to name_map_t. Compare for file + name orderability. */ + +static int +name_search (const void *a_, const void *b_) +{ + const char *a = (const char *)a_; + const name_map_t *b = (const name_map_t *)b_; + +#if HAVE_DOS_BASED_FILE_SYSTEM + return strcasecmp (a, b->name); +#else + return strcmp (a, b->name); +#endif +} + +/* A and B are a pointer to name_map_t. Compare for file name + orderability. */ + +static int +name_sort (const void *a_, const void *b_) +{ + const name_map_t *a = (const name_map_t *)a_; + return name_search (a->name, b_); +} + /* Find or create a source file structure for FILE_NAME. Copies FILE_NAME on creation */ static unsigned find_source (const char *file_name) { - unsigned ix; - source_t *src = 0; + name_map_t *name_map; + char *canon; + unsigned idx; struct stat status; if (!file_name) file_name = ""; - - for (ix = n_sources; ix--;) - if (!filename_cmp (file_name, sources[ix].name)) - { - src = &sources[ix]; - break; - } - - if (!src) - { + name_map = (name_map_t *)bsearch + (file_name, names, n_names, sizeof (*names), name_search); + if (name_map) + { + idx = name_map->src; + goto check_date; + } + + if (n_names + 2 > a_names) + { + /* Extend the name map array -- we'll be inserting one or two + entries. */ + if (!a_names) + a_names = 10; + a_names *= 2; + name_map = XNEWVEC (name_map_t, a_names); + memcpy (name_map, names, n_names * sizeof (*names)); + free (names); + names = name_map; + } + + /* Not found, try the canonical name. */ + canon = canonicalize_name (file_name); + name_map = (name_map_t *)bsearch + (canon, names, n_names, sizeof (*names), name_search); + if (!name_map) + { + /* Not found with canonical name, create a new source. */ + source_t *src; + if (n_sources == a_sources) { if (!a_sources) @@ -793,31 +866,51 @@ find_source (const char *file_name) free (sources); sources = src; } - ix = n_sources; - src = &sources[ix]; - src->name = xstrdup (file_name); + + idx = n_sources; + + name_map = &names[n_names++]; + name_map->name = canon; + name_map->src = idx; + + src = &sources[n_sources++]; + memset (src, 0, sizeof (*src)); + src->name = canon; src->coverage.name = src->name; - n_sources++; - if (!stat (file_name, &status)) + if (!stat (src->name, &status)) src->file_time = status.st_mtime; } + else + idx = name_map->src; + + if (name_search (file_name, name_map)) + { + /* Append the non-canonical name. */ + name_map = &names[n_names++]; + name_map->name = xstrdup (file_name); + name_map->src = idx; + } - if (src->file_time > bbg_file_time) + /* Resort the name map. */ + qsort (names, n_names, sizeof (*names), name_sort); + + check_date: + if (sources[idx].file_time > bbg_file_time) { static int info_emitted; fnotice (stderr, "%s:source file is newer than graph file '%s'\n", - src->name, bbg_file_name); + file_name, bbg_file_name); if (!info_emitted) { fnotice (stderr, "(the message is only displayed one per source file)\n"); info_emitted = 1; } - src->file_time = 0; + sources[idx].file_time = 0; } - return ix; + return idx; } /* Read the graph file. Return list of functions read -- in reverse order. */ @@ -1510,97 +1603,169 @@ function_summary (const coverage_t *cove } } -/* Generate an output file name. LONG_OUTPUT_NAMES and PRESERVE_PATHS - affect name generation. With preserve_paths we create a filename - from all path components of the source file, replacing '/' with - '#', without it we simply take the basename component. With +/* Canonicalize the filename NAME by canonicalizing directory + separators, eliding . components and resolving .. components + appropriately. Always returns a unique string. */ + +static char * +canonicalize_name (const char *name) +{ + /* The canonical name cannot be longer than the incoming name. */ + char *result = XNEWVEC (char, strlen (name) + 1); + const char *base = name, *probe; + char *ptr = result; + char *dd_base; + int slash = 0; + +#if HAVE_DOS_BASED_FILE_SYSTEM + if (base[0] && base[1] == ':') + { + result[0] = base[0]; + result[1] = ':'; + base += 2; + ptr += 2; + } +#endif + for (dd_base = ptr; *base; base = probe) + { + size_t len; + + for (probe = base; *probe; probe++) + if (IS_DIR_SEPARATOR (*probe)) + break; + + len = probe - base; + if (len == 1 && base[0] == '.') + /* Elide a '.' directory */ + ; + else if (len == 2 && base[0] == '.' && base[1] == '.') + { + /* '..', we can only elide it and the previous directory, if + we're not a symlink. */ + struct stat buf; + + *ptr = 0; + if (dd_base == ptr || stat (result, &buf) || S_ISLNK (buf.st_mode)) + { + /* Cannot elide, or unreadable or a symlink. */ + dd_base = ptr + 2 + slash; + goto regular; + } + while (ptr != dd_base && *ptr != '/') + ptr--; + slash = ptr != result; + } + else + { + regular: + /* Regular pathname component. */ + if (slash) + *ptr++ = '/'; + memcpy (ptr, base, len); + ptr += len; + slash = 1; + } + + for (; IS_DIR_SEPARATOR (*probe); probe++) + continue; + } + *ptr = 0; + + return result; +} + +/* Generate an output file name. INPUT_NAME is the canonicalized main + input file and SRC_NAME is the canonicalized file name. + LONG_OUTPUT_NAMES and PRESERVE_PATHS affect name generation. With long_output_names we prepend the processed name of the input file to each output name (except when the current source file is the input file, so you don't get a double concatenation). The two - components are separated by '##'. Also '.' filename components are - removed and '..' components are renamed to '^'. */ + components are separated by '##'. With preserve_paths we create a + filename from all path components of the source file, replacing '/' + with '#', and .. with '^', without it we simply take the basename + component. (Remember, the canonicalized name will already have + elided '.' components and converted \\ separators.) */ static char * make_gcov_file_name (const char *input_name, const char *src_name) { - const char *cptr; - char *name; + char *ptr; + char *result; if (flag_long_names && input_name && strcmp (src_name, input_name)) { - name = XNEWVEC (char, strlen (src_name) + strlen (input_name) + 10); - name[0] = 0; /* Generate the input filename part. */ - cptr = flag_preserve_paths ? NULL : lbasename (input_name); - strcat (name, cptr ? cptr : input_name); - strcat (name, "##"); + result = XNEWVEC (char, strlen (input_name) + strlen (src_name) + 10); + + ptr = result; + ptr = mangle_name (input_name, ptr); + ptr[0] = ptr[1] = '#'; + ptr += 2; } else { - name = XNEWVEC (char, strlen (src_name) + 10); - name[0] = 0; + result = XNEWVEC (char, strlen (src_name) + 10); + ptr = result; } - /* Generate the source filename part. */ - - cptr = flag_preserve_paths ? NULL : lbasename (src_name); - strcat (name, cptr ? cptr : src_name); + ptr = mangle_name (src_name, ptr); + strcpy (ptr, ".gcov"); + + return result; +} - if (flag_preserve_paths) +static char * +mangle_name (char const *base, char *ptr) +{ + size_t len; + + /* Generate the source filename part. */ + if (!flag_preserve_paths) { - /* Convert '/' and '\' to '#', remove '/./', convert '/../' to '#^#', + base = lbasename (base); + len = strlen (base); + memcpy (ptr, base, len); + ptr += len; + } + else + { + /* Convert '/' to '#', convert '..' to '^', convert ':' to '~' on DOS based file system. */ - char *pnew = name, *pold = name; - - /* First check for leading drive separator. */ + const char *probe; - while (*pold != '\0') +#if HAVE_DOS_BASED_FILE_SYSTEM + if (base[0] && base[1] == ':') { -#if defined (HAVE_DOS_BASED_FILE_SYSTEM) - if (*pold == ':') - { - *pnew++ = '~'; - pold++; - } - else + ptr[0] = base[0]; + ptr[1] = '~'; + ptr += 2; + base += 2; + } #endif - if ((*pold == '/' - && (strstr (pold, "/./") == pold - || strstr (pold, "/.\\") == pold)) - || (*pold == '\\' - && (strstr (pold, "\\.\\") == pold - || strstr (pold, "\\./") == pold))) - pold += 3; - else if (*pold == '/' - && (strstr (pold, "/../") == pold - || strstr (pold, "/..\\") == pold)) - { - strcpy (pnew, "#^#"); - pnew += 3; - pold += 4; - } - else if (*pold == '\\' - && (strstr (pold, "\\..\\") == pold - || strstr (pold, "\\../") == pold)) + for (; *base; base = probe) + { + size_t len; + + for (probe = base; *probe; probe++) + if (*probe == '/') + break; + len = probe - base; + if (len == 2 && base[0] == '.' && base[1] == '.') + *ptr++ = '^'; + else { - strcpy (pnew, "#^#"); - pnew += 3; - pold += 4; + memcpy (ptr, base, len); + ptr += len; } - else if (*pold == '/' || *pold == '\\') + if (*probe) { - *pnew++ = '#'; - pold++; + *ptr++ = '#'; + probe++; } - else - *pnew++ = *pold++; } - - *pnew = '\0'; } - - strcat (name, ".gcov"); - return name; + + return ptr; } /* Scan through the bb_data for each line in the block, increment Index: testsuite/gcc.misc-tests/gcov-15.c =================================================================== --- testsuite/gcc.misc-tests/gcov-15.c (revision 0) +++ testsuite/gcc.misc-tests/gcov-15.c (revision 0) @@ -0,0 +1,30 @@ +/* Test gcov multiple paths to file. */ + +/* { dg-options "-fprofile-arcs -ftest-coverage" } */ +/* { dg-do run { target native } } */ + +#if !RECURSIVE +#define RECURSIVE 1 +#include "./gcov-15.c" +#undef RECURSIVE +#endif + +static void __attribute__ ((noinline)) Recursive (void); + + +#if RECURSIVE +static void __attribute__ ((noinline)) +Recursive () +{ + return; /* count(1) */ +} + +#else +int main () +{ + Recursive (); /* count(1) */ + return 0; +} +#endif + +/* { dg-final { run-gcov { -a gcov-15.c } } } */