diff mbox series

[2/2] Prime path coverage in gcc/gcov

Message ID 20240711071042.2484895-2-j@lambda.is
State New
Headers show
Series [1/2] gcov: Cache source files | expand

Commit Message

Jørgen Kvalsvik July 11, 2024, 7:10 a.m. UTC
These are the main highlights since v2:

1. Significant performance improvement in finding prime paths --
   primarily from reducing work when merging tries, but also smaller
   optimizations.

2. JSON output

3. Much improved gcov output, including inlining aware gcov output. See
   demo.

4. Flag for giving up after #-of-paths -- see discussion.

5. Minor bugfixes, destructors, refactoring, helpers etc.

Compile times are still brutal, but much better than before. It doesn't
seem like I emit more instructions than I absolutely have to (but I
would like a counter example here). I don't know if there is too much
that can be done about this other than general speed improvements in
verify-ssa, verify-gimple, verify-control-flow and the likes.

The feature is shaping up nicely, and I think it would be ok to review
now. After some feedback I will write the manual entry for it.

--

This patch adds prime path coverage to gcc/gcov. First, a quick
introduction to path coverage, before I explain a bit on the pieces of
the patch.

PRIME PATHS

Path coverage is recording the paths taken through the program. Here is a
simple example:

if (cond1)  BB 1
  then1 ()  BB 2
else
  else1 ()  BB 3

if (cond2)  BB 4
  then2 ()  BB 5
else
  else2 ()  BB 6

_           BB 7

To cover all paths you must run {then1 then2}, {then1 else2}, {else1 then1},
{else1 else2}. This is in contrast with line/statement coverage where it is
sufficient to execute then2, and it does not matter if it was reached through
then1 or else1.

1 2 4 5 7
1 2 4 6 7
1 3 4 5 7
1 3 4 6 7

This gets more complicated with loops, because 0, 1, 2, ..., N iterations are
all different paths. There are different ways of addressing this, a promising
one being prime paths. A prime path is a simple path (a path with no repeated
vertices except for the first/last in a cycle) that does not appear as a subpath
of any other simple path. Prime paths seem to strike a decent balance between
number of tests, path growth, and loop coverage. Of course, the number of paths
still grows very fast with program complexity - for example, this program has
14 prime paths:

  while (a)
    {
      if (b)
        return;
      while (c--)
        a++;
    }

--

ALGORITHM

Since the numbers of paths grows so fast, we need a good algorithm. The naive
approach of generating all paths and discarding redundancies (see
reference_prime_paths in the diff) simply doesn't complete for even pretty
simple functions with a few ten thousand paths (granted, the implementation is
also poor, but only serves as a reference). Fazli & Afsharchi in their paper
"Time and Space-Efficient Compositional Method for Prime and Test Paths
Generation from describe a neat algorithm which drastically improves on this
and brings complexity down to something managable. This patch implements that
algorithm with a few minor tweaks.

The algorithm first finds the strongly connected components (SCC) of the graph
and creates a new graph where the vertices are the SCCs of the CFG. Within
these vertices different paths are found - regular prime paths, paths that
start in the SCCs entries, and paths that end in the SCCs exits. These per-SCC
paths are combined with paths through the CFG which greatly reduces of paths
needed to be evaluated just to be thrown away.

Using this algorithm we can generate the prime paths for somewhat complicated
functions in a reasonable time. This is the prime_paths function. Please note
that some paths don't benefit from this at all. We need to find the prime paths
within a SCC, so if a single SCC is very large the function degenerates to the
naive implementation. Improving on this is a later project.

--

OVERALL ARCHITECTURE

Like the other coverages in gcc, this operates on the CFG in the profiling
phase, just after branch and condition coverage, in phases:

1. All prime paths are generated, counted, and enumerated from the CFG
2. The paths are evaluted and counter instructions and accumulators are
   emitted
3. gcov reads the CFG and computes the prime paths (same as step 1)
4. gcov prints a report

Simply writing out all the paths in the .gcno file is not really viable,
the files would be too big. Additionally, there are limits to the
practicality of measuring (and reporting) on millions of paths, so for
most programs where coverage is feasible, computing paths should be
plenty fast. As a result, path coverage really only adds 1 bit to the
counter, rounded up to nearest 64 ("bucket"), so 64 paths takes up 8
bytes, 65 paths take up 16 bytes.

Recording paths is really just massaging large bitsets. Per function,
ceil(paths/64 or 32) buckets (gcov_type) are allocated. Paths are
sorted, so the first path maps to the lowest bit, the second path to the
second lowest bit, and so on. On taking an edge and entering a basic
block, a few bitmasks are applied to unset the bits corresponding to the
paths outside the block and set the bits of the paths that start in that
block. Finally, the right buckets are masked and written to the global
accumulators for the paths that end in the block. Full coverage is
achieved when all bits are set.

gcc does not really inform gcov of abnormal paths, so paths with
abnormal paths are ignored. This probably possible, but requires some
changes to the graph gcc writes to the .gcno file.

--

IMPLEMENTATION

In order to remove non-prime paths (subpaths) I use a non-clever suffix tree,
by inserting all subpaths into a trie. Fazli & Afsharchi do not discuss how
duplicates or subpaths are removed, and using the trie turned out to work
really well. The same prime_paths function is used both in gcc and in gcov
which meant adding some more objects in Makefile.in.

As for speed, I would say that it is acceptable (but see missing pieces
on knobs). It is a problem that is combinatorial in its very nature, so
if you enable this feature you can reasonably expect it taking a while.
My main benchmark tree.c generates approx 2M paths across the 20
functions or so in it (where most functions have less than 1500 paths,
and 2 around a million each). Finding the paths takes 3.5-4s, but the
instrumentation phase takes approx. 2.5 minutes and generates a 32M
binary. Not bad for a 1429 line source file.

There are some selftests which deconstruct the algorithm, so it can be
easily referenced with the Fazli & Afsharchi. I hope that including them
both help to catch regression, clarify the assumptions, and help
understanding the algorithm by breaking up the phases.

DEMO

This is the denser line-aware (grep-friendlier) output. Every missing
path is summarized as the lines you need to run in what order, annotated
with the true/false/throw decision.

$ gcc -fpath-coverage --coverage bs.c -c -o bs
$ gcov -et bs.o
bs.gcda:cannot open data file, assuming not executed
        -:    0:Source:bs.c
        -:    0:Graph:bs.gcno
        -:    0:Data:-
        -:    0:Runs:0
paths covered 0 of 17
path  0 not covered: lines 6 6(true) 11(true) 12
path  1 not covered: lines 6 6(true) 11(false) 13(true) 14
path  2 not covered: lines 6 6(true) 11(false) 13(false) 16
path  3 not covered: lines 6 6(false) 18
path  4 not covered: lines 11(true) 12 6(true) 11
path  5 not covered: lines 11(true) 12 6(false) 18
path  6 not covered: lines 11(false) 13(true) 14 6(true) 11
path  7 not covered: lines 11(false) 13(true) 14 6(false) 18
path  8 not covered: lines 12 6(true) 11(true) 12
path  9 not covered: lines 12 6(true) 11(false) 13(true) 14
path 10 not covered: lines 12 6(true) 11(false) 13(false) 16
path 11 not covered: lines 13(true) 14 6(true) 11(true) 12
path 12 not covered: lines 13(true) 14 6(true) 11(false) 13
path 13 not covered: lines 14 6(true) 11(false) 13(true) 14
path 14 not covered: lines 14 6(true) 11(false) 13(false) 16
path 15 not covered: lines 6(true) 11(true) 12 6
path 16 not covered: lines 6(true) 11(false) 13(true) 14 6
    #####:    1:int binary_search(int a[], int len, int from, int to, int key)
        -:    2:{
    #####:    3:    int low = from;
    #####:    4:    int high = to - 1;
        -:    5:
    #####:    6:    while (low <= high)
        -:    7:    {
    #####:    8:        int mid = (low + high) >> 1;
    #####:    9:        long midVal = a[mid];
        -:   10:
    #####:   11:        if (midVal < key)
    #####:   12:            low = mid + 1;
    #####:   13:        else if (midVal > key)
    #####:   14:            high = mid - 1;
        -:   15:        else
    #####:   16:            return mid; // key found
        -:   17:    }
    #####:   18:    return -1;
        -:   19:}

Then there's this mode, which I personally like quite a lot for
understanding paths. Because it is so verbose I have limited the demo to
2 paths. In this mode gcov will print the sequence of *lines* through
the program and in what order to cover the path, including what basic
block the line is a part of. Like its denser sibling, this also prints
the true/false/throw decision, if there is one.

$ gcov -t --prime-paths-source bs.o
bs.gcda:cannot open data file, assuming not executed
        -:    0:Source:bs.c
        -:    0:Graph:bs.gcno
        -:    0:Data:-
        -:    0:Runs:0
paths covered 0 of 17
path 0:
BB  2:           1:int binary_search(int a[], int len, int from, int to, int key)
BB  2:           3:    int low = from;
BB  2:           4:    int high = to - 1;
BB  2:           6:    while (low <= high)
BB  8: (true)    6:    while (low <= high)
BB  3:           8:     int mid = (low + high) >> 1;
BB  3:           9:     long midVal = a[mid];
BB  3: (true)   11:     if (midVal < key)
BB  4:          12:         low = mid + 1;

path 1:
BB  2:           1:int binary_search(int a[], int len, int from, int to, int key)
BB  2:           3:    int low = from;
BB  2:           4:    int high = to - 1;
BB  2:           6:    while (low <= high)
BB  8: (true)    6:    while (low <= high)
BB  3:           8:     int mid = (low + high) >> 1;
BB  3:           9:     long midVal = a[mid];
BB  3: (false)  11:     if (midVal < key)
BB  5: (true)   13:     else if (midVal > key)
BB  6:          14:         high = mid - 1;

The listing is also aware of inlining:

hello.c:

    #include <stdio.h>
    #include "hello.h"

    int notmain(const char *entity)
    {
      return hello (entity);
    }

    #include <stdio.h>

    inline __attribute__((always_inline))
    int hello (const char *s)
    {
      if (s)
        printf ("hello, %s!\n", s);
      else
        printf ("hello, world!\n");
      return 0;
    }

$ gcov -t --prime-paths-source hello

paths covered 0 of 2
path 0:
BB  2: (true)    4:int notmain(const char *entity)
 == inlined from hello.h ==
BB  2: (true)    6:  if (s)
BB  3:           7:    printf ("hello, %s!\n", s);
BB  5:          10:  return 0;
-------------------------
BB  7:           6:  return hello (entity);
BB  8:           6:  return hello (entity);

path 1:
BB  2: (false)   4:int notmain(const char *entity)
 == inlined from hello.h ==
BB  2: (false)   6:  if (s)
BB  4:           9:    printf ("hello, world!\n");
BB  5:          10:  return 0;
-------------------------
BB  7:           6:  return hello (entity);
BB  8:           6:  return hello (entity);

And finally, JSON (abbreviated). It is quite sparse and very nested, but
is mostly a JSON version of the source listing. It has to be this nested
in order to consistently capture multiple locations. It is always
includes the file name per location for consistency, even though this is
very much redundant in almost all cases. This format is in no way set in
stone, and without targeting it with other tooling I am not sure if it
does the job well.

  "gcc_version": "15.0.0 20240704 (experimental)",
  "current_working_directory": "dir",
  "data_file": "hello.o",
  "files": [
    {
      "file": "hello.c",
      "functions": [
        {
          "name": "notmain",
          "demangled_name": "notmain",
          "start_line": 4,
          "start_column": 5,
          "end_line": 7,
          "end_column": 1,
          "blocks": 7,
          "blocks_executed": 0,
          "execution_count": 0,
          "total_prime_paths": 2,
          "covered_prime_paths": 0,
          "prime_path_coverage": [
            {
              "id": 0,
              "sequence": [
                {
                  "block_id": 2,
                  "locations": [
                    {
                      "file": "hello.c",
                      "line_numbers": [
                        4
                      ]
                    },
                    {
                      "file": "hello.h",
                      "line_numbers": [
                        6
                      ]
                    }
                  ],
                  "edge_kind": "fallthru"
                },
                ...

--

LIMITING NUMBER OF PATHS

This flag controls when gcc gives up on path coverage. To be fast it
uses an approximation where it tracks the worst-case number of paths by
counting inserts into the partial tries (before merging) which means it
also counts effectively redundant paths. In practice this is a fuzzy
upper limit (as estimating the number of paths is very hard, and has no
immediate relationship to the number of edges or vertices), and is
typically set to a high value. The idea is to avoid instrumenting the
absolute worst functions and keep compile times reasonable, and not so
much abour rejecting functions with 101357 paths and accepting 101356
paths.

OPEN QUESTIONS

Suffix arrays or suffix tree algorithms. I experimented with an
implementation of the skew suffix array construction algorithm I found
online, but could not get it to perform well enough to warrant swapping
to it. I don't think working on this matters too much right now
considering the compile time added by the instrumentation phase
dominates running time -- when finding paths adds seconds,
instrumentation adds minutes.

The compile times are brutal. Obviously finding prime paths could be
faster, but right now the real bottleneck is actually instrumentation,
or rather, the work created by emitting a bunch of SSAs and gimple
assigns.
---
 gcc/Makefile.in                        |    6 +-
 gcc/builtins.cc                        |    2 +-
 gcc/collect2.cc                        |    5 +-
 gcc/common.opt                         |   14 +
 gcc/gcc.cc                             |    4 +-
 gcc/gcov-counter.def                   |    3 +
 gcc/gcov-io.h                          |    3 +
 gcc/gcov.cc                            |  426 +++++-
 gcc/gimple-iterator.cc                 |    2 +
 gcc/ipa-inline.cc                      |    2 +-
 gcc/passes.cc                          |    4 +-
 gcc/path-coverage.cc                   |  627 ++++++++
 gcc/prime-paths.cc                     | 1939 ++++++++++++++++++++++++
 gcc/profile.cc                         |    6 +-
 gcc/selftest-run-tests.cc              |    1 +
 gcc/selftest.h                         |    1 +
 gcc/testsuite/g++.dg/gcov/gcov-22.C    |   68 +
 gcc/testsuite/gcc.misc-tests/gcov-29.c |  861 +++++++++++
 gcc/testsuite/lib/gcov.exp             |   92 +-
 gcc/tree-profile.cc                    |   11 +-
 20 files changed, 4060 insertions(+), 17 deletions(-)
 create mode 100644 gcc/path-coverage.cc
 create mode 100644 gcc/prime-paths.cc
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-22.C
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c
diff mbox series

Patch

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7d3ea27a6ab..fb904493d6c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1622,6 +1622,8 @@  OBJS = \
 	opts-global.o \
 	ordered-hash-map-tests.o \
 	passes.o \
+	prime-paths.o \
+	path-coverage.o \
 	plugin.o \
 	pointer-query.o \
 	postreload-gcse.o \
@@ -2878,6 +2880,8 @@  GTFILES = $(CPPLIB_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
   $(srcdir)/tree-scalar-evolution.cc \
   $(srcdir)/tree-ssa-operands.h \
   $(srcdir)/tree-profile.cc $(srcdir)/tree-nested.cc \
+  $(srcdir)/path-coverage.cc \
+  $(srcdir)/prime-paths.cc \
   $(srcdir)/omp-offload.h \
   $(srcdir)/omp-general.cc \
   $(srcdir)/omp-low.cc \
@@ -3254,7 +3258,7 @@  s-version: build/genversion$(build_exeext)
 # gcov.o needs $(ZLIBINC) added to the include flags.
 CFLAGS-gcov.o += $(ZLIBINC)
 
-GCOV_OBJS = gcov.o json.o
+GCOV_OBJS = gcov.o json.o graphds.o prime-paths.o bitmap.o sbitmap.o
 gcov$(exeext): $(GCOV_OBJS) $(LIBDEPS)
 	+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) \
 		hash-table.o ggc-none.o $(LIBS) $(ZLIB) -o $@
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0b902896ddd..cccaad469da 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6323,7 +6323,7 @@  expand_builtin_fork_or_exec (tree fn, tree exp, rtx target, int ignore)
   tree call;
 
   /* If we are not profiling, just call the function.  */
-  if (!profile_arc_flag && !condition_coverage_flag)
+  if (!profile_arc_flag && !condition_coverage_flag && !path_coverage_flag)
     return NULL_RTX;
 
   /* Otherwise call the wrapper.  This should be equivalent for the rest of
diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index 902014a9cc1..8be949d117a 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -1035,9 +1035,9 @@  main (int argc, char **argv)
       lto_mode = LTO_MODE_LTO;
   }
 
-  /* -fno-profile-arcs -fno-condition-coverage -fno-test-coverage
+  /* -fno-profile-arcs -fno-condition-coverage -fno-path-coverage -fno-test-coverage
      -fno-branch-probabilities -fno-exceptions -w -fno-whole-program */
-  num_c_args += 7;
+  num_c_args += 8;
 
   c_argv = XCNEWVEC (char *, num_c_args);
   c_ptr = CONST_CAST2 (const char **, char **, c_argv);
@@ -1234,6 +1234,7 @@  main (int argc, char **argv)
   obstack_free (&temporary_obstack, temporary_firstobj);
   *c_ptr++ = "-fno-profile-arcs";
   *c_ptr++ = "-fno-condition-coverage";
+  *c_ptr++ = "-fno-path-coverage";
   *c_ptr++ = "-fno-test-coverage";
   *c_ptr++ = "-fno-branch-probabilities";
   *c_ptr++ = "-fno-exceptions";
diff --git a/gcc/common.opt b/gcc/common.opt
index a300470bbc5..0c3064a252c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -881,6 +881,16 @@  Common Var(warn_too_many_conditions) Init(1) Warning
 Warn when a conditional has too many terms and condition coverage profiling
 gives up instrumenting the expression.
 
+fpath-coverage-limit=
+Common RejectNegative Joined UInteger Var(path_coverage_limit) Init(250000)
+-fpath-coverage-limit=<number>	Don't instrument functions path count exceeding <number>.
+
+Wcoverage-too-many-paths
+Common Var(warn_too_many_paths) Init(1) Warning
+Warn when a function exceeds the number of paths (controlled by
+-fpath-coverage-limit) and path coverage give up instrumenting the
+function.
+
 Wmissing-profile
 Common Var(warn_missing_profile) Init(1) Warning
 Warn in case profiles in -fprofile-use do not exist.
@@ -2403,6 +2413,10 @@  foptimize-sibling-calls
 Common Var(flag_optimize_sibling_calls) Optimization
 Optimize sibling and tail recursive calls.
 
+fpath-coverage
+Common Var(path_coverage_flag)
+Insert path profiling code.
+
 fpartial-inlining
 Common Var(flag_partial_inlining) Optimization
 Perform partial inlining.
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index d80b604a48d..4c361276aef 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -1165,7 +1165,7 @@  proper position among the other output files.  */
 	%:include(libgomp.spec)%(link_gomp)}\
     %{fgnu-tm:%:include(libitm.spec)%(link_itm)}\
     " STACK_SPLIT_SPEC "\
-    %{fprofile-arcs|fcondition-coverage|fprofile-generate*|coverage:-lgcov} " SANITIZER_SPEC " \
+    %{fprofile-arcs|fcondition-coverage|fpath-coverage|fprofile-generate*|coverage:-lgcov} " SANITIZER_SPEC " \
     %{!nostdlib:%{!r:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}}\
     %{!nostdlib:%{!r:%{!nostartfiles:%E}}} %{T*}  \n%(post_link) }}}}}}"
 #endif
@@ -1288,7 +1288,7 @@  static const char *cc1_options =
  %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %w%b.s}}}\
  %{fsyntax-only:-o %j} %{-param*}\
  %{coverage:-fprofile-arcs -ftest-coverage}\
- %{fprofile-arcs|fcondition-coverage|fprofile-generate*|coverage:\
+ %{fprofile-arcs|fcondition-coverage|fpath-coverage|fprofile-generate*|coverage:\
    %{!fprofile-update=single:\
      %{pthread:-fprofile-update=prefer-atomic}}}";
 
diff --git a/gcc/gcov-counter.def b/gcc/gcov-counter.def
index 45d4b3eb0c8..65404135c81 100644
--- a/gcc/gcov-counter.def
+++ b/gcc/gcov-counter.def
@@ -52,3 +52,6 @@  DEF_GCOV_COUNTER(GCOV_TIME_PROFILER, "time_profiler", _time_profile)
 
 /* Conditions.  The counter is interpreted as a bit-set.  */
 DEF_GCOV_COUNTER(GCOV_COUNTER_CONDS, "conditions", _ior)
+
+/* Paths.  The counter is interpreted as a bit-set.  */
+DEF_GCOV_COUNTER(GCOV_COUNTER_PATHS, "ior", _ior)
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 5dc467c92b1..623ca56899a 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -264,6 +264,9 @@  typedef uint64_t gcov_type_unsigned;
 #define GCOV_TAG_CONDS		   ((gcov_unsigned_t)0x01470000)
 #define GCOV_TAG_CONDS_LENGTH(NUM) ((NUM) * 2 * GCOV_WORD_SIZE)
 #define GCOV_TAG_CONDS_NUM(LENGTH) (((LENGTH) / GCOV_WORD_SIZE) / 2)
+#define GCOV_TAG_PATHS		   ((gcov_unsigned_t)0x01490000)
+#define GCOV_TAG_PATHS_LENGTH(NUM) ((NUM) * GCOV_WORD_SIZE)
+#define GCOV_TAG_PATHS_NUM(LENGTH) (((LENGTH) / GCOV_WORD_SIZE))
 #define GCOV_TAG_LINES		 ((gcov_unsigned_t)0x01450000)
 #define GCOV_TAG_COUNTER_BASE 	 ((gcov_unsigned_t)0x01a10000)
 #define GCOV_TAG_COUNTER_LENGTH(NUM) ((NUM) * 2 * GCOV_WORD_SIZE)
diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 055fa7e78ba..37ba060060c 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -48,6 +48,7 @@  along with Gcov; see the file COPYING3.  If not see
 #include "json.h"
 #include "hwint.h"
 #include "xregex.h"
+#include "graphds.h"
 
 #include <zlib.h>
 #include <getopt.h>
@@ -84,6 +85,7 @@  class function_info;
 class block_info;
 class source_info;
 class condition_info;
+class path_info;
 
 /* Describes an arc between two basic blocks.  */
 
@@ -129,6 +131,45 @@  struct arc_info
   struct arc_info *pred_next;
 };
 
+/* Describes (prime) path coverage.  */
+class path_info
+{
+public:
+  path_info () : paths (), covered () {}
+
+  /* The prime paths of a function.  The paths will be
+     lexicographically ordered and identified by their index.  */
+  vector<vector<unsigned>> paths;
+
+  /* The covered paths.  This is really a large bitset partitioned
+     into buckets of gcov_type_unsigned, and bit n is set if the nth
+     path is covered.  */
+  vector<gcov_type_unsigned> covered;
+
+  /* The size (in bits) of each bucket.  */
+  static const size_t
+  bucketsize = sizeof (gcov_type_unsigned) * BITS_PER_UNIT;
+
+  /* Count the covered paths.  */
+  unsigned covered_paths () const
+  {
+    unsigned cnt = 0;
+    for (gcov_type_unsigned v : covered)
+      cnt += popcount_hwi (v);
+    return cnt;
+  }
+
+  /* Check if the nth path is covered.  */
+  bool covered_p (size_t n) const
+  {
+    if (covered.empty ())
+      return false;
+    const size_t bucket = n / bucketsize;
+    const size_t bit = n % bucketsize;
+    return covered[bucket] & (gcov_type_unsigned (1) << bit);
+  }
+};
+
 /* Describes which locations (lines and files) are associated with
    a basic block.  */
 
@@ -317,6 +358,9 @@  public:
 
   vector<condition_info*> conditions;
 
+  /* Path coverage information.  */
+  path_info paths;
+
   /* Raw arc coverage counts.  */
   vector<gcov_type> counts;
 
@@ -400,6 +444,9 @@  struct coverage_info
   int calls_executed;
 
   char *name;
+
+  unsigned paths;
+  unsigned paths_covered;
 };
 
 /* Describes a file mentioned in the block graph.  Contains an array
@@ -607,6 +654,14 @@  static bool flag_conditions = 0;
 /* Show unconditional branches too.  */
 static int flag_unconditional = 0;
 
+/* Output path coverage.  */
+static bool flag_prime_paths = false;
+
+static bool flag_prime_paths_edges = false;
+static bool flag_prime_paths_source = false;
+
+static unsigned flag_prime_paths_limit = (unsigned)-1;;
+
 /* Output a gcov file if this is true.  This is on by default, and can
    be turned off by the -n option.  */
 
@@ -750,9 +805,11 @@  static unsigned find_source (const char *);
 static void read_graph_file (void);
 static int read_count_file (void);
 static void solve_flow_graph (function_info *);
+static void find_prime_paths (function_info *fn);
 static void find_exception_blocks (function_info *);
 static void add_branch_counts (coverage_info *, const arc_info *);
 static void add_condition_counts (coverage_info *, const block_info *);
+static void add_path_counts (coverage_info *, const function_info *);
 static void add_line_counts (coverage_info *, function_info *);
 static void executed_summary (unsigned, unsigned);
 static void function_summary (const coverage_info *);
@@ -767,6 +824,9 @@  static string make_gcov_file_name (const char *, const char *);
 static char *mangle_name (const char *);
 static void release_structures (void);
 extern int main (int, char **);
+static const vector<const char *>& slurp (const source_info &src,
+					  FILE *gcov_file,
+					  const char *line_start);
 
 function_info::function_info (): m_name (NULL), m_demangled_name (NULL),
   ident (0), lineno_checksum (0), cfg_checksum (0), has_catch (0),
@@ -799,6 +859,17 @@  bool function_info::group_line_p (unsigned n, unsigned src_idx)
   return is_group && src == src_idx && start_line <= n && n <= end_line;
 }
 
+/* Find the arc that connects BLOCK to the block with id DEST.  This
+   edge must exist.  */
+static const arc_info&
+find_arc (const block_info &block, unsigned dest)
+{
+  for (const arc_info *arc = block.succ; arc; arc = arc->succ_next)
+    if (arc->dst->id == dest)
+      return *arc;
+  gcc_assert (false);
+}
+
 /* Cycle detection!
    There are a bajillion algorithms that do this.  Boost's function is named
    hawick_cycles, so I used the algorithm by K. A. Hawick and H. A. James in
@@ -1028,6 +1099,7 @@  print_usage (int error_p)
                                     rather than percentages\n");
   fnotice (file, "  -g, --conditions                Include modified condition/decision\n\
                                     coverage (masking MC/DC) in output\n");
+  fnotice (file, "  -e, --prime-paths               Show prime path coverage\n");
   fnotice (file, "  -d, --display-progress          Display progress information\n");
   fnotice (file, "  -D, --debug			    Display debugging dumps\n");
   fnotice (file, "  -f, --function-summaries        Output summaries for each function\n");
@@ -1086,6 +1158,12 @@  static const struct option options[] =
   { "branch-probabilities", no_argument,       NULL, 'b' },
   { "branch-counts",        no_argument,       NULL, 'c' },
   { "conditions",	    no_argument,       NULL, 'g' },
+  /* The easier implementatio is --prime-paths-{report} */
+  { "prime-paths",	    no_argument,       NULL, 'e' },
+  { "prime-paths-edges",    no_argument,       NULL, 900 },
+  { "prime-paths-source",   no_argument,       NULL, 902 },
+  { "prime-paths-all",	    no_argument,       NULL, 903 },
+  { "prime-paths-limit",    required_argument, NULL, 904 },
   { "json-format",	    no_argument,       NULL, 'j' },
   { "include",              required_argument, NULL, 'I' },
   { "exclude",              required_argument, NULL, 'E' },
@@ -1117,7 +1195,7 @@  process_args (int argc, char **argv)
 {
   int opt;
 
-  const char *opts = "abcdDfghHijklmMno:pqrs:tuvwx";
+  const char *opts = "abcdDefghHijklmMno:pqrs:tuvwx";
   while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
     {
       switch (opt)
@@ -1131,6 +1209,26 @@  process_args (int argc, char **argv)
 	case 'c':
 	  flag_counts = 1;
 	  break;
+	case 'e':
+	case 900:
+	  flag_prime_paths = true;
+	  flag_prime_paths_edges = true;
+	  break;
+	case 901:
+	  flag_prime_paths = true;
+	  break;
+	case 902:
+	  flag_prime_paths = true;
+	  flag_prime_paths_source = true;
+	  break;
+	case 903:
+	  flag_prime_paths = true;
+	  flag_prime_paths_edges = true;
+	  flag_prime_paths_source = true;
+	  break;
+	case 904:
+	  flag_prime_paths_limit = atoi (optarg);
+	  break;
 	case 'f':
 	  flag_function_summary = 1;
 	  break;
@@ -1383,6 +1481,68 @@  get_gcov_intermediate_filename (const char *input_file_name)
   return str.c_str ();
 }
 
+/* Add prime path coverage from INFO to FUNCTION.  */
+static void
+json_set_prime_path_coverage (json::object &function, function_info &info)
+{
+  json::array *jpaths = new json::array ();
+  function.set_integer ("total_prime_paths", info.paths.paths.size ());
+  function.set_integer ("covered_prime_paths", info.paths.covered_paths ());
+  function.set ("prime_path_coverage", jpaths);
+
+  size_t pathno = 0;
+  for (const vector<unsigned> &path : info.paths.paths)
+    {
+      if (info.paths.covered_p (pathno++))
+	continue;
+
+      gcc_assert (!path.empty ());
+
+      json::object *jpath = new json::object ();
+      jpaths->append (jpath);
+      jpath->set_integer ("id", pathno - 1);
+
+      json::array *jlist = new json::array ();
+      jpath->set ("sequence", jlist);
+
+      for (size_t i = 0; i != path.size (); ++i)
+	{
+	  const unsigned bb = path[i];
+	  const block_info &block = info.blocks[bb];
+	  const char *edge_kind = "";
+	  if (i + 1 != path.size ())
+	    {
+	      const arc_info &arc = find_arc (block, path[i+1]);
+	      if (arc.false_value)
+		edge_kind = "true";
+	      else if (arc.false_value)
+		edge_kind = "false";
+	      else if (arc.fall_through)
+		edge_kind = "fallthru";
+	      else if (arc.is_throw)
+		edge_kind = "throw";
+	    }
+
+	  json::object *jblock = new json::object ();
+	  json::array *jlocs = new json::array ();
+	  jblock->set_integer ("block_id", block.id);
+	  jblock->set ("locations", jlocs);
+	  jblock->set_string ("edge_kind", edge_kind);
+	  jlist->append (jblock);
+	  for (const block_location_info &loc : block.locations)
+	    {
+	      json::object *jloc = new json::object ();
+	      json::array *jline_numbers = new json::array ();
+	      jlocs->append (jloc);
+	      jloc->set_string ("file", sources[loc.source_file_idx].name);
+	      jloc->set ("line_numbers", jline_numbers);
+	      for (unsigned line : loc.lines)
+		jline_numbers->append (new json::integer_number (line));
+	    }
+	}
+    }
+}
+
 /* Output the result in JSON intermediate format.
    Source info SRC is dumped into JSON_FILES which is JSON array.  */
 
@@ -1413,6 +1573,7 @@  output_json_intermediate_file (json::array *json_files, source_info *src)
       function->set_integer ("blocks_executed", (*it)->blocks_executed);
       function->set_integer ("execution_count", (*it)->blocks[0].count);
 
+      json_set_prime_path_coverage (*function, **it);
       functions->append (function);
     }
 
@@ -1627,6 +1788,9 @@  process_all_functions (void)
 	  solve_flow_graph (fn);
 	  if (fn->has_catch)
 	    find_exception_blocks (fn);
+
+	  /* For path coverage.  */
+	  find_prime_paths (fn);
 	}
       else
 	{
@@ -2190,6 +2354,13 @@  read_graph_file (void)
 	      info->n_terms = gcov_read_unsigned ();
 	      fn->conditions[i] = info;
 	    }
+    }
+      else if (fn && tag == GCOV_TAG_PATHS)
+	{
+	  const unsigned npaths = gcov_read_unsigned ();
+	  const size_t nbits = path_info::bucketsize;
+	  const size_t nbuckets = (npaths + (nbits - 1)) / nbits;
+	  fn->paths.covered.assign (nbuckets, 0);
 	}
       else if (fn && tag == GCOV_TAG_LINES)
 	{
@@ -2346,6 +2517,17 @@  read_count_file (void)
 	    for (ix = 0; ix != fn->counts.size (); ix++)
 	      fn->counts[ix] += gcov_read_counter ();
 	}
+      else if (tag == GCOV_TAG_FOR_COUNTER (GCOV_COUNTER_PATHS) && fn)
+	{
+	  vector<gcov_type_unsigned> &covered = fn->paths.covered;
+	  length = abs (read_length);
+	  if (length != GCOV_TAG_COUNTER_LENGTH (covered.size ()))
+	    goto mismatch;
+
+	  if (read_length > 0)
+	    for (ix = 0; ix != covered.size (); ix++)
+	      covered[ix] = gcov_read_counter ();
+	}
       if (read_length < 0)
 	read_length = 0;
       gcov_sync (base, read_length);
@@ -2629,6 +2811,68 @@  solve_flow_graph (function_info *fn)
       }
 }
 
+/* Find the prime paths of the function from the CFG and add to FN
+   using the same function as gcc.  It relies on gcc recording the CFG
+   faithfully.  Storing the paths explicitly takes up way too much
+   space to be practical, but this means we need to recompute the
+   (exact) same paths in gcov.  This should give paths in
+   lexicographical order so that the nth path in gcc is the nth path
+   in gcov.  ENTRY_BLOCK and EXIT_BLOCK are both removed from all
+   paths.  */
+static void
+find_prime_paths (function_info *fn)
+{
+  if (!flag_prime_paths)
+    return;
+
+  /* If paths.covered being empty then this function was not
+     instrumented, probably because it exceeded #-of-paths limit.  In
+     this case we don't want to find the prime paths as it will take
+     too long, and covered paths are not measured.  */
+  if (fn->paths.covered.empty ())
+    return;
+
+  struct graph *cfg = new_graph (fn->blocks.size ());
+  for (block_info &block : fn->blocks)
+    {
+      cfg->vertices[block.id].data = &block;
+      for (arc_info *arc = block.succ; arc; arc = arc->succ_next)
+	if (!arc->fake)
+	  add_edge (cfg, arc->src->id, arc->dst->id)->data = arc;
+    }
+
+  vec<vec<int>> prime_paths (struct graph*, size_t);
+  /* TODO: Pass extra information in the PATH_TAG section.  In case
+     that is empty this might still need to be tunable should the
+     coverage be requested without instrumentation.  */
+  vec<vec<int>> paths = prime_paths (cfg, (size_t)-1);
+  fn->paths.paths.reserve (paths.length ());
+  for (vec<int> &path : paths)
+    {
+      const int *begin = path.begin ();
+      const int *end = path.end ();
+      if (begin != end && path.last () == EXIT_BLOCK)
+	--end;
+      if (begin != end && *begin == ENTRY_BLOCK)
+	++begin;
+
+      if (begin == end)
+	continue;
+
+      /* If this is an isolated vertex because abnormal edges and fake
+	 edges are removed, don't include it.  */
+      if (end - begin == 1 && !cfg->vertices[*begin].succ
+	  && !cfg->vertices[*begin].pred)
+	continue;
+
+      fn->paths.paths.emplace_back (begin, end);
+    }
+
+  gcc_checking_assert (fn->paths.paths.size () == paths.length ());
+  release_vec_vec (paths);
+  free_graph (cfg);
+}
+
 /* Mark all the blocks only reachable via an incoming catch.  */
 
 static void
@@ -2689,6 +2933,16 @@  add_condition_counts (coverage_info *coverage, const block_info *block)
   coverage->conditions_covered += block->conditions.popcount ();
 }
 
+/* Increment path totals, number of paths and number of covered paths,
+   in COVERAGE according to FN.  */
+
+static void
+add_path_counts (coverage_info *coverage, const function_info *fn)
+{
+  coverage->paths += fn->paths.paths.size ();
+  coverage->paths_covered = fn->paths.covered_paths ();
+}
+
 /* Format COUNT, if flag_human_readable_numbers is set, return it human
    readable format.  */
 
@@ -2805,6 +3059,16 @@  file_summary (const coverage_info *coverage)
       else
 	fnotice (stdout, "No conditions\n");
     }
+
+  if (flag_prime_paths)
+    {
+      if (coverage->paths)
+	fnotice (stdout, "Prime paths covered:%s of %d\n",
+		 format_gcov (coverage->paths_covered,
+			      coverage->paths, 2), coverage->paths);
+      else
+	fnotice (stdout, "No paths\n");
+    }
 }
 
 /* Canonicalize the filename NAME by canonicalizing directory
@@ -3092,6 +3356,8 @@  accumulate_line_counts (source_info *src)
     {
       function_info *fn = *it;
 
+      add_path_counts (&src->coverage, fn);
+
       if (fn->src != src->index || !fn->is_group)
 	continue;
 
@@ -3223,6 +3489,162 @@  output_branch_count (FILE *gcov_file, int ix, const arc_info *arc)
   return 1;
 }
 
+static void
+print_source_line (FILE *f, const vector<const char *> &source_lines,
+		   unsigned line);
+
+
+/* Print a dense coverage report for PATH of FN to GCOV_FILE.  PATH should be
+   number PATHNO in the sorted set of paths.  This function prints a dense form
+   where only the line numbers, and optionally the source file the line comes
+   from, in the order they need to be executed to achieve coverage.  This
+   produces very long lines for large functions, but is a useful and greppable
+   output.
+
+   Returns 0 if the path is covered, and 1 if not covered.  */
+static unsigned
+print_prime_path_edges (FILE *gcov_file, const function_info &fn,
+			const vector<unsigned> &path, size_t pathno)
+{
+  if (fn.paths.covered_p (pathno))
+    return 0;
+
+  fprintf (gcov_file, "path %2zu not covered: lines", pathno);
+  for (size_t k = 0; k != path.size (); ++k)
+    {
+      const block_info &block = fn.blocks[path[k]];
+      const char *edge_kind = "";
+      if (k + 1 != path.size ())
+	{
+	  gcc_checking_assert (block.id == path[k]);
+	  const arc_info &arc = find_arc (block, path[k+1]);
+	  if (arc.true_value)
+	    edge_kind = "(true)";
+	  else if (arc.false_value)
+	    edge_kind = "(false)";
+	  else if (arc.is_throw)
+	    edge_kind = "(throw)";
+	}
+
+      for (const block_location_info &loc : block.locations)
+	{
+	  if (loc.source_file_idx == fn.src)
+	    fprintf (gcov_file, " %u%s", loc.lines.back (), edge_kind);
+	  else
+	    fprintf (gcov_file, " %s:%u%s", sources[loc.source_file_idx].name,
+		     loc.lines.back (), edge_kind);
+	}
+    }
+
+  fprintf (gcov_file, "\n");
+  return 1;
+}
+
+/* TODO: rename */
+static unsigned
+print_inlined_info (FILE *gcov_file, unsigned current_index,
+		    const block_location_info &loc,
+		    const function_info &fn)
+{
+  if (loc.source_file_idx != current_index && loc.source_file_idx == fn.src)
+    fprintf (gcov_file, "-------------------------\n");
+  if (loc.source_file_idx != current_index && loc.source_file_idx != fn.src)
+    fprintf (gcov_file, " == inlined from %s ==\n",
+	     sources[loc.source_file_idx].name);
+  return loc.source_file_idx;
+}
+
+/* Print a coverage report for PATH of FN to GCOV_FILE.  PATH should be number
+   PATHNO in the sorted set of paths.  This function prints the lines that need
+   to be executed (and in what order) to cover it.
+
+   Returns 0 if the path is covered, and 1 if not covered.  */
+static unsigned
+print_prime_path_source (FILE *gcov_file, const function_info &fn,
+			 const vector<unsigned> &path, size_t pathno)
+{
+  if (fn.paths.covered_p (pathno))
+    return 0;
+  fprintf (gcov_file, "path %zu:\n", pathno);
+  unsigned current = fn.src;
+  for (size_t k = 0; k != path.size (); ++k)
+    {
+      const unsigned bb = path[k];
+      const block_info &block = fn.blocks[bb];
+      gcc_checking_assert (block.id == bb);
+
+      const char *edge_kind = "";
+      if (k + 1 != path.size ())
+	{
+	  const arc_info &arc = find_arc (block, path[k+1]);
+	  if (arc.true_value)
+	    edge_kind = "(true)";
+	  else if (arc.false_value)
+	    edge_kind = "(false)";
+	  else if (arc.is_throw)
+	    edge_kind = "(throw)";
+	}
+
+      for (const block_location_info &loc : block.locations)
+	{
+	  const source_info &src = sources[loc.source_file_idx];
+	  const vector<const char *> &lines = slurp (src, gcov_file, "");
+	  current = print_inlined_info (gcov_file, current, loc, fn);
+	  for (unsigned i = 0; i != loc.lines.size () - 1; ++i)
+	    {
+	      const unsigned line = loc.lines[i];
+	      fprintf (gcov_file, "BB %2d: %-7s %3d", bb, "", line);
+	      print_source_line (gcov_file, lines, line);
+	    }
+
+	  const unsigned line = loc.lines.back ();
+	  fprintf (gcov_file, "BB %2d: %-7s %3d", bb, edge_kind, line);
+	  print_source_line (gcov_file, lines, line);
+	}
+    }
+
+  fputc ('\n', gcov_file);
+  return 1;
+}
+
+/* Print path coverage counts for FN to GCOV_FILE.  LINES is the vector of
+   source lines for FN.  Note that unlike statements, branch counts, and
+   conditions, this is not anchored to source lines but the function root.  */
+static int
+output_path_coverage (FILE *gcov_file, const function_info *fn)
+{
+  if (!flag_prime_paths)
+    return 0;
+
+  fnotice (gcov_file, "paths covered %u of %zu\n", fn->paths.covered_paths (),
+	   fn->paths.paths.size ());
+
+  if (flag_prime_paths_edges)
+    {
+      size_t pathno = 0;
+      unsigned limit = flag_prime_paths_limit;
+      for (const vector<unsigned> &path : fn->paths.paths)
+	{
+	  limit -= print_prime_path_edges (gcov_file, *fn, path, pathno++);
+	  if (limit == 0)
+	    break;
+	}
+    }
+
+  if (flag_prime_paths_source)
+    {
+      size_t pathno = 0;
+      unsigned limit = flag_prime_paths_limit;
+      for (const vector<unsigned> &path : fn->paths.paths)
+	{
+	  limit -= print_prime_path_source (gcov_file, *fn, path, pathno++);
+	  if (limit == 0)
+	    break;
+	}
+    }
+  return 1;
+}
+
 static const char *
 read_line (FILE *file)
 {
@@ -3557,6 +3979,7 @@  output_lines (FILE *gcov_file, const source_info *src)
 	    {
 	      function_info *fn = (*fns)[0];
 	      output_function_details (gcov_file, fn);
+	      output_path_coverage (gcov_file, fn);
 
 	      /* If functions are filtered, only the matching functions will be in
 		 fns and there is no need for extra checking.  */
@@ -3602,6 +4025,7 @@  output_lines (FILE *gcov_file, const source_info *src)
 	      fprintf (gcov_file, "%s:\n", fn_name.c_str ());
 
 	      output_function_details (gcov_file, fn);
+	      output_path_coverage (gcov_file, fn);
 
 	      /* Print all lines covered by the function.  */
 	      for (unsigned i = 0; i < lines.size (); i++)
diff --git a/gcc/gimple-iterator.cc b/gcc/gimple-iterator.cc
index 93646262eac..d2c248defb8 100644
--- a/gcc/gimple-iterator.cc
+++ b/gcc/gimple-iterator.cc
@@ -978,6 +978,8 @@  edge_before_returns_twice_call (basic_block bb)
 	split = true;
       other_edge = e;
     }
+  if (!ad_edge)
+    printf ("return-twice in %d\n", bb->index);
   gcc_checking_assert (ad_edge);
   if (other_edge == NULL)
     split = true;
diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
index 9fc41b7696d..f04db904ce5 100644
--- a/gcc/ipa-inline.cc
+++ b/gcc/ipa-inline.cc
@@ -699,7 +699,7 @@  can_early_inline_edge_p (struct cgraph_edge *e)
     }
   gcc_assert (gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->caller->decl))
 	      && gimple_in_ssa_p (DECL_STRUCT_FUNCTION (callee->decl)));
-  if ((profile_arc_flag || condition_coverage_flag)
+  if ((profile_arc_flag || condition_coverage_flag || path_coverage_flag)
       && ((lookup_attribute ("no_profile_instrument_function",
 			    DECL_ATTRIBUTES (caller->decl)) == NULL_TREE)
 	  != (lookup_attribute ("no_profile_instrument_function",
diff --git a/gcc/passes.cc b/gcc/passes.cc
index d73f8ba97b6..010a98d9bca 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -352,8 +352,8 @@  finish_optimization_passes (void)
   gcc::dump_manager *dumps = m_ctxt->get_dumps ();
 
   timevar_push (TV_DUMP);
-  if (profile_arc_flag || condition_coverage_flag || flag_test_coverage
-      || flag_branch_probabilities)
+  if (profile_arc_flag || condition_coverage_flag || path_coverage_flag
+      || flag_test_coverage || flag_branch_probabilities)
     {
       dumps->dump_start (pass_profile_1->static_pass_number, NULL);
       end_branch_prob ();
diff --git a/gcc/path-coverage.cc b/gcc/path-coverage.cc
new file mode 100644
index 00000000000..c4b03731863
--- /dev/null
+++ b/gcc/path-coverage.cc
@@ -0,0 +1,627 @@ 
+/* Calculate prime path coverage.
+   Copyright (C) 2024-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic-core.h"
+#include "target.h"
+#include "function.h"
+#include "basic-block.h"
+#include "tree.h"
+#include "tree-cfg.h"
+#include "cfg.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "coverage.h"
+#include "ssa.h"
+#include "vec.h"
+#include "sbitmap.h"
+#include "graphds.h"
+#include "hash-map.h"
+
+bool mark_dfs_back_edges (struct function *);
+vec<vec<int>> prime_paths (struct graph*, size_t);
+
+namespace
+{
+
+/* Check if all the successors of BB are abnormal, e.g. setjmp.  */
+bool
+all_abnormal_succs_p (basic_block bb)
+{
+  for (edge e : bb->succs)
+    if (!(e->flags & EDGE_ABNORMAL))
+      return false;
+  return true;
+}
+
+/* Build a struct graph equivalent to the CFG for the function FN.  The caller
+   must free the returned graph with free_graph.  The data field of every
+   vertex and edge point to the basic blocks and edges in the CFG.
+
+   The CFG recording and gcov is not aware of abnormal edges, so they are
+   ignored here, too.  This means some paths are lost, e.g. those that involve
+   setjmp/longjmp.  They are still paths but would need more support from
+   profile.cc and gcov.cc to work.  */
+struct graph*
+cfg_as_graph (struct function* fn)
+{
+  struct graph *g = new_graph (n_basic_blocks_for_fn (fn));
+  basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (fn);
+  basic_block exit = EXIT_BLOCK_PTR_FOR_FN (fn);
+
+  g->vertices[entry->index].data = entry;
+  g->vertices[exit->index].data = exit;
+  basic_block top = single_succ (entry);
+  add_edge (g, entry->index, top->index)->data = single_succ_edge (entry);
+
+  const unsigned ignore = EDGE_FAKE | EDGE_ABNORMAL | EDGE_ABNORMAL_CALL;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fn)
+    {
+      g->vertices[bb->index].data = bb;
+      for (edge e : bb->succs)
+	if (!(e->flags & ignore))
+	  add_edge (g, e->src->index, e->dest->index)->data = e;
+    }
+  return g;
+}
+
+/* Find the prime paths for the function FN with the ENTRY and EXIT blocks
+   removed.  This can lead to empty paths when there is a fake edge to the exit
+   block, for example for abort functions or infinite loops.  Empty paths are
+   removed because the length of the returned vec is important.  */
+vec<vec<int>>
+find_prime_paths (struct function *fn)
+{
+  struct graph *cfg = cfg_as_graph (fn);
+  vec<vec<int>> paths = prime_paths (cfg, path_coverage_limit);
+
+  bool any_empty = false;
+  for (vec<int> &path : paths)
+    {
+      /* setjmp calls will be isolated basic blocks when ABNORMAL_EDGEs are
+	 removed.  If a path is made up of just such a vertex it is pointless
+	 and can be removed.  */
+      if (path.length () == 1
+	  && all_abnormal_succs_p (BASIC_BLOCK_FOR_FN (fn, path[0])))
+	path.pop ();
+      if (!path.is_empty () && path[0] == ENTRY_BLOCK)
+	path.ordered_remove (0);
+      if (!path.is_empty () && path.last () == EXIT_BLOCK)
+	path.pop ();
+
+      if (path.is_empty ())
+	{
+	  any_empty = true;
+	  path.release ();
+	}
+    }
+
+  unsigned ix1, ix2;
+  vec<int> *ptr;
+  if (any_empty)
+    VEC_ORDERED_REMOVE_IF (paths, ix1, ix2, ptr, ptr->is_empty ());
+
+  return paths;
+}
+
+/* Return the edge between SRC and DST.  */
+edge
+edge_between (struct function *fn, int src, int dst)
+{
+  basic_block bbsrc = BASIC_BLOCK_FOR_FN (fn, src);
+  basic_block bbdst = BASIC_BLOCK_FOR_FN (fn, dst);
+  for (edge e : bbsrc->succs)
+    if (e->dest == bbdst)
+      return e;
+  gcc_checking_assert (false);
+  return NULL;
+}
+
+/* Visitor for topsort.  */
+void
+topsort1 (basic_block b, vec<basic_block>& L, sbitmap visited)
+{
+  if (bitmap_bit_p (visited, b->index))
+    return;
+
+  for (edge e : b->succs)
+    if (!(e->flags & EDGE_DFS_BACK))
+      topsort1 (e->dest, L, visited);
+
+  bitmap_set_bit (visited, b->index);
+  L.quick_push (b);
+}
+
+/* Get the basic blocks of FN in topological order so that all predecessors
+   come before the vertex, while ignoring back edges.  This is a depth-first
+   approach similar to the algorithm described in Cormen et al (2001).  */
+vec<basic_block>
+topsort (struct function *fn)
+{
+  vec<basic_block> L {};
+  L.reserve (n_basic_blocks_for_fn (fn));
+
+  auto_sbitmap visited (n_basic_blocks_for_fn (fn));
+  bitmap_clear (visited);
+  bitmap_set_bit (visited, EXIT_BLOCK);
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fn)
+    topsort1 (bb, L, visited);
+
+  L.reverse ();
+  return L;
+}
+
+/* Hasher for maps where the key is a pair of edge/basic_block and bucket id
+   (size_t).  */
+template <typename T>
+using bucket_hash = pair_hash <nofree_ptr_hash <T>,
+			       int_hash <size_t, -1>>;
+
+/* Hasher for {edge, bucket-id} keys.  */
+using edge_hash = bucket_hash <edge_def>;
+/* Hasher for {basic_block, bucket-id} keys.  */
+using block_hash = bucket_hash <basic_block_def>;
+
+/* Find the union of possible preserved by bitwise-ANDs for the bucket BUCKET
+   of BB.  If this returns zero no paths go through BB at BUCKET.  */
+uint64_t
+union_incoming_bit_and (hash_map <edge_hash, uint64_t> &ands,
+			const basic_block bb, size_t bucket)
+{
+  uint64_t inc = 0;
+  for (edge e : bb->preds)
+    {
+      const uint64_t *val = ands.get ({e, bucket});
+      inc |= (val ? *val : 0);
+    }
+  return inc;
+}
+
+
+/* Check if the incoming edges have the power to modify the paths in flight for
+   BUCKET.  If all the incoming edges would apply the same bitmask the
+   BB+BUCKET will not actually update the path set, and we don't need to emit
+   an AND_EXPR and have a later fork distinguish the paths.  */
+bool
+can_change_path_p (hash_map <edge_hash, uint64_t> &ands,
+		   const basic_block bb, size_t bucket, uint64_t all_and)
+{
+  for (edge e : bb->preds)
+    {
+      const uint64_t *e_and = ands.get ({e, bucket});
+      if (!e_and || *e_and != all_and)
+	return true;
+    }
+  return false;
+}
+
+/* Check if all bits in BITSET are 1 for the target size TARGET_SIZE.  For a
+   32-bit target only the bits in the lower half should be set, and this should
+   return true when all bits in the lower half are set, even if the bitset type
+   have room for more bits.  */
+bool
+all_bits_set_p (uint64_t bitset, size_t target_size)
+{
+  return (size_t)popcount_hwi (bitset) == target_size;
+}
+
+/* Create a constant or SSA name of GCOV_TYPE_NODE type and zero-assign to it
+   safely on the edge E.  If the edge is abnormal it is assumed the phi is
+   abnormal and we need an SSA name, otherwise fall back to a constant.  The
+   returned value is safe to use with add_phi_arg ().
+
+   If the edge is abnormal we cannot insert on it directly, and instead
+   carefully add the assignment on the source block.  If that source block ends
+   with control flow (like those produced by _setjmp) we must insert before to
+   not break that invariant, otherwise insert after so that things like the
+   setjmp receiver is the first element of the basic block.  Doing the assign
+   is necessary as phis cannot resolve to constants.  */
+tree
+safe_insert_zero_phi_arg (edge e, tree gcov_type_node)
+{
+  tree cst = build_zero_cst (gcov_type_node);
+  if (!(e->flags & (EDGE_ABNORMAL | EDGE_ABNORMAL_CALL)))
+    return cst;
+
+  tree zero = make_ssa_name (gcov_type_node);
+  gassign *put = gimple_build_assign (zero, cst);
+  gimple_stmt_iterator gsi = gsi_after_labels (e->src);
+  if (stmt_ends_bb_p (*gsi))
+    gsi_insert_before (&gsi, put, GSI_NEW_STMT);
+  else
+    gsi_insert_after (&gsi, put, GSI_NEW_STMT);
+
+  return zero;
+}
+
+/* Check if SSA is a constant value (created by build_int_cst) and can be
+   folded.  */
+bool
+constant_p (tree ssa)
+{
+  return tree_fits_uhwi_p (ssa);
+}
+
+/* A fixup task.  When resolving the exit SSA for a back edge arg to a phi
+   node, the exit SSA has not been created yet.  Record what needs to be done
+   when it is created, and tie the phi to the right SSA name once it is
+   guaranteed it is created.  If MASK is nullptr the predecessor's SSA should
+   be used as-is, and does not need an AND.  This should only be created with
+   the helpers fixup_noop and fixup_and.  */
+struct fixup
+{
+  gphi *phi;
+  edge e;
+  tree lhs;
+  tree mask;
+  size_t bucket;
+};
+
+/* Create a fixup with a no-op for the PHI in BUCKET. Use this when the edge E
+   does not need an AND applied and should rather propagate the predecessor's
+   SSA name.  */
+fixup
+fixup_noop (gphi *phi, edge e, size_t bucket)
+{
+  fixup todo;
+  todo.phi = phi;
+  todo.e = e;
+  todo.bucket = bucket;
+  todo.lhs = nullptr;
+  todo.mask = nullptr;
+  return todo;
+}
+
+/* Create a fixup for PHI through BUCKET with the exit SSA name E->src ANDed
+   with MASK (E->src & MASK).  GCOV_TYPE_NODE should be a tree of the gcov type
+   node for creating SSA names.  */
+fixup
+fixup_and (gphi *phi, edge e, size_t bucket, uint64_t mask,
+	   tree gcov_type_node)
+{
+  fixup todo;
+  todo.phi = phi;
+  todo.e = e;
+  todo.bucket = bucket;
+  todo.lhs = make_ssa_name (gcov_type_node);
+  todo.mask = build_int_cst (gcov_type_node, mask);
+  return todo;
+}
+
+} // namespace
+
+/* Instrument FN for prime path coverage, enabled by -fpath-coverage.  The
+   number of paths grows very fast with the complexity (control flow) which
+   explodes compile times and object file size.  Giving up is controlled by the
+   -fpath-coverage-limit flag.  The paths are sorted lexicographically by basic
+   block id, and each path is identified by its index in the sorted set of
+   paths, which in turn corresponds to a bit in a large bitset associated with
+   FN.  The monitoring code is a few bitwise operations added to edges and
+   basic blocks to maintain a set of live paths (note that many paths may
+   overlap and that many paths may be covered by the same input).  When
+   reaching the final vertex of a path the global counters are updated.
+
+   This is made more complicated by the gcov type having very limited size.  To
+   support functions with more than 32/64 paths the bitmap is implemented on
+   top of a sequence of gcov integers, "buckets", where path N is recorded as
+   bit N%64 in bucket N/64.  For large functions, an individual basic block
+   will only be part of a small subset of paths, and by extension buckets and
+   local counters.  Only the necessary counters are read and written.  */
+void
+find_paths (struct function *fn)
+{
+  mark_dfs_back_edges (fn);
+  vec<vec<int>> paths = find_prime_paths (fn);
+
+  if (paths.is_empty ())
+    {
+      warning (OPT_Wcoverage_too_many_paths,
+	       "paths exceeding limit, giving up path coverage");
+      release_vec_vec (paths);
+      return;
+    }
+
+  tree gcov_type_node = get_gcov_type ();
+  const size_t bucketsize = TYPE_PRECISION (gcov_type_node);
+  const size_t nbuckets = (paths.length () + (bucketsize - 1)) / bucketsize;
+  gcc_checking_assert (sizeof (uint64_t) * BITS_PER_UNIT >= bucketsize);
+
+  if (!coverage_counter_alloc (GCOV_COUNTER_PATHS, nbuckets))
+    {
+      release_vec_vec (paths);
+      return;
+    }
+
+  gcov_position_t offset {};
+  offset = gcov_write_tag (GCOV_TAG_PATHS);
+  gcov_write_unsigned (paths.length ());
+  gcov_write_length (offset);
+
+  hash_map <edge_hash, uint64_t> ands;
+  hash_map <block_hash, uint64_t> iors;
+  hash_map <block_hash, uint64_t> flushes;
+
+  /* Now that we have all paths we must figure out what bitmasks must be
+     applied on the edges.  We also record in which basic block the path starts
+     (e.g.  accumulator resets) and ends (accumulator flushes).  */
+  for (size_t pathno = 0; pathno != paths.length (); ++pathno)
+    {
+      const vec<int> &path = paths[pathno];
+      const size_t bucket = pathno / bucketsize;
+      const size_t bit = uint64_t (1) << (pathno % bucketsize);
+
+      basic_block first = BASIC_BLOCK_FOR_FN (fn, path[0]);
+      basic_block last = BASIC_BLOCK_FOR_FN (fn, path[path.length () - 1]);
+
+      for (unsigned i = 1; i != path.length (); ++i)
+	{
+	  edge e = edge_between (fn, path[i-1], path[i]);
+	  ands.get_or_insert ({e, bucket}) |= bit;
+	}
+
+      iors.get_or_insert ({first, bucket}) |= bit;
+      flushes.get_or_insert ({last, bucket}) |= bit;
+    }
+
+  /* The basic blocks (except entry, exit) for this function, in topological
+     order.  Processing in this order means that the predecessor(s) exit SSAs
+     will have been created by the time a block is processed, unless it is a
+     loop/back edge.  This simplifies processing a bit.  */
+  vec<basic_block> blocks = topsort (fn);
+
+  /* The exit SSA names for the BLOCK, the SSA name the BLOCK successors should
+     use as inputs.  */
+  hash_map<block_hash, tree> SSAex;
+  /* The entry SSA name for the BLOCK.  This name forms the basis for the
+     flushing to the global accumulators.  In the presence of phi nodes this is
+     the resolved phi, otherwise it is some predecessor's exit SSA name.  */
+  hash_map<block_hash, tree> SSAen;
+
+  auto_vec<fixup, 4> todos;
+
+  /* Generate the operations for each basic block.  */
+  for (basic_block bb : blocks)
+    {
+      for (size_t bucket = 0; bucket != nbuckets; ++bucket)
+	{
+	  tree ssa = nullptr;
+	  gphi *phi = nullptr;
+	  uint64_t all_and = union_incoming_bit_and (ands, bb, bucket);
+
+	  if (all_and && single_pred_p (bb))
+	    {
+	      /* There is a path on this edge through the bucket, but since
+		 there is only one predecessor there it has no decisive power.
+		 Just push the predecessor's exit and have later ANDs sort it
+		 out.  */
+	      tree *prev = SSAex.get ({single_pred (bb), bucket});
+	      gcc_checking_assert (prev);
+	      ssa = *prev;
+	    }
+	  else if (all_and)
+	    {
+	      /* There are multiple predecessors, so we need a phi.  */
+	      ssa = make_ssa_name (gcov_type_node);
+	      phi = create_phi_node (ssa, bb);
+	    }
+
+	  if (ssa)
+	    SSAen.put ({bb, bucket}, ssa);
+
+	  if (single_pred_p (bb) && single_succ_p (bb))
+	    {
+	      /* Straight line -- the AND mask will already have been applied
+		 to the first ancestor of this chain, so we don't need to apply
+		 it here.  */
+	    }
+	  else if (!can_change_path_p (ands, bb, bucket, all_and))
+	    {
+	      /* All incoming edges would apply the same mask, so applying the
+		 AND here would not actually distinguish paths.  Such an AND
+		 will be applied later anyway so we don't need to apply it
+		 here.  This is a huge improvement for large programs.  */
+	    }
+	  else for (edge e : bb->preds)
+	    {
+	      const uint64_t *bitmask = ands.get ({e, bucket});
+	      /* There is no phi, and there are no paths through this bucket.
+		 Set the SSA name to nullptr so we don't contaminate it by
+		 pushing unrelated predecessors.  */
+	      if (!bitmask && !phi)
+		ssa = nullptr;
+	      else if (!bitmask && phi)
+		{
+		  tree zero = safe_insert_zero_phi_arg (e, gcov_type_node);
+		  add_phi_arg (phi, zero, e, UNKNOWN_LOCATION);
+		}
+	      else if (all_bits_set_p (*bitmask, bucketsize) && !phi)
+		{
+		  /* This reduces to a no-op (x & ~0) and there is no phi
+		     selection, so just push the SSA.  */
+		  gcc_checking_assert (ssa);
+		}
+	      else if (all_bits_set_p (*bitmask, bucketsize) && phi)
+		{
+		  /* This reduces to a no-op (x & ~0).  Reusing the SSA and not
+		     emitting an unecessary AND is a big improvement for large
+		     programs.  */
+		  tree *prev = SSAex.get ({e->src, bucket});
+		  if (prev)
+		    add_phi_arg (phi, *prev, e, UNKNOWN_LOCATION);
+		  else
+		    todos.safe_push (fixup_noop (phi, e, bucket));
+		}
+	      else if (SSAex.get ({e->src, bucket}))
+		{
+		  /* We need to apply a mask, folding if possible.  If there is
+		     a phi it is already the latest-touched ssa.  */
+		  tree prev = *SSAex.get ({e->src, bucket});
+		  gcc_assert (prev);
+		  if (constant_p (prev))
+		    {
+		      const uint64_t x = tree_to_uhwi (prev);
+		      tree cst = build_int_cst (gcov_type_node, x & *bitmask);
+		      if (phi)
+			add_phi_arg (phi, cst, e, UNKNOWN_LOCATION);
+		      else
+			ssa = cst;
+		    }
+		  else
+		    {
+		      tree lhs = make_ssa_name (gcov_type_node);
+		      tree mask = build_int_cst (gcov_type_node, *bitmask);
+		      gassign *put = gimple_build_assign (lhs, BIT_AND_EXPR, prev, mask);
+		      gsi_insert_on_edge (e, put);
+		      if (phi)
+			add_phi_arg (phi, lhs, e, UNKNOWN_LOCATION);
+		      else
+			ssa = lhs;
+		    }
+		}
+	      else
+		{
+		  /* There is a phi and this edge is a back edge,
+		     which means the predecessor (and descendant) exit
+		     SSA has not been created yet.  */
+		  gcc_assert (phi);
+		  gcc_assert (e->flags & EDGE_DFS_BACK);
+		  fixup todo = fixup_and (phi, e, bucket, *bitmask,
+					  gcov_type_node);
+		  todos.safe_push (todo);
+		}
+	    }
+
+	  /* Bitwise IOR.  Unlike the AND this assignment can always be created
+	     right away as this should be applied to the result of the phi,
+	     AND, or single predecessor's exit SSA, and all of those have
+	     already been created.  */
+	  const uint64_t *ior = iors.get ({bb, bucket});
+	  if (ior && !ssa)
+	    {
+	      /* In case there was no predecessor, the IOR/initial state can
+		 just be a constant.  In this case, the IOR also becomes the
+		 block's entry node which means it will be considered for
+		 flushing in single-vertex paths.  */
+	      ssa = build_int_cst (gcov_type_node, *ior);
+	      SSAen.put ({bb, bucket}, ssa);
+	    }
+	  else if (ior && all_bits_set_p (*ior, bucketsize))
+	    ssa = build_all_ones_cst (gcov_type_node);
+	  else if (ior)
+	    {
+	      gcc_assert (ssa);
+	      tree next = make_ssa_name (gcov_type_node);
+	      tree mask = build_int_cst (gcov_type_node, *ior);
+	      gassign *put = gimple_build_assign (next, BIT_IOR_EXPR, ssa, mask);
+	      gimple_stmt_iterator gsi = gsi_after_labels (bb);
+	      gsi_insert_before (&gsi, put, GSI_NEW_STMT);
+	      ssa = next;
+	    }
+
+	  if (ssa)
+	    SSAex.put ({bb, bucket}, ssa);
+	}
+    }
+
+  /* Apply fixups -- now that all exit SSA names are created we can properly
+     set the phi argument if there is a phi node, and emit the (x & mask)
+     instruction if necessary.  */
+  for (fixup &todo : todos)
+    {
+      tree *exit = SSAex.get ({todo.e->src, todo.bucket});
+      gcc_assert (exit && *exit);
+      gcc_checking_assert (todo.phi);
+      if (todo.mask)
+	{
+	  gassign *put = gimple_build_assign (todo.lhs, BIT_AND_EXPR, *exit,
+					      todo.mask);
+	  gsi_insert_on_edge (todo.e, put);
+	}
+
+      add_phi_arg (todo.phi, todo.lhs ? todo.lhs : *exit, todo.e,
+		   UNKNOWN_LOCATION);
+    }
+
+  /* Finally, add instructions to update the global counters.  */
+  for (basic_block bb : blocks)
+    {
+      gimple_stmt_iterator gsi = gsi_after_labels (bb);
+      for (size_t bucket = 0; bucket != nbuckets; ++bucket)
+	{
+	  const uint64_t *bitmask = flushes.get ({bb, bucket});
+	  if (!bitmask || !*bitmask)
+	    continue;
+
+	  tree *exit = SSAen.get ({bb, bucket});
+	  gcc_checking_assert (exit);
+	  if (!*exit)
+	    continue;
+
+	  gcc_assert (*bitmask);
+	  tree counter = tree_coverage_counter_ref (GCOV_COUNTER_PATHS, bucket);
+	  if (!all_bits_set_p (*bitmask, bucketsize))
+	    {
+	      /* We need to apply an AND to only write the paths in the bucket
+		 that end in this vertex.
+
+		 global_accu |= (local_accu & paths);  */
+	      tree cst = build_int_cst (gcov_type_node, *bitmask);
+	      tree tmp1 = make_ssa_name (gcov_type_node);
+	      tree tmp2 = make_ssa_name (gcov_type_node);
+	      tree tmp3 = make_ssa_name (gcov_type_node);
+
+	      gassign *ga1 = gimple_build_assign (tmp1, counter);
+	      gassign *ga2 = gimple_build_assign (tmp2, BIT_AND_EXPR, *exit, cst);
+	      gassign *ga3 = gimple_build_assign (tmp3, BIT_IOR_EXPR, tmp1, tmp2);
+	      gassign *ga4 = gimple_build_assign (unshare_expr (counter), tmp3);
+	      gsi_insert_before (&gsi, ga1, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga2, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga3, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga4, GSI_SAME_STMT);
+	    }
+	  else
+	    {
+	      /* Every path in this bucket ends in this vertex, so we just
+		 apply the IOR and don't need to mask out anything.
+
+		 global_accu |= local_accu;  */
+	      tree tmp1 = make_ssa_name (gcov_type_node);
+	      tree tmp2 = make_ssa_name (gcov_type_node);
+
+	      gassign *ga1 = gimple_build_assign (tmp1, counter);
+	      gassign *ga2 = gimple_build_assign (tmp2, BIT_IOR_EXPR,
+						  tmp1, *exit);
+	      gassign *ga3 = gimple_build_assign (unshare_expr
+						  (counter), tmp2);
+	      gsi_insert_before (&gsi, ga1, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga2, GSI_SAME_STMT);
+	      gsi_insert_before (&gsi, ga3, GSI_SAME_STMT);
+	    }
+	}
+    }
+
+  blocks.release ();
+  release_vec_vec (paths);
+}
diff --git a/gcc/prime-paths.cc b/gcc/prime-paths.cc
new file mode 100644
index 00000000000..326859bd62c
--- /dev/null
+++ b/gcc/prime-paths.cc
@@ -0,0 +1,1939 @@ 
+#define INCLUDE_ALGORITHM
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "obstack.h"
+#include "sbitmap.h"
+#include "vec.h"
+#include "graphds.h"
+#include "selftest.h"
+
+namespace
+{
+
+/* A silly RAII wrapper for struct graph.  The prime_paths function has multiple
+   returns, and this helps reliably clean up.  */
+struct auto_graph
+{
+  auto_graph (struct graph *graph) : ptr (graph) {}
+  auto_graph (const auto_graph &) = delete;
+  ~auto_graph () { free_graph (ptr); }
+  operator struct graph* () { return ptr; }
+  struct graph* operator -> () { return ptr; }
+  graph *ptr;
+};
+
+/* A silly RAII wrapper for an sbitmap vector.  The prime_paths function has
+   multiple returns, and this helps reliably clean up.  */
+struct auto_sbitmap_vector
+{
+  auto_sbitmap_vector (sbitmap *s) : ptr (s) {}
+  auto_sbitmap_vector (const auto_sbitmap_vector &) = delete;
+  ~auto_sbitmap_vector () { sbitmap_vector_free (ptr); }
+  operator sbitmap* () { return ptr; }
+  sbitmap* ptr;
+};
+
+/* A silly RAII wrpaper for automatically releasing a vec<vec<int>>.  */
+struct auto_vec_vec : vec<vec<int>>
+{
+  ~auto_vec_vec () { release_vec_vec (*this); }
+};
+
+/* A silly RAII wrpaper for automatically releasing a vec<vec<vec<int>>>.  */
+struct auto_vec_vec_vec : vec<vec<vec<int>>>
+{
+  ~auto_vec_vec_vec ()
+  {
+    for (vec<vec<int>> &v : *this)
+      release_vec_vec (v);
+    release ();
+  }
+};
+
+/* A trivial key/value pair for a short linear map type.  */
+struct xpair
+{
+  int key;
+  unsigned val;
+};
+
+/* A node in a trie, optimized for mid-sized alphabets possibly larger than 256
+   but not much more.  Finding the prime paths ends up creating a large amount
+   of these nodes so space and access costs matters a lot.
+
+   The node does not explicitly store its own key (CFG vertex ID/basic block
+   index), nor does it store pointers to its successors. Rather, it stores the
+   key+offset pairs for its successors the root trie object, and in a sense
+   behaves like near pointers.  This makes the trie vertices small and
+   reloctable, and removes the need for pointer chasing when releasing the trie.
+
+   The union of near/far is essentially a short-vector optimization, switching
+   to a heap-allocated vector when necessary. This happens relatively rarely
+   (usually maxes out at 1-2%), and the vertices that have more than 2 sucessors
+   also tend to have more than 4. The root vertex tends to use the dynamic
+   vector because the subpaths are recorded as the successors of the root.
+
+   Conceptually, this is a small map from vertex-id -> index and the API is
+   modelled as such.  The insert and search functions are unrolled by hand when
+   using the small vector.  This has a noticable performance impact on insert in
+   particular, and is not too complex since we know we are limited to 2
+   elements.
+
+   Vertices are tagged with endofpath and inserted.  If endofpath is set, the
+   path from the root to this vertex is a complete path.  If inserted is set
+   then the vertex is a part of proper path (one given to insert) and not
+   generated as a suffix.  For example:
+
+   insert ([2 4 6])
+   insert ([9 7 2 4 6])
+   insert ([2 4 6 8])
+
+   The inserted flags for [2 4 6] are not cleared, because otherwise [2 4 6 8]
+   would be dropped when only following inserted vertices.  The endofpath flag
+   in [2 4 6] is cleared when the suffixes of [9 7 2 4 6] are inserted.
+
+   The node will be inserted into a vec, and should be trivial.  Instances
+   should be value-initialized to zero-intialized state.  */
+struct trie_node
+{
+  unsigned length () const
+  { return !heaped ? len : far.length (); }
+
+  const xpair *begin () const
+  { return !heaped ? near : far.begin (); }
+
+  const xpair *end () const
+  { return !heaped ? (near + len) : far.end(); }
+
+  /* Get the ith successor.  This is used for traversal and not lookup, and
+     should only be used by the iterator.  */
+  const xpair &at (unsigned i) const
+  { return !heaped ? near[i] : far[i]; }
+
+  const xpair *get (int key) const;
+  void put (int key, unsigned val);
+
+  unsigned near_lower_bound (int key) const;
+
+  /* Experimentally I found that using a union with 2 elements in the near array
+     to be faster than 4 or without the union (very slightly).  A lot of trie
+     vertices will be created, and vast majority of vertices will have 1 or 2
+     successors (straight line or if-then), and the cost of size and copying
+     adds up.  */
+  union
+  {
+    xpair near[2];
+    vec<xpair> far;
+  };
+  unsigned len : 8;
+  unsigned endofpath : 1;
+  unsigned inserted : 1;
+  unsigned heaped : 1;
+};
+
+/* Compare LHS.key < RHS.key, for use with vec.lower_bound.  */
+bool
+xpair_less (const xpair& lhs, const xpair& rhs)
+{
+  return lhs.key < rhs.key;
+}
+
+/* Compare LHS.key to RHS.key, for use with vec.bsearch.  */
+int
+xpair_cmp (const void *lhs, const void *rhs)
+{
+  return ((const xpair*)lhs)->key - ((const xpair*)rhs)->key;
+}
+
+/* Get a pointer to the element at KEY if it exists, otherwise NULL.  */
+const xpair*
+trie_node::get (int key) const
+{
+  if (!heaped)
+    {
+      if (len == 0) return NULL;
+      if (len >= 1 && key == near[0].key) return near + 0;
+      if (len >= 2 && key == near[1].key) return near + 1;
+      return NULL;
+    }
+  else
+    {
+      xpair kv;
+      kv.key = key;
+      return const_cast <vec<xpair>&> (far).bsearch (&kv, xpair_cmp);
+    }
+}
+
+/* Put ("emplace") VAL at KEY, extending the paths that pass through this
+   vertex.  This function assumes that KEY is not already a successor, and does
+   not perform this check.  get () should be called and checked for NULL putting
+   with this function.  Put maintains the order of the successors.  */
+void
+trie_node::put (int key, unsigned val)
+{
+  xpair kv;
+  kv.key = key;
+  kv.val = val;
+  if (!heaped)
+    {
+      const unsigned i = near_lower_bound (key);
+      if (len < 2)
+	{
+	  near[1] = near[0];
+	  near[i] = kv;
+	  len += 1;
+	}
+      else
+	{
+	  /* This insert is the 3rd element, which does not fit in the embedded
+	     storage, so we must create a vector and convert to a far node.  */
+	  vec<xpair> xs {};
+	  xs.reserve (13);
+	  xs.quick_grow (3);
+	  gcc_checking_assert (i <= 2);
+	  if (i == 0)
+	    {
+	      xs[0] = kv;
+	      xs[1] = near[0];
+	      xs[2] = near[1];
+	    }
+	  else if (i == 1)
+	    {
+	      xs[0] = near[0];
+	      xs[1] = kv;
+	      xs[2] = near[1];
+	    }
+	  else
+	    {
+	      xs[0] = near[0];
+	      xs[1] = near[1];
+	      xs[2] = kv;
+	    }
+
+	  far = xs;
+	  heaped = 1;
+	}
+    }
+  else
+    {
+      const unsigned i = far.lower_bound (kv, xpair_less);
+      far.safe_insert (i, kv);
+    }
+}
+
+/* Get the index to the last element that compares less than KEY, similar to
+   vec.lower_bound.  This assumes the near vector is active, and is for internal
+   use.  */
+unsigned
+trie_node::near_lower_bound (int key) const
+{
+  gcc_checking_assert (!heaped);
+  if (len == 0) return 0;
+  if (len >= 1 && key < near[0].key) return 0;
+  if (len >= 2 && key < near[1].key) return 1;
+  return len;
+}
+
+/* The trie is a major workhorse for this algorithm.  It has two key properties
+   - set-like subpath elimination and sorted output.
+
+   Many evaluated paths will be non-prime, that is, a sequence of vertices that
+   is also fully embedded in a longer sequence of vertices.  For example the
+   path [3 4 5 8] is a subpath of both [2 3 4 5 8] and [3 4 5 8 10].  The
+   insert_with_suffix function maintains this property so that inserting a
+   subpath into the trie is effectively a no-op, and inserting a superpath will
+   effectively remove (unmark) the subpath.  Sometimes it can be guaranteed that
+   no redundant (subpaths) will be generated, in which case the insert function
+   can be used.  The insert function is really only set insert, only becoming a
+   no-op for identical paths, which will be a lot faster.
+
+   Paths can be extracted with an iterator, which will output paths in
+   lexicographically sorted order.  This is an important property because the
+   index of a path in the sorted set will be used by the coverage to record when
+   a path is taken and completed.  The iterator has different behavior than the
+   standard C++ iterators, and to avoid mixups the interface is deliberately
+   different.  The iterator has a (large) stack which is not cheap to copy, and
+   if the stack is shallow copied it would mean iterator copies have non-local
+   effects.  */
+struct trie
+{
+  struct iter;
+  trie ();
+  trie (const trie &o);
+  trie (trie &&o);
+  ~trie ();
+
+  bool insert (const vec<int>&);
+  bool insert (const array_slice<const int>);
+  bool hard_insert (const array_slice<const int>);
+  bool insert_with_suffix (const array_slice<const int>);
+  bool insert_suffix (const array_slice<const int>);
+
+  void merge (const trie&);
+
+  iter paths (vec<int>&) const;
+  iter paths (vec<int>&, int from) const;
+
+  vec<vec<int>> paths () const;
+
+  size_t size () const { return len; }
+
+  vec<trie_node> vertices;
+  size_t len;
+
+  /* An iterator for the paths of the trie.  The iterator yields all paths in
+     lexicographical order.  The iterator will be invalidated on any insertion
+     into the trie.  The iterator should not be constructed directly, but
+     through the paths functions on the trie.  It is essentially an explicit
+     stack depth-first traversal.
+
+     The iter fills a user-provided buffer which should only be read as a when
+     the iter is active.  Whenever next returns true the buffer is filled with
+     the current path.  Uses will generally look like this:
+
+     vec<int> path {};
+     auto iter = trie.paths (path);
+     while (iter.next ())
+       use_path (path);
+*/
+  struct iter
+  {
+    iter (vec<int>&, const vec<trie_node>&);
+    iter (int first, vec<int>& path, const vec<trie_node> &vertices);
+    ~iter ()
+    { stack.release (); }
+
+    bool next ();
+    bool next (int);
+    bool next (bool);
+
+
+    /* This is the analog of the stack frame when implementing a recursive
+       depth-first path traversal and collecting paths to the leafs:
+
+       for (auto successor : vertex[ix])
+	 {
+	   path.push (successor.value);
+	   collect (successor.ix, successor.begin, successor.end, path)
+	   path.pop ();
+	 }
+
+       Using size_t + 2x unsigned helped make the frame more compact and faster
+       than pointers.  */
+    struct frame
+    {
+      /* The index of this frame's vertex, so that vertices[ix].  */
+      size_t ix;
+      /* The index of the current active successor of vertices[ix].  */
+      unsigned itr;
+      /* The end of vertices[ix] successors.  When itr == end, vertex[ix] is
+	 exhausted.  */
+      unsigned end;
+    };
+
+    /* User provided buffer to fill with the paths.  */
+    vec<int> &path;
+    /* Direct reference to the trie vertices vector.  */
+    const vec<trie_node> &vertices;
+    /* The call stack.  */
+    vec<frame> stack;
+    /* Yield flag.  If this is true then next () is permitted to and return a
+       new value.  If this is false, a value has already been yielded and next
+       must first reset the state before building the next value.  */
+    bool yield = true;
+
+    iter (const iter& o) : path (o.path), vertices (o.vertices),
+      stack (o.stack.copy ()), yield (o.yield)
+    {
+    }
+
+    /* Delete the copy assignment as the iter stores references and would cause
+       bad bugs.  It is not necessary for the iterator to work well.  To support
+       these the references would need to be (explicit) pointers.  */
+    iter& operator = (const iter& o) = delete;
+  };
+};
+
+/* Construct an iterator filling BUFFER.  */
+trie::iter
+trie::paths (vec<int> &buffer) const
+{
+  buffer.truncate (0);
+  return iter (buffer, vertices);
+}
+
+/* Construct an iterator filling BUFFER for paths starting at FROM.  */
+trie::iter
+trie::paths (vec<int>& buffer, int from) const
+{
+  buffer.truncate (0);
+  return iter (from, buffer, vertices);
+}
+
+/* Default construct a new trie.  */
+trie::trie () : vertices (vec<trie_node> {}), len (0)
+{
+  vertices.safe_push (trie_node {});
+  vertices[0].inserted = true;
+}
+
+/* Copy construct a new trie.  */
+trie::trie (const trie &o) : vertices (o.vertices.copy ()), len (o.len)
+{
+}
+
+/* Move construct a new trie.  */
+trie::trie (trie &&o) : vertices (o.vertices), len (o.len)
+{
+  o.vertices = {};
+  o.len = 0;
+}
+
+/* Destroy a trie and release all the heaped resources.  */
+trie::~trie ()
+{
+  for (trie_node &v : vertices)
+    if (v.heaped)
+      v.far.release ();
+  vertices.release ();
+}
+
+/* Insert PATH into the trie.  */
+bool
+trie::insert (const vec<int>& path)
+{
+  return insert (array_slice <const int> (path));
+}
+
+/* Insert the PATH into the trie.  Duplicate entries will not be entered twice.
+   If PATH is a subpath of another path this will not be detected or if there is
+   a path previously inserted that is a subpath of PATH then this redundancy
+   will not be eliminated.  For that behavior, call insert_with_suffix.  */
+bool
+trie::insert (array_slice<const int> path)
+{
+  size_t index = 0;
+  size_t partition = 0;
+  for (const int v : path)
+    {
+      trie_node &current = vertices[index];
+      current.inserted = true;
+      partition++;
+
+      const auto *xp = current.get (v);
+      if (xp)
+	{
+	  index = xp->val;
+	}
+      else
+	{
+	  /* A new vertex on this path has been created, which means the rest of
+	     the path will also have to be created.  Drain the path and create
+	     the remaining vertices in a single operation.  */
+	  unsigned ix = vertices.length ();
+	  current.put (v, ix);
+	  current.endofpath = false;
+
+	  array_slice<const int> tail (path.begin () + partition,
+				       path.size () - partition);
+	  vertices.safe_grow_cleared (1 + ix + tail.size ());
+
+	  for (const int v : tail)
+	    {
+	      trie_node &last = vertices[ix];
+	      ix += 1;
+	      last.put (v, ix);
+	      last.inserted = true;
+	    }
+
+	  vertices.last ().endofpath = true;
+	  vertices.last ().inserted = true;
+	  len += 1;
+	  return true;
+	}
+    }
+
+  return false;
+}
+
+/* hard_insert is like insert, except it does not overwrite any endofpath flags,
+   and records the endofpath flag even when a superpath of PATH has been
+   inserted previously.  This effectively disables subpath elimination.  */
+bool
+trie::hard_insert (array_slice<const int> path)
+{
+  size_t index = 0;
+  size_t partition = 0;
+  for (const int v : path)
+    {
+      trie_node &current = vertices[index];
+      current.inserted = true;
+      partition++;
+
+      const auto *xp = current.get (v);
+      if (xp)
+	{
+	  index = xp->val;
+	}
+      else
+	{
+	  unsigned ix = vertices.length ();
+	  current.put (v, ix);
+
+	  array_slice<const int> tail (path.begin () + partition,
+				       path.size () - partition);
+	  vertices.safe_grow_cleared (1 + ix + tail.size ());
+
+	  for (const int v : tail)
+	    {
+	      trie_node &last = vertices[ix];
+	      ix += 1;
+	      last.put (v, ix);
+	      last.inserted = true;
+	    }
+
+	  vertices.last ().endofpath = true;
+	  vertices.last ().inserted = true;
+	  len += 1;
+	  return true;
+	}
+    }
+
+  vertices[index].endofpath = true;
+  return false;
+}
+
+/* Insert a suffixes of PATH.  This is identical to insert except that it is
+   assumed that PATH is a subpath, and that the inserted path should clear the
+   inserted and endofpath flags.  This function should only be called by
+   insert_with_suffix.  */
+bool
+trie::insert_suffix (array_slice<const int> path)
+{
+  size_t index = 0;
+  size_t partition = 0;
+  for (const int v : path)
+    {
+      trie_node &current = vertices[index];
+      current.endofpath = false;
+      partition++;
+
+      const auto *xp = current.get (v);
+      if (xp)
+	{
+	  index = xp->val;
+	}
+      else
+	{
+	  /* A new vertex on this path has been created, which means the rest of
+	     the path will also have to be created.  Drain the path and create
+	     the remaining vertices in a single operation.  */
+	  unsigned ix = vertices.length ();
+	  current.put (v, ix);
+
+	  array_slice<const int> tail (path.begin () + partition,
+				       path.size () - partition);
+	  vertices.safe_grow_cleared (1 + ix + tail.size ());
+
+	  for (const int v : tail)
+	    {
+	      vertices[ix].put (v, ix + 1);
+	      ix += 1;
+	    }
+
+	  return true;
+	}
+    }
+
+  vertices[index].endofpath = false;
+  return false;
+}
+
+/* Insert the paths from OTHER into this trie.  */
+void
+trie::merge (const trie& other)
+{
+  auto_vec<int, 32> p {};
+  iter itr = other.paths (p);
+  while (itr.next ())
+    insert_with_suffix (p);
+}
+
+/* Insert PATH and all its subpaths into the trie.  This function implements the
+   redundancy property of the trie - if an inserted path is either a subpath or
+   superpath of some other path then only the longest will keep its inserted
+   flag.  */
+bool
+trie::insert_with_suffix (array_slice<const int> path)
+{
+  bool inserted = insert (path);
+  path = array_slice<const int> (path.begin () + 1, path.size () - 1);
+  while (inserted && !path.empty ())
+    {
+      inserted = insert_suffix (path);
+      path = array_slice<const int> (path.begin () + 1, path.size () - 1);
+    }
+  return inserted;
+}
+
+/* Convert the paths of a trie to a vec-of-vec.  */
+vec<vec<int>>
+trie::paths () const
+{
+  vec<int> path {};
+  vec<vec<int>> all {};
+  auto iter = paths (path);
+  while (iter.next ())
+    all.safe_push (path.copy ());
+  return all;
+}
+
+/* Create an iterator over VERTICES with the caller-provided buffer PATH.  */
+trie::iter::iter (vec<int> &path, const vec<trie_node> &vertices) : path (path),
+  vertices (vertices), stack (vec<frame> {})
+{
+  gcc_checking_assert (!vertices.is_empty ());
+  stack.reserve (13);
+  frame f;
+  f.ix = 0;
+  f.itr = 0;
+  f.end = vertices[0].length ();
+  stack.quick_push (f);
+}
+
+/* Create an iterator over VERTICES with the caller-provided buffer PATH for all
+   paths and subpaths (suffixes) starting in FROM.  Note that FROM will not be
+   in the output buffer PATH, mainly because non-rooted paths are used when
+   splicing with paths that end in FROM.  */
+trie::iter::iter (int from, vec<int> &path, const vec<trie_node> &vertices) :
+  path (path), vertices (vertices), stack (vec<frame> {})
+{
+  gcc_checking_assert (!vertices.is_empty ());
+  stack.reserve (13);
+
+  auto *xp = vertices[0].get (from);
+  if (!xp)
+    {
+      /* No paths start with FROM, so construct an iterator where next () always
+	 returns false.  */
+      frame f;
+      f.ix = 0;
+      f.itr = 0;
+      f.end = 0;
+      stack.safe_push (f);
+      return;
+    }
+
+  frame f;
+  f.ix = xp->val;
+  f.itr = 0;
+  f.end = vertices[f.ix].length ();
+  stack.safe_push (f);
+}
+
+/* Find the next complete prime path in the trie and write it to the caller's
+   buffer.  Returns true if a path is written and false if the iterator is
+   exhausted, in which case no path is written and the contents of the buffer is
+   garbage.  */
+bool
+trie::iter::next ()
+{
+  while (true)
+    {
+      frame &top = stack.last ();
+      const trie_node &vertex = vertices[top.ix];
+
+      if (vertex.endofpath && yield
+	  && (top.itr != top.end || vertex.length () == 0))
+	{
+	  yield = false;
+	  return true;
+	}
+
+      yield = true;
+
+      if (top.itr != top.end)
+	{
+	  const xpair succ = vertex.at (top.itr);
+	  const trie_node &next = vertices[succ.val];
+	  top.itr++;
+
+	  if (!next.inserted)
+	    continue;
+
+	  frame f {};
+	  f.ix = succ.val;
+	  f.itr = 0;
+	  f.end = next.length ();
+	  path.safe_push (succ.key);
+	  stack.safe_push (f);
+	}
+      else
+	{
+	  stack.pop ();
+	  if (stack.is_empty ())
+	    return false;
+	  path.pop ();
+	}
+    }
+}
+
+/* Find the next path in the trie that would continue (but does not include)
+   LIMIT.  If the trie contains the paths [2 4 6 8 9] [2 4 6 8 10] and [2 4 5
+   8], iter.next (8) would yield [2 4 6] and [2 4 5].  Returns true if a path is
+   written and false if the iterator is exhausted, in which case no path is
+   written and the contents of the buffer is garbage.  */
+bool
+trie::iter::next (int limit)
+{
+  while (true)
+    {
+      frame &top = stack.last ();
+      const trie_node &vertex = vertices[top.ix];
+
+      if (yield && top.itr != top.end)
+	{
+	  const xpair succ = vertex.at (top.itr);
+	  const trie_node &next = vertices[succ.val];
+	  const int key = succ.key;
+	  const int val = succ.val;
+	  top.itr++;
+
+	  if (!next.inserted)
+	    continue;
+
+	  if (key == limit)
+	    {
+	      if (path.is_empty ())
+		continue;
+	      yield = false;
+	      return true;
+	    }
+
+	  frame f {};
+	  f.ix = val;
+	  f.itr = 0;
+	  f.end = next.length ();
+	  path.safe_push (key);
+	  stack.safe_push (f);
+	}
+      else
+	{
+	  yield = true;
+	  stack.pop ();
+	  if (stack.is_empty ())
+	    return false;
+	  path.pop ();
+	}
+    }
+}
+
+/* Find the next path in among all paths including subpaths/suffixes.  This is
+   mainly useful in combination with trie.paths (from) for finding the paths
+   that go through some vertex.  */
+bool
+trie::iter::next (bool)
+{
+  while (true)
+    {
+      frame &top = stack.last ();
+      const trie_node &vertex = vertices[top.ix];
+
+      if (yield && vertex.length () == 0)
+	{
+	  yield = false;
+	  return true;
+	}
+
+      yield = true;
+
+      if (top.itr != top.end)
+	{
+	  const xpair succ = vertex.at (top.itr);
+	  const trie_node &next = vertices[succ.val];
+	  top.itr++;
+
+	  frame f {};
+	  f.ix = succ.val;
+	  f.itr = 0;
+	  f.end = next.length ();
+	  path.safe_push (succ.key);
+	  stack.safe_push (f);
+	}
+      else
+	{
+	  stack.pop ();
+	  if (stack.is_empty ())
+	    return false;
+	  path.pop ();
+	}
+    }
+}
+
+/* Return the index of NEEDLE in HAYSTACK, or (size_t)-1 if not found.  */
+template <typename T>
+size_t
+index_of (T needle, const vec <T> &haystack)
+{
+  size_t len = haystack.length ();
+  for (size_t i = 0; i != len; ++i)
+    if (haystack[i] == needle)
+      return i;
+  return (size_t)-1;
+}
+
+/* Check if there is an edge in GRAPH from SRC to DST.  */
+bool
+edge_p (const struct graph *graph, int src, int dest)
+{
+  for (struct graph_edge *e = graph->vertices[src].succ; e; e = e->succ_next)
+    if (e->dest == dest)
+      return true;
+  return false;
+}
+
+/* Check if PATH is a cycle starting (and ending) with V.  */
+bool
+cycle_p (const vec<int>& path, int v)
+{
+  return path[0] == v && path[path.length ()-1] == v;
+}
+
+/* Find the SCC entry-exit paths, the simple paths from ENTRY to EXIT, and add
+   them to OUT.  PRIME_PATHS is the prime paths of the SCC.  Paths are hard
+   inserted into OUT, which disables subpath eliminiation and essentially makes
+   OUT a compact set.  This is important to not eliminate paths from ENTRY to
+   EXIT which are traversed by other ENTRY/EXIT pairs.  Duplicated entries are
+   removed.  */
+void
+scc_entry_exit_paths (const vec<vec<int>> &internal_pp, int entry, int exit,
+		      trie &out)
+{
+  if (entry == exit)
+    {
+      out.hard_insert (array_slice <const int> (&entry, 1));
+      return;
+    }
+
+  for (const vec<int> &path : internal_pp)
+    {
+      const size_t Ven = index_of (entry, path);
+      const size_t Vex = index_of (exit, path);
+
+      if (Ven == (size_t)-1 || Vex == (size_t)-1 || Vex <= Ven)
+	continue;
+
+      const size_t len = (Vex + 1) - Ven;
+      array_slice <const int> p (path.begin () + Ven, len);
+      out.hard_insert (p);
+    }
+}
+
+/* Find the SCC exit paths, the simple paths that starts in a non-entry vertex
+   in the SCC and ends in EXIT and are not a cycles.  INTERNAL_PP are the
+   internal prime paths for the SCC with EXIT as an exit vertex.
+
+   Fazli claims the path must not be a subpath of another exit path in the SCC,
+   but this is only half true: see gcov-29.c/pathcov005a.  Subpaths must survive
+   if they end in a different exit vertex than the superpath, so the hard_insert
+   is important.  */
+void
+scc_exit_paths (const vec<vec<int>> &prime_paths, int exit, trie &out)
+{
+  trie trie;
+  for (const vec<int> &path : prime_paths)
+    {
+      const size_t Vex = index_of (exit, path);
+      if (Vex == (size_t)-1 || cycle_p (path, exit))
+	continue;
+      array_slice <const int> p (path.begin (), Vex + 1);
+      trie.insert_with_suffix (p);
+    }
+
+  auto_vec<int> path {};
+  auto iter = trie.paths (path);
+  while (iter.next ())
+    out.hard_insert (path);
+}
+
+/* Find the SCC entry paths, the simple paths that start in the entry vertex
+   ENTRY and are not cycles.  INTERNAL_PP are the internal prime paths for the
+   SCC with ENTRY as an entry vertex.  The paths are inserted into OUT.  */
+void
+scc_entry_paths (const vec<vec<int>> &internal_pp, int entry, trie &trie)
+{
+  for (const vec<int> &path : internal_pp)
+    {
+      const size_t Ven = index_of (entry, path);
+      if (Ven == (size_t)-1 || cycle_p (path, entry))
+	continue;
+      array_slice <const int> p (path.begin () + Ven, path.length () - Ven);
+      trie.insert (p);
+    }
+}
+
+/* Worker for cfg_complete_prime_paths.  ITR is the current id for the current
+   path.  END is end of the path so that when ITR == END, the walk is completed.
+   EDGES is the matrix of edges where EDGES[src][dst] is set if there is an edge
+   from src to dest.  PATH is the vertices that make up this walk so far.  TRIE
+   is the output trie where paths are inserted.  SCC_ENEX_PATHS are the
+   entry-exit paths found by the scc_entry_exit_paths function.  */
+void
+cfg_complete_prime_paths1 (const int *itr, const int *end,
+			   const sbitmap *edges,
+			   const vec<vec<vec<int>>> &scc_enex_paths,
+			   vec<int> &path, trie &trie)
+{
+  if (itr == end)
+    {
+      trie.insert_with_suffix (path);
+      return;
+    }
+
+  const unsigned pathlen = path.length ();
+  const sbitmap succs = edges[path.last ()];
+  for (const vec<int> &enex : scc_enex_paths[*itr])
+    {
+      if (!bitmap_bit_p (succs, enex[0]))
+	continue;
+
+      path.safe_splice (enex);
+      cfg_complete_prime_paths1 (itr + 1, end, edges, scc_enex_paths,
+				 path, trie);
+      path.truncate (pathlen);
+    }
+}
+
+/* Find the complete prime paths of the CFG, the prime paths that start in the
+   entry vertex and end in the exit vertex.  */
+trie
+cfg_complete_prime_paths (const sbitmap *edges,
+			  const vec<trie> &scc_entry_exit_paths,
+			  const trie &ccfg_prime_paths)
+{
+  trie trie;
+  auto_vec<int, 16> path {};
+  auto_vec<int, 16> cfgpp {};
+  auto_vec<vec<vec<int>>, 8> scc_enex (scc_entry_exit_paths.length ());
+
+  for (size_t i = 0; i != scc_entry_exit_paths.length (); ++i)
+    {
+      scc_enex.quick_push (vec<vec<int>> {});
+      auto iter = scc_entry_exit_paths[i].paths (path);
+      while (iter.next ())
+	scc_enex[i].safe_push (path.copy ());
+    }
+  path.truncate (0);
+
+  auto iter = ccfg_prime_paths.paths (cfgpp);
+  while (iter.next ())
+    {
+      for (const vec<int> &enex : scc_enex[cfgpp[0]])
+	{
+	  path.truncate (0);
+	  path.safe_splice (enex);
+	  cfg_complete_prime_paths1 (cfgpp.begin () + 1, cfgpp.end (),
+				     edges, scc_enex, path, trie);
+	}
+    }
+
+  for (vec<vec<int>> &v : scc_enex)
+    release_vec_vec (v);
+  return trie;
+}
+
+/* Find the SCC exit prime paths, the prime paths that start from a strongly
+   connected component and end in the end vertex. SCCS is a vector where SCCS[i]
+   = SCC (vertex_i) so that if vertex[2].component == 5 then SCCS[2] == 5.
+   SCC_EXIT_PATHS is the output of scc_exit_paths ().  COMPLETE_PRIME_PATHS is
+   the output of cfg_complete_prime_paths ().  */
+trie
+scc_exit_prime_paths (const struct graph *cfg, const trie &scc_exit_paths,
+		      const trie &complete_prime_paths)
+{
+  trie trie;
+  auto_vec<int, 8> path {};
+  auto_vec<int, 8> r {};
+  auto_vec<int, 8> q {};
+
+  auto exiter = scc_exit_paths.paths (q);
+  while (exiter.next ())
+    {
+      const int Vex = q.last ();
+      auto iter = complete_prime_paths.paths (r, Vex);
+      while (iter.next (true))
+	{
+	  /* There could be multiple Vex in the SCC. Even if scc_exit_paths did
+	     not kill the subpaths, this trie probably would. It can be assumed
+	     that all vertices in q are in the same SCC.
+
+	     This is an important note, as the Fazli and Afsharchi paper does
+	     not properly capture this subtlety.  */
+	  const int p0 = Vex;
+	  const int p1 = r[0];
+
+	  if (cfg->vertices[p0].component == cfg->vertices[p1].component)
+	    continue;
+
+	  path.truncate (0);
+	  path.reserve (q.length () + r.length ());
+	  path.splice (q);
+	  path.splice (r);
+	  /* This can probably insert without subpath elimination because:
+	     1. Conflicts are *really* rare (see patmatch in tree.c), but they
+		do happen.
+	     2. The output of this function is "filtered" through another trie
+		anyway so the redundant paths generated here will be eliminated
+		in the consumers at a very low extra cost.  */
+	  trie.insert (path);
+	}
+    }
+
+  return trie;
+}
+
+/* Check if PATH in CFG enters the VERTEX's SCC through VERTEX.  */
+bool
+enters_through_p (const struct graph *cfg, const vec<int> &path, int vertex)
+{
+  gcc_checking_assert (!path.is_empty ());
+  const int last = path.address()[path.length ()-1];
+  if (cfg->vertices[last].component == cfg->vertices[vertex].component)
+    return false;
+  return edge_p (cfg, last, vertex);
+};
+
+/* Worker for scc_entry_prime_paths.  CFG is the CFG for the function,
+   SCC_ENTRY_PATHS the accumulated scc_entry_paths for all the SCCs, PRIME_PATHS
+   is either the result of cfg_complete_prime_paths or exit_prime_paths, and OUT
+   the output trie.  */
+void
+scc_entry_prime_paths1 (const struct graph *cfg, const trie &scc_entry_paths,
+			const trie &prime_paths, trie &out)
+{
+  auto_vec<int, 8> p {};
+  auto_vec<int, 8> q {};
+  auto_vec<int, 8> path {};
+  auto itr = scc_entry_paths.paths (q);
+  while (itr.next ())
+    {
+      const int Ven = q[0];
+      /* TODO: This might benefit from a reversed trie lookup.  */
+      auto xitr = prime_paths.paths (p);
+      while (xitr.next (Ven))
+	{
+	  if (!enters_through_p (cfg, p, Ven))
+	    continue;
+
+	  path.truncate (0);
+	  path.reserve (p.length () + q.length ());
+	  path.splice (p);
+	  path.splice (q);
+	  out.insert_with_suffix (path);
+	}
+    }
+}
+
+/* Find the entry prime paths - the prime paths that start in the root and end
+   in a strongly connected component.  CFG is the CFG for this function,
+   SCC_ENTRY_PATHS the accumulated scc_entry_paths for all the SCCs,
+   COMPLETE_PRIME_PATHS the result of cfg_complete_prime_paths, and
+   EXIT_PRIME_PATHS result of exit_prime_paths.  */
+trie
+scc_entry_prime_paths (const struct graph *cfg,
+		       const trie &scc_entry_paths,
+		       const trie &complete_prime_paths,
+		       const trie &exit_prime_paths)
+{
+  trie trie;
+  scc_entry_prime_paths1 (cfg, scc_entry_paths, complete_prime_paths, trie);
+  scc_entry_prime_paths1 (cfg, scc_entry_paths, exit_prime_paths, trie);
+  return trie;
+}
+
+/* Build a new control flow graph from the strongly connected components, so
+   that every node in the CCFG is a strongly connected component in the original
+   CFG.  NSSC is the number of vertices in the new graph, and the return value
+   of graphds_ssc.  */
+struct graph*
+build_ccfg (struct graph *cfg, int nscc)
+{
+  struct graph *ccfg = new_graph (nscc);
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    {
+      struct vertex v = cfg->vertices[i];
+      for (struct graph_edge *e = v.succ; e; e = e->succ_next)
+	{
+	  int src = v.component;
+	  int dest = cfg->vertices[e->dest].component;
+	  if (src != dest && !edge_p (ccfg, src, dest))
+	    add_edge (ccfg, src, dest);
+	}
+    }
+
+  return ccfg;
+}
+
+/* Create a new graph from CFG where the edges between strongly connected
+   components removed.  */
+struct graph*
+disconnect_sccs (struct graph *cfg)
+{
+  struct graph *ccfg = new_graph (cfg->n_vertices);
+  const struct vertex *vertices = cfg->vertices;
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    {
+      ccfg->vertices[i].data = &cfg->vertices[i];
+      for (struct graph_edge *e = vertices[i].succ; e; e = e->succ_next)
+	if (vertices[e->src].component == vertices[e->dest].component)
+	  add_edge (ccfg, e->src, e->dest)->data = e;
+    }
+  return ccfg;
+}
+
+/* Check if vertex I in CFG is the entry vertex of a strongly connected
+   component.  A vertex is an entry vertex if 1) there are no predecessors
+   (i.e. the root vertex is always an entry vertex) or 2) a predecessor belongs
+   to a different SCC.  */
+bool
+scc_entry_vertex_p (struct graph *cfg, size_t i)
+{
+  if (!cfg->vertices[i].pred)
+    return true;
+  const int scc = cfg->vertices[i].component;
+  for (struct graph_edge *e = cfg->vertices[i].pred; e; e = e->pred_next)
+    if (cfg->vertices[e->src].component != scc)
+      return true;
+  return false;
+}
+
+/* Check if vertex I in CFG is an exit vertex of a strongly connected component.
+   A vertex is an exit vertex if 1) there are no successors (i.e. the sink is
+   always an exit vertex) or 2) if a successor belongs to a different SCC.  */
+bool
+scc_exit_vertex_p (struct graph *cfg, size_t i)
+{
+  if (!cfg->vertices[i].succ)
+    return true;
+  const int scc = cfg->vertices[i].component;
+  for (struct graph_edge *e = cfg->vertices[i].succ; e; e = e->succ_next)
+    if (cfg->vertices[e->dest].component != scc)
+      return true;
+  return false;
+}
+
+/* Worker for simple_paths.  Find all the simple paths in CFG starting at NODE
+   and insert into OUT.  This is a DFS where the search stops when entering a
+   vertex already in SEEN.  PATH is the sequence of ids for the vertices taken
+   from the from the root to NODE.  */
+void
+simple_paths1 (const struct graph *cfg, int node, sbitmap seen, vec<int> &path,
+	       trie &out)
+{
+  if (!bitmap_set_bit (seen, node))
+    {
+      if (path[0] == node)
+	path.quick_push (node);
+      out.insert (path);
+      if (path[0] == node)
+	path.pop ();
+      return;
+    }
+  path.quick_push (node);
+
+  struct graph_edge *succs = cfg->vertices[node].succ;
+  if (!succs)
+    {
+      out.insert (path);
+      bitmap_clear_bit (seen, node);
+      path.pop ();
+      return;
+    }
+
+  for (struct graph_edge *e = succs; e; e = e->succ_next)
+    simple_paths1 (cfg, e->dest, seen, path, out);
+
+  bitmap_clear_bit (seen, node);
+  path.pop ();
+}
+
+/* Find all the simple paths in CFG starting at ROOT and insert into OUT.  A
+   simple path is a sequence of vertices without any duplicated vertices (i.e.
+   no loops).  SEEN should be an sbitmap of CFG->n_vertices size.  PATH and SEEN
+   will be cleared entry and is for buffer reuse between calls.  */
+void
+simple_paths (struct graph *cfg, int root, sbitmap seen, vec<int> &path,
+	      trie &out)
+{
+  bitmap_clear (seen);
+  path.reserve (cfg->n_vertices);
+  path.truncate (0);
+  simple_paths1 (cfg, root, seen, path, out);
+}
+
+/* Merge the tries T1, T2, T3, and set of paths VECS into the larges trie.
+   Returns a reference to the trie merged into.  Merging tries and resolving
+   redundant paths is the slowest step (at least in the sense it works on the
+   largest input), and merging into a partial result reduces the work
+   accordingly.  For large problems this is a massive improvement, which in the
+   worst cases (where all tries but one are empty or almost empty) speed up
+   30-40%.  */
+trie&
+merge (trie &t1, trie &t2, trie &t3, vec<vec<vec<int>>>& vecs)
+{
+  trie *dst = nullptr;
+  const size_t s1 = t1.size ();
+  const size_t s2 = t2.size ();
+  const size_t s3 = t3.size ();
+
+  if (s1 >= s2 && s1 >= s3)
+    {
+      dst = &t1;
+      t1.merge (t2);
+      t1.merge (t3);
+    }
+  else if (s2 >= s1 && s2 >= s3)
+    {
+      dst = &t2;
+      t2.merge (t1);
+      t2.merge (t3);
+    }
+  else
+    {
+      dst = &t3;
+      t3.merge (t1);
+      t3.merge (t2);
+    }
+
+  gcc_checking_assert (dst);
+  for (const vec<vec<int>> &v2 : vecs)
+    for (const vec<int> &v1 : v2)
+      dst->insert_with_suffix (v1);
+  return *dst;
+}
+
+/* Store the edges of CFG in a matrix of bitmaps so that bit_p (edges[src],
+   dest) is true if there is an edge from src to dest.  This is faster and more
+   convenient than walking the linked list of successors in hot loops.  The
+   vector will have N bitmaps of N bits where N is the number of vertices in
+   CFG.  */
+sbitmap*
+edge_matrix (const struct graph *cfg)
+{
+  sbitmap *edges = sbitmap_vector_alloc (cfg->n_vertices, cfg->n_vertices);
+  bitmap_vector_clear (edges, cfg->n_vertices);
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    for (graph_edge *e = cfg->vertices[i].succ; e; e = e->succ_next)
+      bitmap_set_bit (edges[e->src], e->dest);
+  return edges;
+}
+
+} // namespace
+
+/* Fazli & Afsarchi compositional prime paths generation.  */
+vec<vec<int>>
+prime_paths (struct graph *cfg, size_t pathlimit)
+{
+    const int nscc = graphds_scc (cfg, NULL);
+    auto_graph disconnected = disconnect_sccs (cfg);
+    auto_graph ccfg = build_ccfg (cfg, nscc);
+    auto_sbitmap_vector edges (edge_matrix (cfg));
+
+    auto_sbitmap seen (cfg->n_vertices);
+    auto_vec<int, 8> pathbuf {};
+
+    size_t approxpaths = 0;
+
+    /* Store an SCC-ID -> vertices mapping to quickly find the vertices that
+       make up a strongly connected component.  */
+    auto_vec_vec sccs {};
+    sccs.safe_grow_cleared (ccfg->n_vertices);
+    for (int i = 0; i != cfg->n_vertices; ++i)
+      sccs[cfg->vertices[i].component].safe_push (i);
+
+    auto_vec_vec_vec scc_internal_pp {};
+    scc_internal_pp.safe_grow_cleared (nscc);
+    for (int i = 0; i != nscc; ++i)
+      {
+	trie internal_pp;
+	for (int V : sccs[i])
+	  simple_paths (disconnected, V, seen, pathbuf, internal_pp);
+	scc_internal_pp[i] = internal_pp.paths ();
+	approxpaths += scc_internal_pp[i].length ();
+	if (approxpaths > pathlimit)
+	  return {};
+      }
+
+    auto_vec<trie, 8> scc_enex_paths (nscc);
+    scc_enex_paths.safe_grow_cleared (nscc);
+    trie scc_en_paths;
+    trie scc_ex_paths;
+
+    for (int i = 0; i != ccfg->n_vertices; ++i)
+      {
+	for (int Ven : sccs[i])
+	  {
+	    if (!scc_entry_vertex_p (cfg, Ven))
+	      continue;
+
+	    for (int Vex : sccs[i])
+	      {
+		if (!scc_exit_vertex_p (cfg, Vex))
+		  continue;
+		scc_entry_exit_paths (scc_internal_pp[i], Ven, Vex,
+				      scc_enex_paths[i]);
+	      }
+	  }
+      }
+
+    for (int i = 0; i != cfg->n_vertices; ++i)
+      {
+	const int scc = cfg->vertices[i].component;
+	if (scc_entry_vertex_p (cfg, i))
+	  scc_entry_paths (scc_internal_pp[scc], i, scc_en_paths);
+
+	if (scc_exit_vertex_p (cfg, i))
+	  scc_exit_paths (scc_internal_pp[scc], i, scc_ex_paths);
+      }
+
+    /* In the presence of abnormal edges (like longjmp) it is possible to have
+       multiple "entry points" in function -- build ccfg prime paths starting at
+       any vertex without predecessor.  For most graphs this will only be the
+       ENTRY_BLOCK.  */
+    trie ccfg_prime_paths;
+    for (int i = 0; i != ccfg->n_vertices; ++i)
+      if (!ccfg->vertices[i].pred)
+	simple_paths (ccfg, i, seen, pathbuf, ccfg_prime_paths);
+
+    trie complete_prime_paths = cfg_complete_prime_paths (edges, scc_enex_paths,
+							  ccfg_prime_paths);
+    approxpaths += complete_prime_paths.size ();
+    if (approxpaths > pathlimit)
+      return {};
+    trie exit_prime_paths = scc_exit_prime_paths (cfg, scc_ex_paths,
+						  complete_prime_paths);
+    approxpaths += exit_prime_paths.size ();
+    if (approxpaths > pathlimit)
+      return {};
+    trie entry_prime_paths = scc_entry_prime_paths (cfg, scc_en_paths,
+						    complete_prime_paths,
+						    exit_prime_paths);
+    approxpaths += entry_prime_paths.size ();
+    if (approxpaths > pathlimit)
+      return {};
+
+    trie &merged = merge (complete_prime_paths, entry_prime_paths,
+			  exit_prime_paths, scc_internal_pp);
+    if (merged.size () > pathlimit)
+      return {};
+
+    return merged.paths ();
+}
+
+#if CHECKING_P
+
+namespace selftest
+{
+
+/* Check if the trie contains PATH.  */
+static bool
+contains (const trie &trie, array_slice<const int> path)
+{
+  size_t index = 0;
+  for (int id : path)
+    {
+      const trie_node &current = trie.vertices[index];
+      if (!current.inserted)
+	return false;
+      const auto *xp = current.get (id);
+      if (!xp)
+	return false;
+      index = xp->val;
+    }
+  return trie.vertices[index].inserted && trie.vertices[index].endofpath;
+}
+
+static bool
+equal_p (array_slice<const int> lhs, array_slice<const int> rhs)
+{
+  if (lhs.size () != rhs.size ())
+    return false;
+
+  size_t length = lhs.size();
+  for (size_t i = 0; i != length; ++i)
+    if (lhs[i] != rhs[i])
+      return false;
+  return true;
+}
+
+static bool
+any_equal_p (const array_slice<const int> &needle,
+	     const vec<vec<int>> &haystack)
+{
+  for (const vec<int> &x : haystack)
+    if (equal_p (needle, array_slice <const int> (x)))
+      return true;
+  return false;
+}
+
+static size_t
+count (const trie &trie)
+{
+  size_t n = 0;
+  auto_vec<int> path {};
+  auto iter = trie.paths (path);
+  while (iter.next ())
+    n += 1;
+  return n;
+}
+
+static vec<vec<int>>
+simple_paths (struct graph *cfg, trie &trie, int root = 0)
+{
+  auto_sbitmap seen (cfg->n_vertices);
+  auto_vec<int> path;
+  simple_paths (cfg, root, seen, path, trie);
+  return trie.paths ();
+}
+
+/* Create a CFG that roughly corresponds to this program:
+
+int binary_search(int a[], int len, int from, int to, int key)
+{
+    int low = from;
+    int high = to - 1;
+
+    while (low <= high)
+    {
+	int mid = (low + high) >> 1;
+	long midVal = a[mid];
+
+	if (midVal < key)
+	    low = mid + 1;
+	else if (midVal > key)
+	    high = mid - 1;
+	else
+	    return mid; // key found
+    }
+    return -1;
+}
+
+*/
+static struct graph*
+binary_search_cfg ()
+{
+    struct graph *g = new_graph (11);
+    add_edge (g, 0, 1);
+    add_edge (g, 1, 2);
+    add_edge (g, 2, 3);
+    add_edge (g, 2, 4);
+    add_edge (g, 3, 10);
+    add_edge (g, 4, 5);
+    add_edge (g, 4, 6);
+    add_edge (g, 5, 7);
+    add_edge (g, 6, 8);
+    add_edge (g, 6, 9);
+    add_edge (g, 7, 2);
+    add_edge (g, 8, 10);
+    add_edge (g, 9, 7);
+    graphds_scc (g, NULL);
+    return g;
+}
+
+/* Test a full run of the algorithm against a known graph (binary-search).  */
+static void
+test_prime_paths ()
+{
+    auto_graph g = binary_search_cfg ();
+    vec<vec<int>> paths = prime_paths (g, 100);
+    const int p01[] = { 0, 1, 2, 3, 10 };
+    const int p02[] = { 0, 1, 2, 4, 6, 8, 10 };
+    const int p03[] = { 5, 7, 2, 4, 6, 9 };
+    const int p04[] = { 4, 6, 9, 7, 2, 4 };
+    const int p05[] = { 2, 4, 6, 9, 7, 2 };
+    const int p06[] = { 6, 9, 7, 2, 4, 6 };
+    const int p07[] = { 9, 7, 2, 4, 6, 9 };
+    const int p08[] = { 7, 2, 4, 6, 9, 7 };
+    const int p09[] = { 6, 9, 7, 2, 4, 5 };
+    const int p10[] = { 4, 5, 7, 2, 4 };
+    const int p11[] = { 2, 4, 5, 7, 2 };
+    const int p12[] = { 5, 7, 2, 4, 5 };
+    const int p13[] = { 7, 2, 4, 5, 7 };
+    const int p14[] = { 4, 6, 9, 7, 2, 3, 10 };
+    const int p15[] = { 5, 7, 2, 4, 6, 8, 10 };
+    const int p16[] = { 9, 7, 2, 4, 6, 8, 10 };
+    const int p17[] = { 4, 5, 7, 2, 3, 10 };
+    const int p18[] = { 0, 1, 2, 4, 6, 9, 7 };
+    const int p19[] = { 0, 1, 2, 4, 5, 7 };
+
+    ASSERT_EQ (paths.length (), 19);
+    ASSERT_TRUE (any_equal_p (p01, paths));
+    ASSERT_TRUE (any_equal_p (p02, paths));
+    ASSERT_TRUE (any_equal_p (p03, paths));
+    ASSERT_TRUE (any_equal_p (p04, paths));
+    ASSERT_TRUE (any_equal_p (p05, paths));
+    ASSERT_TRUE (any_equal_p (p06, paths));
+    ASSERT_TRUE (any_equal_p (p07, paths));
+    ASSERT_TRUE (any_equal_p (p08, paths));
+    ASSERT_TRUE (any_equal_p (p09, paths));
+    ASSERT_TRUE (any_equal_p (p10, paths));
+    ASSERT_TRUE (any_equal_p (p11, paths));
+    ASSERT_TRUE (any_equal_p (p12, paths));
+    ASSERT_TRUE (any_equal_p (p13, paths));
+    ASSERT_TRUE (any_equal_p (p14, paths));
+    ASSERT_TRUE (any_equal_p (p15, paths));
+    ASSERT_TRUE (any_equal_p (p16, paths));
+    ASSERT_TRUE (any_equal_p (p17, paths));
+    ASSERT_TRUE (any_equal_p (p18, paths));
+    ASSERT_TRUE (any_equal_p (p19, paths));
+    release_vec_vec (paths);
+}
+
+/* The strongly connected component graph for binary_search looks like
+    this, using the vertex numbers from the original graph:
+
+    START
+      |
+      1
+      |
+      2 (SCC)
+     / \
+    3   8
+     \ /
+     END
+
+  The components gets renumbered by graphds_scc, so the ccfg looks like
+  this.  The actual numbers don't matter as long as the structure of the
+  graph is preserved, and this test is now sensitive to the numbering set
+  by graphds_scc.  It does not have to be - if that function should reverse
+  the numbering this test must be updated.
+
+      5
+      |
+      4
+      |
+      3 (SCC)
+     / \
+    2   1
+     \ /
+      0
+*/
+static void
+test_build_ccfg ()
+{
+  auto_graph cfg = binary_search_cfg ();
+  const int nscc = graphds_scc (cfg, NULL);
+  auto_graph ccfg = build_ccfg (cfg, nscc);
+  ASSERT_EQ (6, nscc);
+
+  ASSERT_TRUE (edge_p (ccfg, 5, 4));
+  ASSERT_TRUE (edge_p (ccfg, 4, 3));
+  ASSERT_TRUE (edge_p (ccfg, 3, 2));
+  ASSERT_TRUE (edge_p (ccfg, 3, 1));
+  ASSERT_TRUE (edge_p (ccfg, 2, 0));
+  ASSERT_TRUE (edge_p (ccfg, 1, 0));
+}
+
+/* This test checks some basic assumptions on finding the strongly connected
+   components and disconnecting the graph by removing all edges between SCCs.
+   Creating a single auxillary graph simplifies the bookkeeping.  */
+static void
+test_split_components ()
+{
+  auto_graph cfg = binary_search_cfg ();
+  int nscc = graphds_scc (cfg, NULL);
+  auto_graph ccfg = disconnect_sccs (cfg);
+
+  vec<vec<int>> entries {};
+  vec<vec<int>> exits {};
+  entries.safe_grow_cleared (nscc);
+  exits.safe_grow_cleared (nscc);
+
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    {
+      if (scc_entry_vertex_p (cfg, i))
+	entries[cfg->vertices[i].component].safe_push (i);
+      if (scc_exit_vertex_p (cfg, i))
+	exits[cfg->vertices[i].component].safe_push (i);
+    }
+
+  const int p10[] = { 10 };
+  const int p08[] = { 8 };
+  const int p03[] = { 3 };
+  const int p02[] = { 2 };
+  const int p01[] = { 1 };
+  const int p00[] = { 0 };
+  const int p26[] = { 2, 6 };
+
+  ASSERT_EQ (entries.length (), 6);
+  ASSERT_TRUE (any_equal_p (p10, entries));
+  ASSERT_TRUE (any_equal_p (p08, entries));
+  ASSERT_TRUE (any_equal_p (p03, entries));
+  ASSERT_TRUE (any_equal_p (p02, entries));
+  ASSERT_TRUE (any_equal_p (p01, entries));
+  ASSERT_TRUE (any_equal_p (p00, entries));
+
+  ASSERT_EQ (exits.length (), 6);
+  ASSERT_TRUE (any_equal_p (p10, exits));
+  ASSERT_TRUE (any_equal_p (p08, exits));
+  ASSERT_TRUE (any_equal_p (p03, exits));
+  ASSERT_TRUE (any_equal_p (p26, exits));
+  ASSERT_TRUE (any_equal_p (p01, exits));
+  ASSERT_TRUE (any_equal_p (p00, exits));
+
+  /* The result of disconnect_sccs () disconnects the graph into its strongly
+     connected components. The subgraphs are either singletons (a single
+     vertex with no edges) or graphs with cycles.  The SCC internal prime
+     paths can be found by running a DFS from every SCC vertex, terminating
+     on a duplicated vertex.  This may create some redundant paths still,
+     which must be filtered out.
+
+     Singletons can either be detected and skipped (requires counting the
+     components) or filtered after.  For this test case they are skipped
+     because other graph inconsistencies are easier to detect.  */
+
+  /* Count and check singleton components.  */
+  vec<int> scc_size {};
+  scc_size.safe_grow_cleared (nscc);
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    scc_size[cfg->vertices[i].component]++;
+  ASSERT_EQ (nscc, 6);
+  ASSERT_EQ (scc_size[0], 1);
+  ASSERT_EQ (scc_size[1], 1);
+  ASSERT_EQ (scc_size[2], 1);
+  ASSERT_EQ (scc_size[3], 6);
+  ASSERT_EQ (scc_size[4], 1);
+  ASSERT_EQ (scc_size[5], 1);
+
+  /* Manually unroll the loop finding the simple paths starting at the
+     vertices in the SCCs.  In this case there is only the one SCC.  */
+  trie ccfg_paths;
+  simple_paths (ccfg, ccfg_paths, 2);
+  simple_paths (ccfg, ccfg_paths, 4);
+  simple_paths (ccfg, ccfg_paths, 5);
+  simple_paths (ccfg, ccfg_paths, 6);
+  simple_paths (ccfg, ccfg_paths, 7);
+  simple_paths (ccfg, ccfg_paths, 9);
+  /* then in+out of trie */
+  vec<vec<int>> xscc_internal_pp = ccfg_paths.paths ();
+  trie scc_internal_pp;
+  for (auto &p : xscc_internal_pp)
+    scc_internal_pp.insert_with_suffix (p);
+
+  const int pp01[] = { 5, 7, 2, 4, 6, 9 };
+  const int pp02[] = { 4, 5, 7, 2, 4 };
+  const int pp03[] = { 4, 6, 9, 7, 2, 4 };
+  const int pp04[] = { 2, 4, 5, 7, 2 };
+  const int pp05[] = { 2, 4, 6, 9, 7, 2 };
+  const int pp06[] = { 5, 7, 2, 4, 5 };
+  const int pp07[] = { 6, 9, 7, 2, 4, 6 };
+  const int pp08[] = { 7, 2, 4, 5, 7 };
+  const int pp09[] = { 9, 7, 2, 4, 6, 9 };
+  const int pp10[] = { 7, 2, 4, 6, 9, 7 };
+  const int pp11[] = { 6, 9, 7, 2, 4, 5 };
+
+  ASSERT_EQ (count (scc_internal_pp), 11);
+  ASSERT_TRUE (contains (scc_internal_pp, pp01));
+  ASSERT_TRUE (contains (scc_internal_pp, pp02));
+  ASSERT_TRUE (contains (scc_internal_pp, pp03));
+  ASSERT_TRUE (contains (scc_internal_pp, pp04));
+  ASSERT_TRUE (contains (scc_internal_pp, pp05));
+  ASSERT_TRUE (contains (scc_internal_pp, pp06));
+  ASSERT_TRUE (contains (scc_internal_pp, pp07));
+  ASSERT_TRUE (contains (scc_internal_pp, pp08));
+  ASSERT_TRUE (contains (scc_internal_pp, pp09));
+  ASSERT_TRUE (contains (scc_internal_pp, pp10));
+  ASSERT_TRUE (contains (scc_internal_pp, pp11));
+}
+
+/* The remaining tests deconstructs the algorithm and runs only a single phase
+   with good inputs at that point.  This makes it easier to pinpoint where
+   things go wrong, and helps show in steps how the algorithm works and the
+   phases relate.
+
+   The phases and their inputs and outputs are bazed on Fazli & Afsarchi.  */
+
+static void
+test_scc_internal_prime_paths ()
+{
+  /* This graph has only the SCC subgraph.  The result of running prime-paths
+     on it should be the SCC internal prime paths of the full graph.  */
+  auto_graph scc = new_graph (11);
+  add_edge (scc, 0, 2);
+  add_edge (scc, 2, 4);
+  add_edge (scc, 4, 5);
+  add_edge (scc, 4, 6);
+  add_edge (scc, 5, 7);
+  add_edge (scc, 6, 9);
+  add_edge (scc, 9, 7);
+  add_edge (scc, 7, 2);
+
+  vec<vec<int>> paths = prime_paths (scc, 100);
+  const int p01[] = { 5, 7, 2, 4, 6, 9 };
+  const int p02[] = { 4, 6, 9, 7, 2, 4 };
+  const int p03[] = { 2, 4, 6, 9, 7, 2 };
+  const int p04[] = { 6, 9, 7, 2, 4, 6 };
+  const int p05[] = { 9, 7, 2, 4, 6, 9 };
+  const int p06[] = { 7, 2, 4, 6, 9, 7 };
+  const int p07[] = { 6, 9, 7, 2, 4, 5 };
+  const int p08[] = { 4, 5, 7, 2, 4 };
+  const int p09[] = { 2, 4, 5, 7, 2 };
+  const int p10[] = { 5, 7, 2, 4, 5 };
+  const int p11[] = { 7, 2, 4, 5, 7 };
+
+  ASSERT_TRUE (any_equal_p (p01, paths));
+  ASSERT_TRUE (any_equal_p (p02, paths));
+  ASSERT_TRUE (any_equal_p (p03, paths));
+  ASSERT_TRUE (any_equal_p (p04, paths));
+  ASSERT_TRUE (any_equal_p (p05, paths));
+  ASSERT_TRUE (any_equal_p (p06, paths));
+  ASSERT_TRUE (any_equal_p (p07, paths));
+  ASSERT_TRUE (any_equal_p (p08, paths));
+  ASSERT_TRUE (any_equal_p (p09, paths));
+  ASSERT_TRUE (any_equal_p (p10, paths));
+  ASSERT_TRUE (any_equal_p (p11, paths));
+  release_vec_vec (paths);
+}
+
+/* Test the entry/exit path helpers for the strongly connected component in
+   binary_search.  The SCC has one entry (2, the loop header) and two exits (2,
+   6, the loop exit and return).  */
+static void
+test_scc_entry_exit_paths ()
+{
+  auto_graph scc = new_graph (11);
+  add_edge (scc, 2, 4);
+  add_edge (scc, 4, 5);
+  add_edge (scc, 4, 6);
+  add_edge (scc, 5, 7);
+  add_edge (scc, 6, 9);
+  add_edge (scc, 9, 7);
+  add_edge (scc, 7, 2);
+
+  trie scc_internal_trie;
+  simple_paths (scc, scc_internal_trie, 2);
+  simple_paths (scc, scc_internal_trie, 4);
+  simple_paths (scc, scc_internal_trie, 5);
+  simple_paths (scc, scc_internal_trie, 6);
+  simple_paths (scc, scc_internal_trie, 7);
+  simple_paths (scc, scc_internal_trie, 9);
+  vec<vec<int>> scc_prime_paths = scc_internal_trie.paths ();
+
+  trie entry_exits {};
+  scc_entry_exit_paths (scc_prime_paths, 2, 2, entry_exits);
+  scc_entry_exit_paths (scc_prime_paths, 2, 6, entry_exits);
+
+  const int p01[] = { 2 };
+  const int p02[] = { 2, 4, 6 };
+
+  ASSERT_EQ (count (entry_exits), 2);
+  ASSERT_TRUE (contains (entry_exits, p01));
+  ASSERT_TRUE (contains (entry_exits, p02));
+
+  trie exits;
+  scc_exit_paths (scc_prime_paths, 2, exits);
+  scc_exit_paths (scc_prime_paths, 6, exits);
+
+  const int p03[] = { 4, 6, 9, 7, 2 };
+  const int p04[] = { 5, 7, 2, 4, 6 };
+  const int p05[] = { 9, 7, 2, 4, 6 };
+  const int p06[] = { 4, 5, 7, 2 };
+
+  ASSERT_EQ (count (exits), 4);
+  ASSERT_TRUE (contains (exits, p03));
+  ASSERT_TRUE (contains (exits, p04));
+  ASSERT_TRUE (contains (exits, p05));
+  ASSERT_TRUE (contains (exits, p06));
+
+  trie entries;
+  scc_entry_paths (scc_prime_paths, 2, entries);
+
+  const int p07[] = { 2, 4, 6, 9, 7 };
+  const int p08[] = { 2, 4, 5, 7 };
+  ASSERT_EQ (count (entries), 2);
+  ASSERT_TRUE (contains (entries, p07));
+  ASSERT_TRUE (contains (entries, p08));
+
+  release_vec_vec (scc_prime_paths);
+}
+
+static void
+test_complete_prime_paths ()
+{
+  const int ccfgpp0[] = { 0, 1, 2, 3, 5 };
+  const int ccfgpp1[] = { 0, 1, 2, 4, 5 };
+  trie ccfg_prime_paths {};
+  ccfg_prime_paths.insert (ccfgpp0);
+  ccfg_prime_paths.insert (ccfgpp1);
+
+  const int scc0[] = { 2 };
+  const int scc1[] = { 2, 4, 6 };
+
+  const int ccfg_single[] = { 0, 1, 3, 8, 10 };
+
+  auto_graph cfg = binary_search_cfg ();
+  auto_sbitmap_vector edges = sbitmap_vector_alloc (cfg->n_vertices,
+						    cfg->n_vertices);
+  bitmap_vector_clear (edges, cfg->n_vertices);
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    for (graph_edge *e = cfg->vertices[i].succ; e; e = e->succ_next)
+      bitmap_set_bit (edges[e->src], e->dest);
+
+  vec<trie> ccfg_paths {};
+  ccfg_paths.safe_grow_cleared (6);
+  ccfg_paths[0].insert (array_slice <const int> (ccfg_single + 0, 1));
+  ccfg_paths[1].insert (array_slice <const int> (ccfg_single + 1, 1));
+  ccfg_paths[3].insert (array_slice <const int> (ccfg_single + 2, 1));
+  ccfg_paths[4].insert (array_slice <const int> (ccfg_single + 3, 1));
+  ccfg_paths[5].insert (array_slice <const int> (ccfg_single + 4, 1));
+
+  /* The paths for the SCC would need to be updated in ccfg pre pass.  This
+     feels like a clumsy interface and should maybe be disconnected from the
+     struct graph.  */
+  ccfg_paths[2].hard_insert (scc0);
+  ccfg_paths[2].hard_insert (scc1);
+
+  trie cpp = cfg_complete_prime_paths (edges, ccfg_paths,
+  				       ccfg_prime_paths);
+
+  const int p01[] = { 0, 1, 2, 3, 10 };
+  const int p02[] = { 0, 1, 2, 4, 6, 8, 10 };
+
+  ASSERT_EQ (count (cpp), 2);
+  ASSERT_TRUE (contains (cpp, p01));
+  ASSERT_TRUE (contains (cpp, p02));
+}
+
+static vec<int>
+binary_search_scc_map ()
+{
+  vec<int> sccs {};
+  sccs.safe_grow (11);
+  sccs[0] = 0;
+  sccs[1] = 1;
+  sccs[2] = 2;
+  sccs[3] = 3;
+  sccs[4] = 2;
+  sccs[5] = 2;
+  sccs[6] = 2;
+  sccs[7] = 2;
+  sccs[8] = 4;
+  sccs[9] = 2;
+  sccs[10] = 5;
+  return sccs;
+}
+
+static void
+test_exit_prime_paths ()
+{
+  const int cpp01[] = { 0, 1, 2, 3, 10 };
+  const int cpp02[] = { 0, 1, 2, 4, 6, 8, 10 };
+  trie complete_prime_paths {};
+  complete_prime_paths.insert_with_suffix (cpp01);
+  complete_prime_paths.insert_with_suffix (cpp02);
+
+  const int ep01[] = { 4, 6, 9, 7, 2 };
+  const int ep02[] = { 5, 7, 2, 4, 6 };
+  const int ep03[] = { 9, 7, 2, 4, 6 };
+  const int ep04[] = { 4, 5, 7, 2 };
+  trie exit_paths;
+  exit_paths.insert (ep01);
+  exit_paths.insert (ep02);
+  exit_paths.insert (ep03);
+  exit_paths.insert (ep04);
+
+  auto_graph cfg = binary_search_cfg ();
+  trie epp = scc_exit_prime_paths (cfg, exit_paths, complete_prime_paths);
+
+  const int pp01[] = { 4, 6, 9, 7, 2, 3, 10 };
+  const int pp02[] = { 5, 7, 2, 4, 6, 8, 10 };
+  const int pp03[] = { 9, 7, 2, 4, 6, 8, 10 };
+  const int pp04[] = { 4, 5, 7, 2, 3, 10 };
+
+  ASSERT_EQ (count (epp), 4);
+  ASSERT_TRUE (contains (epp, pp01));
+  ASSERT_TRUE (contains (epp, pp02));
+  ASSERT_TRUE (contains (epp, pp03));
+  ASSERT_TRUE (contains (epp, pp04));
+}
+
+static void
+test_entry_prime_paths ()
+{
+  struct graph *cfg = binary_search_cfg ();
+
+  static int sccep01[] = { 2, 4, 6, 9, 7 };
+  static int sccep02[] = { 2, 4, 5, 7 };
+  trie scc_entry_paths;
+  scc_entry_paths.insert (sccep01);
+  scc_entry_paths.insert (sccep02);
+
+  const int cpp01[] = { 0, 1, 2, 3, 10 };
+  const int cpp02[] = { 0, 1, 2, 4, 6, 8, 10 };
+  trie complete_prime_paths {};
+  complete_prime_paths.insert (cpp01);
+  complete_prime_paths.insert (cpp02);
+
+  const int ep01[] = { 4, 6, 9, 7, 2, 3, 10 };
+  const int ep02[] = { 5, 7, 2, 4, 6, 8, 10 };
+  const int ep03[] = { 9, 7, 2, 4, 6, 8, 10 };
+  const int ep04[] = { 4, 5, 7, 2, 3, 10 };
+  trie exit_prime_paths {};
+  exit_prime_paths.insert (ep01);
+  exit_prime_paths.insert (ep02);
+  exit_prime_paths.insert (ep03);
+  exit_prime_paths.insert (ep04);
+
+  vec<int> sccs = binary_search_scc_map ();
+
+  trie epp = scc_entry_prime_paths (cfg, scc_entry_paths,
+				    complete_prime_paths,
+				    exit_prime_paths);
+
+  /* The 0 (start node) does not show up in the Fazli & Afschardi paper and
+     kinda, but has no real impact on the result.  The prime-paths functions
+     do not care about these vertices, but the path-coverage instrumentation
+     filters out the ENTRY/EXIT blocks from all the paths.  */
+  const int pp01[] = { 0, 1, 2, 4, 6, 9, 7 };
+  const int pp02[] = { 0, 1, 2, 4, 5, 7 };
+  ASSERT_EQ (count (epp), 2);
+  ASSERT_TRUE (contains (epp, pp01));
+  ASSERT_TRUE (contains (epp, pp02));
+}
+
+/* A straight-line graph with one vertex should yield a single path of length 1
+   with just the non-exit non-entry node in it.  */
+void
+test_singleton_path ()
+{
+  auto_graph cfg = new_graph (3);
+  add_edge (cfg, 0, 2);
+  add_edge (cfg, 2, 1);
+  vec<vec<int>> paths = prime_paths (cfg, 100);
+
+  ASSERT_EQ (paths.length (), 1);
+  ASSERT_EQ (paths[0].length (), 3);
+  ASSERT_EQ (paths[0][0], 0);
+  ASSERT_EQ (paths[0][1], 2);
+  ASSERT_EQ (paths[0][2], 1);
+  release_vec_vec (paths);
+}
+
+void
+path_coverage_cc_tests ()
+{
+  test_prime_paths ();
+  test_build_ccfg ();
+  test_split_components ();
+  test_scc_internal_prime_paths ();
+  test_scc_entry_exit_paths ();
+  test_complete_prime_paths ();
+  test_exit_prime_paths ();
+  test_entry_prime_paths ();
+  test_singleton_path ();
+}
+
+} // namespace selftest
+
+#endif /* #if CHECKING_P */
diff --git a/gcc/profile.cc b/gcc/profile.cc
index 25d4f4a4b86..c3d06c4d387 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -1545,7 +1545,7 @@  branch_prob (bool thunk)
 
   remove_fake_edges ();
 
-  if (condition_coverage_flag || profile_arc_flag)
+  if (condition_coverage_flag || path_coverage_flag || profile_arc_flag)
       gimple_init_gcov_profiler ();
 
   if (condition_coverage_flag)
@@ -1597,6 +1597,10 @@  branch_prob (bool thunk)
 	instrument_values (values);
     }
 
+  void find_paths(struct function*);
+  if (path_coverage_flag)
+    find_paths(cfun);
+
   free_aux_for_edges ();
 
   values.release ();
diff --git a/gcc/selftest-run-tests.cc b/gcc/selftest-run-tests.cc
index e6779206c47..9361e43ccdf 100644
--- a/gcc/selftest-run-tests.cc
+++ b/gcc/selftest-run-tests.cc
@@ -105,6 +105,7 @@  selftest::run_tests ()
   diagnostic_path_cc_tests ();
   simple_diagnostic_path_cc_tests ();
   attribs_cc_tests ();
+  path_coverage_cc_tests ();
 
   /* This one relies on most of the above.  */
   function_tests_cc_tests ();
diff --git a/gcc/selftest.h b/gcc/selftest.h
index dcb1463ed90..4df9c0dd836 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -241,6 +241,7 @@  extern void json_cc_tests ();
 extern void optinfo_emit_json_cc_tests ();
 extern void opts_cc_tests ();
 extern void ordered_hash_map_tests_cc_tests ();
+extern void path_coverage_cc_tests ();
 extern void predict_cc_tests ();
 extern void pretty_print_cc_tests ();
 extern void range_tests ();
diff --git a/gcc/testsuite/g++.dg/gcov/gcov-22.C b/gcc/testsuite/g++.dg/gcov/gcov-22.C
new file mode 100644
index 00000000000..bc290714abd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/gcov-22.C
@@ -0,0 +1,68 @@ 
+/* { dg-options "--coverage -fpath-coverage" } */
+/* { dg-do compile } */
+
+#include <stdexcept>
+
+/* This test is a collection of odd CFG shapes which has been shown to
+   trigger ICEs.  */
+
+/* A hard abort-like exit could lead to paths with only the exit node,
+   which would then be an empty path after entry/exit prunng.  */
+void xabort () { __builtin_exit (0); }
+
+/* Unconditional exceptions have the same effect as aborts
+   w.r.t. empty paths.  */
+int throws (int) { throw std::runtime_error("exception"); }
+
+/* Bad instrumentation would give
+   'error: returns_twice call is not first in basic block 3'.  */
+int _setjmp (void **);
+void set_jmp ()
+{
+  _setjmp (0);
+}
+
+/* From g++.dg/torture/pr83482.C  */
+void a();
+struct b {
+  virtual long c() { return 0L; }
+  void m_fn2() { c(); }
+} d;
+
+void e() {
+  d.m_fn2();
+  try {
+    a();
+    _setjmp(0);
+  } catch (...) {
+  }
+}
+
+void multiple_setjmp ()
+{
+  _setjmp (0);
+  _setjmp (0);
+}
+
+void multiple_conditional_setjmp (int a)
+{
+  if (a)
+    _setjmp (0);
+  else
+    _setjmp (0);
+
+  _setjmp (0);
+}
+
+int id (int x) { return x; }
+int infinite1 ()
+{
+  for (int c = 0; ; c++) id (c);
+}
+
+void infinite2 ()
+{
+  for (;;) {}
+}
+
+/* { dg-final { run-gcov gcov-22.C } } */
diff --git a/gcc/testsuite/gcc.misc-tests/gcov-29.c b/gcc/testsuite/gcc.misc-tests/gcov-29.c
new file mode 100644
index 00000000000..3419b922398
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/gcov-29.c
@@ -0,0 +1,861 @@ 
+/* { dg-options "--coverage -fpath-coverage" } */
+/* { dg-do run { target native } } */
+
+void
+pathcov001a ()
+{
+  /* Empty functions should not be problematic.  */
+}
+
+/* Straight line, which should have only one path.  */
+/* BEGIN paths
+   summary: 1/1
+*/
+void
+pathcov001b ()
+/* END */
+{
+  int a = 0;
+  ++a;
+}
+
+/* Same as b, but not executed.  */
+/* BEGIN paths
+   summary: 0/1
+   expect: 33
+*/
+void
+pathcov001c ()
+/* END */
+{
+  int a = 0;
+  ++a;
+}
+
+/* 002 is a simple baseline test, with no complicated control flow and no
+   loops, run with different combinations of inputs that tests the paths in
+   isolation.  */
+/* BEGIN paths
+   summary: 0/2
+   expect: 48(true) 49 52
+   expect: 48(false) 51 52
+*/
+void
+pathcov002a (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* BEGIN paths
+    summary: 1/2
+    expect: 63(false) 66 67
+*/
+void
+pathcov002c (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* BEGIN paths
+   summary: 1/2
+   expect: 78(true) 79 82
+*/
+void
+pathcov002b (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* Identical to 002*, but run for both inputs.  This should achieve full
+   coverage.
+
+   BEGIN paths
+   summary: 2/2
+*/
+void
+pathcov002d (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* Test individual control flow structures in isolation.  */
+
+/* BEGIN paths
+   summary: 0/2
+   expect: 112(true) 113 114
+   expect: 112(false) 114
+*/
+void
+pathcov003a (int a)
+/* END */
+{
+  if (a)
+    a++;
+}
+
+/* BEGIN paths
+   summary: 0/2
+   expect: 125(true) 126 129
+   expect: 125(false) 128 129
+*/
+void
+pathcov003b (int a)
+/* END */
+{
+  if (a)
+    a++;
+  else
+    a--;
+}
+
+/* BEGIN paths
+   summary: 0/3
+   expect: 141(true) 142 147
+   expect: 141(false) 143(true) 144 147
+   expect: 141(false) 143(false) 146 147
+*/
+void
+pathcov003c (int a)
+/* END */
+{
+  if (a > 10)
+    a++;
+  else if (a > 20)
+    a--;
+  else
+    a += 2;
+}
+
+/* BEGIN paths
+   summary: 0/5
+   expect: 162 162(true) 163
+   expect: 162 162(false) 164
+   expect: 163 162(true) 163
+   expect: 163 162(false) 164
+   expect: 162(true) 163 162
+*/
+void
+pathcov003d (int a)
+/* END */
+{
+  int x = 0;
+  for (int i = 0; i < a; ++i)
+    ++x;
+}
+
+/* BEGIN paths
+   summary: 0/5
+   expect: 180 180(true) 181
+   expect: 180 180(false) 182
+   expect: 181 180(true) 181
+   expect: 181 180(false) 182
+   expect: 180(true) 181 180
+*/
+void
+pathcov003e (int a)
+/* END */
+{
+  int x = 0;
+  int i = 0;
+  while (i++ < a)
+    x++;
+}
+
+/* BEGIN paths
+   summary: 0/2
+   expect: 194 197(false) 198
+   expect: 197(true) 197
+*/
+void
+pathcov003f (int a)
+/* END */
+{
+  int x = 0;
+  int i = 0;
+  do {
+      x++;
+  } while (i++ < a);
+}
+
+/* BEGIN paths
+   summary: 0/5
+   expect: 213 216(true) 220
+   expect: 213 216(false) 222
+   expect: 216(true) 220 216
+   expect: 220 216(true) 220
+   expect: 220 216(false) 222
+*/
+void
+pathcov003g (int a)
+/* END */
+{
+  int i = 0;
+  int x = 0;
+
+top:
+  if (i < a)
+    {
+      x++;
+      i++;
+      goto top;
+    }
+}
+
+/* This example has a good mix of control flow structures which makes it nice
+   at identifying problems with the bookeeping/instrumentation.  */
+
+/* BEGIN paths
+   summary: 0/9
+   expect: 243(false) 247 247(true) 247(true) 249(true) 250 253
+   expect: 243(false) 247 247(true) 247(false) 253
+   expect: 243(false) 247 247(false) 253
+   expect: 243(true) 253
+   expect: 249(false) 247(true) 247(true) 249
+   expect: 249(false) 247(true) 247(false) 253
+   expect: 247(true) 247(true) 249(false) 247
+   expect: 247(true) 249(false) 247(true) 247
+   expect: 247(true) 249(false) 247(false) 253
+*/
+void
+pathcov004a (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (1, 0, 0, 0)
+   summary: 1/9 */
+void
+pathcov004b (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (0, 0, 0, 0)
+   summary: 1/9 */
+void
+pathcov004c (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (0, 1, 0, 0)
+   summary: 1/9 */
+void
+pathcov004d (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (0, 1, 1, 0)
+   summary: 2/9 */
+void
+pathcov004e (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* BEGIN paths
+   args: (0, 2, 1, 0)
+   summary: 3/9 */
+void
+pathcov004f (int a, int b, int c, int d)
+/* END */
+{
+  if (a)
+    {}
+  else
+    {
+      while (b-- > 0 && c-- > 0)
+	{
+	  if (d)
+	    break;
+	}
+    }
+}
+
+/* Funny loop exits.  */
+
+/* BEGIN paths
+   summary: 0/14
+*/
+void
+pathcov005a (int a, int b, int c)
+/* END */
+{
+  while (a)
+    {
+      if (b)
+	return;
+      while (c--)
+	a++;
+    }
+}
+
+void
+pathcov005b (int a, int b, int c)
+/* END */
+{
+  if (a)
+    goto loop;
+
+  while (b > 0)
+    {
+      b--;
+loop:
+      while (c--)
+	a++;
+    }
+}
+
+void
+pathcov005c (int a, int b, int c)
+/* END */
+{
+  while (a-- > 0) c++;
+  while (b-- > 0) c++;
+}
+
+/* BEGIN paths
+   summary: 0/67
+
+   This is a sanity check and baseline and should not be executed.
+
+   With >64 cases we should have >64 paths which guarantees we cannot fit the
+   full bitset within a in a single gcov type.  We want to only update the
+   relevant counters because extra instructions are expensive in compile time
+   and binary bloat, and verify that only taken paths are recorded.  */
+void
+pathcov006a (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+  }
+}
+
+/* BEGIN paths
+   args: (0)
+   summary: 1/67
+*/
+void
+pathcov006b (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+  }
+}
+
+/* BEGIN paths
+   args: (64)
+   summary: 1/67 */
+void
+pathcov006c (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+  }
+}
+
+/* BEGIN paths
+   args: (2, 65)
+   summary: 2/67
+
+   In this case we should record a path in both halves of the accumulator
+   bitsets.  Note that the paths don't overlap with the single-half examples in
+   006b and 006c to reduce the chance of accidental passes.  */
+void
+pathcov006d (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+  }
+}
+
+/* BEGIN paths
+   args: (66)
+   summary: 1/67
+
+   This case should hit the empty default.  */
+void
+pathcov006e (int a)
+/* END */
+{
+  int x = 0;
+  switch (a)
+  {
+    case 0: x++; break;
+    case 1: x++; break;
+    case 2: x++; break;
+    case 3: x++; break;
+    case 4: x++; break;
+    case 5: x++; break;
+    case 6: x++; break;
+    case 7: x++; break;
+    case 8: x++; break;
+    case 9: x++; break;
+    case 10: x++; break;
+    case 11: x++; break;
+    case 12: x++; break;
+    case 13: x++; break;
+    case 14: x++; break;
+    case 15: x++; break;
+    case 16: x++; break;
+    case 17: x++; break;
+    case 18: x++; break;
+    case 19: x++; break;
+    case 20: x++; break;
+    case 21: x++; break;
+    case 22: x++; break;
+    case 23: x++; break;
+    case 24: x++; break;
+    case 25: x++; break;
+    case 26: x++; break;
+    case 27: x++; break;
+    case 28: x++; break;
+    case 29: x++; break;
+    case 30: x++; break;
+    case 31: x++; break;
+    case 32: x++; break;
+    case 33: x++; break;
+    case 34: x++; break;
+    case 35: x++; break;
+    case 36: x++; break;
+    case 37: x++; break;
+    case 38: x++; break;
+    case 39: x++; break;
+    case 40: x++; break;
+    case 41: x++; break;
+    case 42: x++; break;
+    case 43: x++; break;
+    case 44: x++; break;
+    case 45: x++; break;
+    case 46: x++; break;
+    case 47: x++; break;
+    case 48: x++; break;
+    case 49: x++; break;
+    case 50: x++; break;
+    case 51: x++; break;
+    case 52: x++; break;
+    case 53: x++; break;
+    case 54: x++; break;
+    case 55: x++; break;
+    case 56: x++; break;
+    case 57: x++; break;
+    case 58: x++; break;
+    case 59: x++; break;
+    case 60: x++; break;
+    case 61: x++; break;
+    case 62: x++; break;
+    case 63: x++; break;
+    case 64: x++; break;
+    case 65: x++; break;
+    default:
+  }
+}
+
+void *alloc (int) { return 0; }
+void *getcwd (void *, int) { return 0; }
+void release (void *) {}
+
+/* Based on gnu_getcwd from tree.c  */
+void *gnu_getcwd()
+{
+  int size = 100;
+  void *buffer = alloc (size);
+
+  while (1) {
+    void *value = getcwd (buffer, size);
+    if (value != 0) return buffer;
+    size *= 2;
+    release (buffer);
+    buffer = (void *) alloc (size);
+  }
+}
+
+void loopy (int a)
+{
+  goto mid;
+  while (1)
+    {
+      a--;
+    mid:
+      a--;
+      if (a < -5)
+	break;
+      a--;
+    }
+}
+
+int
+main ()
+{
+  pathcov001a ();
+  pathcov001b ();
+  /* not called: pathcov001c (); */
+
+  /* not called: pathcov002a (); */
+  pathcov002b (0);
+  pathcov002c (1);
+  pathcov002d (0);
+  pathcov002d (1);
+
+  pathcov004b (1, 0, 0, 0);
+  pathcov004c (0, 0, 0, 0);
+  pathcov004d (0, 1, 0, 0);
+  pathcov004e (0, 1, 1, 0);
+  pathcov004f (0, 2, 1, 0);
+
+  /* not called: pathcov006a (); */
+  pathcov006b (0);
+  pathcov006c (64);
+  pathcov006d (2);
+  pathcov006d (65);
+  pathcov006e (66);
+}
+
+/* { dg-final { run-gcov prime-paths { --prime-paths gcov-29.c } } } */
diff --git a/gcc/testsuite/lib/gcov.exp b/gcc/testsuite/lib/gcov.exp
index e49f1011884..3fc7b65bee5 100644
--- a/gcc/testsuite/lib/gcov.exp
+++ b/gcc/testsuite/lib/gcov.exp
@@ -533,6 +533,86 @@  proc verify-filters { testname testcase file expected unexpected } {
     return [expr [llength $expected] + [llength $unexpected]]
 }
 
+proc verify-prime-paths { testname testcase file } {
+    set failed 0
+    set fd [open $file r]
+
+    set expected_n -1
+    set expected_m -1
+    set recording 0
+    set expected ""
+
+    while { [gets $fd line] >= 0 } {
+	regexp "^\[^:\]+: *(\[0-9\]+):" "$line" all lineno
+	set prefix "$testname line $lineno"
+
+	if {[regexp "BEGIN *paths" $line]} {
+	    set recording 1
+	    set expected ""
+	    set expected_n -1
+	    set expected_m -1
+	    set seen ""
+	    continue
+	}
+	
+        if { $recording != 1 } {
+            continue
+        }
+
+        if [regexp {summary: *(\d+)/(\d+)} $line _ n m] {
+            set expected_n $n
+            set expected_m $m
+        }
+
+        if [regexp "expect: *(.*)" $line all ln] {
+            set cases ""
+            set ln [regsub -all {\s+} $ln " "]
+            foreach case [split  $ln " "] {
+                lappend cases $case
+            }
+            lappend expected $cases
+        }
+
+        if [regexp "END" $line] {
+            if {$recording != 1} {
+                incr failed
+                fail "unexpected END at line $lineno, missing BEGIN"
+
+                # Abort the test if there is a mismatch, to avoid creating
+                # unecessary errors.  At this point the test itself is broken.
+                break
+            }
+            set recording 0
+
+            if {[llength $expected] > 0} {
+                incr failed
+                fail "expected: '$expected'"
+            }
+        }
+
+        if [regexp {paths covered (\d+) of (\d+)} $line _ n m] {
+            if { $n ne $expected_n || $m ne $expected_m } {
+                incr failed
+                fail "$prefix: expected $expected_n/$expected_m covered paths, was $n/$m"
+            }
+        }
+
+        if [regexp {path *\d+ not covered: lines (.*)} $line _ path] {
+            set pathl ""
+            foreach ln [split $path " "] {
+		if [regexp {\s*(.*)\s*} $ln _ key] {
+                    lappend pathl $key
+                }
+            }
+            set i [lsearch $expected $pathl]
+            set expected [lreplace $expected $i $i]
+        }
+    }
+
+    close $fd
+    return $failed
+}
+
 proc gcov-pytest-format-line { args } {
     global subdir
 
@@ -606,6 +686,7 @@  proc run-gcov { args } {
     set gcov_verify_calls 0
     set gcov_verify_branches 0
     set gcov_verify_conditions 0
+    set gcov_verify_prime_paths 0
     set gcov_verify_lines 1
     set gcov_verify_intermediate 0
     set gcov_verify_filters 0
@@ -623,6 +704,8 @@  proc run-gcov { args } {
 	  set gcov_verify_filters 1
 	  set verify_filters_expected [lindex $a 1]
 	  set verify_filters_unexpected [lindex $a 2]
+	} elseif { $a == "prime-paths" } {
+	  set gcov_verify_prime_paths 1
 	} elseif { $a == "intermediate" } {
 	  set gcov_verify_intermediate 1
 	  set gcov_verify_calls 0
@@ -703,6 +786,11 @@  proc run-gcov { args } {
     } else {
 	set cdfailed 0
     }
+    if { $gcov_verify_prime_paths } {
+	set ppfailed [verify-prime-paths $testname $testcase $testcase.gcov]
+    } else {
+	set ppfailed 0
+    }
     if { $gcov_verify_calls } {
 	set cfailed [verify-calls $testname $testcase $testcase.gcov]
     } else {
@@ -722,12 +810,12 @@  proc run-gcov { args } {
 
     # Report whether the gcov test passed or failed.  If there were
     # multiple failures then the message is a summary.
-    set tfailed [expr $lfailed + $bfailed + $cdfailed + $cfailed + $ifailed + $ffailed]
+    set tfailed [expr $lfailed + $bfailed + $cdfailed + $ppfailed + $cfailed + $ifailed + $ffailed]
     if { $xfailed } {
 	setup_xfail "*-*-*"
     }
     if { $tfailed > 0 } {
-	fail "$testname gcov: $lfailed failures in line counts, $bfailed in branch percentages, $cdfailed in condition/decision, $cfailed in return percentages, $ifailed in intermediate format, $ffailed failed in filters"
+	fail "$testname gcov: $lfailed failures in line counts, $bfailed in branch percentages, $cdfailed in condition/decision, $ppfailed in prime-paths, $cfailed in return percentages, $ifailed in intermediate format, $ffailed failed in filters"
 	if { $xfailed } {
 	    clean-gcov $testcase
 	}
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index 153c9323040..80520eecd3e 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -1908,7 +1908,7 @@  tree_profiling (void)
 	  thunk = true;
 	  /* When generate profile, expand thunk to gimple so it can be
 	     instrumented same way as other functions.  */
-	  if (profile_arc_flag || condition_coverage_flag)
+	  if (profile_arc_flag || condition_coverage_flag || path_coverage_flag)
 	    expand_thunk (node, false, true);
 	  /* Read cgraph profile but keep function as thunk at profile-use
 	     time.  */
@@ -1953,7 +1953,8 @@  tree_profiling (void)
   release_profile_file_filtering ();
 
   /* Drop pure/const flags from instrumented functions.  */
-  if (profile_arc_flag || condition_coverage_flag || flag_test_coverage)
+  if (profile_arc_flag || condition_coverage_flag || path_coverage_flag
+      || flag_test_coverage)
     FOR_EACH_DEFINED_FUNCTION (node)
       {
 	if (!gimple_has_body_p (node->decl)
@@ -1985,7 +1986,8 @@  tree_profiling (void)
 
       push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
-      if (profile_arc_flag || condition_coverage_flag || flag_test_coverage)
+      if (profile_arc_flag || condition_coverage_flag || path_coverage_flag
+	  || flag_test_coverage)
 	FOR_EACH_BB_FN (bb, cfun)
 	  {
 	    gimple_stmt_iterator gsi;
@@ -2070,7 +2072,8 @@  pass_ipa_tree_profile::gate (function *)
      disabled.  */
   return (!in_lto_p && !flag_auto_profile
 	  && (flag_branch_probabilities || flag_test_coverage
-	      || profile_arc_flag || condition_coverage_flag)
+	      || profile_arc_flag || condition_coverage_flag
+	      || path_coverage_flag)
 	  && !seen_error ());
 }