diff mbox series

[pushed,3/4] pretty-print: reimplement pp_format with a new struct pp_token

Message ID 20240829225813.2567570-3-dmalcolm@redhat.com
State New
Headers show
Series [pushed,1/4] Use std::unique_ptr for optinfo_item | expand

Commit Message

David Malcolm Aug. 29, 2024, 10:58 p.m. UTC
The following patch rewrites the internals of pp_format.

A pretty_printer's output_buffer maintains a stack of chunk_info
instances, each one responsible for handling a call to pp_format, where
having a stack allows us to support re-entrant calls to pp_format on the
same pretty_printer.

Previously a chunk_info merely stored buffers of accumulated text
per unformatted run and per formatted argument.

This led to various special-casing for handling:

- urlifiers, needing class quoting_info to handle awkard cases where
  the run of quoted text could be split between stages 1 and 2
  of formatting

- dumpfiles, where the optinfo machinery could lead to objects being
  stashed during formatting for later replay to JSON optimization
  records

- in the C++ frontend, the format codes %H and %I can't be processed
  until we've seen both, leading to awkward code to manipulate the
  text buffers

Further, supporting URLs in messages in SARIF output (PR other/116419)
would add additional manipulations of text buffers, since our internal
pp_begin_url API gives the URL at the beginning of the wrapped text,
whereas SARIF's format for embedded URLs has the URL *after* the wrapped
text.  Also when handling "%@" we wouldn't necessarily know the URL of
an event ID until later, requiring further nasty special-case
manipulation of text buffers.

This patch rewrites pretty-print formatting by introducing a new
intermediate representation during formatting: pp_token and
pp_token_list.  Rather than simply accumulating a buffer of "char" in
the chunk_obstack during formatting, we now also accumulate a
pp_token_list, a doubly-linked list of pp_token, which can be:
- text buffers
- begin/end colorization
- begin/end quote
- begin/end URL
- "custom data" tokens

Working at the level of tokens rather than just text buffers allows the
various awkward special cases above to be replaced with uniform logic.
For example, all "urlification" is now done in phase 3 of formatting,
in one place, by looking for [..., BEGIN_QUOTE, TEXT, END_QUOTE, ...]
and injecting BEGIN_URL and END_URL wrapper tokens when the urlifier
has a URL for TEXT.  Doing so greatly simplifies the urlifier code,
allowing the removal of class quoting_info.

The tokens and token lists are allocated on the chunk_obstack, and so
there's no additional heap activity required, with the memory reclaimed
when the chunk_obstack is freed after phase 3 of formatting.

New kinds of pp_token can be added as needed to support output formats.
For example, the followup patch adds a token for "%@" for events IDs, to
better support SARIF output.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Lightly tested with valgrind.
Pushed to trunk as r15-3311-ge31b6176996567.

gcc/c/ChangeLog:
	* c-objc-common.cc (c_tree_printer): Convert final param from
	const char ** to pp_token_list &.

gcc/cp/ChangeLog:
	* error.cc: Include "make-unique.h".
	(deferred_printed_type::m_buffer_ptr): Replace with...
	(deferred_printed_type::m_printed_text): ...this and...
	(deferred_printed_type::m_token_list): ...this.
	(deferred_printed_type::deferred_printed_type): Update ctors for
	above changes.
	(deferred_printed_type::set_text_for_token_list): New.
	(append_formatted_chunk): Pass chunk_obstack to
	append_formatted_chunk.
	(add_quotes): Delete.
	(cxx_format_postprocessor::handle): Reimplement to call
	deferred_printed_type::set_text_for_token_list, rather than store
	buffer pointers.
	(defer_phase_2_of_type_diff): Replace param "buffer_ptr"
	with "formatted_token_list".  Reimplement by storing
	a pointer to formatted_token_list so that the postprocessor can
	put its text there.
	(cp_printer): Convert param "buffer_ptr" to
	"formatted_token_list".  Update calls to
	defer_phase_2_of_type_diff accordingly.

gcc/ChangeLog:
	* diagnostic.cc (diagnostic_context::report_diagnostic): Don't
	pass m_urlifier to pp_format, as urlification now happens in
	phase 3.
	* dump-context.h (class dump_pretty_printer): Update leading
	comment.
	(dump_pretty_printer::emit_items): Drop decl.
	(dump_pretty_printer::set_optinfo): New.
	(class dump_pretty_printer::stashed_item): Delete class.
	(class dump_pretty_printer::custom_token_printer): New class.
	(dump_pretty_printer::format_decoder_cb): Convert param from
	const char ** to pp_token_list &.
	(dump_pretty_printer::decode_format): Likewise.
	(dump_pretty_printer::stash_item): Likewise.
	(dump_pretty_printer::emit_any_pending_textual_chunks): Drop decl.
	(dump_pretty_printer::m_stashed_items): Delete field.
	(dump_pretty_printer::m_token_printer): New member data.
	* dumpfile.cc (struct wrapped_optinfo_item): New.
	(dump_pretty_printer::dump_pretty_printer): Update for dropping
	of field m_stashed_items and new field m_token_printer.
	(dump_pretty_printer::emit_items): Delete; we now use
	pp_output_formatted_text..
	(dump_pretty_printer::emit_any_pending_textual_chunks): Delete.
	(dump_pretty_printer::stash_item): Convert param from
	const char ** to pp_token_list &.
	(dump_pretty_printer::format_decoder_cb): Likewise.
	(dump_pretty_printer::decode_format): Likewise.
	(dump_pretty_printer::custom_token_printer::print_tokens): New.
	(dump_pretty_printer::custom_token_printer::emit_any_pending_textual_chunks):
	New.
	(dump_context::dump_printf_va): Call set_optinfo on the
	dump_pretty_printer.  Replace call to emit_items with a call to
	pp_output_formatted_text.
	* opt-problem.cc (opt_problem::opt_problem): Replace call to
	emit_items with call to set_optinfo and call to
	pp_output_formatted_text.
	* pretty-print-format-impl.h (struct pp_token): New.
	(struct pp_token_text): New.
	(is_a_helper <pp_token_text *>::test): New.
	(is_a_helper <const pp_token_text *>::test): New.
	(struct pp_token_begin_color): New.
	(is_a_helper <pp_token_begin_color *>::test): New.
	(is_a_helper <const pp_token_begin_color *>::test): New.
	(struct pp_token_end_color): New.
	(struct pp_token_begin_quote): New.
	(struct pp_token_end_quote): New.
	(struct pp_token_begin_url): New.
	(is_a_helper <pp_token_begin_url*>::test): New.
	(is_a_helper <const pp_token_begin_url*>::test): New.
	(struct pp_token_end_url): New.
	(struct pp_token_custom_data): New.
	(is_a_helper <pp_token_custom_data *>::test): New.
	(is_a_helper <const pp_token_custom_data *>::test): New.
	(class pp_token_list): New.
	(chunk_info::get_args): Drop.
	(chunk_info::get_quoting_info): Drop.
	(chunk_info::get_token_lists): New accessor.
	(chunk_info::append_formatted_chunk): Add obstack & param.
	(chunk_info::dump): New decls.
	(chunk_info::m_args): Convert element type from const char * to
	pp_token_list *.  Rewrite/update comment.
	(chunk_info::m_quotes): Drop field.
	* pretty-print-markup.h (class pp_token_list): New forward decl.
	(pp_markup::context::context): Drop urlifier param; add
	formatted_token_list param.
	(pp_markup::context::push_back_any_text): New decl.
	(pp_markup::context::m_urlifier): Drop field.
	(pp_markup::context::m_formatted_token_list): New field.
	* pretty-print-urlifier.h: Update comment.
	* pretty-print.cc: Define INCLUDE_MEMORY.  Include
	"make-unique.h".
	(default_token_printer): New forward decl.
	(obstack_append_string): Delete.
	(urlify_quoted_string): Delete.
	(pp_token::pp_token): New.
	(pp_token::dump): New.
	(allocate_object): New.
	(class quoting_info): Delete.
	(pp_token::operator new): New.
	(pp_token::operator delete): New.
	(pp_token_list::operator new): New.
	(pp_token_list::operator delete): New.
	(pp_token_list::pp_token_list): New.
	(pp_token_list::~pp_token_list): New.
	(pp_token_list::push_back_text): New.
	(pp_token_list::push_back): New.
	(pp_token_list::push_back_list): New.
	(pp_token_list::pop_front): New.
	(pp_token_list::remove_token): New.
	(pp_token_list::insert_after): New.
	(pp_token_list::replace_custom_tokens): New.
	(pp_token_list::merge_consecutive_text_tokens): New.
	(pp_token_list::apply_urlifier): New.
	(pp_token_list::dump): New.
	(chunk_info::append_formatted_chunk): Add obstack & param and use
	it to reimplement in terms of token lists.
	(chunk_info::pop_from_output_buffer): Drop m_quotes.
	(chunk_info::on_begin_quote): Delete.
	(chunk_info::dump): New.
	(chunk_info::on_end_quote): Delete.
	(push_back_any_text): New.
	(pretty_printer::format): Drop "urlifier" param and quoting_info
	logic.  Convert "formatters" and "args" from const ** to
	pp_token_list **.  Reimplement so that rather than just
	accumulating a text buffer in the chunk_obstack for each arg,
	instead also accumulate a pp_token_list and pp_tokens for each
	arg.
	(auto_obstack::operator obstack &): New.
	(quoting_info::handle_phase_3): Delete.
	(pp_output_formatted_text): Reimplement in terms of manipulations
	of pp_token_lists, rather than char buffers.  Call
	default_token_printer, or m_token_printer's print_tokens vfunc.
	(default_token_printer): New.
	(pretty_printer::pretty_printer): Initialize m_token_printer in
	both ctors.
	(pp_markup::context::begin_quote): Reimplement to use token list.
	(pp_markup::context::end_quote): Likewise.
	(pp_markup::context::begin_highlight_color): Likewise.
	(pp_markup::context::end_highlight_color): Likewise.
	(pp_markup::context::push_back_any_text): New.
	(selftest::test_merge_consecutive_text_tokens): New.
	(selftest::test_custom_tokens_1): New.
	(selftest::test_custom_tokens_2): New.
	(selftest::pp_printf_with_urlifier): Drop "urlifier" param from
	call to pp_format.
	(selftest::test_urlification): Add test of the example from
	pretty-print-format-impl.h.
	(selftest::pretty_print_cc_tests): Call the new selftest
	functions.
	* pretty-print.h (class quoting_info): Drop forward decl.
	(class pp_token_list): New forward decl.
	(printer_fn): Convert final param from const char ** to
	pp_token_list &.
	(class token_printer): New.
	(class pretty_printer): Add pp_output_formatted_text as friend.
	(pretty_printer::set_token_printer): New.
	(pretty_printer::format): Drop urlifier param as this now happens
	in phase 3.
	(pretty_printer::m_format_decoder): Update comment.
	(pretty_printer::m_token_printer): New field.
	(pp_format): Drop urlifier param.
	* tree-diagnostic.cc (default_tree_printer): Convert final param
	from const char ** to pp_token_list &.
	* tree-diagnostic.h: Likewise for decl.

gcc/fortran/ChangeLog:
	* error.cc (gfc_format_decoder): Convert final param from
	const char **buffer_ptr to pp_token_list &formatted_token_list,
	and update call to default_tree_printer accordingly.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/c/c-objc-common.cc         |    4 +-
 gcc/cp/error.cc                |  105 +--
 gcc/diagnostic.cc              |    2 +-
 gcc/dump-context.h             |   40 +-
 gcc/dumpfile.cc                |  217 +++---
 gcc/fortran/error.cc           |    5 +-
 gcc/opt-problem.cc             |    3 +-
 gcc/pretty-print-format-impl.h |  407 ++++++++++-
 gcc/pretty-print-markup.h      |   10 +-
 gcc/pretty-print-urlifier.h    |    2 +-
 gcc/pretty-print.cc            | 1176 ++++++++++++++++++++++----------
 gcc/pretty-print.h             |   43 +-
 gcc/tree-diagnostic.cc         |    2 +-
 gcc/tree-diagnostic.h          |    2 +-
 14 files changed, 1483 insertions(+), 535 deletions(-)
diff mbox series

Patch

diff --git a/gcc/c/c-objc-common.cc b/gcc/c/c-objc-common.cc
index fde9ae6ad667..9d39fcd4e442 100644
--- a/gcc/c/c-objc-common.cc
+++ b/gcc/c/c-objc-common.cc
@@ -34,7 +34,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "dwarf2.h"
 
 static bool c_tree_printer (pretty_printer *, text_info *, const char *,
-			    int, bool, bool, bool, bool *, const char **);
+			    int, bool, bool, bool, bool *, pp_token_list &);
 
 /* Info for C language features which can be queried through
    __has_{feature,extension}.  */
@@ -318,7 +318,7 @@  pp_markup::element_quoted_type::print_type (pp_markup::context &ctxt)
 static bool
 c_tree_printer (pretty_printer *pp, text_info *text, const char *spec,
 		int precision, bool wide, bool set_locus, bool hash,
-		bool *quoted, const char **)
+		bool *quoted, pp_token_list &)
 {
   tree t = NULL_TREE;
   // FIXME: the next cast should be a dynamic_cast, when it is permitted.
diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 3cc0dd1cdfa9..420fad26b7b7 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -39,6 +39,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "cp-name-hint.h"
 #include "attribs.h"
 #include "pretty-print-format-impl.h"
+#include "make-unique.h"
 
 #define pp_separate_with_comma(PP) pp_cxx_separate_with (PP, ',')
 #define pp_separate_with_semicolon(PP) pp_cxx_separate_with (PP, ';')
@@ -110,7 +111,7 @@  static void cp_print_error_function (diagnostic_context *,
 				     const diagnostic_info *);
 
 static bool cp_printer (pretty_printer *, text_info *, const char *,
-			int, bool, bool, bool, bool *, const char **);
+			int, bool, bool, bool, bool *, pp_token_list &);
 
 /* Color names for highlighting "%qH" vs "%qI" values,
    and ranges corresponding to them.  */
@@ -124,22 +125,50 @@  class deferred_printed_type
 {
 public:
   deferred_printed_type ()
-  : m_tree (NULL_TREE), m_buffer_ptr (NULL), m_verbose (false), m_quote (false)
+  : m_tree (NULL_TREE),
+    m_printed_text (),
+    m_token_list (nullptr),
+    m_verbose (false), m_quote (false)
   {}
 
-  deferred_printed_type (tree type, const char **buffer_ptr, bool verbose,
+  deferred_printed_type (tree type,
+			 pp_token_list &token_list,
+			 bool verbose,
 			 bool quote)
-  : m_tree (type), m_buffer_ptr (buffer_ptr), m_verbose (verbose),
+  : m_tree (type),
+    m_printed_text (),
+    m_token_list (&token_list),
+    m_verbose (verbose),
     m_quote (quote)
   {
     gcc_assert (type);
-    gcc_assert (buffer_ptr);
+  }
+
+  void set_text_for_token_list (const char *text, bool quote)
+  {
+    /* Replace the contents of m_token_list with a text token for TEXT,
+       possibly wrapped by BEGIN_QUOTE/END_QUOTE (if QUOTE is true).
+       This allows us to ignore any {BEGIN,END}_QUOTE tokens added
+       by %qH and %qI, and instead use the quoting from type_to_string,
+       and its logic for "aka".  */
+    while (m_token_list->m_first)
+      m_token_list->pop_front ();
+
+    if (quote)
+      m_token_list->push_back<pp_token_begin_quote> ();
+
+    // TEXT is gc-allocated, so we can borrow it
+    m_token_list->push_back_text (label_text::borrow (text));
+
+    if (quote)
+      m_token_list->push_back<pp_token_end_quote> ();
   }
 
   /* The tree is not GTY-marked: they are only non-NULL within a
      call to pp_format.  */
   tree m_tree;
-  const char **m_buffer_ptr;
+  label_text m_printed_text;
+  pp_token_list *m_token_list;
   bool m_verbose;
   bool m_quote;
 };
@@ -4402,26 +4431,7 @@  append_formatted_chunk (pretty_printer *pp, const char *content)
 {
   output_buffer *buffer = pp_buffer (pp);
   chunk_info *chunk_array = buffer->cur_chunk_array;
-  chunk_array->append_formatted_chunk (content);
-}
-
-/* Create a copy of CONTENT, with quotes added, and,
-   potentially, with colorization.
-   No escaped is performed on CONTENT.
-   The result is in a GC-allocated buffer. */
-
-static const char *
-add_quotes (const char *content, bool show_color)
-{
-  pretty_printer tmp_pp;
-  pp_show_color (&tmp_pp) = show_color;
-
-  /* We have to use "%<%s%>" rather than "%qs" here in order to avoid
-     quoting colorization bytes within the results and using either
-     pp_quote or pp_begin_quote doesn't work the same.  */
-  pp_printf (&tmp_pp, "%<%s%>", content);
-
-  return pp_ggc_formatted_text (&tmp_pp);
+  chunk_array->append_formatted_chunk (buffer->chunk_obstack, content);
 }
 
 #if __GNUC__ >= 10
@@ -4429,8 +4439,8 @@  add_quotes (const char *content, bool show_color)
 #endif
 
 /* If we had %H and %I, and hence deferred printing them,
-   print them now, storing the result into the chunk_info
-   for pp_format.  Quote them if 'q' was provided.
+   print them now, storing the result into custom_token_value
+   for the custom pp_token.  Quote them if 'q' was provided.
    Also print the difference in tree form, adding it as
    an additional chunk.  */
 
@@ -4448,13 +4458,13 @@  cxx_format_postprocessor::handle (pretty_printer *pp)
 	= show_highlight_colors ? highlight_colors::percent_i : nullptr;
       /* Avoid reentrancy issues by working with a copy of
 	 m_type_a and m_type_b, resetting them now.  */
-      deferred_printed_type type_a = m_type_a;
-      deferred_printed_type type_b = m_type_b;
+      deferred_printed_type type_a = std::move (m_type_a);
+      deferred_printed_type type_b = std::move (m_type_b);
       m_type_a = deferred_printed_type ();
       m_type_b = deferred_printed_type ();
 
-      gcc_assert (type_a.m_buffer_ptr);
-      gcc_assert (type_b.m_buffer_ptr);
+      gcc_assert (type_a.m_token_list);
+      gcc_assert (type_b.m_token_list);
 
       bool show_color = pp_show_color (pp);
 
@@ -4495,13 +4505,8 @@  cxx_format_postprocessor::handle (pretty_printer *pp)
 					percent_i);
 	}
 
-      if (type_a.m_quote)
-	type_a_text = add_quotes (type_a_text, show_color);
-      *type_a.m_buffer_ptr = type_a_text;
-
-       if (type_b.m_quote)
-	type_b_text = add_quotes (type_b_text, show_color);
-      *type_b.m_buffer_ptr = type_b_text;
+      type_a.set_text_for_token_list (type_a_text, type_a.m_quote);
+      type_b.set_text_for_token_list (type_b_text, type_b.m_quote);
    }
 }
 
@@ -4526,9 +4531,12 @@  cxx_format_postprocessor::handle (pretty_printer *pp)
    pretty_printer's m_format_postprocessor hook.
 
    This is called in phase 2 of pp_format, when it is accumulating
-   a series of formatted chunks.  We stash the location of the chunk
-   we're meant to have written to, so that we can write to it in the
-   m_format_postprocessor hook.
+   a series of pp_token lists.  Since we have to interact with the
+   fiddly quoting logic for "aka", we store the pp_token_list *
+   and in the m_format_postprocessor hook we generate text for the type
+   (possibly with quotes and colors), then replace all tokens in that token list
+   (such as [BEGIN_QUOTE, END_QUOTE]) with a text token containing the
+   freshly generated text.
 
    We also need to stash whether a 'q' prefix was provided (the QUOTE
    param)  so that we can add the quotes when writing out the delayed
@@ -4536,12 +4544,13 @@  cxx_format_postprocessor::handle (pretty_printer *pp)
 
 static void
 defer_phase_2_of_type_diff (deferred_printed_type *deferred,
-			    tree type, const char **buffer_ptr,
+			    tree type,
+			    pp_token_list &formatted_token_list,
 			    bool verbose, bool quote)
 {
   gcc_assert (deferred->m_tree == NULL_TREE);
-  gcc_assert (deferred->m_buffer_ptr == NULL);
-  *deferred = deferred_printed_type (type, buffer_ptr, verbose, quote);
+  *deferred = deferred_printed_type (type, formatted_token_list,
+				     verbose, quote);
 }
 
 /* Implementation of pp_markup::element_quoted_type::print_type
@@ -4578,7 +4587,7 @@  pp_markup::element_quoted_type::print_type (pp_markup::context &ctxt)
 static bool
 cp_printer (pretty_printer *pp, text_info *text, const char *spec,
 	    int precision, bool wide, bool set_locus, bool verbose,
-	    bool *quoted, const char **buffer_ptr)
+	    bool *quoted, pp_token_list &formatted_token_list)
 {
   gcc_assert (pp_format_postprocessor (pp));
   cxx_format_postprocessor *postprocessor
@@ -4618,11 +4627,11 @@  cp_printer (pretty_printer *pp, text_info *text, const char *spec,
     case 'F': result = fndecl_to_string (next_tree, verbose);	break;
     case 'H':
       defer_phase_2_of_type_diff (&postprocessor->m_type_a, next_tree,
-				  buffer_ptr, verbose, *quoted);
+				  formatted_token_list, verbose, *quoted);
       return true;
     case 'I':
       defer_phase_2_of_type_diff (&postprocessor->m_type_b, next_tree,
-				  buffer_ptr, verbose, *quoted);
+				  formatted_token_list, verbose, *quoted);
       return true;
     case 'L': result = language_to_string (next_lang);		break;
     case 'O': result = op_to_string (false, next_tcode);	break;
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 381a050ab4c9..a80e16b542df 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -1407,7 +1407,7 @@  diagnostic_context::report_diagnostic (diagnostic_info *diagnostic)
     m_output_format->on_begin_group ();
   m_diagnostic_groups.m_emission_count++;
 
-  pp_format (this->printer, &diagnostic->message, m_urlifier);
+  pp_format (this->printer, &diagnostic->message);
   /* Call vfunc in the output format.  This is responsible for
      phase 3 of formatting, and for printing the result.  */
   m_output_format->on_report_diagnostic (*diagnostic, orig_diag_kind);
diff --git a/gcc/dump-context.h b/gcc/dump-context.h
index 5992956380b1..e90c4ee1d6ae 100644
--- a/gcc/dump-context.h
+++ b/gcc/dump-context.h
@@ -154,48 +154,52 @@  class dump_context
 };
 
 /* A subclass of pretty_printer for implementing dump_context::dump_printf_va.
-   In particular, the formatted chunks are captured as optinfo_item instances,
-   thus retaining metadata about the entities being dumped (e.g. source
-   locations), rather than just as plain text.  */
+   In particular, the formatted chunks are captured as optinfo_item instances
+   as pp_token_custom_data, thus retaining metadata about the entities being
+   dumped (e.g. source locations), rather than just as plain text.
+   These custom items are retained through to the end of stage 3 of formatted
+   printing; the printer uses a custom token_printer subclass to emit them to
+   the active optinfo (if any).  */
 
 class dump_pretty_printer : public pretty_printer
 {
 public:
   dump_pretty_printer (dump_context *context, dump_flags_t dump_kind);
 
-  void emit_items (optinfo *dest);
+  void set_optinfo (optinfo *info) { m_token_printer.m_optinfo = info; }
 
 private:
-  /* Information on an optinfo_item that was generated during phase 2 of
-     formatting.  */
-  class stashed_item
+  struct custom_token_printer : public token_printer
   {
-  public:
-    stashed_item (const char **buffer_ptr_, optinfo_item *item_)
-      : buffer_ptr (buffer_ptr_), item (item_) {}
-    const char **buffer_ptr;
-    optinfo_item *item;
+    custom_token_printer (dump_pretty_printer &dump_pp)
+    : m_dump_pp (dump_pp),
+      m_optinfo (nullptr)
+    {}
+    void print_tokens (pretty_printer *pp,
+		       const pp_token_list &tokens) final override;
+    void emit_any_pending_textual_chunks ();
+
+    dump_pretty_printer &m_dump_pp;
+    optinfo *m_optinfo;
   };
 
   static bool format_decoder_cb (pretty_printer *pp, text_info *text,
 				 const char *spec, int /*precision*/,
 				 bool /*wide*/, bool /*set_locus*/,
 				 bool /*verbose*/, bool */*quoted*/,
-				 const char **buffer_ptr);
+				 pp_token_list &formatted_tok_list);
 
   bool decode_format (text_info *text, const char *spec,
-		      const char **buffer_ptr);
+		      pp_token_list &formatted_tok_list);
 
-  void stash_item (const char **buffer_ptr,
+  void stash_item (pp_token_list &formatted_tok_list,
 		   std::unique_ptr<optinfo_item> item);
 
-  void emit_any_pending_textual_chunks (optinfo *dest);
-
   void emit_item (std::unique_ptr<optinfo_item> item, optinfo *dest);
 
   dump_context *m_context;
   dump_flags_t m_dump_kind;
-  auto_vec<stashed_item> m_stashed_items;
+  custom_token_printer m_token_printer;
 };
 
 /* An RAII-style class for use in debug dumpers for temporarily using a
diff --git a/gcc/dumpfile.cc b/gcc/dumpfile.cc
index eb245059210a..da3671829a21 100644
--- a/gcc/dumpfile.cc
+++ b/gcc/dumpfile.cc
@@ -791,90 +791,39 @@  make_item_for_dump_symtab_node (symtab_node *node)
   return item;
 }
 
-/* dump_pretty_printer's ctor.  */
-
-dump_pretty_printer::dump_pretty_printer (dump_context *context,
-					  dump_flags_t dump_kind)
-: pretty_printer (), m_context (context), m_dump_kind (dump_kind),
-  m_stashed_items ()
+struct wrapped_optinfo_item : public pp_token_custom_data::value
 {
-  pp_format_decoder (this) = format_decoder_cb;
-}
-
-/* Phase 3 of formatting; compare with pp_output_formatted_text.
-
-   Emit optinfo_item instances for the various formatted chunks from phases
-   1 and 2 (i.e. pp_format).
-
-   Some chunks may already have had their items built (during decode_format).
-   These chunks have been stashed into m_stashed_items; we emit them here.
-
-   For all other purely textual chunks, they are printed into
-   buffer->formatted_obstack, and then emitted as a textual optinfo_item.
-   This consolidates multiple adjacent text chunks into a single text
-   optinfo_item.  */
-
-void
-dump_pretty_printer::emit_items (optinfo *dest)
-{
-  output_buffer *buffer = pp_buffer (this);
-  chunk_info *chunk_array = buffer->cur_chunk_array;
-  const char * const *args = chunk_array->get_args ();
-
-  gcc_assert (buffer->obstack == &buffer->formatted_obstack);
-  gcc_assert (buffer->line_length == 0);
-
-  unsigned stashed_item_idx = 0;
-  for (unsigned chunk = 0; args[chunk]; chunk++)
-    {
-      if (stashed_item_idx < m_stashed_items.length ()
-	  && args[chunk] == *m_stashed_items[stashed_item_idx].buffer_ptr)
-	{
-	  emit_any_pending_textual_chunks (dest);
-	  /* This chunk has a stashed item: use it.  */
-	  std::unique_ptr <optinfo_item> item
-	    (m_stashed_items[stashed_item_idx++].item);
-	  emit_item (std::move (item), dest);
-	}
-      else
-	/* This chunk is purely textual.  Print it (to
-	   buffer->formatted_obstack), so that we can consolidate adjacent
-	   chunks into one textual optinfo_item.  */
-	pp_string (this, args[chunk]);
-    }
+  wrapped_optinfo_item (std::unique_ptr<optinfo_item> item)
+  : m_optinfo_item (std::move (item))
+  {
+    gcc_assert (m_optinfo_item.get ());
+  }
 
-  emit_any_pending_textual_chunks (dest);
+  void dump (FILE *out) const final override
+  {
+    fprintf (out, "OPTINFO(\"%s\")", m_optinfo_item->get_text ());
+  }
 
-  /* Ensure that we consumed all of stashed_items.  */
-  gcc_assert (stashed_item_idx == m_stashed_items.length ());
+  bool as_standard_tokens (pp_token_list &) final override
+  {
+    /* Keep as a custom token.  */
+    return false;
+  }
 
-  chunk_array->pop_from_output_buffer (*buffer);
-}
+  std::unique_ptr<optinfo_item> m_optinfo_item;
+};
 
-/* Subroutine of dump_pretty_printer::emit_items
-   for consolidating multiple adjacent pure-text chunks into single
-   optinfo_items (in phase 3).  */
+/* dump_pretty_printer's ctor.  */
 
-void
-dump_pretty_printer::emit_any_pending_textual_chunks (optinfo *dest)
+dump_pretty_printer::dump_pretty_printer (dump_context *context,
+					  dump_flags_t dump_kind)
+: pretty_printer (),
+  m_context (context),
+  m_dump_kind (dump_kind),
+  m_token_printer (*this)
 {
-  output_buffer *const buffer = pp_buffer (this);
-  gcc_assert (buffer->obstack == &buffer->formatted_obstack);
-
-  /* Don't emit an item if the pending text is empty.  */
-  if (output_buffer_last_position_in_text (buffer) == NULL)
-    return;
-
-  char *formatted_text = xstrdup (pp_formatted_text (this));
-  std::unique_ptr<optinfo_item> item
-    = make_unique<optinfo_item> (OPTINFO_ITEM_KIND_TEXT, UNKNOWN_LOCATION,
-				 formatted_text);
-  emit_item (std::move (item), dest);
-
-  /* Clear the pending text by unwinding formatted_text back to the start
-     of the buffer (without deallocating).  */
-  obstack_free (&buffer->formatted_obstack,
-		buffer->formatted_obstack.object_base);
+  pp_format_decoder (this) = format_decoder_cb;
+  set_token_printer (&m_token_printer);
 }
 
 /* Emit ITEM and take ownership of it.  If DEST is non-NULL, add ITEM
@@ -889,17 +838,18 @@  dump_pretty_printer::emit_item (std::unique_ptr<optinfo_item> item,
     dest->add_item (std::move (item));
 }
 
-/* Record that ITEM (generated in phase 2 of formatting) is to be used for
-   the chunk at BUFFER_PTR in phase 3 (by emit_items).  */
+/* Append a custom pp_token for ITEM (generated in phase 2 of formatting)
+   into FORMATTTED_TOK_LIST, so that it can be emitted in phase 2.  */
 
 void
-dump_pretty_printer::stash_item (const char **buffer_ptr,
+dump_pretty_printer::stash_item (pp_token_list &formatted_tok_list,
 				 std::unique_ptr<optinfo_item> item)
 {
-  gcc_assert (buffer_ptr);
   gcc_assert (item.get ());
 
-  m_stashed_items.safe_push (stashed_item (buffer_ptr, item.release ()));
+  auto custom_data
+    = ::make_unique<wrapped_optinfo_item> (std::move (item));
+  formatted_tok_list.push_back<pp_token_custom_data> (std::move (custom_data));
 }
 
 /* pp_format_decoder callback for dump_pretty_printer, and thus for
@@ -912,10 +862,10 @@  dump_pretty_printer::format_decoder_cb (pretty_printer *pp, text_info *text,
 					const char *spec, int /*precision*/,
 					bool /*wide*/, bool /*set_locus*/,
 					bool /*verbose*/, bool */*quoted*/,
-					const char **buffer_ptr)
+					pp_token_list &formatted_tok_list)
 {
   dump_pretty_printer *opp = static_cast <dump_pretty_printer *> (pp);
-  return opp->decode_format (text, spec, buffer_ptr);
+  return opp->decode_format (text, spec, formatted_tok_list);
 }
 
 /* Format decoder for dump_pretty_printer, and thus for dump_printf and
@@ -942,7 +892,7 @@  dump_pretty_printer::format_decoder_cb (pretty_printer *pp, text_info *text,
 
 bool
 dump_pretty_printer::decode_format (text_info *text, const char *spec,
-				       const char **buffer_ptr)
+				    pp_token_list &formatted_tok_list)
 {
   /* Various format codes that imply making an optinfo_item and stashed it
      for later use (to capture metadata, rather than plain text).  */
@@ -954,7 +904,7 @@  dump_pretty_printer::decode_format (text_info *text, const char *spec,
 
 	/* Make an item for the node, and stash it.  */
 	auto item = make_item_for_dump_symtab_node (node);
-	stash_item (buffer_ptr, std::move (item));
+	stash_item (formatted_tok_list, std::move (item));
 	return true;
       }
 
@@ -964,7 +914,7 @@  dump_pretty_printer::decode_format (text_info *text, const char *spec,
 
 	/* Make an item for the stmt, and stash it.  */
 	auto item = make_item_for_dump_gimple_expr (stmt, 0, TDF_SLIM);
-	stash_item (buffer_ptr, std::move (item));
+	stash_item (formatted_tok_list, std::move (item));
 	return true;
       }
 
@@ -974,7 +924,7 @@  dump_pretty_printer::decode_format (text_info *text, const char *spec,
 
 	/* Make an item for the stmt, and stash it.  */
 	auto item = make_item_for_dump_gimple_stmt (stmt, 0, TDF_SLIM);
-	stash_item (buffer_ptr, std::move (item));
+	stash_item (formatted_tok_list, std::move (item));
 	return true;
       }
 
@@ -984,7 +934,7 @@  dump_pretty_printer::decode_format (text_info *text, const char *spec,
 
 	/* Make an item for the tree, and stash it.  */
 	auto item = make_item_for_dump_generic_expr (t, TDF_SLIM);
-	stash_item (buffer_ptr, std::move (item));
+	stash_item (formatted_tok_list, std::move (item));
 	return true;
       }
 
@@ -993,6 +943,87 @@  dump_pretty_printer::decode_format (text_info *text, const char *spec,
     }
 }
 
+void
+dump_pretty_printer::custom_token_printer::
+print_tokens (pretty_printer *pp,
+	      const pp_token_list &tokens)
+{
+  /* Accumulate text whilst emitting items.  */
+  for (auto iter = tokens.m_first; iter; iter = iter->m_next)
+    switch (iter->m_kind)
+      {
+      default:
+	gcc_unreachable ();
+
+      case pp_token::kind::text:
+	{
+	  pp_token_text *sub = as_a <pp_token_text *> (iter);
+	  gcc_assert (sub->m_value.get ());
+	  pp_string (pp, sub->m_value.get ());
+	}
+	break;
+
+      case pp_token::kind::begin_color:
+      case pp_token::kind::end_color:
+	/* No-op for dumpfiles.  */
+	break;
+
+      case pp_token::kind::begin_quote:
+	pp_begin_quote (pp, pp_show_color (pp));
+	break;
+      case pp_token::kind::end_quote:
+	pp_end_quote (pp, pp_show_color (pp));
+	break;
+
+      case pp_token::kind::begin_url:
+      case pp_token::kind::end_url:
+	/* No-op for dumpfiles.  */
+	break;
+
+      case pp_token::kind::custom_data:
+	{
+	  emit_any_pending_textual_chunks ();
+	  pp_token_custom_data *sub = as_a <pp_token_custom_data *> (iter);
+	  gcc_assert (sub->m_value.get ());
+	  wrapped_optinfo_item *custom_data
+	    = static_cast<wrapped_optinfo_item *> (sub->m_value.get ());
+	  m_dump_pp.emit_item (std::move (custom_data->m_optinfo_item),
+			       m_optinfo);
+	}
+	break;
+      }
+
+  emit_any_pending_textual_chunks ();
+}
+
+/* Subroutine of dump_pretty_printer::custom_token_printer::print_tokens
+   for consolidating multiple adjacent pure-text chunks into single
+   optinfo_items (in phase 3).  */
+
+void
+dump_pretty_printer::custom_token_printer::
+emit_any_pending_textual_chunks ()
+{
+  dump_pretty_printer *pp = &m_dump_pp;
+  output_buffer *const buffer = pp_buffer (pp);
+  gcc_assert (buffer->obstack == &buffer->formatted_obstack);
+
+  /* Don't emit an item if the pending text is empty.  */
+  if (output_buffer_last_position_in_text (buffer) == nullptr)
+    return;
+
+  char *formatted_text = xstrdup (pp_formatted_text (pp));
+  std::unique_ptr<optinfo_item> item
+    = make_unique<optinfo_item> (OPTINFO_ITEM_KIND_TEXT, UNKNOWN_LOCATION,
+				 formatted_text);
+  pp->emit_item (std::move (item), m_optinfo);
+
+  /* Clear the pending text by unwinding formatted_text back to the start
+     of the buffer (without deallocating).  */
+  obstack_free (&buffer->formatted_obstack,
+		buffer->formatted_obstack.object_base);
+}
+
 /* Output a formatted message using FORMAT on appropriate dump streams.  */
 
 void
@@ -1007,14 +1038,16 @@  dump_context::dump_printf_va (const dump_metadata_t &metadata,
   /* Phases 1 and 2, using pp_format.  */
   pp_format (&pp, &text);
 
-  /* Phase 3.  */
+  /* Phase 3: update the custom token_printer with any active optinfo.  */
   if (optinfo_enabled_p ())
     {
       optinfo &info = ensure_pending_optinfo (metadata);
-      pp.emit_items (&info);
+      pp.set_optinfo (&info);
     }
   else
-    pp.emit_items (NULL);
+    pp.set_optinfo (nullptr);
+
+  pp_output_formatted_text (&pp, nullptr);
 }
 
 /* Similar to dump_printf, except source location is also printed, and
diff --git a/gcc/fortran/error.cc b/gcc/fortran/error.cc
index e89667613b18..a5884620e301 100644
--- a/gcc/fortran/error.cc
+++ b/gcc/fortran/error.cc
@@ -1125,7 +1125,7 @@  gfc_notify_std (int std, const char *gmsgid, ...)
 static bool
 gfc_format_decoder (pretty_printer *pp, text_info *text, const char *spec,
 		    int precision, bool wide, bool set_locus, bool hash,
-		    bool *quoted, const char **buffer_ptr)
+		    bool *quoted, pp_token_list &formatted_token_list)
 {
   switch (*spec)
     {
@@ -1170,7 +1170,8 @@  gfc_format_decoder (pretty_printer *pp, text_info *text, const char *spec,
 	 etc. diagnostics can use the FE printer while the FE is still
 	 active.  */
       return default_tree_printer (pp, text, spec, precision, wide,
-				   set_locus, hash, quoted, buffer_ptr);
+				   set_locus, hash, quoted,
+				   formatted_token_list);
     }
 }
 
diff --git a/gcc/opt-problem.cc b/gcc/opt-problem.cc
index d76ddaf57adf..fc29333c331a 100644
--- a/gcc/opt-problem.cc
+++ b/gcc/opt-problem.cc
@@ -71,7 +71,8 @@  opt_problem::opt_problem (const dump_location_t &loc,
 
     /* Phase 3: dump the items to the "immediate" dump destinations,
        and storing them into m_optinfo for later retrieval.  */
-    pp.emit_items (&m_optinfo);
+    pp.set_optinfo (&m_optinfo);
+    pp_output_formatted_text (&pp, nullptr);
   }
 }
 
diff --git a/gcc/pretty-print-format-impl.h b/gcc/pretty-print-format-impl.h
index e05ad388963d..cffdd461a33d 100644
--- a/gcc/pretty-print-format-impl.h
+++ b/gcc/pretty-print-format-impl.h
@@ -23,6 +23,308 @@  along with GCC; see the file COPYING3.  If not see
 
 #include "pretty-print.h"
 
+/* A struct representing a pending item to be printed within
+   pp_format.
+
+   These can represent:
+   - a run of text within one of the output_buffers's obstacks
+   - begin/end named color
+   - open/close quote
+   - begin/end URL
+   - custom data (for the formatter, for the pretty_printer,
+     or the output format)
+
+   These are built into pp_token_list instances.
+
+   Doing so allows for interaction between:
+
+   - pretty_printer formatting codes (such as C++'s %H and %I,
+   which can't be printed until we've seen both)
+
+   - output formats, such as text vs SARIF (so each can handle URLs
+   and event IDs it its own way)
+
+   - optimization records, where we want to stash data into the
+   formatted messages
+
+   - urlifiers: these can be run in phase 3 of formatting
+
+   without needing lots of fragile logic on char pointers.
+
+   To avoid needing lots of heap allocation/deallocation, pp_token
+   instances are allocated in the pretty_printer's chunk_obstack:
+   they must not outlive phase 3 of formatting of the given
+   chunk_info level.  */
+
+struct pp_token
+{
+public:
+  enum class kind
+  {
+    text,
+
+    begin_color,
+    end_color,
+
+    begin_quote,
+    end_quote,
+
+    begin_url,
+    end_url,
+
+    custom_data,
+
+    NUM_KINDS
+  };
+
+  pp_token (enum kind k);
+
+  pp_token (const pp_token &) = delete;
+  pp_token (pp_token &&) = delete;
+
+  virtual ~pp_token () = default;
+
+  pp_token &operator= (const pp_token &) = delete;
+  pp_token &operator= (pp_token &&) = delete;
+
+  void dump (FILE *out) const;
+  void DEBUG_FUNCTION dump () const { dump (stderr); }
+
+  static void *operator new (size_t sz, obstack &s);
+  static void operator delete (void *);
+
+  enum kind m_kind;
+
+  // Intrusive doubly-linked list
+  pp_token *m_prev;
+  pp_token *m_next;
+};
+
+/* Subclasses of pp_token for the various kinds of token.  */
+
+struct pp_token_text : public pp_token
+{
+  pp_token_text (label_text &&value)
+  : pp_token (kind::text),
+    m_value (std::move (value))
+  {
+    gcc_assert (m_value.get ());
+  }
+
+  label_text m_value;
+};
+
+template <>
+template <>
+inline bool
+is_a_helper <pp_token_text *>::test (pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::text;
+}
+
+template <>
+template <>
+inline bool
+is_a_helper <const pp_token_text *>::test (const pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::text;
+}
+
+struct pp_token_begin_color : public pp_token
+{
+  pp_token_begin_color (label_text &&value)
+  : pp_token (kind::begin_color),
+    m_value (std::move (value))
+  {
+    gcc_assert (m_value.get ());
+  }
+
+  label_text m_value;
+};
+
+template <>
+template <>
+inline bool
+is_a_helper <pp_token_begin_color *>::test (pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::begin_color;
+}
+
+template <>
+template <>
+inline bool
+is_a_helper <const pp_token_begin_color *>::test (const pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::begin_color;
+}
+
+struct pp_token_end_color : public pp_token
+{
+  pp_token_end_color ()
+  : pp_token (kind::end_color)
+  {
+  }
+};
+
+struct pp_token_begin_quote : public pp_token
+{
+  pp_token_begin_quote ()
+  : pp_token (kind::begin_quote)
+  {
+  }
+};
+
+struct pp_token_end_quote : public pp_token
+{
+  pp_token_end_quote ()
+  : pp_token (kind::end_quote)
+  {
+  }
+};
+
+struct pp_token_begin_url : public pp_token
+{
+  pp_token_begin_url (label_text &&value)
+  : pp_token (kind::begin_url),
+    m_value (std::move (value))
+  {
+    gcc_assert (m_value.get ());
+  }
+
+  label_text m_value;
+};
+
+template <>
+template <>
+inline bool
+is_a_helper <pp_token_begin_url*>::test (pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::begin_url;
+}
+
+template <>
+template <>
+inline bool
+is_a_helper <const pp_token_begin_url*>::test (const pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::begin_url;
+}
+
+struct pp_token_end_url : public pp_token
+{
+  pp_token_end_url ()
+    : pp_token (kind::end_url)
+  {
+  }
+};
+
+struct pp_token_custom_data : public pp_token
+{
+  class value
+  {
+  public:
+    virtual ~value () {}
+    virtual void dump (FILE *out) const = 0;
+
+    /* Hook for lowering a custom_data token to standard tokens.
+       Return true and write to OUT if possible.
+       Return false for custom_data that is to be handled by
+       the token_printer.  */
+    virtual bool as_standard_tokens (pp_token_list &out) = 0;
+  };
+
+  pp_token_custom_data (std::unique_ptr<value> val)
+  : pp_token (kind::custom_data),
+    m_value (std::move (val))
+  {
+    gcc_assert (m_value.get ());
+  }
+
+  std::unique_ptr<value> m_value;
+};
+
+template <>
+template <>
+inline bool
+is_a_helper <pp_token_custom_data *>::test (pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::custom_data;
+}
+
+template <>
+template <>
+inline bool
+is_a_helper <const pp_token_custom_data *>::test (const pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::custom_data;
+}
+
+/* A list of pp_token, with ownership of the tokens, using
+   a particular obstack to allocate its tokens.  These are
+   also allocated on the obstack during formatting (or, occasionally,
+   the stack).  */
+
+class pp_token_list
+{
+public:
+  // Allocate a new pp_token_list within S.
+  static pp_token_list *make (obstack &s)
+  {
+    return new (s) pp_token_list (s);
+  }
+  static void *operator new (size_t sz, obstack &s);
+  static void operator delete (void *);
+
+  pp_token_list (obstack &s);
+  pp_token_list (const pp_token_list &) = delete;
+  pp_token_list (pp_token_list &&);
+
+  ~pp_token_list ();
+
+  pp_token &operator= (const pp_token_list &) = delete;
+  pp_token &operator= (pp_token_list &&) = delete;
+
+/* Make a pp_token of the given subclass, using the relevant obstack to provide
+   the memory.  The pp_token must therefore not outlive the current chunk_info
+   level during formatting.  */
+  template<typename Subclass, typename... Args>
+  std::unique_ptr<pp_token>
+  make_token (Args&&... args)
+  {
+    return std::unique_ptr<pp_token>
+      (new (m_obstack) Subclass (std::forward<Args> (args)...));
+  }
+
+  template<typename Subclass, typename... Args>
+  void push_back (Args&&... args)
+  {
+    auto tok = make_token<Subclass> (std::forward<Args> (args)...);
+    push_back (std::move (tok));
+  }
+  void push_back_text (label_text &&text);
+  void push_back (std::unique_ptr<pp_token> tok);
+  void push_back_list (pp_token_list &&list);
+
+  std::unique_ptr<pp_token> pop_front ();
+
+  std::unique_ptr<pp_token> remove_token (pp_token *tok);
+
+  void insert_after (std::unique_ptr<pp_token> new_tok,
+		     pp_token *relative_tok);
+
+  void replace_custom_tokens ();
+  void merge_consecutive_text_tokens ();
+  void apply_urlifier (const urlifier &urlifier);
+
+  void dump (FILE *out) const;
+  void DEBUG_FUNCTION dump () const { dump (stderr); }
+
+  obstack &m_obstack;
+
+  pp_token *m_first;
+  pp_token *m_end;
+};
+
 /* The chunk_info data structure forms a stack of the results from the
    first phase of formatting (pp_format) which have not yet been
    output (pp_output_formatted_text).  A stack is necessary because
@@ -34,13 +336,15 @@  class chunk_info
   friend class pp_markup::context;
 
 public:
-  const char * const *get_args () const { return m_args; }
-  quoting_info *get_quoting_info () const { return m_quotes; }
+  pp_token_list * const * get_token_lists () const { return m_args; }
 
-  void append_formatted_chunk (const char *content);
+  void append_formatted_chunk (obstack &s, const char *content);
 
   void pop_from_output_buffer (output_buffer &buf);
 
+  void dump (FILE *out) const;
+  void DEBUG_FUNCTION dump () const { dump (stderr); }
+
 private:
   void on_begin_quote (const output_buffer &buf,
 		       unsigned chunk_idx,
@@ -54,17 +358,100 @@  private:
   /* Pointer to previous chunk on the stack.  */
   chunk_info *m_prev;
 
-  /* Array of chunks to output.  Each chunk is a NUL-terminated string.
+  /* Array of chunks to output.  Each chunk is a doubly-linked list of
+     pp_token.
+
+     The chunks can be printed via chunk_info::dump ().
+
      In the first phase of formatting, even-numbered chunks are
      to be output verbatim, odd-numbered chunks are format specifiers.
+     For example, given:
+       pp_format (pp,
+		  "foo: %i, bar: %s, opt: %qs",
+		  42, "baz", "-foption");
+
+     after phase 1 we might have:
+       (gdb) call buffer->cur_chunk_array->dump()
+       0: [TEXT("foo: ")]
+       1: [TEXT("i")]
+       2: [TEXT(", bar: ")]
+       3: [TEXT("s")]
+       4: [TEXT(", opt: ")]
+       5: [TEXT("qs")]
+
      The second phase replaces all odd-numbered chunks with formatted
-     text, and the third phase simply emits all the chunks in sequence
-     with appropriate line-wrapping.  */
-  const char *m_args[PP_NL_ARGMAX * 2];
+     token lists.  In the above example, after phase 2 we might have:
+       (gdb) call pp->m_buffer->cur_chunk_array->dump()
+       0: [TEXT("foo: ")]
+       1: [TEXT("42")]
+       2: [TEXT(", bar: ")]
+       3: [TEXT("baz")]
+       4: [TEXT(", opt: ")]
+       5: [BEGIN_QUOTE, TEXT("-foption"), END_QUOTE]
+     For example the %qs has become the three tokens:
+       [BEGIN_QUOTE, TEXT("-foption"), END_QUOTE]
+
+     The third phase (in pp_output_formatted_text):
+
+     (1) merges the tokens from all the chunks into one list,
+     giving e.g.
+      (gdb) call tokens.dump()
+      [TEXT("foo: "), TEXT("42"), TEXT(", bar: "), TEXT("baz"),
+       TEXT(", opt: "), BEGIN_QUOTE, TEXT("-foption"), END_QUOTE]
+
+     (2) lowers some custom tokens into non-custom tokens
+
+     (3) merges consecutive text tokens, giving e.g.:
+      (gdb) call tokens.dump()
+      [TEXT("foo: 42, bar: baz, option: "),
+       BEGIN_QUOTE, TEXT("-foption"), END_QUOTE]
+
+     (4) if provided with a urlifier, tries to apply it to quoted text,
+     giving e.g:
+      (gdb) call tokens.dump()
+      [TEXT("foo: 42, bar: baz, option: "), BEGIN_QUOTE,
+       BEGIN_URL("http://example.com"), TEXT("-foption"), END_URL, END_QUOTE]
+
+     (5) emits all tokens in sequence with appropriate line-wrapping.  This
+     can be overridded via the pretty_printer's token_printer, allowing for
+     output formats to e.g. override how URLs are handled, or to handle
+     custom_data that wasn't lowered in (2) above, e.g. for handling JSON
+     output of optimization records.  */
+  pp_token_list *m_args[PP_NL_ARGMAX * 2];
+
+  /* The pp_tokens, pp_token_lists, and the accumulated text buffers are
+     allocated within the output_buffer's chunk_obstack.  In the above
+     example, the in-memory layout of the chunk_obstack might look like
+     this after phase 1:
+
+      + pp_token_list for chunk 0 (m_first: *)   <--- START of chunk_info level
+      |                                     |
+      + "foo: \0"  <-------------\          |
+      |                          |          |
+      + pp_token_text (borrowed: *) <-------/
+      |
+      + pp_token_list for chunk 1
+      |
+      + "i\0" <------------------\
+      |                          |
+      + pp_token_text (borrowed: *)
+      |
+      +  ...etc for chunks 2 to 4...
+      |
+      + pp_token_list for chunk 5
+      |
+      + "qs\0" <-----------------\
+      |                          |
+      + pp_token_text (borrowed: *)
+      |
+      |
+      V
+     obstack grows this way
 
-  /* If non-null, information on quoted text runs within the chunks
-     for use by a urlifier.  */
-  quoting_info *m_quotes;
+     At each stage, allocation of additional text buffers, tokens, and lists
+     grow forwards in the obstack (though the internal pointers in linked
+     lists might point backwards to earlier objects within the same
+     chunk_info level).  */
 };
 
 #endif /* GCC_PRETTY_PRINT_FORMAT_IMPL_H */
diff --git a/gcc/pretty-print-markup.h b/gcc/pretty-print-markup.h
index b35632a79da9..ce2c5e9dbbe9 100644
--- a/gcc/pretty-print-markup.h
+++ b/gcc/pretty-print-markup.h
@@ -22,6 +22,8 @@  along with GCC; see the file COPYING3.  If not see
 
 #include "diagnostic-color.h"
 
+class pp_token_list;
+
 namespace pp_markup {
 
 class context
@@ -31,12 +33,12 @@  public:
 	   output_buffer &buf,
 	   unsigned chunk_idx,
 	   bool &quoted,
-	   const urlifier *urlifier)
+	   pp_token_list *formatted_token_list)
   : m_pp (pp),
     m_buf (buf),
     m_chunk_idx (chunk_idx),
     m_quoted (quoted),
-    m_urlifier (urlifier)
+    m_formatted_token_list (formatted_token_list)
   {
   }
 
@@ -46,11 +48,13 @@  public:
   void begin_highlight_color (const char *color_name);
   void end_highlight_color ();
 
+  void push_back_any_text ();
+
   pretty_printer &m_pp;
   output_buffer &m_buf;
   unsigned m_chunk_idx;
   bool &m_quoted;
-  const urlifier *m_urlifier;
+  pp_token_list *m_formatted_token_list;
 };
 
 /* Abstract base class for use in pp_format for handling "%e".
diff --git a/gcc/pretty-print-urlifier.h b/gcc/pretty-print-urlifier.h
index 3e63e62c41e1..3feb80921bc9 100644
--- a/gcc/pretty-print-urlifier.h
+++ b/gcc/pretty-print-urlifier.h
@@ -20,7 +20,7 @@  along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_PRETTY_PRINT_URLIFIER_H
 #define GCC_PRETTY_PRINT_URLIFIER_H
 
-/* Abstract base class for optional use in pp_format for adding URLs
+/* Abstract base class for optional use in pretty-printing for adding URLs
    to quoted text strings.  */
 
 class urlifier
diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index 810c629ef116..d2c0a197680c 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -19,6 +19,7 @@  along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #include "config.h"
+#define INCLUDE_MEMORY
 #define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
@@ -30,6 +31,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-color.h"
 #include "diagnostic-event-id.h"
 #include "diagnostic-highlight-colors.h"
+#include "make-unique.h"
 #include "selftest.h"
 
 #if HAVE_ICONV
@@ -710,6 +712,10 @@  static int
 decode_utf8_char (const unsigned char *, size_t len, unsigned int *);
 static void pp_quoted_string (pretty_printer *, const char *, size_t = -1);
 
+static void
+default_token_printer (pretty_printer *pp,
+		       const pp_token_list &tokens);
+
 /* Overwrite the given location/range within this text_info's rich_location.
    For use e.g. when implementing "+" in client format decoders.  */
 
@@ -1063,196 +1069,408 @@  pp_indent (pretty_printer *pp)
 
 static const char *get_end_url_string (pretty_printer *);
 
-/* Append STR to OSTACK, without a null-terminator.  */
+/* struct pp_token.  */
 
-static void
-obstack_append_string (obstack *ostack, const char *str)
+pp_token::pp_token (enum kind k)
+: m_kind (k),
+  m_prev (nullptr),
+  m_next (nullptr)
 {
-  obstack_grow (ostack, str, strlen (str));
 }
 
-/* Append STR to OSTACK, without a null-terminator.  */
-
-static void
-obstack_append_string (obstack *ostack, const char *str, size_t len)
-{
-  obstack_grow (ostack, str, len);
-}
-
-/* Given quoted text within the buffer OBSTACK
-   at the half-open interval [QUOTED_TEXT_START_IDX, QUOTED_TEXT_END_IDX),
-   potentially use URLIFIER (if non-null) to see if there's a URL for the
-   quoted text.
-
-   If so, replace the quoted part of the text in the buffer with a URLified
-   version of the text, using PP's settings.
-
-   For example, given this is the buffer:
-     "this is a test `hello worldTRAILING-CONTENT"
-     .................^~~~~~~~~~~
-   with the quoted text starting at the 'h' of "hello world", the buffer
-   becomes:
-     "this is a test `BEGIN_URL(URL)hello worldEND(URL)TRAILING-CONTENT"
-     .................^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-     .................-----------replacement-----------
-
-   Return the new offset into the buffer of the quoted text endpoint i.e.
-   the offset of "TRAILING-CONTENT" in the above.  */
-
-static size_t
-urlify_quoted_string (pretty_printer *pp,
-		      obstack *obstack,
-		      const urlifier *urlifier,
-		      size_t quoted_text_start_idx,
-		      size_t quoted_text_end_idx)
-{
-  if (!pp->supports_urls_p ())
-    return quoted_text_end_idx;
-  if (!urlifier)
-    return quoted_text_end_idx;
-
-  const size_t quoted_len = quoted_text_end_idx - quoted_text_start_idx;
-  if (quoted_len == 0)
-    /* Empty quoted string; do nothing.  */
-    return quoted_text_end_idx;
-  const char *start = (obstack->object_base + quoted_text_start_idx);
-  char *url = urlifier->get_url_for_quoted_text (start, quoted_len);
-  if (!url)
-    /* No URL for this quoted text; do nothing.  */
-    return quoted_text_end_idx;
-
-  /* Stash a copy of the remainder of the chunk.  */
-  char *text = xstrndup (start,
-			 obstack_object_size (obstack) - quoted_text_start_idx);
-
-  /* Replace quoted text...  */
-  obstack->next_free = obstack->object_base + quoted_text_start_idx;
-
-  /*  ...with URLified version of the text.  */
-  /* Begin URL.  */
-  switch (pp->get_url_format ())
+void
+pp_token::dump (FILE *out) const
+{
+  switch (m_kind)
     {
     default:
-    case URL_FORMAT_NONE:
       gcc_unreachable ();
-    case URL_FORMAT_ST:
-      obstack_append_string (obstack, "\33]8;;");
-      obstack_append_string (obstack, url);
-      obstack_append_string (obstack, "\33\\");
+    case kind::text:
+      {
+	const pp_token_text *sub = as_a <const pp_token_text *> (this);
+	gcc_assert (sub->m_value.get ());
+	fprintf (out, "TEXT(\"%s\")", sub->m_value.get ());
+      }
       break;
-    case URL_FORMAT_BEL:
-      obstack_append_string (obstack, "\33]8;;");
-      obstack_append_string (obstack, url);
-      obstack_append_string (obstack, "\a");
+    case kind::begin_color:
+      {
+	const pp_token_begin_color *sub
+	  = as_a <const pp_token_begin_color *> (this);
+	gcc_assert (sub->m_value.get ());
+	fprintf (out, "BEGIN_COLOR(\"%s\")", sub->m_value.get ());
+	break;
+      }
+    case kind::end_color:
+      fprintf (out, "END_COLOR");
+      break;
+    case kind::begin_quote:
+      fprintf (out, "BEGIN_QUOTE");
+      break;
+    case kind::end_quote:
+      fprintf (out, "END_QUOTE");
+      break;
+    case kind::begin_url:
+      {
+	const pp_token_begin_url *sub
+	  = as_a <const pp_token_begin_url *> (this);
+	gcc_assert (sub->m_value.get ());
+	fprintf (out, "BEGIN_URL(\"%s\")", sub->m_value.get ());
+      }
+      break;
+    case kind::end_url:
+      fprintf (out, "END_URL");
+      break;
+    case kind::custom_data:
+      {
+	const pp_token_custom_data *sub
+	  = as_a <const pp_token_custom_data *> (this);
+	gcc_assert (sub->m_value.get ());
+	fprintf (out, "CUSTOM(");
+	sub->m_value->dump (out);
+	fprintf (out, ")");
+      }
       break;
     }
-  /* Add back the quoted part of the text.  */
-  obstack_append_string (obstack, text, quoted_len);
-  /* End URL.  */
-  obstack_append_string (obstack,
-			 get_end_url_string (pp));
+}
 
-  size_t new_end_idx = obstack_object_size (obstack);
+/* Allocate SZ bytes within S, which must not be half-way through
+   building another object.  */
 
-  /* Add back the remainder of the text after the quoted part.  */
-  obstack_append_string (obstack, text + quoted_len);
-  free (text);
-  free (url);
-  return new_end_idx;
+static void *
+allocate_object (size_t sz, obstack &s)
+{
+  /* We must not be half-way through an object.  */
+  gcc_assert (obstack_base (&s) == obstack_next_free (&s));
+
+  obstack_grow (&s, obstack_base (&s), sz);
+  void *buf = obstack_finish (&s);
+  return buf;
 }
 
-/* A class for tracking quoted text within a buffer for
-   use by a urlifier.  */
+/* Make room for a pp_token instance within obstack S.  */
 
-class quoting_info
+void *
+pp_token::operator new (size_t sz, obstack &s)
 {
-public:
-  /* Called when quoted text is begun in phase 1 or 2.  */
-  void on_begin_quote (const output_buffer &buf,
-		       unsigned chunk_idx)
-  {
-    /* Stash location of start of quoted string.  */
-    size_t byte_offset = obstack_object_size (&buf.chunk_obstack);
-    m_loc_last_open_quote = location (chunk_idx, byte_offset);
-  }
+  return allocate_object (sz, s);
+}
 
-  /* Called when quoted text is ended in phase 1 or 2.  */
-  void on_end_quote (pretty_printer *pp,
-		     output_buffer &buf,
-		     unsigned chunk_idx,
-		     const urlifier &urlifier)
-  {
-    /* If possible, do urlification now.  */
-    if (chunk_idx == m_loc_last_open_quote.m_chunk_idx)
-      {
-	urlify_quoted_string (pp,
-			      &buf.chunk_obstack,
-			      &urlifier,
-			      m_loc_last_open_quote.m_byte_offset,
-			      obstack_object_size (&buf.chunk_obstack));
-	m_loc_last_open_quote = location ();
-	return;
-      }
-    /* Otherwise the quoted text straddles multiple chunks.
-       Stash the location of end of quoted string for use in phase 3.  */
-    size_t byte_offset = obstack_object_size (&buf.chunk_obstack);
-    m_phase_3_quotes.push_back (run (m_loc_last_open_quote,
-				     location (chunk_idx, byte_offset)));
-    m_loc_last_open_quote = location ();
-  }
+void
+pp_token::operator delete (void *)
+{
+  /* No-op: pp_tokens are allocated within obstacks, so
+     the memory will be reclaimed when the obstack is freed.  */
+}
 
-  bool has_phase_3_quotes_p () const
-  {
-    return m_phase_3_quotes.size () > 0;
-  }
-  void handle_phase_3 (pretty_printer *pp,
-		       const urlifier &urlifier);
+/* class pp_token_list.  */
 
-private:
-  struct location
-  {
-    location ()
-    : m_chunk_idx (UINT_MAX),
-      m_byte_offset (SIZE_MAX)
+/* Make room for a pp_token_list instance within obstack S.  */
+
+void *
+pp_token_list::operator new (size_t sz, obstack &s)
+{
+  return allocate_object (sz, s);
+}
+
+void
+pp_token_list::operator delete (void *)
+{
+  /* No-op: pp_token_list allocated within obstacks don't
+     need their own reclaim the memory will be reclaimed when
+     the obstack is freed.  */
+}
+
+pp_token_list::pp_token_list (obstack &s)
+: m_obstack (s),
+  m_first (nullptr),
+  m_end (nullptr)
+{
+}
+
+pp_token_list::pp_token_list (pp_token_list &&other)
+: m_obstack (other.m_obstack),
+  m_first (other.m_first),
+  m_end (other.m_end)
+{
+  other.m_first = nullptr;
+  other.m_end = nullptr;
+}
+
+pp_token_list::~pp_token_list ()
+{
+  for (auto iter = m_first; iter; )
     {
+      pp_token *next = iter->m_next;
+      delete iter;
+      iter = next;
     }
+}
+
+void
+pp_token_list::push_back_text (label_text &&text)
+{
+  if (text.get ()[0] == '\0')
+    return; // pushing empty string is a no-op
+  push_back<pp_token_text> (std::move (text));
+}
 
-    location (unsigned chunk_idx,
-	      size_t byte_offset)
-    : m_chunk_idx (chunk_idx),
-      m_byte_offset (byte_offset)
+void
+pp_token_list::push_back (std::unique_ptr<pp_token> tok)
+{
+  if (!m_first)
     {
+      gcc_assert (m_end == nullptr);
+      m_first = tok.get ();
+      m_end = tok.get ();
     }
+  else
+    {
+      gcc_assert (m_end != nullptr);
+      m_end->m_next = tok.get ();
+      tok->m_prev = m_end;
+      m_end = tok.get ();
+    }
+  tok.release ();
+}
 
-    unsigned m_chunk_idx;
-    size_t m_byte_offset;
-  };
+void
+pp_token_list::push_back_list (pp_token_list &&list)
+{
+  while (auto tok = list.pop_front ())
+    push_back (std::move (tok));
+}
 
-  struct run
-  {
-    run (location start, location end)
-    : m_start (start), m_end (end)
+std::unique_ptr<pp_token>
+pp_token_list::pop_front ()
+{
+  pp_token *result = m_first;
+  if (result == nullptr)
+    return nullptr;
+
+  gcc_assert (result->m_prev == nullptr);
+  m_first = result->m_next;
+  if (result->m_next)
     {
+      gcc_assert (result != m_end);
+      m_first->m_prev = nullptr;
     }
+  else
+    {
+      gcc_assert (result == m_end);
+      m_end = nullptr;
+    }
+  result->m_next = nullptr;
+  return std::unique_ptr<pp_token> (result);
+}
 
-    location m_start;
-    location m_end;
-  };
+std::unique_ptr<pp_token>
+pp_token_list::remove_token (pp_token *tok)
+{
+  gcc_assert (tok);
+  if (tok->m_prev)
+    {
+      gcc_assert (tok != m_first);
+      tok->m_prev->m_next = tok->m_next;
+    }
+  else
+    {
+      gcc_assert (tok == m_first);
+      m_first = tok->m_next;
+    }
+  if (tok->m_next)
+    {
+      gcc_assert (tok != m_end);
+      tok->m_next->m_prev = tok->m_prev;
+    }
+  else
+    {
+      gcc_assert (tok == m_end);
+      m_end = tok->m_prev;
+    }
+  tok->m_prev = nullptr;
+  tok->m_next = nullptr;
+  gcc_assert (m_first != tok);
+  gcc_assert (m_end != tok);
+  return std::unique_ptr<pp_token> (tok);
+}
+
+/* Insert NEW_TOK after RELATIVE_TOK.  */
+
+void
+pp_token_list::insert_after (std::unique_ptr<pp_token> new_tok_up,
+			     pp_token *relative_tok)
+{
+  pp_token *new_tok = new_tok_up.release ();
+
+  gcc_assert (new_tok);
+  gcc_assert (new_tok->m_prev == nullptr);
+  gcc_assert (new_tok->m_next == nullptr);
+  gcc_assert (relative_tok);
+
+  if (relative_tok->m_next)
+    {
+      gcc_assert (relative_tok != m_end);
+      relative_tok->m_next->m_prev = new_tok;
+    }
+  else
+    {
+      gcc_assert (relative_tok == m_end);
+      m_end = new_tok;
+    }
+  new_tok->m_prev = relative_tok;
+  new_tok->m_next = relative_tok->m_next;
+  relative_tok->m_next = new_tok;
+}
+
+void
+pp_token_list::replace_custom_tokens ()
+{
+  pp_token *iter = m_first;
+  while (iter)
+    {
+      pp_token *next  = iter->m_next;
+      if (iter->m_kind == pp_token::kind::custom_data)
+	{
+	  pp_token_list tok_list (m_obstack);
+	  pp_token_custom_data *sub = as_a <pp_token_custom_data *> (iter);
+	  if (sub->m_value->as_standard_tokens (tok_list))
+	    {
+	      while (auto tok = tok_list.pop_front ())
+		{
+		  /* The resulting token list must not contain any
+		     custom data.  */
+		  gcc_assert (tok->m_kind != pp_token::kind::custom_data);
+		  insert_after (std::move (tok), iter);
+		}
+	      remove_token (iter);
+	    }
+	}
+      iter = next;
+    }
+}
+
+/* Merge any runs of consecutive text tokens within this list
+   into individual text tokens.  */
+
+void
+pp_token_list::merge_consecutive_text_tokens ()
+{
+  pp_token *start_of_run = m_first;
+  while (start_of_run)
+    {
+      if (start_of_run->m_kind != pp_token::kind::text)
+	{
+	  start_of_run = start_of_run->m_next;
+	  continue;
+	}
+      pp_token *end_of_run = start_of_run;
+      while (end_of_run->m_next
+	     && end_of_run->m_next->m_kind == pp_token::kind::text)
+	end_of_run = end_of_run->m_next;
+      if (end_of_run != start_of_run)
+	{
+	  /* start_of_run through end_of_run are a run of consecutive
+	     text tokens.  */
+
+	  /* Calculate size of buffer for merged text.  */
+	  size_t sz = 0;
+	  for (auto iter = start_of_run; iter != end_of_run->m_next;
+	       iter = iter->m_next)
+	    {
+	      pp_token_text *iter_text = static_cast<pp_token_text *> (iter);
+	      sz += strlen (iter_text->m_value.get ());
+	    }
+
+	  /* Allocate and populate buffer for merged text
+	     (within m_obstack).  */
+	  char * const buf = (char *)allocate_object (sz + 1, m_obstack);
+	  char *p = buf;
+	  for (auto iter = start_of_run; iter != end_of_run->m_next;
+	       iter = iter->m_next)
+	    {
+	      pp_token_text *iter_text = static_cast<pp_token_text *> (iter);
+	      size_t iter_sz = strlen (iter_text->m_value.get ());
+	      memcpy (p, iter_text->m_value.get (), iter_sz);
+	      p += iter_sz;
+	    }
+	  *p = '\0';
+
+	  /* Replace start_of_run's buffer pointer with the new buffer.  */
+	  static_cast<pp_token_text *> (start_of_run)->m_value
+	    = label_text::borrow (buf);
+
+	  /* Remove all the other text tokens in the run.  */
+	  pp_token * const next = end_of_run->m_next;
+	  while (start_of_run->m_next != next)
+	    remove_token (start_of_run->m_next);
+	  start_of_run = next;
+	}
+      else
+	start_of_run = end_of_run->m_next;
+    }
+}
+
+/* Apply URLIFIER to this token list.
+   Find BEGIN_QUOTE, TEXT, END_QUOTE triples, and if URLIFIER has a url
+   for the value of TEXT, then wrap TEXT in a {BEGIN,END}_URL pair.  */
+
+void
+pp_token_list::apply_urlifier (const urlifier &urlifier)
+{
+  for (pp_token *iter = m_first; iter; )
+    {
+      if (iter->m_kind == pp_token::kind::begin_quote
+	  && iter->m_next
+	  && iter->m_next->m_kind == pp_token::kind::text
+	  && iter->m_next->m_next
+	  && iter->m_next->m_next->m_kind == pp_token::kind::end_quote)
+	{
+	  pp_token *begin_quote = iter;
+	  pp_token_text *text = as_a <pp_token_text *> (begin_quote->m_next);
+	  pp_token *end_quote = text->m_next;
+	  if (char *url = urlifier.get_url_for_quoted_text
+			    (text->m_value.get (),
+			     strlen (text->m_value.get ())))
+	    {
+	      auto begin_url
+		= make_token<pp_token_begin_url> (label_text::take (url));
+	      auto end_url = make_token<pp_token_end_url> ();
+	      insert_after (std::move (begin_url), begin_quote);
+	      insert_after (std::move (end_url), text);
+	    }
+	  iter = end_quote->m_next;
+	}
+      else
+	iter = iter->m_next;
+    }
+}
+
+void
+pp_token_list::dump (FILE *out) const
+{
+  fprintf (out, "[");
+  for (auto iter = m_first; iter; iter = iter->m_next)
+    {
+      iter->dump (out);
+      if (iter->m_next)
+	fprintf (out, ", ");
+    }
+  fprintf (out, "]\n");
+}
 
-  location m_loc_last_open_quote;
-  std::vector<run> m_phase_3_quotes;
-};
 
 /* Adds a chunk to the end of formatted output, so that it
    will be printed by pp_output_formatted_text.  */
 
 void
-chunk_info::append_formatted_chunk (const char *content)
+chunk_info::append_formatted_chunk (obstack &s, const char *content)
 {
   unsigned int chunk_idx;
   for (chunk_idx = 0; m_args[chunk_idx]; chunk_idx++)
     ;
-  m_args[chunk_idx++] = content;
+  pp_token_list *tokens = pp_token_list::make (s);
+  tokens->push_back_text (label_text::borrow (content));
+  m_args[chunk_idx++] = tokens;
   m_args[chunk_idx] = nullptr;
 }
 
@@ -1262,34 +1480,33 @@  chunk_info::append_formatted_chunk (const char *content)
 void
 chunk_info::pop_from_output_buffer (output_buffer &buf)
 {
-  delete m_quotes;
   buf.cur_chunk_array = m_prev;
   obstack_free (&buf.chunk_obstack, this);
 }
 
 void
-chunk_info::on_begin_quote (const output_buffer &buf,
-			    unsigned chunk_idx,
-			    const urlifier *urlifier)
+chunk_info::dump (FILE *out) const
 {
-  if (!urlifier)
-    return;
-  if (!m_quotes)
-    m_quotes = new quoting_info ();
-  m_quotes->on_begin_quote (buf, chunk_idx);
+  for (size_t idx = 0; m_args[idx]; ++idx)
+    {
+      fprintf (out, "%i: ", (int)idx);
+      m_args[idx]->dump (out);
+    }
 }
 
-void
-chunk_info::on_end_quote (pretty_printer *pp,
-			  output_buffer &buf,
-			  unsigned chunk_idx,
-			  const urlifier *urlifier)
+/* Finish any text accumulating within CUR_OBSTACK,
+   terminating it.
+   Push a text pp_token to the end of TOK_LIST containing
+   a borrowed copy of the text in CUR_OBSTACK.  */
+
+static void
+push_back_any_text (pp_token_list *tok_list,
+		    obstack *cur_obstack)
 {
-  if (!urlifier)
-    return;
-  if (!m_quotes)
-    m_quotes = new quoting_info ();
-  m_quotes->on_end_quote (pp, buf, chunk_idx, *urlifier);
+  obstack_1grow (cur_obstack, '\0');
+  tok_list->push_back_text
+    (label_text::borrow (XOBFINISH (cur_obstack,
+				    const char *)));
 }
 
 /* The following format specifiers are recognized as being client independent:
@@ -1339,36 +1556,22 @@  chunk_info::on_end_quote (pretty_printer *pp,
 /* Implementation of pp_format.
    Formatting phases 1 and 2: render TEXT->format_spec plus
    text->m_args_ptr into a series of chunks in pp_buffer (PP)->args[].
-   Phase 3 is in pp_output_formatted_text.
-
-   If URLIFIER is non-NULL, then use it to add URLs for quoted
-   strings, so that e.g.
-     "before %<quoted%> after"
-   with a URLIFIER that has a URL for "quoted" might be emitted as:
-     "before `BEGIN_URL(http://example.com)quotedEND_URL' after"
-   This is handled here for message fragments that are:
-   - quoted entirely in phase 1 (e.g. "%<this is quoted%>"), or
-   - quoted entirely in phase 2 (e.g. "%qs"),
-   Quoted fragments that use a mixture of both phases
-   (e.g. "%<this is a mixture: %s %>")
-   are stashed into the output_buffer's m_quotes for use in phase 3.  */
+   Phase 3 is in pp_output_formatted_text.  */
 
 void
-pretty_printer::format (text_info *text,
-			const urlifier *urlifier)
+pretty_printer::format (text_info *text)
 {
   output_buffer * const buffer = m_buffer;
 
   unsigned int chunk = 0, argno;
-  const char **formatters[PP_NL_ARGMAX];
+  pp_token_list **formatters[PP_NL_ARGMAX];
 
   /* Allocate a new chunk structure.  */
   chunk_info *new_chunk_array = XOBNEW (&buffer->chunk_obstack, chunk_info);
 
   new_chunk_array->m_prev = buffer->cur_chunk_array;
-  new_chunk_array->m_quotes = nullptr;
   buffer->cur_chunk_array = new_chunk_array;
-  const char **args = new_chunk_array->m_args;
+  pp_token_list **args = new_chunk_array->m_args;
 
   /* Formatting phase 1: split up TEXT->format_spec into chunks in
      pp_buffer (PP)->args[].  Even-numbered chunks are to be output
@@ -1380,6 +1583,8 @@  pretty_printer::format (text_info *text,
 
   unsigned int curarg = 0;
   bool any_unnumbered = false, any_numbered = false;
+  pp_token_list *cur_token_list;
+  args[chunk++] = cur_token_list = pp_token_list::make (buffer->chunk_obstack);
   for (const char *p = text->m_format_spec; *p; )
     {
       while (*p != '\0' && *p != '%')
@@ -1403,44 +1608,39 @@  pretty_printer::format (text_info *text,
 
 	case '<':
 	  {
-	    obstack_grow (&buffer->chunk_obstack,
-			  open_quote, strlen (open_quote));
-	    const char *colorstr = colorize_start (m_show_color, "quote");
-	    obstack_grow (&buffer->chunk_obstack, colorstr, strlen (colorstr));
+	    push_back_any_text (cur_token_list, &buffer->chunk_obstack);
+	    cur_token_list->push_back<pp_token_begin_quote> ();
 	    p++;
-
-	    buffer->cur_chunk_array->on_begin_quote (*buffer, chunk, urlifier);
 	    continue;
 	  }
 
 	case '>':
 	  {
-	    buffer->cur_chunk_array->on_end_quote (this, *buffer, chunk, urlifier);
-
-	    const char *colorstr = colorize_stop (m_show_color);
-	    obstack_grow (&buffer->chunk_obstack, colorstr, strlen (colorstr));
+	    push_back_any_text (cur_token_list, &buffer->chunk_obstack);
+	    cur_token_list->push_back<pp_token_end_quote> ();
+	    p++;
+	    continue;
 	  }
-	  /* FALLTHRU */
 	case '\'':
-	  obstack_grow (&buffer->chunk_obstack,
-			close_quote, strlen (close_quote));
-	  p++;
+	  {
+	    push_back_any_text (cur_token_list, &buffer->chunk_obstack);
+	    cur_token_list->push_back<pp_token_end_quote> ();
+	    p++;
+	  }
 	  continue;
 
 	case '}':
 	  {
-	    const char *endurlstr = get_end_url_string (this);
-	    obstack_grow (&buffer->chunk_obstack, endurlstr,
-			  strlen (endurlstr));
+	    push_back_any_text (cur_token_list, &buffer->chunk_obstack);
+	    cur_token_list->push_back<pp_token_end_url> ();
+	    p++;
 	  }
-	  p++;
 	  continue;
 
 	case 'R':
 	  {
-	    const char *colorstr = colorize_stop (m_show_color);
-	    obstack_grow (&buffer->chunk_obstack, colorstr,
-			  strlen (colorstr));
+	    push_back_any_text (cur_token_list, &buffer->chunk_obstack);
+	    cur_token_list->push_back<pp_token_end_color> ();
 	    p++;
 	    continue;
 	  }
@@ -1455,11 +1655,14 @@  pretty_printer::format (text_info *text,
 
 	default:
 	  /* Handled in phase 2.  Terminate the plain chunk here.  */
-	  obstack_1grow (&buffer->chunk_obstack, '\0');
-	  args[chunk++] = XOBFINISH (&buffer->chunk_obstack, const char *);
+	  push_back_any_text (cur_token_list, &buffer->chunk_obstack);
 	  break;
 	}
 
+      /* Start a new token list for the formatting args.  */
+      args[chunk] = cur_token_list
+	= pp_token_list::make (buffer->chunk_obstack);
+
       if (ISDIGIT (*p))
 	{
 	  char *end;
@@ -1479,7 +1682,7 @@  pretty_printer::format (text_info *text,
 	}
       gcc_assert (argno < PP_NL_ARGMAX);
       gcc_assert (!formatters[argno]);
-      formatters[argno] = &args[chunk];
+      formatters[argno] = &args[chunk++];
       do
 	{
 	  obstack_1grow (&buffer->chunk_obstack, *p);
@@ -1531,17 +1734,24 @@  pretty_printer::format (text_info *text,
 	    }
 	}
       if (*p == '\0')
-	break;
+	{
+	  push_back_any_text (cur_token_list, &buffer->chunk_obstack);
+	  break;
+	}
 
       obstack_1grow (&buffer->chunk_obstack, '\0');
+      push_back_any_text (cur_token_list, &buffer->chunk_obstack);
+
+      /* Start a new token list for the next (non-formatted) text.  */
       gcc_assert (chunk < PP_NL_ARGMAX * 2);
-      args[chunk++] = XOBFINISH (&buffer->chunk_obstack, const char *);
+      args[chunk++] = cur_token_list
+	= pp_token_list::make (buffer->chunk_obstack);
     }
 
   obstack_1grow (&buffer->chunk_obstack, '\0');
+  push_back_any_text (cur_token_list, &buffer->chunk_obstack);
   gcc_assert (chunk < PP_NL_ARGMAX * 2);
-  args[chunk++] = XOBFINISH (&buffer->chunk_obstack, const char *);
-  args[chunk] = 0;
+  args[chunk] = nullptr;
 
   /* Set output to the argument obstack, and switch line-wrapping and
      prefixing off.  */
@@ -1549,6 +1759,15 @@  pretty_printer::format (text_info *text,
   const int old_line_length = buffer->line_length;
   const pp_wrapping_mode_t old_wrapping_mode = pp_set_verbatim_wrapping (this);
 
+  /* Note that you can debug the state of the chunk arrays here using
+       (gdb) call buffer->cur_chunk_array->dump()
+     which, given e.g. "foo: %s bar: %s" might print:
+       0: [TEXT("foo: ")]
+       1: [TEXT("s")]
+       2: [TEXT(" bar: ")]
+       3: [TEXT("s")]
+  */
+
   /* Second phase.  Replace each formatter with the formatted text it
      corresponds to.  */
 
@@ -1562,10 +1781,20 @@  pretty_printer::format (text_info *text,
 
       const char *p;
 
+      /* We expect a single text token containing the formatter.  */
+      pp_token_list *tok_list = *(formatters[argno]);
+      gcc_assert (tok_list);
+      gcc_assert (tok_list->m_first == tok_list->m_end);
+      gcc_assert (tok_list->m_first->m_kind == pp_token::kind::text);
+
+      /* Accumulate the value of the formatted text into here.  */
+      pp_token_list *formatted_tok_list
+	= pp_token_list::make (buffer->chunk_obstack);
+
       /* We do not attempt to enforce any ordering on the modifier
 	 characters.  */
 
-      for (p = *formatters[argno];; p++)
+      for (p = as_a <pp_token_text *> (tok_list->m_first)->m_value.get ();; p++)
 	{
 	  switch (*p)
 	    {
@@ -1612,16 +1841,18 @@  pretty_printer::format (text_info *text,
 
       if (quote)
 	{
-	  pp_begin_quote (this, m_show_color);
-	  buffer->cur_chunk_array->on_begin_quote (*buffer, chunk, urlifier);
+	  push_back_any_text (formatted_tok_list, &buffer->chunk_obstack);
+	  formatted_tok_list->push_back<pp_token_begin_quote> ();
 	}
 
       switch (*p)
 	{
 	case 'r':
-	  pp_string (this, colorize_start (m_show_color,
-					 va_arg (*text->m_args_ptr,
-						 const char *)));
+	  {
+	    const char *color = va_arg (*text->m_args_ptr, const char *);
+	    formatted_tok_list->push_back<pp_token_begin_color>
+	      (label_text::borrow (color));
+	  }
 	  break;
 
 	case 'c':
@@ -1763,7 +1994,11 @@  pretty_printer::format (text_info *text,
 	  break;
 
 	case '{':
-	  begin_url (va_arg (*text->m_args_ptr, const char *));
+	  {
+	    const char *url = va_arg (*text->m_args_ptr, const char *);
+	    formatted_tok_list->push_back<pp_token_begin_url>
+	      (label_text::borrow (url));
+	  }
 	  break;
 
 	case 'e':
@@ -1772,7 +2007,7 @@  pretty_printer::format (text_info *text,
 	      = va_arg (*text->m_args_ptr, pp_element *);
 	    pp_markup::context ctxt (*this, *buffer, chunk,
 				     quote, /* by reference */
-				     urlifier);
+				     formatted_tok_list);
 	    element->add_to_phase_2 (ctxt);
 	  }
 	  break;
@@ -1787,22 +2022,23 @@  pretty_printer::format (text_info *text,
 	       (e.g. when printing "'TYPEDEF' aka 'TYPE'" in the C family
 	       of frontends).  */
 	    gcc_assert (pp_format_decoder (this));
+	    gcc_assert (formatted_tok_list);
 	    ok = m_format_decoder (this, text, p,
 				   precision, wide, plus, hash, &quote,
-				   formatters[argno]);
+				   *formatted_tok_list);
 	    gcc_assert (ok);
 	  }
 	}
 
       if (quote)
 	{
-	  buffer->cur_chunk_array->on_end_quote (this, *buffer,
-						 chunk, urlifier);
-	  pp_end_quote (this, m_show_color);
+	  push_back_any_text (formatted_tok_list, &buffer->chunk_obstack);
+	  formatted_tok_list->push_back<pp_token_end_quote> ();
 	}
 
-      obstack_1grow (&buffer->chunk_obstack, '\0');
-      *formatters[argno] = XOBFINISH (&buffer->chunk_obstack, const char *);
+      push_back_any_text (formatted_tok_list, &buffer->chunk_obstack);
+      delete *formatters[argno];
+      *formatters[argno] = formatted_tok_list;
     }
 
   if (CHECKING_P)
@@ -1833,6 +2069,8 @@  struct auto_obstack
     obstack_free (&m_obstack, NULL);
   }
 
+  operator obstack & () { return m_obstack; }
+
   void grow (const void *src, size_t length)
   {
     obstack_grow (&m_obstack, src, length);
@@ -1851,130 +2089,105 @@  struct auto_obstack
   obstack m_obstack;
 };
 
-/* Subroutine of pp_output_formatted_text for the awkward case where
-   quoted text straddles multiple chunks.
-
-   Flush PP's buffer's chunks to PP's output buffer, whilst inserting
-   URLs for any quoted text that should be URLified.
-
-   For example, given:
-   |  pp_format (pp,
-   |            "unrecognized option %qs; did you mean %<-%s%>",
-   |            "foo", "foption");
-   we would have these chunks:
-   |  chunk 0: "unrecognized option "
-   |  chunk 1: "`foo'" (already checked for urlification)
-   |  chunk 2: "; did you mean `-"
-   |                           ^*
-   |  chunk 3: "foption"
-   |            *******
-   |  chunk 4: "'"
-   |            ^
-   and this quoting_info would have recorded the open quote near the end
-   of chunk 2 and close quote at the start of chunk 4; this function would
-   check the combination of the end of chunk 2 and all of chunk 3 ("-foption")
-   for urlification.  */
+/* Format of a message pointed to by TEXT.
+   If URLIFIER is non-null then use it on any quoted text that was not
+   handled in phases 1 or 2 to potentially add URLs.  */
 
 void
-quoting_info::handle_phase_3 (pretty_printer *pp,
-			      const urlifier &urlifier)
+pp_output_formatted_text (pretty_printer *pp,
+			  const urlifier *urlifier)
 {
-  unsigned int chunk;
   output_buffer * const buffer = pp_buffer (pp);
+  gcc_assert (buffer->obstack == &buffer->formatted_obstack);
+
   chunk_info *chunk_array = buffer->cur_chunk_array;
-  const char * const *args = chunk_array->get_args ();
-  quoting_info *quoting = chunk_array->get_quoting_info ();
-
-  /* We need to construct the string into an intermediate buffer
-     for this case, since using pp_string can introduce prefixes
-     and line-wrapping, and omit whitespace at the start of lines.  */
-  auto_obstack combined_buf;
-
-  /* Iterate simultaneously through both
-     - the chunks and
-     - the runs of quoted characters
-     Accumulate text from the chunks into combined_buf, and handle
-     runs of quoted characters when handling the chunks they
-     correspond to.  */
-  size_t start_of_run_byte_offset = 0;
-  std::vector<quoting_info::run>::const_iterator iter_run
-    = quoting->m_phase_3_quotes.begin ();
-  std::vector<quoting_info::run>::const_iterator end_runs
-    = quoting->m_phase_3_quotes.end ();
-  for (chunk = 0; args[chunk]; chunk++)
-    {
-      size_t start_of_chunk_idx = combined_buf.object_size ();
+  pp_token_list * const *token_lists = chunk_array->get_token_lists ();
 
-      combined_buf.grow (args[chunk], strlen (args[chunk]));
+  {
+    /* Consolidate into one token list.  */
+    pp_token_list tokens (buffer->chunk_obstack);
+    for (unsigned chunk = 0; token_lists[chunk]; chunk++)
+      {
+	tokens.push_back_list (std::move (*token_lists[chunk]));
+	delete token_lists[chunk];
+      }
 
-      if (iter_run != end_runs
-	  && chunk == iter_run->m_end.m_chunk_idx)
-	{
-	  /* A run is ending; consider for it urlification.  */
-	  const size_t end_of_run_byte_offset
-	    = start_of_chunk_idx + iter_run->m_end.m_byte_offset;
-	  const size_t end_offset
-	    = urlify_quoted_string (pp,
-				    &combined_buf.m_obstack,
-				    &urlifier,
-				    start_of_run_byte_offset,
-				    end_of_run_byte_offset);
-
-	  /* If URLification occurred it will have grown the buffer.
-	     We need to update start_of_chunk_idx so that offsets
-	     relative to it are still correct, for the case where
-	     we have a chunk that both ends a quoted run and starts
-	     another quoted run.  */
-	  gcc_assert (end_offset >= end_of_run_byte_offset);
-	  start_of_chunk_idx += end_offset - end_of_run_byte_offset;
-
-	  iter_run++;
-	}
-      if (iter_run != end_runs
-	  && chunk == iter_run->m_start.m_chunk_idx)
-	{
-	  /* Note where the run starts w.r.t. the composed buffer.  */
-	  start_of_run_byte_offset
-	    = start_of_chunk_idx + iter_run->m_start.m_byte_offset;
-	}
-    }
+    tokens.replace_custom_tokens ();
+
+    tokens.merge_consecutive_text_tokens ();
+
+    if (urlifier)
+      tokens.apply_urlifier (*urlifier);
+
+    /* This is a third phase, first 2 phases done in pp_format_args.
+       Now we actually print it.  */
+    if (pp->m_token_printer)
+      pp->m_token_printer->print_tokens (pp, tokens);
+    else
+      default_token_printer (pp, tokens);
 
-  /* Now print to PP.  */
-  const char *start
-    = static_cast <const char *> (combined_buf.object_base ());
-  pp_maybe_wrap_text (pp, start, start + combined_buf.object_size ());
+  /* Close the scope here to ensure that "tokens" above is fully cleared up
+     before popping the current chunk_info, since that latter will pop
+     the chunk_obstack, and "tokens" may be using blocks within
+     the current chunk_info's chunk_obstack level.  */
+  }
+
+  chunk_array->pop_from_output_buffer (*buffer);
 }
 
-/* Format of a message pointed to by TEXT.
-   If URLIFIER is non-null then use it on any quoted text that was not
-   handled in phases 1 or 2 to potentially add URLs.  */
+/* Default implementation of token printing.  */
 
-void
-pp_output_formatted_text (pretty_printer *pp,
-			  const urlifier *urlifier)
+static void
+default_token_printer (pretty_printer *pp,
+		       const pp_token_list &tokens)
 {
-  unsigned int chunk;
-  output_buffer * const buffer = pp_buffer (pp);
-  chunk_info *chunk_array = buffer->cur_chunk_array;
-  const char * const *args = chunk_array->get_args ();
-  quoting_info *quoting = chunk_array->get_quoting_info ();
+  /* Convert to text, possibly with colorization, URLs, etc.  */
+  for (auto iter = tokens.m_first; iter; iter = iter->m_next)
+    switch (iter->m_kind)
+      {
+      default:
+	gcc_unreachable ();
 
-  gcc_assert (buffer->obstack == &buffer->formatted_obstack);
+      case pp_token::kind::text:
+	{
+	  pp_token_text *sub = as_a <pp_token_text *> (iter);
+	  pp_string (pp, sub->m_value.get ());
+	}
+	break;
+
+      case pp_token::kind::begin_color:
+	{
+	  pp_token_begin_color *sub = as_a <pp_token_begin_color *> (iter);
+	  pp_string (pp, colorize_start (pp_show_color (pp),
+					 sub->m_value.get ()));
+	}
+	break;
+      case pp_token::kind::end_color:
+	pp_string (pp, colorize_stop (pp_show_color (pp)));
+	break;
 
-  /* This is a third phase, first 2 phases done in pp_format_args.
-     Now we actually print it.  */
+      case pp_token::kind::begin_quote:
+	pp_begin_quote (pp, pp_show_color (pp));
+	break;
+      case pp_token::kind::end_quote:
+	pp_end_quote (pp, pp_show_color (pp));
+	break;
 
-  /* If we have any deferred urlification, handle it now.  */
-  if (urlifier
-      && pp->supports_urls_p ()
-      && quoting
-      && quoting->has_phase_3_quotes_p ())
-    quoting->handle_phase_3 (pp, *urlifier);
-  else
-    for (chunk = 0; args[chunk]; chunk++)
-      pp_string (pp, args[chunk]);
+      case pp_token::kind::begin_url:
+	{
+	  pp_token_begin_url *sub = as_a <pp_token_begin_url *> (iter);
+	  pp_begin_url (pp, sub->m_value.get ());
+	}
+	break;
+      case pp_token::kind::end_url:
+	pp_end_url (pp);
+	break;
 
-  chunk_array->pop_from_output_buffer (*buffer);
+      case pp_token::kind::custom_data:
+	/* These should have been eliminated by replace_custom_tokens.  */
+	gcc_unreachable ();
+	break;
+      }
 }
 
 /* Helper subroutine of output_verbatim and verbatim. Do the appropriate
@@ -2113,6 +2326,7 @@  pretty_printer::pretty_printer (int maximum_length)
     m_wrapping (),
     m_format_decoder (nullptr),
     m_format_postprocessor (NULL),
+    m_token_printer (nullptr),
     m_emitted_prefix (false),
     m_need_newline (false),
     m_translate_identifiers (true),
@@ -2138,6 +2352,7 @@  pretty_printer::pretty_printer (const pretty_printer &other)
   m_wrapping (other.m_wrapping),
   m_format_decoder (other.m_format_decoder),
   m_format_postprocessor (NULL),
+  m_token_printer (other.m_token_printer),
   m_emitted_prefix (other.m_emitted_prefix),
   m_need_newline (other.m_need_newline),
   m_translate_identifiers (other.m_translate_identifiers),
@@ -2743,8 +2958,9 @@  void
 pp_markup::context::begin_quote ()
 {
   gcc_assert (!m_quoted);
-  pp_begin_quote (&m_pp, pp_show_color (&m_pp));
-  m_buf.cur_chunk_array->on_begin_quote (m_buf, m_chunk_idx, m_urlifier);
+  gcc_assert (m_formatted_token_list);
+  push_back_any_text ();
+  m_formatted_token_list->push_back<pp_token_begin_quote> ();
   m_quoted = true;
 }
 
@@ -2755,8 +2971,9 @@  pp_markup::context::end_quote ()
      printing a type emitting "TYPEDEF' {aka `TYPE'}".  */
   if (!m_quoted)
     return;
-  m_buf.cur_chunk_array->on_end_quote (&m_pp, m_buf, m_chunk_idx, m_urlifier);
-  pp_end_quote (&m_pp, pp_show_color (&m_pp));
+  gcc_assert (m_formatted_token_list);
+  push_back_any_text ();
+  m_formatted_token_list->push_back<pp_token_end_quote> ();
   m_quoted = false;
 }
 
@@ -2765,7 +2982,10 @@  pp_markup::context::begin_highlight_color (const char *color_name)
 {
   if (!pp_show_highlight_colors (&m_pp))
     return;
-  pp_string (&m_pp, colorize_start (pp_show_color (&m_pp), color_name));
+
+  push_back_any_text ();
+  m_formatted_token_list->push_back <pp_token_begin_color>
+    (label_text::borrow (color_name));
 }
 
 void
@@ -2773,10 +2993,20 @@  pp_markup::context::end_highlight_color ()
 {
   if (!pp_show_highlight_colors (&m_pp))
     return;
-  const char *colorstr = colorize_stop (pp_show_color (&m_pp));
-  obstack_grow (&m_buf.chunk_obstack, colorstr, strlen (colorstr));
+
+  push_back_any_text ();
+  m_formatted_token_list->push_back<pp_token_end_color> ();
 }
 
+void
+pp_markup::context::push_back_any_text ()
+{
+  obstack *cur_obstack = m_buf.obstack;
+  obstack_1grow (cur_obstack, '\0');
+  m_formatted_token_list->push_back_text
+    (label_text::borrow (XOBFINISH (cur_obstack,
+				    const char *)));
+}
 
 /* Color names for expressing "expected" vs "actual" values.  */
 const char *const highlight_colors::expected = "highlight-a";
@@ -3039,6 +3269,245 @@  test_pp_format ()
 		    1776, "second");
 }
 
+static void
+test_merge_consecutive_text_tokens ()
+{
+  auto_obstack s;
+  pp_token_list list (s);
+  list.push_back_text (label_text::borrow ("hello"));
+  list.push_back_text (label_text::borrow (" "));
+  list.push_back_text (label_text::take (xstrdup ("world")));
+  list.push_back_text (label_text::borrow ("!"));
+
+  list.merge_consecutive_text_tokens ();
+  // We expect a single text token, with concatenated text
+  ASSERT_EQ (list.m_first, list.m_end);
+  pp_token *tok = list.m_first;
+  ASSERT_NE (tok, nullptr);
+  ASSERT_EQ (tok->m_kind, pp_token::kind::text);
+  ASSERT_STREQ (as_a <pp_token_text *> (tok)->m_value.get (), "hello world!");
+}
+
+/* Verify that we can create custom tokens that can be lowered
+   in phase 3.  */
+
+static void
+test_custom_tokens_1 ()
+{
+  struct custom_token_adder : public pp_element
+  {
+  public:
+    struct value : public pp_token_custom_data::value
+    {
+      value (custom_token_adder &adder)
+      : m_adder (adder)
+      {
+	m_adder.m_num_living_values++;
+      }
+      value (const value &other)
+      : m_adder (other.m_adder)
+      {
+	m_adder.m_num_living_values++;
+      }
+      value (value &&other)
+      : m_adder (other.m_adder)
+      {
+	m_adder.m_num_living_values++;
+      }
+      value &operator= (const value &other) = delete;
+      value &operator= (value &&other) = delete;
+      ~value ()
+      {
+	m_adder.m_num_living_values--;
+      }
+
+      void dump (FILE *out) const final override
+      {
+	fprintf (out, "\"%s\"", m_adder.m_name);
+      }
+
+      bool as_standard_tokens (pp_token_list &out) final override
+      {
+	ASSERT_TRUE (m_adder.m_num_living_values > 0);
+	out.push_back<pp_token_text> (label_text::borrow (m_adder.m_name));
+	return true;
+      }
+
+      custom_token_adder &m_adder;
+    };
+
+    custom_token_adder (const char *name)
+    : m_name (name),
+      m_num_living_values (0)
+    {
+    }
+
+    void add_to_phase_2 (pp_markup::context &ctxt) final override
+    {
+      auto val_ptr = make_unique<value> (*this);
+      ctxt.m_formatted_token_list->push_back<pp_token_custom_data>
+	(std::move (val_ptr));
+    }
+
+    const char *m_name;
+    int m_num_living_values;
+  };
+
+  custom_token_adder e1 ("foo");
+  custom_token_adder e2 ("bar");
+  ASSERT_EQ (e1.m_num_living_values, 0);
+  ASSERT_EQ (e2.m_num_living_values, 0);
+
+  pretty_printer pp;
+  pp_printf (&pp, "before %e middle %e after", &e1, &e2);
+
+  /* Verify that instances were cleaned up.  */
+  ASSERT_EQ (e1.m_num_living_values, 0);
+  ASSERT_EQ (e2.m_num_living_values, 0);
+
+  ASSERT_STREQ (pp_formatted_text (&pp),
+		"before foo middle bar after");
+}
+
+/* Verify that we can create custom tokens that aren't lowered
+   in phase 3, but instead are handled by a custom token_printer.
+   Use this to verify the inputs seen by such token_printers.  */
+
+static void
+test_custom_tokens_2 ()
+{
+  struct custom_token_adder : public pp_element
+  {
+    struct value : public pp_token_custom_data::value
+    {
+    public:
+      value (custom_token_adder &adder)
+      : m_adder (adder)
+      {
+	m_adder.m_num_living_values++;
+      }
+      value (const value &other)
+      : m_adder (other.m_adder)
+      {
+	m_adder.m_num_living_values++;
+      }
+      value (value &&other)
+      : m_adder (other.m_adder)
+      {
+	m_adder.m_num_living_values++;
+      }
+      value &operator= (const value &other) = delete;
+      value &operator= (value &&other) = delete;
+      ~value ()
+      {
+	m_adder.m_num_living_values--;
+      }
+
+      void dump (FILE *out) const final override
+      {
+	fprintf (out, "\"%s\"", m_adder.m_name);
+      }
+
+      bool as_standard_tokens (pp_token_list &) final override
+      {
+	return false;
+      }
+
+      custom_token_adder &m_adder;
+    };
+
+    custom_token_adder (const char *name)
+    : m_name (name),
+      m_num_living_values (0)
+    {
+    }
+
+    void add_to_phase_2 (pp_markup::context &ctxt) final override
+    {
+      auto val_ptr = make_unique<value> (*this);
+      ctxt.m_formatted_token_list->push_back<pp_token_custom_data>
+	(std::move (val_ptr));
+    }
+
+    const char *m_name;
+    int m_num_living_values;
+  };
+
+  class custom_token_printer : public token_printer
+  {
+    void print_tokens (pretty_printer *pp,
+		       const pp_token_list &tokens) final override
+    {
+      /* Verify that TOKENS has:
+	 [TEXT("before "), CUSTOM("foo"), TEXT(" middle "), CUSTOM("bar"),
+	  TEXT(" after")]  */
+      pp_token *tok_0 = tokens.m_first;
+      ASSERT_NE (tok_0, nullptr);
+      ASSERT_EQ (tok_0->m_kind, pp_token::kind::text);
+      ASSERT_STREQ (as_a<pp_token_text *> (tok_0)->m_value.get (),
+		    "before ");
+
+      pp_token *tok_1 = tok_0->m_next;
+      ASSERT_NE (tok_1, nullptr);
+      ASSERT_EQ (tok_1->m_prev, tok_0);
+      ASSERT_EQ (tok_1->m_kind, pp_token::kind::custom_data);
+
+      custom_token_adder::value *v1
+	= static_cast <custom_token_adder::value *>
+	(as_a<pp_token_custom_data *> (tok_1)->m_value.get ());
+      ASSERT_STREQ (v1->m_adder.m_name, "foo");
+      ASSERT_TRUE (v1->m_adder.m_num_living_values > 0);
+
+      pp_token *tok_2 = tok_1->m_next;
+      ASSERT_NE (tok_2, nullptr);
+      ASSERT_EQ (tok_2->m_prev, tok_1);
+      ASSERT_EQ (tok_2->m_kind, pp_token::kind::text);
+      ASSERT_STREQ (as_a<pp_token_text *> (tok_2)->m_value.get (),
+		    " middle ");
+
+      pp_token *tok_3 = tok_2->m_next;
+      ASSERT_NE (tok_3, nullptr);
+      ASSERT_EQ (tok_3->m_prev, tok_2);
+      ASSERT_EQ (tok_3->m_kind, pp_token::kind::custom_data);
+      custom_token_adder::value *v3
+	= static_cast <custom_token_adder::value *>
+	(as_a<pp_token_custom_data *> (tok_3)->m_value.get ());
+      ASSERT_STREQ (v3->m_adder.m_name, "bar");
+      ASSERT_TRUE (v3->m_adder.m_num_living_values > 0);
+
+      pp_token *tok_4 = tok_3->m_next;
+      ASSERT_NE (tok_4, nullptr);
+      ASSERT_EQ (tok_4->m_prev, tok_3);
+      ASSERT_EQ (tok_4->m_kind, pp_token::kind::text);
+      ASSERT_STREQ (as_a<pp_token_text *> (tok_4)->m_value.get (),
+		    " after");
+      ASSERT_EQ (tok_4->m_next, nullptr);
+
+      /* Normally we'd loop over the tokens, printing them to PP
+	 and handling the custom tokens.
+	 Instead, print a message to PP to verify that we were called.  */
+      pp_string (pp, "print_tokens was called");
+    }
+  };
+
+  custom_token_adder e1 ("foo");
+  custom_token_adder e2 ("bar");
+  ASSERT_EQ (e1.m_num_living_values, 0);
+  ASSERT_EQ (e2.m_num_living_values, 0);
+
+  custom_token_printer tp;
+  pretty_printer pp;
+  pp.set_token_printer (&tp);
+  pp_printf (&pp, "before %e middle %e after", &e1, &e2);
+
+  /* Verify that instances were cleaned up.  */
+  ASSERT_EQ (e1.m_num_living_values, 0);
+  ASSERT_EQ (e2.m_num_living_values, 0);
+
+  ASSERT_STREQ (pp_formatted_text (&pp),
+		"print_tokens was called");
+}
+
 /* A subclass of pretty_printer for use by test_prefixes_and_wrapping.  */
 
 class test_pretty_printer : public pretty_printer
@@ -3248,7 +3717,7 @@  pp_printf_with_urlifier (pretty_printer *pp,
 
   va_start (ap, msg);
   text_info text (msg, &ap, errno);
-  pp_format (pp, &text, urlifier);
+  pp_format (pp, &text);
   pp_output_formatted_text (pp, urlifier);
   va_end (ap);
 }
@@ -3404,6 +3873,18 @@  test_urlification ()
       ("foo `\33]8;;http://example.com\33\\-foption\33]8;;\33\\' bar",
        pp_formatted_text (&pp));
   }
+
+  /* Test the example from pretty-print-format-impl.h.  */
+  {
+    pretty_printer pp;
+    pp.set_url_format (URL_FORMAT_ST);
+    pp_printf_with_urlifier (&pp, &urlifier,
+	       "foo: %i, bar: %s, option: %qs",
+	       42, "baz", "-foption");
+    ASSERT_STREQ (pp_formatted_text (&pp),
+		  "foo: 42, bar: baz, option:"
+		  " `]8;;http://example.com\\-foption]8;;\\'");
+  }
 }
 
 /* Test multibyte awareness.  */
@@ -3453,6 +3934,9 @@  pretty_print_cc_tests ()
 {
   test_basic_printing ();
   test_pp_format ();
+  test_merge_consecutive_text_tokens ();
+  test_custom_tokens_1 ();
+  test_custom_tokens_2 ();
   test_prefixes_and_wrapping ();
   test_urls ();
   test_urls_from_braces ();
diff --git a/gcc/pretty-print.h b/gcc/pretty-print.h
index ea81706b5d8a..e0505b2683c2 100644
--- a/gcc/pretty-print.h
+++ b/gcc/pretty-print.h
@@ -70,8 +70,8 @@  enum diagnostic_prefixing_rule_t
 };
 
 class chunk_info;
-class quoting_info;
 class output_buffer;
+class pp_token_list;
 class urlifier;
 
 namespace pp_markup {
@@ -177,7 +177,7 @@  struct pp_wrapping_mode_t
    A client-supplied formatter returns true if everything goes well,
    otherwise it returns false.  */
 typedef bool (*printer_fn) (pretty_printer *, text_info *, const char *,
-			    int, bool, bool, bool, bool *, const char **);
+			    int, bool, bool, bool, bool *, pp_token_list &);
 
 /* Base class for an optional client-supplied object for doing additional
    processing between stages 2 and 3 of formatted printing.  */
@@ -189,6 +189,18 @@  class format_postprocessor
   virtual void handle (pretty_printer *) = 0;
 };
 
+/* Abstract base class for writing formatted tokens to the pretty_printer's
+   text buffer, allowing for output formats and dumpfiles to override
+   how different kinds of tokens are handled.  */
+
+class token_printer
+{
+public:
+  virtual ~token_printer () {}
+  virtual void print_tokens (pretty_printer *pp,
+			     const pp_token_list &tokens) = 0;
+};
+
 inline bool & pp_needs_newline (pretty_printer *pp);
 
 /* True if PRETTY-PRINTER is in line-wrapping mode.  */
@@ -236,6 +248,9 @@  public:
   friend format_postprocessor *& pp_format_postprocessor (pretty_printer *pp);
   friend bool & pp_show_highlight_colors (pretty_printer *pp);
 
+  friend void pp_output_formatted_text (pretty_printer *,
+					const urlifier *);
+
   /* Default construct a pretty printer with specified
      maximum line length cut off limit.  */
   explicit pretty_printer (int = 0);
@@ -250,12 +265,16 @@  public:
     m_buffer->stream = outfile;
   }
 
+  void set_token_printer (token_printer* tp)
+  {
+    m_token_printer = tp; // borrowed
+  }
+
   void set_prefix (char *prefix);
 
   void emit_prefix ();
 
-  void format (text_info *text,
-	       const urlifier *urlifier);
+  void format (text_info *text);
 
   void maybe_space ();
 
@@ -314,8 +333,9 @@  private:
      If the BUFFER needs additional characters from the format string, it
      should advance the TEXT->format_spec as it goes.  When FORMAT_DECODER
      returns, TEXT->format_spec should point to the last character processed.
-     The QUOTE and BUFFER_PTR are passed in, to allow for deferring-handling
-     of format codes (e.g. %H and %I in the C++ frontend).  */
+     The QUOTE and FORMATTED_TOKEN_LIST are passed in, to allow for
+     deferring-handling of format codes (e.g. %H and %I in
+     the C++ frontend).  */
   printer_fn m_format_decoder;
 
   /* If non-NULL, this is called by pp_format once after all format codes
@@ -324,6 +344,12 @@  private:
      format codes (which interract with each other).  */
   format_postprocessor *m_format_postprocessor;
 
+  /* This is used by pp_output_formatted_text after it has converted all
+     formatted chunks into a single list of tokens.
+     Can be nullptr.
+     Borrowed from the output format or from dump_pretty_printer.  */
+  token_printer *m_token_printer;
+
   /* Nonzero if current PREFIX was emitted at least once.  */
   bool m_emitted_prefix;
 
@@ -543,10 +569,9 @@  extern void pp_verbatim (pretty_printer *, const char *, ...)
      ATTRIBUTE_GCC_PPDIAG(2,3);
 extern void pp_flush (pretty_printer *);
 extern void pp_really_flush (pretty_printer *);
-inline void pp_format (pretty_printer *pp, text_info *text,
-		       const urlifier *urlifier = nullptr)
+inline void pp_format (pretty_printer *pp, text_info *text)
 {
-  pp->format (text, urlifier);
+  pp->format (text);
 }
 extern void pp_output_formatted_text (pretty_printer *,
 				      const urlifier * = nullptr);
diff --git a/gcc/tree-diagnostic.cc b/gcc/tree-diagnostic.cc
index fc78231dfa44..466725fdd637 100644
--- a/gcc/tree-diagnostic.cc
+++ b/gcc/tree-diagnostic.cc
@@ -55,7 +55,7 @@  default_tree_diagnostic_starter (diagnostic_context *context,
 bool
 default_tree_printer (pretty_printer *pp, text_info *text, const char *spec,
 		      int precision, bool wide, bool set_locus, bool hash,
-		      bool *, const char **)
+		      bool *, pp_token_list &)
 {
   tree t;
 
diff --git a/gcc/tree-diagnostic.h b/gcc/tree-diagnostic.h
index 6ebac381ace8..98ca654c946e 100644
--- a/gcc/tree-diagnostic.h
+++ b/gcc/tree-diagnostic.h
@@ -53,6 +53,6 @@  void diagnostic_report_current_function (diagnostic_context *,
 
 void tree_diagnostics_defaults (diagnostic_context *context);
 bool default_tree_printer (pretty_printer *, text_info *, const char *,
-			   int, bool, bool, bool, bool *, const char **);
+			   int, bool, bool, bool, bool *, pp_token_list &);
 
 #endif /* ! GCC_TREE_DIAGNOSTIC_H */