[pushed,3/4] pretty-print: reimplement pp_format with a new struct pp_token

The following patch rewrites the internals of pp_format.

A pretty_printer's output_buffer maintains a stack of chunk_info
instances, each one responsible for handling a call to pp_format, where
having a stack allows us to support re-entrant calls to pp_format on the
same pretty_printer.

Previously a chunk_info merely stored buffers of accumulated text
per unformatted run and per formatted argument.

This led to various special-casing for handling:

- urlifiers, needing class quoting_info to handle awkard cases where
  the run of quoted text could be split between stages 1 and 2
  of formatting

- dumpfiles, where the optinfo machinery could lead to objects being
  stashed during formatting for later replay to JSON optimization
  records

- in the C++ frontend, the format codes %H and %I can't be processed
  until we've seen both, leading to awkward code to manipulate the
  text buffers

Further, supporting URLs in messages in SARIF output (PR other/116419)
would add additional manipulations of text buffers, since our internal
pp_begin_url API gives the URL at the beginning of the wrapped text,
whereas SARIF's format for embedded URLs has the URL *after* the wrapped
text.  Also when handling "%@" we wouldn't necessarily know the URL of
an event ID until later, requiring further nasty special-case
manipulation of text buffers.

This patch rewrites pretty-print formatting by introducing a new
intermediate representation during formatting: pp_token and
pp_token_list.  Rather than simply accumulating a buffer of "char" in
the chunk_obstack during formatting, we now also accumulate a
pp_token_list, a doubly-linked list of pp_token, which can be:
- text buffers
- begin/end colorization
- begin/end quote
- begin/end URL
- "custom data" tokens

Working at the level of tokens rather than just text buffers allows the
various awkward special cases above to be replaced with uniform logic.
For example, all "urlification" is now done in phase 3 of formatting,
in one place, by looking for [..., BEGIN_QUOTE, TEXT, END_QUOTE, ...]
and injecting BEGIN_URL and END_URL wrapper tokens when the urlifier
has a URL for TEXT.  Doing so greatly simplifies the urlifier code,
allowing the removal of class quoting_info.

The tokens and token lists are allocated on the chunk_obstack, and so
there's no additional heap activity required, with the memory reclaimed
when the chunk_obstack is freed after phase 3 of formatting.

New kinds of pp_token can be added as needed to support output formats.
For example, the followup patch adds a token for "%@" for events IDs, to
better support SARIF output.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Lightly tested with valgrind.
Pushed to trunk as r15-3311-ge31b6176996567.

gcc/c/ChangeLog:
	* c-objc-common.cc (c_tree_printer): Convert final param from
	const char ** to pp_token_list &.

gcc/cp/ChangeLog:
	* error.cc: Include "make-unique.h".
	(deferred_printed_type::m_buffer_ptr): Replace with...
	(deferred_printed_type::m_printed_text): ...this and...
	(deferred_printed_type::m_token_list): ...this.
	(deferred_printed_type::deferred_printed_type): Update ctors for
	above changes.
	(deferred_printed_type::set_text_for_token_list): New.
	(append_formatted_chunk): Pass chunk_obstack to
	append_formatted_chunk.
	(add_quotes): Delete.
	(cxx_format_postprocessor::handle): Reimplement to call
	deferred_printed_type::set_text_for_token_list, rather than store
	buffer pointers.
	(defer_phase_2_of_type_diff): Replace param "buffer_ptr"
	with "formatted_token_list".  Reimplement by storing
	a pointer to formatted_token_list so that the postprocessor can
	put its text there.
	(cp_printer): Convert param "buffer_ptr" to
	"formatted_token_list".  Update calls to
	defer_phase_2_of_type_diff accordingly.

gcc/ChangeLog:
	* diagnostic.cc (diagnostic_context::report_diagnostic): Don't
	pass m_urlifier to pp_format, as urlification now happens in
	phase 3.
	* dump-context.h (class dump_pretty_printer): Update leading
	comment.
	(dump_pretty_printer::emit_items): Drop decl.
	(dump_pretty_printer::set_optinfo): New.
	(class dump_pretty_printer::stashed_item): Delete class.
	(class dump_pretty_printer::custom_token_printer): New class.
	(dump_pretty_printer::format_decoder_cb): Convert param from
	const char ** to pp_token_list &.
	(dump_pretty_printer::decode_format): Likewise.
	(dump_pretty_printer::stash_item): Likewise.
	(dump_pretty_printer::emit_any_pending_textual_chunks): Drop decl.
	(dump_pretty_printer::m_stashed_items): Delete field.
	(dump_pretty_printer::m_token_printer): New member data.
	* dumpfile.cc (struct wrapped_optinfo_item): New.
	(dump_pretty_printer::dump_pretty_printer): Update for dropping
	of field m_stashed_items and new field m_token_printer.
	(dump_pretty_printer::emit_items): Delete; we now use
	pp_output_formatted_text..
	(dump_pretty_printer::emit_any_pending_textual_chunks): Delete.
	(dump_pretty_printer::stash_item): Convert param from
	const char ** to pp_token_list &.
	(dump_pretty_printer::format_decoder_cb): Likewise.
	(dump_pretty_printer::decode_format): Likewise.
	(dump_pretty_printer::custom_token_printer::print_tokens): New.
	(dump_pretty_printer::custom_token_printer::emit_any_pending_textual_chunks):
	New.
	(dump_context::dump_printf_va): Call set_optinfo on the
	dump_pretty_printer.  Replace call to emit_items with a call to
	pp_output_formatted_text.
	* opt-problem.cc (opt_problem::opt_problem): Replace call to
	emit_items with call to set_optinfo and call to
	pp_output_formatted_text.
	* pretty-print-format-impl.h (struct pp_token): New.
	(struct pp_token_text): New.
	(is_a_helper <pp_token_text *>::test): New.
	(is_a_helper <const pp_token_text *>::test): New.
	(struct pp_token_begin_color): New.
	(is_a_helper <pp_token_begin_color *>::test): New.
	(is_a_helper <const pp_token_begin_color *>::test): New.
	(struct pp_token_end_color): New.
	(struct pp_token_begin_quote): New.
	(struct pp_token_end_quote): New.
	(struct pp_token_begin_url): New.
	(is_a_helper <pp_token_begin_url*>::test): New.
	(is_a_helper <const pp_token_begin_url*>::test): New.
	(struct pp_token_end_url): New.
	(struct pp_token_custom_data): New.
	(is_a_helper <pp_token_custom_data *>::test): New.
	(is_a_helper <const pp_token_custom_data *>::test): New.
	(class pp_token_list): New.
	(chunk_info::get_args): Drop.
	(chunk_info::get_quoting_info): Drop.
	(chunk_info::get_token_lists): New accessor.
	(chunk_info::append_formatted_chunk): Add obstack & param.
	(chunk_info::dump): New decls.
	(chunk_info::m_args): Convert element type from const char * to
	pp_token_list *.  Rewrite/update comment.
	(chunk_info::m_quotes): Drop field.
	* pretty-print-markup.h (class pp_token_list): New forward decl.
	(pp_markup::context::context): Drop urlifier param; add
	formatted_token_list param.
	(pp_markup::context::push_back_any_text): New decl.
	(pp_markup::context::m_urlifier): Drop field.
	(pp_markup::context::m_formatted_token_list): New field.
	* pretty-print-urlifier.h: Update comment.
	* pretty-print.cc: Define INCLUDE_MEMORY.  Include
	"make-unique.h".
	(default_token_printer): New forward decl.
	(obstack_append_string): Delete.
	(urlify_quoted_string): Delete.
	(pp_token::pp_token): New.
	(pp_token::dump): New.
	(allocate_object): New.
	(class quoting_info): Delete.
	(pp_token::operator new): New.
	(pp_token::operator delete): New.
	(pp_token_list::operator new): New.
	(pp_token_list::operator delete): New.
	(pp_token_list::pp_token_list): New.
	(pp_token_list::~pp_token_list): New.
	(pp_token_list::push_back_text): New.
	(pp_token_list::push_back): New.
	(pp_token_list::push_back_list): New.
	(pp_token_list::pop_front): New.
	(pp_token_list::remove_token): New.
	(pp_token_list::insert_after): New.
	(pp_token_list::replace_custom_tokens): New.
	(pp_token_list::merge_consecutive_text_tokens): New.
	(pp_token_list::apply_urlifier): New.
	(pp_token_list::dump): New.
	(chunk_info::append_formatted_chunk): Add obstack & param and use
	it to reimplement in terms of token lists.
	(chunk_info::pop_from_output_buffer): Drop m_quotes.
	(chunk_info::on_begin_quote): Delete.
	(chunk_info::dump): New.
	(chunk_info::on_end_quote): Delete.
	(push_back_any_text): New.
	(pretty_printer::format): Drop "urlifier" param and quoting_info
	logic.  Convert "formatters" and "args" from const ** to
	pp_token_list **.  Reimplement so that rather than just
	accumulating a text buffer in the chunk_obstack for each arg,
	instead also accumulate a pp_token_list and pp_tokens for each
	arg.
	(auto_obstack::operator obstack &): New.
	(quoting_info::handle_phase_3): Delete.
	(pp_output_formatted_text): Reimplement in terms of manipulations
	of pp_token_lists, rather than char buffers.  Call
	default_token_printer, or m_token_printer's print_tokens vfunc.
	(default_token_printer): New.
	(pretty_printer::pretty_printer): Initialize m_token_printer in
	both ctors.
	(pp_markup::context::begin_quote): Reimplement to use token list.
	(pp_markup::context::end_quote): Likewise.
	(pp_markup::context::begin_highlight_color): Likewise.
	(pp_markup::context::end_highlight_color): Likewise.
	(pp_markup::context::push_back_any_text): New.
	(selftest::test_merge_consecutive_text_tokens): New.
	(selftest::test_custom_tokens_1): New.
	(selftest::test_custom_tokens_2): New.
	(selftest::pp_printf_with_urlifier): Drop "urlifier" param from
	call to pp_format.
	(selftest::test_urlification): Add test of the example from
	pretty-print-format-impl.h.
	(selftest::pretty_print_cc_tests): Call the new selftest
	functions.
	* pretty-print.h (class quoting_info): Drop forward decl.
	(class pp_token_list): New forward decl.
	(printer_fn): Convert final param from const char ** to
	pp_token_list &.
	(class token_printer): New.
	(class pretty_printer): Add pp_output_formatted_text as friend.
	(pretty_printer::set_token_printer): New.
	(pretty_printer::format): Drop urlifier param as this now happens
	in phase 3.
	(pretty_printer::m_format_decoder): Update comment.
	(pretty_printer::m_token_printer): New field.
	(pp_format): Drop urlifier param.
	* tree-diagnostic.cc (default_tree_printer): Convert final param
	from const char ** to pp_token_list &.
	* tree-diagnostic.h: Likewise for decl.

gcc/fortran/ChangeLog:
	* error.cc (gfc_format_decoder): Convert final param from
	const char **buffer_ptr to pp_token_list &formatted_token_list,
	and update call to default_tree_printer accordingly.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/c/c-objc-common.cc         |    4 +-
 gcc/cp/error.cc                |  105 +--
 gcc/diagnostic.cc              |    2 +-
 gcc/dump-context.h             |   40 +-
 gcc/dumpfile.cc                |  217 +++---
 gcc/fortran/error.cc           |    5 +-
 gcc/opt-problem.cc             |    3 +-
 gcc/pretty-print-format-impl.h |  407 ++++++++++-
 gcc/pretty-print-markup.h      |   10 +-
 gcc/pretty-print-urlifier.h    |    2 +-
 gcc/pretty-print.cc            | 1176 ++++++++++++++++++++++----------
 gcc/pretty-print.h             |   43 +-
 gcc/tree-diagnostic.cc         |    2 +-
 gcc/tree-diagnostic.h          |    2 +-
 14 files changed, 1483 insertions(+), 535 deletions(-)

Message ID	20240829225813.2567570-3-dmalcolm@redhat.com
State	New
Headers	show Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=c96rUs0Y; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WvxZQ6nSJz1yZ9 for <incoming@patchwork.ozlabs.org>; Fri, 30 Aug 2024 09:00:14 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B9EFA386181E for <incoming@patchwork.ozlabs.org>; Thu, 29 Aug 2024 23:00:12 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 03BA0385EC25 for <gcc-patches@gcc.gnu.org>; Thu, 29 Aug 2024 22:58:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 03BA0385EC25 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 03BA0385EC25 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724972315; cv=none; b=CkcO0lcxhptzLCUa+9Ti3SJHU7UXFuITLm/AVYYJ/gyn7mOko42JR4tGxmsMBMpKNfsVbGrf6rOiDAiolmOQIPzKZHE2s4heK+/qm3l3fQAXTkOojHxb/OiIs+aaVQhqbPhZM0j6/J7Lw1jEUmaChcEBhn33pN0/UYGhxoG8Wzs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724972315; c=relaxed/simple; bh=XFPrlND5/HDjG5Qpjn6VtB3PJTE8/rHIDRLWpWiL/Gc=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=A99b24FDiqDE35WM6zG3OLj5fpayw+ewR+8nvUMH8Adlbc9zo1U+tDA5v+0hXfKbSfDw0tJuSy0eCAQyX+K+05WZBE8KmKxgt2cLWMyp50ROiAAnFAZ4vd/phFtIvhI+Tz2Uy05wp1O+ESwJbkOhUpD+N+NCkIZOWitNlnwgZHE= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724972307; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0wXiJ85tkQClLDGBateHOQp38cZhYsbaMNJOg391XSM=; b=c96rUs0YYhFwhFM1xCCuqon7nWuVcZ7tEGWjXxDP+Urdv+I3IkVMgEcH08AIshWFuG0Xz1 WIiDz9THsnxzgjyNMCDHiYKUZJkaBQhsaLOF24IMFc9cYMJam3qVHzga3H9Tn7wEUvLuDH eA+KVAbtT0/V1x8VDqnSo3/rKXBf71E= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-136-qW5PORSSPtKQOxZ0zg6stA-1; Thu, 29 Aug 2024 18:58:25 -0400 X-MC-Unique: qW5PORSSPtKQOxZ0zg6stA-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BF21419560B1 for <gcc-patches@gcc.gnu.org>; Thu, 29 Aug 2024 22:58:24 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.22.16.43]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2DFE219560A3; Thu, 29 Aug 2024 22:58:22 +0000 (UTC) From: David Malcolm <dmalcolm@redhat.com> To: gcc-patches@gcc.gnu.org Cc: David Malcolm <dmalcolm@redhat.com> Subject: [pushed 3/4] pretty-print: reimplement pp_format with a new struct pp_token Date: Thu, 29 Aug 2024 18:58:12 -0400 Message-Id: <20240829225813.2567570-3-dmalcolm@redhat.com> In-Reply-To: <20240829225813.2567570-1-dmalcolm@redhat.com> References: <20240829225813.2567570-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	[pushed,1/4] Use std::unique_ptr for optinfo_item \| expand [pushed,1/4] Use std::unique_ptr for optinfo_item [pushed,2/4] pretty-print: move class chunk_info into its own header [pushed,3/4] pretty-print: reimplement pp_format with a new struct pp_token [pushed,4/4] SARIF output: implement embedded URLs in messages (§3.11.6; PR other/116419)

[pushed,3/4] pretty-print: reimplement pp_format with a new struct pp_token

Commit Message

Patch