From patchwork Wed Nov 6 14:35:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Jires X-Patchwork-Id: 2007592 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.cz header.i=@suse.cz header.a=rsa-sha256 header.s=susede2_rsa header.b=plJxWySW; dkim=pass header.d=suse.cz header.i=@suse.cz header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=FzGZWFdq; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.a=rsa-sha256 header.s=susede2_rsa header.b=Jn+8yon+; dkim=neutral header.d=suse.cz header.i=@suse.cz header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=a2q81Gcv; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Xk76c69Xcz1xyM for ; Thu, 7 Nov 2024 01:35:52 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 152003858427 for ; Wed, 6 Nov 2024 14:35:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id 0CB103858403 for ; Wed, 6 Nov 2024 14:35:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0CB103858403 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0CB103858403 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730903726; cv=none; b=oDayAu0kvpf4OatHBrQAQ2Khth7hEESzWWl9zquTA9xle2Sf1hSvbX+GinNXbEiTbebA+zxtKH96xmKUaiGVc40gx7VQanPFSAr9UEeWbkTMRUEHBTTS/0BpIlTPnwinnT5FKOuWZahEKk4YgjgGFkIplL3L+cP2RiLP9vqrCXg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730903726; c=relaxed/simple; bh=k+wU06+ORVtjAhMijpqqN7ZFpVa3U4svP1jAh3bJEKk=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=RJzsDTbVR8zh8ndL5+xVVlQqpOYSh8Wqkfo8gQE1dTRFEd8vSZJGpdIJGXLUrQ/dihdP3gVtTMl77Gatr2iniuEpDWhdx2ew6rI3tqqjNQ8qIA9ymQ7pyt21IktTEl/wL3Rj8lcQU4a8IOj2z05eAICSNGtINtpgW1xFS5rm13w= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id F102121998; Wed, 6 Nov 2024 14:35:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1730903722; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3NNT0XPT9sEpbLYWraryOugWKLMGbTm9QRcZ7lIjipg=; b=plJxWySWW78HVnWV6Op92jARoZ0gOf4ke3JqVJ/fvJAn2/j1GYr5UA2SqwmmvWD11HwRlo i2Gt7/mZmkgIH9/lP5+UFI2wllY5LHDecIGqiDSl/IKj/jA/7ML64xXDy4T0EnrjnEPM7M Xs/MmuRc7pHdwjzJ2DlkwC87YHBnYNI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1730903722; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3NNT0XPT9sEpbLYWraryOugWKLMGbTm9QRcZ7lIjipg=; b=FzGZWFdqcZDNuKty1Y+ni8T/w1vwlItLfdtXurlNv+wiqVJa+WjG123lAQ4fMqV2ckdCKJ FSsvBU57v+g60UAQ== Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Jn+8yon+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=a2q81Gcv DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1730903721; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3NNT0XPT9sEpbLYWraryOugWKLMGbTm9QRcZ7lIjipg=; b=Jn+8yon+pV0DbcDIGeZP1S/dDSRAMLQJRB5IGn+erKyNKu3RiwV73fRJz+l2jUSv/vR4Lh XV+nbMQClfcReIR9g0FDts9oMPLfjwffbzRFCnSfQaXgUBuvioqo/ilBVh7xd6iudxPNsM qPPzkAPrApHL3K40jhmTdWK/dKoYahs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1730903721; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3NNT0XPT9sEpbLYWraryOugWKLMGbTm9QRcZ7lIjipg=; b=a2q81GcvGuEhsfrXEnHZ4CKhkTb5UIxqYKPFlMQjj7+anypwrPLr1SS9yJvOxMXnEfOLT6 27DYQP5nnOIDIbBw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id D7DD013736; Wed, 6 Nov 2024 14:35:21 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 2sFlM6l+K2dPOgAAD6G6ig (envelope-from ); Wed, 06 Nov 2024 14:35:21 +0000 Date: Wed, 6 Nov 2024 15:35:20 +0100 From: Michal Jires To: gcc-patches@gcc.gnu.org Cc: hubicka@ucw.cz Subject: [PATCH 3/3] dwarf: lto: Stabilize external die references. Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: F102121998 X-Spamd-Result: default: False [-3.51 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; MIME_TRACE(0.00)[0:+]; MISSING_XM_UA(0.00)[]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[dwarf2out.cc:url,suse.cz:dkim,suse.cz:mid]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_NONE(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; DKIM_TRACE(0.00)[suse.cz:+] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org During Incremental LTO, contents of LTO partitions diverge because of external DIE references (DW_AT_abstract_origin). External references are in form 'die_symbol+offset'. Originally there is only single die_symbol for each compilation unit and its offsets are in 100'000s, which easily diverge. Die symbols have to be unique across compilation units. Originally for this purpose the die symbol name is computed from hash of entire file. To avoid this I added flag_lto_debuginfo_assume_unique_filepaths which computes the die_symbol only from filepath, which seems reasonable assumption for any project using incremental LTO. Compilation unit's die symbol name is then prepended to each die symbol for uniqueness. To remove divergence of offsets in case of C++, we have to add die symbols to DW_TAG_subprogram (functions), DW_TAG_variable and DW_TAG_namespace. Benefits: Before this patch Incremental LTO diverges/recompiles ~twice as much with '-g'. With this, additional divergence with '-g' is under 10 %. Negatives: When the flag is set, the added die symbols survive into final executable. For `cc1` executable this represents almost 10 % size increase of only added symbols. You can strip them out, but I have not found a simple way to remove them automatically in GCC. However for the purposes of Incremental LTO it should suffice. There was no measured compilation time increase because of streaming these additional symbols/strings. gcc/ChangeLog: * common.opt: New flag. * dwarf2out.cc (compute_comp_unit_symbol): With flag, don't checksum contents but filepath. (compute_die_symbols_from_die): New. (compute_die_symbols): New. (dwarf2out_early_finish): Call compute_die_symbols. gcc/testsuite/ChangeLog: * g++.dg/lto/die_symbol_conflicts_0.C: New test. --- gcc/common.opt | 4 + gcc/dwarf2out.cc | 120 +++++++++++++++++- .../g++.dg/lto/die_symbol_conflicts_0.C | 12 ++ 3 files changed, 132 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C diff --git a/gcc/common.opt b/gcc/common.opt index 12b25ff486d..4aa80f0df8f 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2253,6 +2253,10 @@ flto-partition= Common Joined RejectNegative Enum(lto_partition_model) Var(flag_lto_partition) Init(LTO_PARTITION_BALANCED) Specify the algorithm to partition symbols and vars at linktime. +flto-debuginfo-assume-unique-filepaths +Common Var(flag_lto_debuginfo_assume_unique_filepaths) Init(0) +Assume all linked source files have unique filepaths. + ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h. flto-compression-level= Common Joined RejectNegative UInteger Var(flag_lto_compression_level) Init(-1) IntegerRange(0, 19) diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index bf1ac45ed73..af272a3a824 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -8015,9 +8015,17 @@ compute_comp_unit_symbol (dw_die_ref unit_die) the name filename of the unit. */ md5_init_ctx (&ctx); - mark = 0; - die_checksum (unit_die, &ctx, &mark); - unmark_all_dies (unit_die); + if (flag_lto_debuginfo_assume_unique_filepaths) + { + gcc_assert (die_name); + md5_process_bytes (die_name, strlen (die_name), &ctx); + } + else + { + mark = 0; + die_checksum (unit_die, &ctx, &mark); + unmark_all_dies (unit_die); + } md5_finish_ctx (&ctx, checksum); /* When we this for comp_unit_die () we have a DW_AT_name that might @@ -33119,6 +33127,110 @@ ctf_debug_do_cu (dw_die_ref die) FOR_EACH_CHILD (die, c, ctf_do_die (c)); } +/* Recursively compute die symbols from DIE's attributes. + Not all symbols can be computed this way. */ +static void +compute_die_symbols_from_die (dw_die_ref die) +{ + dw_attr_node *a; + int i; + const char* name = NULL; + + if (!die->die_attr) + return; + + switch (die->die_tag) + { + /* Assumed that each die parent has at most single children namespace + with the same name. */ + case DW_TAG_namespace: + case DW_TAG_module: + + FOR_EACH_VEC_ELT (*die->die_attr, i, a) + { + if (a->dw_attr == DW_AT_name) + name = AT_string (a); + /* Ignored DW_AT_abstract_origin, leads to duplicates. */ + } + break; + + default: break; + } + + if (name) + { + gcc_assert (!die->die_id.die_symbol); + gcc_assert (die->die_parent); + + const char* parent_symbol = die->die_parent->die_id.die_symbol; + /* Prefix with parent symbol to guarantee uniqueness. Important for + namespaces. Toplevel functions and variables can and do use just + comp_unit's symbol as prefix. Die symbols of these toplevel symbols + may overlap. Use the 'r' to differentiate. */ + die->die_id.die_symbol = concat (parent_symbol, ".r.", name, NULL); + } + + /* Splitting functions has little to no benefit. */ + if (die->die_tag == DW_TAG_subprogram) + return; + + dw_die_ref c; + if (!die->comdat_type_p && die->die_id.die_symbol) + FOR_EACH_CHILD (die, c, compute_die_symbols_from_die (c)); +} + +/* Compute die symbols and insert them into their DIEs. + All die symbols must be unique across compilation units. */ +static void +compute_die_symbols () +{ + dw_die_ref comp_unit = comp_unit_die (); + compute_comp_unit_symbol (comp_unit); + + /* For Incremental LTO we are interested in stable 'die_symbol+offset'. + By computing and emmiting die_symbols for more DIEs we get more local + and stable offsets. */ + if (flag_generate_lto && flag_lto_debuginfo_assume_unique_filepaths) + { + /* Information about functions is not fully contained in DIEs. Must be + done separatelly. + DIE might not contain DW_AT_linkage_name and DW_AT_name may be not + unique. Even when contained, DW_AT_linkage_name does not have to be + unique. */ + const char* base = comp_unit->die_id.die_symbol; + symtab_node *node; + FOR_EACH_SYMBOL (node) + { + /* There may be multiple functions with the same asm_name. + This is rare, so we don't handle them. */ + if (node->next_sharing_asm_name || node->previous_sharing_asm_name) + continue; + + const char* asm_name = node->asm_name (); + /* Mimic what assemble_name_raw does with a leading '*'. */ + if (asm_name[0] == '*') + asm_name = &asm_name[1]; + + dw_die_ref decl_die = lookup_decl_die (node->decl); + if (decl_die) + decl_die->die_id.die_symbol = concat (base, ".s.", asm_name, NULL); + + tree origin = DECL_ABSTRACT_ORIGIN (node->decl); + dw_die_ref origin_die = origin ? lookup_decl_die (origin) : NULL; + if (origin_die) + { + asm_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (origin)); + origin_die->die_id.die_symbol = concat (base, ".s.", asm_name, + NULL); + } + } + + /* Other needed die symbols can be computed from DIE informatation + alone. */ + compute_die_symbols_from_die (comp_unit); + } +} + /* Perform any cleanups needed after the early debug generation pass has run. */ @@ -33338,7 +33450,7 @@ dwarf2out_early_finish (const char *filename) } /* Stick a unique symbol to the main debuginfo section. */ - compute_comp_unit_symbol (comp_unit_die ()); + compute_die_symbols (); /* Output the main compilation unit. We always need it if only for the CU symbol. */ diff --git a/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C b/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C new file mode 100644 index 00000000000..4979b4350f4 --- /dev/null +++ b/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C @@ -0,0 +1,12 @@ +/* { dg-lto-do assemble } */ +/* { dg-lto-options { { -O2 -g -flto -flto-debuginfo-assume-unique-filepaths } } } */ + +/* DIE symbols computed from names might conflict. */ + +namespace foo { + void foo () asm ("foo"); +} + +int main () { + foo::foo (); +}