From patchwork Mon Jul 1 12:32:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1954657 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WCQS51vKGz1xpc for ; Mon, 1 Jul 2024 22:33:07 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3D58438133C3 for ; Mon, 1 Jul 2024 12:33:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4818A389941E for ; Mon, 1 Jul 2024 12:32:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4818A389941E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4818A389941E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719837159; cv=none; b=dhfgD4Gynu0kgDerjbLTr5F74fbVTyJtBDJfQQgGLcHh1T4Df8wMHf1a9wbe1HcaWfI2DLAq/mIR1X7Uc9iKdVAtdPJ3dwR9Y6SKVP1CCpB9N+O5aAqZ+0R/XxFTRpRzCLNp8f1Pzxa+I39K0Z3wOcWzmoad9Z/DPAHaoiVn9DM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719837159; c=relaxed/simple; bh=DOzpXPrYtR+yLmHKfrTDDVe794b2v5UgKQ8kgiM/5rQ=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=utF3GtpgZAEXO2A14FqQDWjRw92xPttOdR36KaKsYViygOeAv5syFg8CzOFey15ttRm6ZJ9ukyixdbxT/+ToyEygp+KLTAnbQBBKoqJEMbYdRmew8jMC8xXwAXG7Ul8FyFemSbAh4h9iTic1uF55wPy3hOLFktSmzVpqURNdC9Y= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BE825339; Mon, 1 Jul 2024 05:33:00 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 565A73F766; Mon, 1 Jul 2024 05:32:35 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org,jlaw@ventanamicro.com, richard.sandiford@arm.com Cc: jlaw@ventanamicro.com Subject: [PATCH] Give fast DCE a separate dirty flag Date: Mon, 01 Jul 2024 13:32:34 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Spam-Status: No, score=-19.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Thomas pointed out that we sometimes failed to eliminate some dead code (specifically clobbers of otherwise unused registers) on nvptx when late-combine is enabled. This happens because: - combine is able to optimise the function in a way that exposes dead code. This leaves the df information in a "dirty" state. - late_combine calls df_analyze without DF_LR_RUN_DCE run set. This updates the df information and clears the "dirty" state. - late_combine doesn't find any extra optimisations, and so leaves the df information up-to-date. - if_after_combine (ce2) calls df_analyze with DF_LR_RUN_DCE set. Because the df information is already up-to-date, fast DCE is not run. The upshot is that running late-combine has the effect of suppressing a DCE opportunity that would have been noticed without late-combine. I think this shows that we should track the state of the DCE separately from the LR problem. Every pass updates the latter, but not all passes update the former. Bootstrapped & regression-tested on aarch64-linux-gnu. Thomas also confirms that it fixes the nvptx problem. OK to install? Richard gcc/ * df.h (DF_LR_DCE): New df_problem_id. (df_lr_dce): New macro. * df-core.cc (rest_of_handle_df_finish): Check for a null free_fun. * df-problems.cc (df_lr_finalize): Split out fast DCE handling to... (df_lr_dce_finalize): ...this new function. (problem_LR_DCE): New df_problem. (df_lr_add_problem): Register LR_DCE rather than LR itself. * dce.cc (fast_dce): Clear df_lr_dce->solutions_dirty. --- gcc/dce.cc | 3 ++ gcc/df-core.cc | 3 +- gcc/df-problems.cc | 96 ++++++++++++++++++++++++++++++++-------------- gcc/df.h | 2 + 4 files changed, 74 insertions(+), 30 deletions(-) diff --git a/gcc/dce.cc b/gcc/dce.cc index be1a2a87732..04e8d98818d 100644 --- a/gcc/dce.cc +++ b/gcc/dce.cc @@ -1182,6 +1182,9 @@ fast_dce (bool word_level) BITMAP_FREE (processed); BITMAP_FREE (redo_out); BITMAP_FREE (all_blocks); + + /* Both forms of DCE should make further DCE unnecessary. */ + df_lr_dce->solutions_dirty = false; } diff --git a/gcc/df-core.cc b/gcc/df-core.cc index b0e8a88d433..8fd778a8618 100644 --- a/gcc/df-core.cc +++ b/gcc/df-core.cc @@ -806,7 +806,8 @@ rest_of_handle_df_finish (void) for (i = 0; i < df->num_problems_defined; i++) { struct dataflow *dflow = df->problems_in_order[i]; - dflow->problem->free_fun (); + if (dflow->problem->free_fun) + dflow->problem->free_fun (); } free (df->postorder); diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc index 88ee0dd67fc..bfd24bd1e86 100644 --- a/gcc/df-problems.cc +++ b/gcc/df-problems.cc @@ -1054,37 +1054,10 @@ df_lr_transfer_function (int bb_index) } -/* Run the fast dce as a side effect of building LR. */ - static void -df_lr_finalize (bitmap all_blocks) +df_lr_finalize (bitmap) { df_lr->solutions_dirty = false; - if (df->changeable_flags & DF_LR_RUN_DCE) - { - run_fast_df_dce (); - - /* If dce deletes some instructions, we need to recompute the lr - solution before proceeding further. The problem is that fast - dce is a pessimestic dataflow algorithm. In the case where - it deletes a statement S inside of a loop, the uses inside of - S may not be deleted from the dataflow solution because they - were carried around the loop. While it is conservatively - correct to leave these extra bits, the standards of df - require that we maintain the best possible (least fixed - point) solution. The only way to do that is to redo the - iteration from the beginning. See PR35805 for an - example. */ - if (df_lr->solutions_dirty) - { - df_clear_flags (DF_LR_RUN_DCE); - df_lr_alloc (all_blocks); - df_lr_local_compute (all_blocks); - df_worklist_dataflow (df_lr, all_blocks, df->postorder, df->n_blocks); - df_lr_finalize (all_blocks); - df_set_flags (DF_LR_RUN_DCE); - } - } } @@ -1266,6 +1239,69 @@ static const struct df_problem problem_LR = false /* Reset blocks on dropping out of blocks_to_analyze. */ }; +/* Run the fast DCE after building LR. This is a separate problem so that + the "dirty" flag is only cleared after a DCE pass is actually run. */ + +static void +df_lr_dce_finalize (bitmap all_blocks) +{ + if (!(df->changeable_flags & DF_LR_RUN_DCE)) + return; + + /* Also clears df_lr_dce->solutions_dirty. */ + run_fast_df_dce (); + + /* If dce deletes some instructions, we need to recompute the lr + solution before proceeding further. The problem is that fast + dce is a pessimestic dataflow algorithm. In the case where + it deletes a statement S inside of a loop, the uses inside of + S may not be deleted from the dataflow solution because they + were carried around the loop. While it is conservatively + correct to leave these extra bits, the standards of df + require that we maintain the best possible (least fixed + point) solution. The only way to do that is to redo the + iteration from the beginning. See PR35805 for an + example. */ + if (df_lr->solutions_dirty) + { + df_clear_flags (DF_LR_RUN_DCE); + df_lr_alloc (all_blocks); + df_lr_local_compute (all_blocks); + df_worklist_dataflow (df_lr, all_blocks, df->postorder, df->n_blocks); + df_lr_finalize (all_blocks); + df_set_flags (DF_LR_RUN_DCE); + } +} + +static const struct df_problem problem_LR_DCE +{ + DF_LR_DCE, /* Problem id. */ + DF_BACKWARD, /* Direction (arbitrary). */ + NULL, /* Allocate the problem specific data. */ + NULL, /* Reset global information. */ + NULL, /* Free basic block info. */ + NULL, /* Local compute function. */ + NULL, /* Init the solution specific data. */ + NULL, /* Worklist solver. */ + NULL, /* Confluence operator 0. */ + NULL, /* Confluence operator n. */ + NULL, /* Transfer function. */ + df_lr_dce_finalize, /* Finalize function. */ + NULL, /* Free all of the problem information. */ + NULL, /* Remove this problem from the stack of dataflow problems. */ + NULL, /* Debugging. */ + NULL, /* Debugging start block. */ + NULL, /* Debugging end block. */ + NULL, /* Debugging start insn. */ + NULL, /* Debugging end insn. */ + NULL, /* Incremental solution verify start. */ + NULL, /* Incremental solution verify end. */ + &problem_LR, /* Dependent problem. */ + 0, /* Size of entry of block_info array. */ + TV_DF_LR, /* Timing variable. */ + false /* Reset blocks on dropping out of blocks_to_analyze. */ +}; + /* Create a new DATAFLOW instance and add it to an existing instance of DF. The returned structure is what is used to get at the @@ -1274,7 +1310,9 @@ static const struct df_problem problem_LR = void df_lr_add_problem (void) { - df_add_problem (&problem_LR); + /* Also add the fast DCE problem. It is then conditionally enabled by + the DF_LR_RUN_DCE flag. */ + df_add_problem (&problem_LR_DCE); /* These will be initialized when df_scan_blocks processes each block. */ df_lr->out_of_date_transfer_functions = BITMAP_ALLOC (&df_bitmap_obstack); diff --git a/gcc/df.h b/gcc/df.h index c4e690b40cf..bbb4f297b47 100644 --- a/gcc/df.h +++ b/gcc/df.h @@ -47,6 +47,7 @@ enum df_problem_id { DF_SCAN, DF_LR, /* Live Registers backward. */ + DF_LR_DCE, /* Dead code elimination post-pass for LR. */ DF_LIVE, /* Live Registers & Uninitialized Registers */ DF_RD, /* Reaching Defs. */ DF_CHAIN, /* Def-Use and/or Use-Def Chains. */ @@ -940,6 +941,7 @@ extern class df_d *df; #define df_scan (df->problems_by_index[DF_SCAN]) #define df_rd (df->problems_by_index[DF_RD]) #define df_lr (df->problems_by_index[DF_LR]) +#define df_lr_dce (df->problems_by_index[DF_LR_DCE]) #define df_live (df->problems_by_index[DF_LIVE]) #define df_chain (df->problems_by_index[DF_CHAIN]) #define df_word_lr (df->problems_by_index[DF_WORD_LR])