From patchwork Mon Jul 1 11:55:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 1954647 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=baylibre-com.20230601.gappssmtp.com header.i=@baylibre-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=h52/gbSf; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WCPdg1MVyz1xpN for ; Mon, 1 Jul 2024 21:56:23 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4E2973815FF6 for ; Mon, 1 Jul 2024 11:56:21 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by sourceware.org (Postfix) with ESMTPS id 78DEA3815FF6 for ; Mon, 1 Jul 2024 11:55:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 78DEA3815FF6 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 78DEA3815FF6 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::136 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719834947; cv=none; b=x/9RShurFNl6bdhLAjQYEcqG023EEWc5CzrUSiDWFAej4/M/Dq/wb9vKUaqIqeVr/6tZ2QGntERK/YoNZLBmPP+RYO7siHIHx1kPIcm9tJIeLAct/AkhPLhAy8I7NqigThepa6030wIVwVgjfq4Vu0xiqcFH53KswZ5aFYyyap4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719834947; c=relaxed/simple; bh=C4k/fmy0s1ZQjCgF9efGPB2CiJFmUU+rBqBYpfKMt8s=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=hyxKD4gJPVD+xc4wX2wUehqxuD86pEyad+WuaLbSTABliRqhUOK7AVUkj2OKrFYoHlP+fJlHvdBj50sK6qsr81ZDe17OM22KMdGExfN3nVxxqZjqM7rH/PeJ1HVQdNMwKR5X7XOshtIKujm80cRTIanjPcFzmI2s6kH3u7D0D1c= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x136.google.com with SMTP id 2adb3069b0e04-52e743307a2so3310716e87.0 for ; Mon, 01 Jul 2024 04:55:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1719834943; x=1720439743; darn=gcc.gnu.org; h=mime-version:message-id:date:user-agent:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=TgNkVeKsSoIHnIdbfhao0phnKupvZEA2pSGXRaRPSRQ=; b=h52/gbSfO7dK4DJaNo1wosWAbbEWc0e0Ugns73rnuX5LGwv8GRfevldHQzTGLPDS7N Dexrwr7LyuMUCAJV4NrrbLS0LM2gnqrsyZIOdRkUnj+IFqXg0PkTKJiRloHEXysB+pRV XTW1NlE7fWOFhe69aAjz++scJ33jbNrwX4DyYOOnUBAwZAjaqv6a0XuXrc3vBVMu1SVN 3+qno1VK+POJzZoj3IgAmhWt/lp5WCA9P84C1f4jhX1Ms97ltNvEeiRzFVDqr58LHJqi cSmquTnURvM1wG7zzfbupzbia/HyC/PKSC+aWvse7taRPJe5gzrz4cadbGj7KBnUN6bS D4IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719834943; x=1720439743; h=mime-version:message-id:date:user-agent:references:in-reply-to :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TgNkVeKsSoIHnIdbfhao0phnKupvZEA2pSGXRaRPSRQ=; b=gCRJlVcuIHi8QO0mN1XUUpUngr0LuQ6wijAGeNsuzeZ6urBrqaiIlmiraLmRYKCbat hEAft2etI0m5YiT2LmW9KgxE+IlAOzCDMGMw1wAdT3xTpVn+Y6VYVmDAapd74clNaVUo oyS9ZEgSr3W8RmYIoYDT0PNg3knjr6npUO7y8Ni/3nXo3c5CX+fvhwvB6/r6mDuXdpI6 UfnvLtVO7nWaKdCwzYS++o51kQ2AfrSX1OiYkn6P7dlhT85AwjRZG3bfECr6KFvCIs1s ddrsXdd3Xs4mDFkhw4FVNJHFBxQrIhmVKLx8dHbYtSvMzlZALkxMAdzbMVR7bBOSy0sR hzpg== X-Gm-Message-State: AOJu0YzPLGO8mnJjbZr1TI8HVeFypnKtza+GxaYtEBekMcxSQ3RFP2ju KbmCkkLbSViOPpgJV4W3tAheOXz2kuoG4+3EElLsbl7+a1EcmWLNbE0t8JHZuLQK2aSwEQMdZ7h AbC8= X-Google-Smtp-Source: AGHT+IEiHoE25FYZ+BrueXc3FXl0N+CxEcSbWZ9r55pXbGEv6eNpEOGWzskdct1hHLnGXJEA93mP8w== X-Received: by 2002:a05:6512:2814:b0:52c:e1cd:39b7 with SMTP id 2adb3069b0e04-52e8264de9fmr4088393e87.5.1719834942694; Mon, 01 Jul 2024 04:55:42 -0700 (PDT) Received: from euler.schwinge.ddns.net (p200300c8b733b9005e8fc6f38b6af531.dip0.t-ipconnect.de. [2003:c8:b733:b900:5e8f:c6f3:8b6a:f531]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42573c55ff4sm126797375e9.46.2024.07.01.04.55.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Jul 2024 04:55:42 -0700 (PDT) From: Thomas Schwinge To: gcc-patches@gcc.gnu.org Cc: Richard Sandiford , jlaw@ventanamicro.com, rdapp.gcc@gmail.com, Tom de Vries , Roger Sayle Subject: WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation' (was: nvptx vs. [PATCH] Add a late-combine pass [PR106594]) In-Reply-To: <87ed8i2ekt.fsf@euler.schwinge.ddns.net> References: <87r0citjoy.fsf@euler.schwinge.ddns.net> <87r0ci2kt2.fsf@euler.schwinge.ddns.net> <87jzia2ict.fsf@euler.schwinge.ddns.net> <87ed8i2ekt.fsf@euler.schwinge.ddns.net> User-Agent: Notmuch/0.30+8~g47a4bad (https://notmuchmail.org) Emacs/29.3 (x86_64-pc-linux-gnu) Date: Mon, 01 Jul 2024 13:55:40 +0200 Message-ID: <87bk3huy0z.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi! On 2024-06-28T00:41:54+0200, I wrote: > On 2024-06-27T23:20:18+0200, I wrote: >> On 2024-06-27T22:27:21+0200, I wrote: >>> On 2024-06-27T18:49:17+0200, I wrote: >>>> On 2023-10-24T19:49:10+0100, Richard Sandiford wrote: >>>>> This patch adds a combine pass that runs late in the pipeline. >>> >>> [After sending, I realized I replied to a previous thread of this work.] >>> >>>> I've beek looking a bit through recent nvptx target code generation >>>> changes for GCC target libraries, and thought I'd also share here my >>>> findings for the "late-combine" changes in isolation, for nvptx target. >>>> >>>> First the unexpected thing: >>> >>> So much for "unexpected thing" -- next level of unexpected here... >>> Appreciated if anyone feels like helping me find my way through this, but >>> I totally understand if you've got other things to do. >> >> OK, I found something already. (Unexpectedly quickly...) ;-) >> >>>> there are a few cases where we now see unused >>>> registers get declared > >> But in fact, for both cases > > Now tested: 's%both%all'. :-) > >> the unexpected difference goes away if after >> 'pass_late_combine' I inject a 'pass_fast_rtl_dce'. The following will be unnecessary assuming that Richard's proposed "Give fast DCE a separate dirty flag" gets accepted, but may still be useful if we follow through with the idea to enable (parts of) 'pass_postreload' for nvptx (as discussing with Roger), so, for later: >> The following makes these two cases work, but evidently needs a lot more >> analysis: a lot of other passes are enabled that may be anything between >> beneficial and harmful for 'targetm.no_register_allocation'/nvptx. >> >> --- gcc/passes.cc >> +++ gcc/passes.cc >> @@ -676,17 +676,17 @@ const pass_data pass_data_postreload = >> class pass_postreload : public rtl_opt_pass >> { >> public: >> pass_postreload (gcc::context *ctxt) >> : rtl_opt_pass (pass_data_postreload, ctxt) >> {} >> >> /* opt_pass methods: */ >> - bool gate (function *) final override { return reload_completed; } >> + bool gate (function *) final override { return reload_completed || targetm.no_register_allocation; } >> --- gcc/regcprop.cc >> +++ gcc/regcprop.cc >> @@ -1305,17 +1305,17 @@ class pass_cprop_hardreg : public rtl_opt_pass >> public: >> pass_cprop_hardreg (gcc::context *ctxt) >> : rtl_opt_pass (pass_data_cprop_hardreg, ctxt) >> {} >> >> /* opt_pass methods: */ >> bool gate (function *) final override >> { >> - return (optimize > 0 && (flag_cprop_registers)); >> + return (optimize > 0 && flag_cprop_registers && !targetm.no_register_allocation); >> } > > Also, that quickly ICEs; more '[...] && !targetm.no_register_allocation' > are needed elsewhere, at least. > > The following simpler thing, however, does work; move 'pass_fast_rtl_dce' > out of 'pass_postreload': > > --- gcc/passes.cc > +++ gcc/passes.cc > @@ -677,14 +677,15 @@ class pass_postreload : public rtl_opt_pass > { > public: > pass_postreload (gcc::context *ctxt) > : rtl_opt_pass (pass_data_postreload, ctxt) > {} > > /* opt_pass methods: */ > + opt_pass * clone () final override { return new pass_postreload (m_ctxt); } > bool gate (function *) final override { return reload_completed; } > > }; // class pass_postreload > --- gcc/passes.def > +++ gcc/passes.def > @@ -529,7 +529,10 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_regrename); > NEXT_PASS (pass_fold_mem_offsets); > NEXT_PASS (pass_cprop_hardreg); > - NEXT_PASS (pass_fast_rtl_dce); > + POP_INSERT_PASSES () > + NEXT_PASS (pass_fast_rtl_dce); > + NEXT_PASS (pass_postreload); > + PUSH_INSERT_PASSES_WITHIN (pass_postreload) > NEXT_PASS (pass_reorder_blocks); > NEXT_PASS (pass_leaf_regs); > NEXT_PASS (pass_split_before_sched2); > > This (only) cleans up "the mess that 'pass_late_combine' created"; no > further changes in GCC target libraries for nvptx. (For avoidance of > doubt: "mess" is a great exaggeration here.) But that then disturbs non-nvptx targets; see (prerequisite) "Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'" for why. Then, see the attached -- just for later, for now -- "WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation'" for how to make this work properly. (This also puts back 'pass_fast_rtl_dce' into 'pass_late_compilation' instead of running it unconditionally, in order to not change any behavior in that regard.) Grüße Thomas >>> But: should we expect '-fno-late-combine-instructions' vs. >>> '-flate-combine-instructions' to behave in the same way? (After all, >>> '%r22' remains unused also with '-flate-combine-instructions', and >>> doesn't need to be emitted.) This could, of course, also be a nvptx back >>> end issue? >>> >>> I'm happy to supply any dump files etc. Also, 'tmp-libc_a-lnumeric.i.xz' >>> is attached if you'd like to reproduce this with your own nvptx target >>> 'cc1': >>> >>> $ [...]/configure --target=nvptx-none --enable-languages=c >>> $ make -j12 all-gcc >>> $ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin -o tmp-libc_a-lnumeric.s -fdump-rtl-all # -fno-late-combine-instructions >>> >>> >>> Grüße >>> Thomas From ef14e15c3255059f374e04a47d838e9c98c9da2c Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 28 Jun 2024 00:41:54 +0200 Subject: [PATCH] WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation' id:87ed8i2ekt.fsf@euler.schwinge.ddns.net --- gcc/passes.cc | 8 ++++++++ gcc/passes.def | 6 ++++++ 2 files changed, 14 insertions(+) diff --git a/gcc/passes.cc b/gcc/passes.cc index e444b462113..1cdd4a77f5b 100644 --- a/gcc/passes.cc +++ b/gcc/passes.cc @@ -685,6 +685,10 @@ public: {} /* opt_pass methods: */ + opt_pass *clone () final override + { + return new pass_postreload (m_ctxt); + } bool gate (function *) final override { if (reload_completed) @@ -728,6 +732,10 @@ public: {} /* opt_pass methods: */ + opt_pass *clone () final override + { + return new pass_late_compilation (m_ctxt); + } bool gate (function *) final override { return reload_completed || targetm.no_register_allocation; diff --git a/gcc/passes.def b/gcc/passes.def index 72198bc4c4e..cb221438a1e 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -529,7 +529,13 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_regrename); NEXT_PASS (pass_fold_mem_offsets); NEXT_PASS (pass_cprop_hardreg); + POP_INSERT_PASSES () + NEXT_PASS (pass_late_compilation); + PUSH_INSERT_PASSES_WITHIN (pass_late_compilation) NEXT_PASS (pass_fast_rtl_dce); + POP_INSERT_PASSES () + NEXT_PASS (pass_postreload); + PUSH_INSERT_PASSES_WITHIN (pass_postreload) NEXT_PASS (pass_reorder_blocks); NEXT_PASS (pass_leaf_regs); NEXT_PASS (pass_split_before_sched2); -- 2.34.1