From patchwork Sat Jul 8 16:10:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1805201 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=DnJmw0n9; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QywGn67T1z20Nq for ; Sun, 9 Jul 2023 02:10:39 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1366C3858C78 for ; Sat, 8 Jul 2023 16:10:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1366C3858C78 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1688832635; bh=IH/0J17/pl9gmdQnJHhxoG+ft0STjJJJFEAyfTDwU9o=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=DnJmw0n9wbQbX6yJIU+O6QbSxvMe8MmJyuqrK2hlPCK5ocA7O6SdkI9XNDLVJSAOz mrRm13zOQNWwbcutNunSow/Z8QggVy0lTp5dlMpRu5EHsLVo0pnvkYjJ1hP9GX6kPe 2VsrUhCNUZ+9EbmS1SVt367Em0ESZ6Zr72Mx6tBU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 3F1F23858D1E for ; Sat, 8 Jul 2023 16:10:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3F1F23858D1E Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id A610B281D0F; Sat, 8 Jul 2023 18:10:09 +0200 (CEST) Date: Sat, 8 Jul 2023 18:10:09 +0200 To: gcc-patches@gcc.gnu.org Subject: Fix profile update in tree-ssa/update-cunroll.c Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Hubicka via Gcc-patches From: Jan Hubicka Reply-To: Jan Hubicka Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Fix tree-ssa/update-cunroll.c In this testcase the profile is misupdated before loop has two exits. The first exit is one eliminated by complete unrolling while second exit remains. We remove first exit but forget about fact that the source BB of other exit will then have higher frequency making other exit more likely. This patch fixes that in duplicate_loop_body_to_header_edge. While looking into resulting profiles I also noticed that in some cases scale_loop_profile may drop probabilities to 0 incorrectly either when trying to update exit from nested loop (which has similar problem) or when the profile was inconsistent as described in coment bellow. With the patch I now get on tramp3d with -O3: Profile consistency report: Pass dump id and name |static mismat|dynamic mismatch |in count |in count 127t ch | 12 +10| 0 131t dom | 20 +8| 0 134t reassoc | 22 +2| 0 136t forwprop | 26 +4| 185250 +185250 159t cddce | 38 +12| 213412 +28162 161t ldist | 39 +1| 213412 172t ifcvt | 41 +2| 369692 +156280 173t vect | 108 +67| 9508861 +9139169 176t cunroll | 102 -6| 11603578 +2094717 183t loopdone | 101 -1| 11547143 -56435 197t dom | 100 -1| 12641109 +1093966 199t threadfull | 102 +2| 12849084 +207975 200t vrp | 104 +2| 13047253 +198169 204t dce | 102 -2| 12973989 -73264 206t sink | 98 -4| 12959537 -14452 211t cddce | 102 +4| 12973989 +14452 255t optimized | 98 -4| 12959537 -14452 258r into_cfglayout | 97 -1| 12960039 259r jump | 98 +1| 12960039 262r cse1 | 97 -1| 12960039 275r loop2_unroll | 99 +2| 16090384 +3130345 312r pro_and_epilogue | 119 +20| 16191103 +100720 323r bbro | 118 -1| 15877546 -313557 So 118 instead of 160 mismatches. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: PR middle-end/110590 * cfgloopmanip.cc (scale_loop_profile): Avoid scaling exits within inner loops and be more careful about inconsistent profiles. (duplicate_loop_body_to_header_edge): Fix profile update when eliminated exit is followed by other exit. gcc/testsuite/ChangeLog: PR middle-end/110590 * gcc.dg/tree-prof/update-cunroll-2.c: Remove xfail. * gcc.dg/tree-ssa/update-cunroll.c: Likewise. diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc index f56a9b87d1c..52732420787 100644 --- a/gcc/cfgloopmanip.cc +++ b/gcc/cfgloopmanip.cc @@ -580,13 +580,47 @@ scale_loop_profile (class loop *loop, profile_probability p, unadjusted_exit_count = exit_edge->count (); scale_loop_frequencies (loop, scale_prob); - if (exit_edge) + if (exit_edge && exit_edge->src->loop_father != loop) + { + fprintf (dump_file, + ";; Loop exit is in inner loop;" + " will leave exit probabilities inconsistent\n"); + } + else if (exit_edge) { profile_count old_exit_count = exit_edge->count (); profile_probability new_probability; if (iteration_bound > 0) - new_probability - = unadjusted_exit_count.probability_in (exit_edge->src->count); + { + /* It may happen that the source basic block of the exit edge is + inside in-loop condition: + + +-> header + | | + | B1 + | / \ + | | B2--exit_edge--> + | \ / + | B3 + +__/ + + If B2 count is smaller than desired exit edge count + the profile was inconsistent with the newly discovered upper bound. + Probablity of edge B1->B2 is too low. We do not attempt to fix + that (as it is hard in general) but we want to avoid dropping + count of edge B2->B3 to zero may confuse later optimizations. */ + if (unadjusted_exit_count.apply_scale (7, 8) > exit_edge->src->count) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + ";; Source basic block of loop exit count is too small;" + " will leave exit probabilities inconsistent\n"); + exit_edge->probability = exit_edge->probability.guessed (); + return; + } + new_probability + = unadjusted_exit_count.probability_in (exit_edge->src->count); + } else new_probability = profile_probability::always (); set_edge_probability_and_rescale_others (exit_edge, new_probability); @@ -1146,8 +1180,7 @@ duplicate_loop_body_to_header_edge (class loop *loop, edge e, profile_count count_le = latch_edge->count (); profile_count count_out_orig = orig ? orig->count () : count_in - count_le; profile_probability prob_pass_thru = count_le.probability_in (count_in); - profile_probability prob_pass_wont_exit = - (count_le + count_out_orig).probability_in (count_in); + profile_count new_count_le = count_le + count_out_orig; if (orig && orig->probability.initialized_p () && !(orig->probability == profile_probability::always ())) @@ -1167,7 +1200,21 @@ duplicate_loop_body_to_header_edge (class loop *loop, edge e, && dominated_by_p (CDI_DOMINATORS, bbs[i], orig->src)) bitmap_set_bit (bbs_to_scale, i); } + /* Since we will scale up all basic blocks dominated by orig, exits + will become more likely; compensate for that. */ + if (after_exit_den.nonzero_p ()) + { + auto_vec exits = get_loop_exit_edges (loop); + for (edge ex : exits) + if (ex != orig + && dominated_by_p (CDI_DOMINATORS, ex->src, orig->src)) + new_count_le -= ex->count ().apply_scale (after_exit_num + - after_exit_den, + after_exit_den); + } } + profile_probability prob_pass_wont_exit = + new_count_le.probability_in (count_in); scale_step = XNEWVEC (profile_probability, ndupl); diff --git a/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c b/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c index 8ef3ab2b5e4..58c0fb59452 100644 --- a/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c +++ b/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c @@ -18,4 +18,4 @@ main () t (); return 0; } -/* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized" {xfail *-*-*} } } */ +/* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/update-cunroll.c b/gcc/testsuite/gcc.dg/tree-ssa/update-cunroll.c index 5820423bd1c..687fe1519d8 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/update-cunroll.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/update-cunroll.c @@ -11,4 +11,4 @@ int t() } /* Currently duplicate_loop_body_to_header_edge gets wrong computation of prob_pass_wont_exit which assumes that the exit condition is last in the loop. */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 0 "optimized" { xfail *-*-*}} } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 0 "optimized" } } */