From patchwork Sat Jul  8 16:10:09 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Hubicka <hubicka@ucw.cz>
X-Patchwork-Id: 1805201
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=<UNKNOWN>)
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256
 header.s=default header.b=DnJmw0n9;
	dkim-atps=neutral
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4QywGn67T1z20Nq
	for <incoming@patchwork.ozlabs.org>; Sun,  9 Jul 2023 02:10:39 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 1366C3858C78
	for <incoming@patchwork.ozlabs.org>; Sat,  8 Jul 2023 16:10:35 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1366C3858C78
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1688832635;
	bh=IH/0J17/pl9gmdQnJHhxoG+ft0STjJJJFEAyfTDwU9o=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=DnJmw0n9wbQbX6yJIU+O6QbSxvMe8MmJyuqrK2hlPCK5ocA7O6SdkI9XNDLVJSAOz
	 mrRm13zOQNWwbcutNunSow/Z8QggVy0lTp5dlMpRu5EHsLVo0pnvkYjJ1hP9GX6kPe
	 2VsrUhCNUZ+9EbmS1SVt367Em0ESZ6Zr72Mx6tBU=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16])
 by sourceware.org (Postfix) with ESMTPS id 3F1F23858D1E
 for <gcc-patches@gcc.gnu.org>; Sat,  8 Jul 2023 16:10:11 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3F1F23858D1E
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)
 id A610B281D0F; Sat,  8 Jul 2023 18:10:09 +0200 (CEST)
Date: Sat, 8 Jul 2023 18:10:09 +0200
To: gcc-patches@gcc.gnu.org
Subject: Fix profile update in  tree-ssa/update-cunroll.c
Message-ID: <ZKmKYTnGNGB/9GFW@kam.mff.cuni.cz>
MIME-Version: 1.0
Content-Disposition: inline
X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Jan Hubicka via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Jan Hubicka <hubicka@ucw.cz>
Reply-To: Jan Hubicka <hubicka@ucw.cz>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

Fix tree-ssa/update-cunroll.c

In this testcase the profile is misupdated before loop has two exits.
The first exit is one eliminated by complete unrolling while second exit remains.
We remove first exit but forget about fact that the source BB of other exit will
then have higher frequency making other exit more likely.

This patch fixes that in duplicate_loop_body_to_header_edge.
While looking into resulting profiles I also noticed that in some cases
scale_loop_profile may drop probabilities to 0 incorrectly either when
trying to update exit from nested loop (which has similar problem) or when the profile
was inconsistent as described in coment bellow.

With the patch I now get on tramp3d with -O3:

Profile consistency report:

Pass dump id and name            |static mismat|dynamic mismatch          
                                 |in count     |in count                  
127t ch                          |     12   +10|            0             
131t dom                         |     20    +8|            0             
134t reassoc                     |     22    +2|            0             
136t forwprop                    |     26    +4|       185250      +185250
159t cddce                       |     38   +12|       213412       +28162
161t ldist                       |     39    +1|       213412             
172t ifcvt                       |     41    +2|       369692      +156280
173t vect                        |    108   +67|      9508861     +9139169
176t cunroll                     |    102    -6|     11603578     +2094717
183t loopdone                    |    101    -1|     11547143       -56435
197t dom                         |    100    -1|     12641109     +1093966
199t threadfull                  |    102    +2|     12849084      +207975
200t vrp                         |    104    +2|     13047253      +198169
204t dce                         |    102    -2|     12973989       -73264
206t sink                        |     98    -4|     12959537       -14452
211t cddce                       |    102    +4|     12973989       +14452
255t optimized                   |     98    -4|     12959537       -14452
258r into_cfglayout              |     97    -1|     12960039             
259r jump                        |     98    +1|     12960039             
262r cse1                        |     97    -1|     12960039             
275r loop2_unroll                |     99    +2|     16090384     +3130345
312r pro_and_epilogue            |    119   +20|     16191103      +100720
323r bbro                        |    118    -1|     15877546      -313557

So 118 instead of 160 mismatches.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

	PR middle-end/110590
	* cfgloopmanip.cc (scale_loop_profile): Avoid scaling exits within
	inner loops and be more careful about inconsistent profiles.
	(duplicate_loop_body_to_header_edge): Fix profile update when eliminated
	exit is followed by other exit.

gcc/testsuite/ChangeLog:

	PR middle-end/110590
	* gcc.dg/tree-prof/update-cunroll-2.c: Remove xfail.
	* gcc.dg/tree-ssa/update-cunroll.c: Likewise.
diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index f56a9b87d1c..52732420787 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -580,13 +580,47 @@ scale_loop_profile (class loop *loop, profile_probability p,
     unadjusted_exit_count = exit_edge->count ();
   scale_loop_frequencies (loop, scale_prob);
 
-  if (exit_edge)
+  if (exit_edge && exit_edge->src->loop_father != loop)
+    {
+      fprintf (dump_file,
+	       ";; Loop exit is in inner loop;"
+	       " will leave exit probabilities inconsistent\n");
+    }
+  else if (exit_edge)
     {
       profile_count old_exit_count = exit_edge->count ();
       profile_probability new_probability;
       if (iteration_bound > 0)
-	new_probability
-	  = unadjusted_exit_count.probability_in (exit_edge->src->count);
+	{
+	  /* It may happen that the source basic block of the exit edge is
+	     inside in-loop condition:
+
+		+-> header
+		|    |
+		|   B1
+		|  /  \
+		| |   B2--exit_edge-->
+		|  \  /
+		|   B3
+		+__/
+
+	      If B2 count is smaller than desired exit edge count
+	      the profile was inconsistent with the newly discovered upper bound.
+	      Probablity of edge B1->B2 is too low.  We do not attempt to fix
+	      that (as it is hard in general) but we want to avoid dropping
+	      count of edge B2->B3 to zero may confuse later optimizations.  */
+	  if (unadjusted_exit_count.apply_scale (7, 8) > exit_edge->src->count)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file,
+			 ";; Source basic block of loop exit count is too small;"
+			 " will leave exit probabilities inconsistent\n");
+	      exit_edge->probability = exit_edge->probability.guessed ();
+	      return;
+	    }
+	  new_probability
+	    = unadjusted_exit_count.probability_in (exit_edge->src->count);
+	}
       else
 	new_probability = profile_probability::always ();
       set_edge_probability_and_rescale_others (exit_edge, new_probability);
@@ -1146,8 +1180,7 @@ duplicate_loop_body_to_header_edge (class loop *loop, edge e,
       profile_count count_le = latch_edge->count ();
       profile_count count_out_orig = orig ? orig->count () : count_in - count_le;
       profile_probability prob_pass_thru = count_le.probability_in (count_in);
-      profile_probability prob_pass_wont_exit =
-	      (count_le + count_out_orig).probability_in (count_in);
+      profile_count new_count_le = count_le + count_out_orig;
 
       if (orig && orig->probability.initialized_p ()
 	  && !(orig->probability == profile_probability::always ()))
@@ -1167,7 +1200,21 @@ duplicate_loop_body_to_header_edge (class loop *loop, edge e,
 		  && dominated_by_p (CDI_DOMINATORS, bbs[i], orig->src))
 		bitmap_set_bit (bbs_to_scale, i);
 	    }
+	  /* Since we will scale up all basic blocks dominated by orig, exits
+	     will become more likely; compensate for that.  */
+	  if (after_exit_den.nonzero_p ())
+	    {
+	      auto_vec<edge> exits = get_loop_exit_edges (loop);
+	      for (edge ex : exits)
+		if (ex != orig
+		    && dominated_by_p (CDI_DOMINATORS, ex->src, orig->src))
+		  new_count_le -= ex->count ().apply_scale (after_exit_num
+							    - after_exit_den,
+							    after_exit_den);
+	    }
 	}
+      profile_probability prob_pass_wont_exit =
+	      new_count_le.probability_in (count_in);
 
       scale_step = XNEWVEC (profile_probability, ndupl);
 
diff --git a/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c b/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c
index 8ef3ab2b5e4..58c0fb59452 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c
@@ -18,4 +18,4 @@ main ()
     t ();
   return 0;
 }
-/* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized" {xfail *-*-*} } } */
+/* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/update-cunroll.c b/gcc/testsuite/gcc.dg/tree-ssa/update-cunroll.c
index 5820423bd1c..687fe1519d8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/update-cunroll.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/update-cunroll.c
@@ -11,4 +11,4 @@ int t()
 }
 /* Currently duplicate_loop_body_to_header_edge gets wrong computation of prob_pass_wont_exit
    which assumes that the exit condition is last in the loop.  */
-/* { dg-final { scan-tree-dump-times "Invalid sum" 0 "optimized" { xfail *-*-*}} } */
+/* { dg-final { scan-tree-dump-times "Invalid sum" 0 "optimized" } } */