From patchwork Fri Jun 30 13:55:14 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Hubicka <hubicka@ucw.cz>
X-Patchwork-Id: 1801975
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=<UNKNOWN>)
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256
 header.s=default header.b=hCBo9SsF;
	dkim-atps=neutral
Received: from sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qsxfh2H0xz20ZL
	for <incoming@patchwork.ozlabs.org>; Fri, 30 Jun 2023 23:55:39 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id EFFE7387088F
	for <incoming@patchwork.ozlabs.org>; Fri, 30 Jun 2023 13:55:36 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EFFE7387088F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1688133337;
	bh=G2zDhkwpVOraVCHR19xQ0vDtcUTPIyfxC49SVdMl2T0=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=hCBo9SsFWITzL4hx1RqjpRBLcK5rlB9wLrFQKte13fC7c/CxZbj61wE4R9bUnXYg0
	 HzuSjzRDG0pz87a4gsKoY1newSvtZkaxmsO8vqsPiu+W8pnuqMtLHPhLsLKGpyAweI
	 +CzyImiZTmh3BKLFyAND94TgI9WrGlXGtaCmpKjE=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16])
 by sourceware.org (Postfix) with ESMTPS id 2B001385770B
 for <gcc-patches@gcc.gnu.org>; Fri, 30 Jun 2023 13:55:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2B001385770B
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)
 id E8EB4281D3F; Fri, 30 Jun 2023 15:55:14 +0200 (CEST)
Date: Fri, 30 Jun 2023 15:55:14 +0200
To: gcc-patches@gcc.gnu.org, jwakely@redhat.com
Subject: Fix predictions of conditionals with __builtin_expect
Message-ID: <ZJ7ewtDTYq1aLqNw@kam.mff.cuni.cz>
MIME-Version: 1.0
Content-Disposition: inline
X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Jan Hubicka via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Jan Hubicka <hubicka@ucw.cz>
Reply-To: Jan Hubicka <hubicka@ucw.cz>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

Hi,
while looking into the std::vector _M_realloc_insert codegen I noticed that 
call of __throw_bad_alloc is predicted with 10% probability. This is because
the conditional guarding it has __builtin_expect (cond, 0) on it.  This
incorrectly takes precedence over more reliable heuristics predicting that call
to cold noreturn is likely not going to happen.

So I reordered the predictors so __builtin_expect_with_probability comes first
after predictors that never makes a mistake (so user can use it to always
specify the outcome by hand).  I also downgraded malloc predictor since I do
think user-defined malloc functions & new operators may behave funny ways and
moved usual __builtin_expect after the noreturn cold predictor.

This triggered latent bug in expr_expected_value_1 where

	  if (*predictor < predictor2)
 	    *predictor = predictor2;

should be:

 	  if (predictor2 < *predictor)
 	    *predictor = predictor2;

which eventually triggered an ICE on combining heuristics.  This made me notice
that we can do slightly better while combining expected values in case only 
one of the parameters (such as in a*b when we expect a==0) can determine
overall result.

Note that the new code may pick weaker heuristics in case that both values are
predicted.  Not sure if this scenario is worth the extra CPU time: there is
not correct way to combine the probabilities anyway since we do not know if
the predictions are independent, so I think users should not rely on it.

Fixing this issue uncovered another problem.  In 2018 Martin Liska added
code predicting that MALLOC returns non-NULL but instead of that he predicts
that it returns true (boolean 1).  This sort of works for testcase testing
 malloc (10) != NULL
but, for example, we will predict
 malloc (10) == malloc (10)
as true, which is not right and such comparsion may happen in real code

I think proper way is to update expr_expected_value_1 to work with value
ranges, but that needs greater surgery so I decided to postpone this and
only add FIXME and fill PR110499.

Bootstrapped/regtested x86_64-linux.  Will commit it shortly.

gcc/ChangeLog:

	PR middle-end/109849
	* predict.cc (estimate_bb_frequencies): Turn to static function.
	(expr_expected_value_1): Fix handling of binary expressions with
	predicted values.
	* predict.def (PRED_MALLOC_NONNULL): Move later in the priority queue.
	(PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the priority
	queue.
	* predict.h (estimate_bb_frequencies): No longer declare it.

gcc/testsuite/ChangeLog:

	PR middle-end/109849
	* gcc.dg/predict-18.c: Improve testcase.
diff --git a/gcc/predict.cc b/gcc/predict.cc
index 5e3c1d69ca4..688c0970f1c 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -89,6 +90,7 @@ static void predict_paths_leading_to_edge (edge, enum br_predictor,
 static bool can_predict_insn_p (const rtx_insn *);
 static HOST_WIDE_INT get_predictor_value (br_predictor, HOST_WIDE_INT);
 static void determine_unlikely_bbs ();
+static void estimate_bb_frequencies (bool force);
 
 /* Information we hold about each branch predictor.
    Filled using information from predict.def.  */
@@ -2485,7 +2487,11 @@ expr_expected_value_1 (tree type, tree op0, enum tree_code code,
 	    {
 	      if (predictor)
 		*predictor = PRED_MALLOC_NONNULL;
-	      return boolean_true_node;
+	      /* FIXME: This is wrong and we need to convert the logic
+	         to value ranges.  This makes predictor to assume that
+		 malloc always returns (size_t)1 which is not the same
+		 as returning non-NULL.  */
+	      return fold_convert (type, boolean_true_node);
 	    }
 
 	  if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
@@ -2563,7 +2569,9 @@ expr_expected_value_1 (tree type, tree op0, enum tree_code code,
 	      case BUILT_IN_REALLOC:
 		if (predictor)
 		  *predictor = PRED_MALLOC_NONNULL;
-		return boolean_true_node;
+		/* FIXME: This is wrong and we need to convert the logic
+		   to value ranges.  */
+		return fold_convert (type, boolean_true_node);
 	      default:
 		break;
 	    }
@@ -2575,18 +2583,43 @@ expr_expected_value_1 (tree type, tree op0, enum tree_code code,
   if (get_gimple_rhs_class (code) == GIMPLE_BINARY_RHS)
     {
       tree res;
+      tree nop0 = op0;
+      tree nop1 = op1;
+      if (TREE_CODE (op0) != INTEGER_CST)
+	{
+	  /* See if expected value of op0 is good enough to determine the result.  */
+	  nop0 = expr_expected_value (op0, visited, predictor, probability);
+	  if (nop0
+	      && (res = fold_build2 (code, type, nop0, op1)) != NULL
+	      && TREE_CODE (res) == INTEGER_CST)
+	    return res;
+	  if (!nop0)
+	    nop0 = op0;
+	 }
       enum br_predictor predictor2;
       HOST_WIDE_INT probability2;
-      op0 = expr_expected_value (op0, visited, predictor, probability);
-      if (!op0)
-	return NULL;
-      op1 = expr_expected_value (op1, visited, &predictor2, &probability2);
-      if (!op1)
+      if (TREE_CODE (op1) != INTEGER_CST)
+	{
+	  /* See if expected value of op1 is good enough to determine the result.  */
+	  nop1 = expr_expected_value (op1, visited, &predictor2, &probability2);
+	  if (nop1
+	      && (res = fold_build2 (code, type, op0, nop1)) != NULL
+	      && TREE_CODE (res) == INTEGER_CST)
+	    {
+	      *predictor = predictor2;
+	      *probability = probability2;
+	      return res;
+	    }
+	  if (!nop1)
+	    nop1 = op1;
+	 }
+      if (nop0 == op0 || nop1 == op1)
 	return NULL;
-      res = fold_build2 (code, type, op0, op1);
+      /* Finally see if we have two known values.  */
+      res = fold_build2 (code, type, nop0, nop1);
       if (TREE_CODE (res) == INTEGER_CST
-	  && TREE_CODE (op0) == INTEGER_CST
-	  && TREE_CODE (op1) == INTEGER_CST)
+	  && TREE_CODE (nop0) == INTEGER_CST
+	  && TREE_CODE (nop1) == INTEGER_CST)
 	{
 	  /* Combine binary predictions.  */
 	  if (*probability != -1 || probability2 != -1)
@@ -2596,7 +2629,7 @@ expr_expected_value_1 (tree type, tree op0, enum tree_code code,
 	      *probability = RDIV (p1 * p2, REG_BR_PROB_BASE);
 	    }
 
-	  if (*predictor < predictor2)
+	  if (predictor2 < *predictor)
 	    *predictor = predictor2;
 
 	  return res;
@@ -3894,7 +3927,7 @@ determine_unlikely_bbs ()
    probabilities.  If FORCE is true, the frequencies are used to estimate
    the counts even when there are already non-zero profile counts.  */
 
-void
+static void
 estimate_bb_frequencies (bool force)
 {
   basic_block bb;
diff --git a/gcc/predict.def b/gcc/predict.def
index 1f391a01e85..ae7dd8239c5 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -51,16 +51,17 @@ DEF_PREDICTOR (PRED_NO_PREDICTION, "no prediction", PROB_ALWAYS, 0)
 DEF_PREDICTOR (PRED_UNCONDITIONAL, "unconditional jump", PROB_ALWAYS,
 	       PRED_FLAG_FIRST_MATCH)
 
-/* Return value of malloc function is almost always non-null.  */
-DEF_PREDICTOR (PRED_MALLOC_NONNULL, "malloc returned non-NULL", \
-	       PROB_VERY_LIKELY, PRED_FLAG_FIRST_MATCH)
-
 /* Use number of loop iterations determined by # of iterations
    analysis to set probability.  We don't want to use Dempster-Shaffer
    theory here, as the predictions is exact.  */
 DEF_PREDICTOR (PRED_LOOP_ITERATIONS, "loop iterations", PROB_UNINITIALIZED,
 	       PRED_FLAG_FIRST_MATCH)
 
+/* Hints provided by user via __builtin_expect_with_probability.  */
+DEF_PREDICTOR (PRED_BUILTIN_EXPECT_WITH_PROBABILITY,
+	       "__builtin_expect_with_probability", PROB_UNINITIALIZED,
+	       PRED_FLAG_FIRST_MATCH)
+
 /* Assume that any given atomic operation has low contention,
    and thus the compare-and-swap operation succeeds. */
 DEF_PREDICTOR (PRED_COMPARE_AND_SWAP, "compare and swap", PROB_VERY_LIKELY,
@@ -73,11 +74,6 @@ DEF_PREDICTOR (PRED_COMPARE_AND_SWAP, "compare and swap", PROB_VERY_LIKELY,
 DEF_PREDICTOR (PRED_BUILTIN_EXPECT, "__builtin_expect", PROB_VERY_LIKELY,
 	       PRED_FLAG_FIRST_MATCH)
 
-/* Hints provided by user via __builtin_expect_with_probability.  */
-DEF_PREDICTOR (PRED_BUILTIN_EXPECT_WITH_PROBABILITY,
-	       "__builtin_expect_with_probability", PROB_UNINITIALIZED,
-	       PRED_FLAG_FIRST_MATCH)
-
 /* Branches to hot labels are likely.  */
 DEF_PREDICTOR (PRED_HOT_LABEL, "hot label", HITRATE (90),
 	       PRED_FLAG_FIRST_MATCH)
@@ -86,6 +82,10 @@ DEF_PREDICTOR (PRED_HOT_LABEL, "hot label", HITRATE (90),
 DEF_PREDICTOR (PRED_COLD_LABEL, "cold label", HITRATE (90),
 	       PRED_FLAG_FIRST_MATCH)
 
+/* Return value of malloc function is almost always non-null.  */
+DEF_PREDICTOR (PRED_MALLOC_NONNULL, "malloc returned non-NULL", \
+	       PROB_VERY_LIKELY, PRED_FLAG_FIRST_MATCH)
+
 /* Use number of loop iterations guessed by the contents of the loop.  */
 DEF_PREDICTOR (PRED_LOOP_ITERATIONS_GUESSED, "guessed loop iterations",
 	       PROB_UNINITIALIZED, PRED_FLAG_FIRST_MATCH)
diff --git a/gcc/predict.h b/gcc/predict.h
index d9a7fc3eca1..4864b7d7113 100644
--- a/gcc/predict.h
+++ b/gcc/predict.h
@@ -93,7 +93,6 @@ extern void tree_estimate_probability (bool);
 extern void handle_missing_profiles (void);
 extern bool update_max_bb_count (void);
 extern bool expensive_function_p (int);
-extern void estimate_bb_frequencies (bool);
 extern void compute_function_frequency (void);
 extern tree build_predict_expr (enum br_predictor, enum prediction);
 extern const char *predictor_name (enum br_predictor);
diff --git a/gcc/testsuite/gcc.dg/predict-18.c b/gcc/testsuite/gcc.dg/predict-18.c
index 0c93638a971..073e742d849 100644
--- a/gcc/testsuite/gcc.dg/predict-18.c
+++ b/gcc/testsuite/gcc.dg/predict-18.c
@@ -8,6 +8,8 @@ int x;
 short v = 0;
 short expected = 0;
 short max = ~0;
+short m = 0;
+short n = 0;
 #define STRONG 0
 
 void foo (int a, int b)
@@ -23,9 +25,17 @@ void foo (int a, int b)
 
   if (__builtin_expect_with_probability (a < 10, 1, 0.9f) > __builtin_expect_with_probability (b, 0, 0.8f))
     global++;
+
+  if (a * __builtin_expect_with_probability (m, 0, 0.6f) > 0)
+    global++;
+
+  if (__builtin_expect_with_probability (n, 0, 0.65f) * a > 0)
+    global++;
 }
 
 /* { dg-final { scan-tree-dump "__builtin_expect_with_probability heuristics of edge .*->.*: 54.00%" "profile_estimate"} } */
 /* { dg-final { scan-tree-dump "__builtin_expect_with_probability heuristics of edge .*->.*: 77.70%" "profile_estimate"} } */
 /* { dg-final { scan-tree-dump "__builtin_expect_with_probability heuristics of edge .*->.*: 98.96%" "profile_estimate"} } */
 /* { dg-final { scan-tree-dump "__builtin_expect_with_probability heuristics of edge .*->.*: 71.99%" "profile_estimate"} } */
+/* { dg-final { scan-tree-dump "__builtin_expect_with_probability heuristics of edge .*->.*: 40.00%" "profile_estimate"} } */
+/* { dg-final { scan-tree-dump "__builtin_expect_with_probability heuristics of edge .*->.*: 35.01%" "profile_estimate"} } */