From patchwork Thu Jul 20 14:00:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1810482 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=qxCOFFuv; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4R6Dqv0qJZz1yXp for ; Fri, 21 Jul 2023 00:01:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A0F34385AF9B for ; Thu, 20 Jul 2023 14:01:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A0F34385AF9B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689861672; bh=EDnQbNljJ9qxz1DTUGAQTgdhrXmDvl7ZSnPjrC36S3Y=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=qxCOFFuvjIHkN0m2tnpwGabvhEvfbU2lFQGgc1rDAxmTjzcoMcedBG1tqQhMqm1D8 X5v371VUnXDTeF588Z61nKrSzUMRbJMBnHVYx3yLP6a3Ys8iXUEIRp3y3CK+g2yyoJ CM9H3mMhz0zyvsoEh2Qtb2TWSQLqONceHBPlz958= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id A34733858CDB for ; Thu, 20 Jul 2023 14:00:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A34733858CDB Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 344692825B5; Thu, 20 Jul 2023 16:00:48 +0200 (CEST) Date: Thu, 20 Jul 2023 16:00:48 +0200 To: gcc-patches@gcc.gnu.org Subject: Cleanup code determining number of iterations from cfg profile Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_INVALID, DKIM_SIGNED, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_NONE, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Hubicka via Gcc-patches From: Jan Hubicka Reply-To: Jan Hubicka Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, this patch cleanups API for determining expected loop iteraitons from profile. We started with having expected_loop_iterations and only source was the integer represented BB counts. It did some work on guessing number of iteration if profile was absent or bogus. Later we introduced loop_info and added get_estimated_loop_iterations which made expected_loop_iterations useful mostly when doing profile updates and not for loop optimization heuristics. The naming is bit ambiguous so this difference is not clear. Even later we introduced precision tracking to profile and exended the API to return reliablity of result but did not update all uses to do reasonable stuff with it. There is also some cofusion about +-1s concering latch execution counts versus header execution counts. This patch aims to obsolette expected_loop_iterations and expected_loop_iterations_unbounded (and "suceeds" modulo 1 use of each of two). It adds expected_loop_iterations_by_profile which computes sreal and does correct precision/presence tracking. Unlike old code, it is based on CFG profile only and does not attempt to provide fake answer when info is missing and does not check sanity with loop_info. We now define iterations consistently as lath execution in loop_info so I use that here too. I converted almost all calls to new API: dumps, code produing loop_info from CFG profile and profile updating. Remaining uses are in loop unrolling and prefetching that needs more TLC I will do incrementally. There are some improvements possible which I can play with incrementally. - for simple loops with one exit dominating latch we can use exit probability for easier to preserve info in loop itraionts. THis is probably not too critical since all esitmates should be recorded in loop_info and would help mostly if new loop is constructed or old loop is lost and redicovered. - We may want to avoid trusting the profile if it is obviously inconsistent on header. Bootstrapped/regtested x86_64-linux, plan to commit it later today if there are no complains. Honza gcc/ChangeLog: * cfgloop.cc: Include sreal.h. (flow_loop_dump): Dump sreal iteration exsitmate. (get_estimated_loop_iterations): Update. * cfgloop.h (expected_loop_iterations_by_profile): Declare. * cfgloopanal.cc (expected_loop_iterations_by_profile): New function. (expected_loop_iterations_unbounded): Use new API. * cfgloopmanip.cc (scale_loop_profile): Use expected_loop_iterations_by_profile * predict.cc (pass_profile::execute): Likewise. * profile.cc (branch_prob): Likewise. * tree-ssa-loop-niter.cc: Include sreal.h. (estimate_numbers_of_iterations): Likewise diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc index ccda7415d70..11336ea45c0 100644 --- a/gcc/cfgloop.cc +++ b/gcc/cfgloop.cc @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "dumpfile.h" #include "tree-ssa.h" #include "tree-pretty-print.h" +#include "sreal.h" static void flow_loops_cfg_dump (FILE *); @@ -138,14 +139,11 @@ flow_loop_dump (const class loop *loop, FILE *file, loop_depth (loop), (long) (loop_outer (loop) ? loop_outer (loop)->num : -1)); - if (loop->latch) - { - bool read_profile_p; - gcov_type nit = expected_loop_iterations_unbounded (loop, &read_profile_p); - if (read_profile_p && !loop->any_estimate) - fprintf (file, ";; profile-based iteration count: %" PRIu64 "\n", - (uint64_t) nit); - } + bool reliable; + sreal iterations; + if (expected_loop_iterations_by_profile (loop, &iterations, &reliable)) + fprintf (file, ";; profile-based iteration count: %f %s\n", + iterations.to_double (), reliable ? "(reliable)" : "(unreliable)"); fprintf (file, ";; nodes:"); bbs = get_loop_body (loop); @@ -2014,10 +2012,12 @@ get_estimated_loop_iterations (class loop *loop, widest_int *nit) profile. */ if (!loop->any_estimate) { - if (loop->header->count.reliable_p ()) + sreal snit; + bool reliable; + if (expected_loop_iterations_by_profile (loop, &snit, &reliable) + && reliable) { - *nit = gcov_type_to_wide_int - (expected_loop_iterations_unbounded (loop) + 1); + *nit = (snit + 0.5).to_int (); return true; } return false; diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index e7ac2b5f3db..4d2fd4b6af5 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -403,7 +403,10 @@ extern void verify_loop_structure (void); /* Loop analysis. */ extern bool just_once_each_iteration_p (const class loop *, const_basic_block); gcov_type expected_loop_iterations_unbounded (const class loop *, - bool *read_profile_p = NULL, bool by_profile_only = false); + bool *read_profile_p = NULL); +extern bool expected_loop_iterations_by_profile (const class loop *loop, + sreal *ret, + bool *reliable = NULL); extern unsigned expected_loop_iterations (class loop *); extern rtx doloop_condition_get (rtx_insn *); diff --git a/gcc/cfgloopanal.cc b/gcc/cfgloopanal.cc index fd867448d63..a0972387590 100644 --- a/gcc/cfgloopanal.cc +++ b/gcc/cfgloopanal.cc @@ -233,79 +233,101 @@ average_num_loop_insns (const class loop *loop) return ret; } -/* Returns expected number of iterations of LOOP, according to - measured or guessed profile. +/* Return true if BB profile can be used to determine the expected number of + iterations (that is number of executions of latch edge(s) for each + entry of the loop. If this is the case initialize RET with the number + of iterations. + + RELIABLE is set if profile indiates that the returned value should be + realistic estimate. (This is the case if we read profile and did not + messed it up yet and not the case of guessed profiles.) - This functions attempts to return "sane" value even if profile - information is not good enough to derive osmething. - If BY_PROFILE_ONLY is set, this logic is bypassed and function - return -1 in those scenarios. */ + This function uses only CFG profile. We track more reliable info in + loop_info structure and for loop optimization heuristics more relevant + is get_estimated_loop_iterations API. */ -gcov_type -expected_loop_iterations_unbounded (const class loop *loop, - bool *read_profile_p, - bool by_profile_only) +bool +expected_loop_iterations_by_profile (const class loop *loop, sreal *ret, + bool *reliable) { + profile_count header_count = loop->header->count; + if (reliable) + *reliable = false; + + /* TODO: For single exit loops we can use loop exit edge probability. + It also may be reliable while loop itself was adjusted. */ + if (!header_count.initialized_p () + || !header_count.nonzero_p ()) + return false; + + profile_count count_in = profile_count::zero (); edge e; edge_iterator ei; - gcov_type expected = -1; - - if (read_profile_p) - *read_profile_p = false; - /* If we have no profile at all, use AVG_LOOP_NITER. */ - if (profile_status_for_fn (cfun) == PROFILE_ABSENT) - { - if (by_profile_only) - return -1; - expected = param_avg_loop_niter; - } - else if (loop->latch && (loop->latch->count.initialized_p () - || loop->header->count.initialized_p ())) + /* For single-latch loops avoid querying dominators. */ + if (loop->latch) { - profile_count count_in = profile_count::zero (), - count_latch = profile_count::zero (); - + bool found = false; FOR_EACH_EDGE (e, ei, loop->header->preds) - if (e->src == loop->latch) - count_latch = e->count (); - else + if (e->src != loop->latch) count_in += e->count (); - - if (!count_latch.initialized_p ()) - { - if (by_profile_only) - return -1; - expected = param_avg_loop_niter; - } - else if (!count_in.nonzero_p ()) + else + found = true; + /* If latch is not found, loop is inconsistent. */ + gcc_checking_assert (found); + } + else + FOR_EACH_EDGE (e, ei, loop->header->preds) + if (!dominated_by_p (CDI_DOMINATORS, e->src, loop->header)) + count_in += e->count (); + + bool known; + /* Number of iterations is number of executions of latch edge. */ + *ret = (header_count - count_in).to_sreal_scale (count_in, &known); + if (!known) + return false; + if (reliable) + { + /* Header should have at least count_in many executions. + Give up on clearly inconsistent profile. */ + if (header_count < count_in && header_count.differs_from_p (count_in)) { - if (by_profile_only) - return -1; - expected = count_latch.to_gcov_type () * 2; + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Inconsistent bb profile of loop %i\n", + loop->num); + *reliable = false; } else - { - expected = (count_latch.to_gcov_type () + count_in.to_gcov_type () - - 1) / count_in.to_gcov_type (); - if (read_profile_p - && count_latch.reliable_p () && count_in.reliable_p ()) - *read_profile_p = true; - } + *reliable = count_in.reliable_p () && header_count.reliable_p (); } + return true; +} + +/* Returns expected number of iterations of LOOP, according to + measured or guessed profile. + + This functions attempts to return "sane" value even if profile + information is not good enough to derive osmething. */ + +gcov_type +expected_loop_iterations_unbounded (const class loop *loop, + bool *read_profile_p) +{ + gcov_type expected = -1; + + if (read_profile_p) + *read_profile_p = false; + + sreal sreal_expected; + if (expected_loop_iterations_by_profile + (loop, &sreal_expected, read_profile_p)) + expected = (sreal_expected + 0.5).to_int (); else - { - if (by_profile_only) - return -1; - expected = param_avg_loop_niter; - } + expected = param_avg_loop_niter; - if (!by_profile_only) - { - HOST_WIDE_INT max = get_max_loop_iterations_int (loop); - if (max != -1 && max < expected) - return max; - } + HOST_WIDE_INT max = get_max_loop_iterations_int (loop); + if (max != -1 && max < expected) + return max; return expected; } diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc index 5c0065b2f5a..3012a8d60f7 100644 --- a/gcc/cfgloopmanip.cc +++ b/gcc/cfgloopmanip.cc @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3. If not see #include "gimplify-me.h" #include "tree-ssa-loop-manip.h" #include "dumpfile.h" +#include "sreal.h" static void copy_loops_to (class loop **, int, class loop *); @@ -527,16 +528,16 @@ scale_loop_profile (class loop *loop, profile_probability p, if (iteration_bound == -1) return; - gcov_type iterations = expected_loop_iterations_unbounded (loop, NULL, true); - if (iterations == -1) + sreal iterations; + if (!expected_loop_iterations_by_profile (loop, &iterations)) return; if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, - ";; guessed iterations of loop %i:%i new upper bound %i:\n", + ";; guessed iterations of loop %i:%f new upper bound %i:\n", loop->num, - (int)iterations, + iterations.to_double (), (int)iteration_bound); } diff --git a/gcc/predict.cc b/gcc/predict.cc index 26f9f3f6a88..8f44f5b7db8 100644 --- a/gcc/predict.cc +++ b/gcc/predict.cc @@ -4142,11 +4142,11 @@ pass_profile::execute (function *fun) profile_status_for_fn (fun) = PROFILE_GUESSED; if (dump_file && (dump_flags & TDF_DETAILS)) { + sreal iterations; for (auto loop : loops_list (cfun, LI_FROM_INNERMOST)) - if (loop->header->count.initialized_p ()) - fprintf (dump_file, "Loop got predicted %d to iterate %i times.\n", - loop->num, - (int)expected_loop_iterations_unbounded (loop)); + if (expected_loop_iterations_by_profile (loop, &iterations)) + fprintf (dump_file, "Loop got predicted %d to iterate %f times.\n", + loop->num, iterations.to_double ()); } return 0; } diff --git a/gcc/profile.cc b/gcc/profile.cc index a71800d5ce6..84d47b36d47 100644 --- a/gcc/profile.cc +++ b/gcc/profile.cc @@ -1543,14 +1543,17 @@ branch_prob (bool thunk) { if (dump_file && (dump_flags & TDF_DETAILS)) report_predictor_hitrates (); + sreal nit; + bool reliable; /* At this moment we have precise loop iteration count estimates. Record them to loop structure before the profile gets out of date. */ for (auto loop : loops_list (cfun, 0)) - if (loop->header->count > 0 && loop->header->count.reliable_p ()) + if (loop->header->count.ipa ().nonzero_p () + && expected_loop_iterations_by_profile (loop, &nit, &reliable) + && reliable) { - gcov_type nit = expected_loop_iterations_unbounded (loop); - widest_int bound = gcov_type_to_wide_int (nit); + widest_int bound = (nit + 0.5).to_int (); loop->any_estimate = false; record_niter_bound (loop, bound, true, false); } diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc index 5d398b67e68..3c4e66291fb 100644 --- a/gcc/tree-ssa-loop-niter.cc +++ b/gcc/tree-ssa-loop-niter.cc @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-dfa.h" #include "internal-fn.h" #include "gimple-range.h" +#include "sreal.h" /* The maximum number of dominator BBs we search for conditions @@ -4775,6 +4776,9 @@ estimate_numbers_of_iterations (class loop *loop) loop->estimate_state = EST_AVAILABLE; + sreal nit; + bool reliable; + /* If we have a measured profile, use it to estimate the number of iterations. Normally this is recorded by branch_prob right after reading the profile. In case we however found a new loop, record the @@ -4787,10 +4791,10 @@ estimate_numbers_of_iterations (class loop *loop) recomputing iteration bounds later in the compilation process will just introduce random roundoff errors. */ if (!loop->any_estimate - && loop->header->count.reliable_p ()) + && expected_loop_iterations_by_profile (loop, &nit, &reliable) + && reliable) { - gcov_type nit = expected_loop_iterations_unbounded (loop); - bound = gcov_type_to_wide_int (nit); + bound = (nit + 0.5).to_int (); record_niter_bound (loop, bound, true, false); }