From patchwork Mon Aug 23 18:49:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 1520243 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.cz header.i=@suse.cz header.a=rsa-sha256 header.s=susede2_rsa header.b=eIkuM9oy; dkim=pass header.d=suse.cz header.i=@suse.cz header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=B9LNZRIz; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Gv6wS2hnfz9sWl for ; Tue, 24 Aug 2021 21:54:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 604F13857423 for ; Tue, 24 Aug 2021 11:54:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 129233858420 for ; Tue, 24 Aug 2021 11:52:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 129233858420 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 406B320051 for ; Tue, 24 Aug 2021 11:52:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1629805978; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: resent-to:resent-from:resent-message-id:in-reply-to:in-reply-to: references:references; bh=occ3iOWcV2EZUasBnOlDdCM/lPrsi0gfR0H+XswA75w=; b=eIkuM9oyIjScxWBEU0KcmJ0DeCEM8OVeQCZ33Vs9Vj26frDNb/x8xHSaPW9247kPTz/8lp CK8mA/dOWyXH0JpcOvOTCAlVrRvQG2++6coUlYnUjwcrOWyNxz4uNNP3eNSZtAQFEOHWzb 3q+14+UL1SNFR6CECzqRs1w1ni/BeGg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1629805978; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: resent-to:resent-from:resent-message-id:in-reply-to:in-reply-to: references:references; bh=occ3iOWcV2EZUasBnOlDdCM/lPrsi0gfR0H+XswA75w=; b=B9LNZRIzPRb+A8Zb6EdmlmplSBEyQTTeKUx4tjqeJY/Izsqx9kxdCDywzxhjxuyS2CurAj C6B7sfgZ4Vg1UUDw== Received: from suse.cz (virgil.suse.cz [10.100.13.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 2F0C4A3BC0 for ; Tue, 24 Aug 2021 11:52:58 +0000 (UTC) Resent-From: Martin Jambor Resent-Date: Tue, 24 Aug 2021 13:52:58 +0200 Resent-Message-ID: Resent-To: GCC Patches Message-Id: <96160a5131c9e5eb302fb9f4db43c5d8b4cfe042.1629805719.git.mjambor@suse.cz> In-Reply-To: References: From: Martin Jambor Date: Mon, 23 Aug 2021 20:49:14 +0200 Subject: [PATCH 4/4] ipa-cp: Select saner profile count to base heuristics on To: GCC Patches X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jan Hubicka , Xionghu Luo Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" When profile feedback is available, IPA-CP takes the count of the hottest node and then evaluates all call contexts relative to it. This means that typically almost no clones for specialized contexts are ever created because the maximum is some special function, called from everywhere (that is likely to get inlined anyway) and all the examined edges look cold compared to it. This patch changes the selection. It simply sorts counts of all edges eligible for cloning in a vector and then picks the count in 90th percentile (the actual number is configurable via a parameter). I also tried more complex approaches which were summing the counts and picking the edge which together with all hotter edges accounted for a given portion of the total sum of all edge counts. But first it was not apparently clear to me that they make more logical sense that the simple method and practically I always also had to ignore a few percent of the hottest edges with really extreme counts (looking at bash and python). And when I had to do that anyway, it seemed simpler to just "ignore" more and take the first non-ignored count as the base. Nevertheless, if people think some more sophisticated method should be used anyway, I am willing to be persuaded. But this patch is a clear improvement over the current situation. gcc/ChangeLog: 2021-08-23 Martin Jambor * params.opt (param_ipa_cp_profile_count_base): New parameter. * ipa-cp.c (max_count): Replace with base_count, replace all occurrences too, unless otherwise stated. (ipcp_cloning_candidate_p): identify mostly-directly called functions based on their counts, not max_count. (compare_edge_profile_counts): New function. (ipcp_propagate_stage): Instead of setting max_count, find the appropriate edge count in a sorted vector of counts of eligible edges and make it the base_count. --- gcc/ipa-cp.c | 82 +++++++++++++++++++++++++++++++++++++++++++++----- gcc/params.opt | 4 +++ 2 files changed, 78 insertions(+), 8 deletions(-) diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 53cca7aa804..6ab74f61e83 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -400,9 +400,9 @@ object_allocator > ipcp_sources_pool object_allocator ipcp_agg_lattice_pool ("IPA_CP aggregate lattices"); -/* Maximal count found in program. */ +/* Base count to use in heuristics when using profile feedback. */ -static profile_count max_count; +static profile_count base_count; /* Original overall size of the program. */ @@ -809,7 +809,8 @@ ipcp_cloning_candidate_p (struct cgraph_node *node) /* When profile is available and function is hot, propagate into it even if calls seems cold; constant propagation can improve function's speed significantly. */ - if (max_count > profile_count::zero ()) + if (stats.count_sum > profile_count::zero () + && node->count.ipa ().initialized_p ()) { if (stats.count_sum > node->count.ipa ().apply_scale (90, 100)) { @@ -3310,10 +3311,10 @@ good_cloning_opportunity_p (struct cgraph_node *node, sreal time_benefit, ipa_node_params *info = ipa_node_params_sum->get (node); int eval_threshold = opt_for_fn (node->decl, param_ipa_cp_eval_threshold); - if (max_count > profile_count::zero ()) + if (base_count > profile_count::zero ()) { - sreal factor = count_sum.probability_in (max_count).to_sreal (); + sreal factor = count_sum.probability_in (base_count).to_sreal (); sreal evaluation = (time_benefit * factor) / size_cost; evaluation = incorporate_penalties (node, info, evaluation); evaluation *= 1000; @@ -3950,6 +3951,21 @@ value_topo_info::propagate_effects () } } +/* Callback for qsort to sort counts of all edges. */ + +static int +compare_edge_profile_counts (const void *a, const void *b) +{ + const profile_count *cnt1 = (const profile_count *) a; + const profile_count *cnt2 = (const profile_count *) b; + + if (*cnt1 < *cnt2) + return 1; + if (*cnt1 > *cnt2) + return -1; + return 0; +} + /* Propagate constants, polymorphic contexts and their effects from the summaries interprocedurally. */ @@ -3962,8 +3978,10 @@ ipcp_propagate_stage (class ipa_topo_info *topo) if (dump_file) fprintf (dump_file, "\n Propagating constants:\n\n"); - max_count = profile_count::uninitialized (); + base_count = profile_count::uninitialized (); + bool compute_count_base = false; + unsigned base_count_pos_percent = 0; FOR_EACH_DEFINED_FUNCTION (node) { if (node->has_gimple_body_p () @@ -3981,9 +3999,57 @@ ipcp_propagate_stage (class ipa_topo_info *topo) ipa_size_summary *s = ipa_size_summaries->get (node); if (node->definition && !node->alias && s != NULL) overall_size += s->self_size; - max_count = max_count.max (node->count.ipa ()); + if (node->count.ipa ().initialized_p ()) + { + compute_count_base = true; + unsigned pos_percent = opt_for_fn (node->decl, + param_ipa_cp_profile_count_base); + base_count_pos_percent = MAX (base_count_pos_percent, pos_percent); + } } + if (compute_count_base) + { + auto_vec all_edge_counts; + all_edge_counts.reserve_exact (symtab->edges_count); + FOR_EACH_DEFINED_FUNCTION (node) + for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee) + { + profile_count count = cs->count.ipa (); + if (!(count > profile_count::zero ())) + continue; + + enum availability avail; + cgraph_node *tgt + = cs->callee->function_or_virtual_thunk_symbol (&avail); + ipa_node_params *info = ipa_node_params_sum->get (tgt); + if (info && info->versionable) + all_edge_counts.quick_push (count); + } + + if (!all_edge_counts.is_empty ()) + { + gcc_assert (base_count_pos_percent <= 100); + all_edge_counts.qsort (compare_edge_profile_counts); + + unsigned base_count_pos + = ((all_edge_counts.length () * (base_count_pos_percent)) / 100); + base_count = all_edge_counts[base_count_pos]; + + if (dump_file) + { + fprintf (dump_file, "\nSelected base_count from %u edges at " + "position %u, arriving at: ", all_edge_counts.length (), + base_count_pos); + base_count.dump (dump_file); + fprintf (dump_file, "\n"); + } + } + else if (dump_file) + fprintf (dump_file, "\nNo candidates with non-zero call count found, " + "continuing as if without profile feedback.\n"); + } + orig_overall_size = overall_size; if (dump_file) @@ -6576,7 +6642,7 @@ make_pass_ipa_cp (gcc::context *ctxt) void ipa_cp_c_finalize (void) { - max_count = profile_count::uninitialized (); + base_count = profile_count::uninitialized (); overall_size = 0; orig_overall_size = 0; ipcp_free_transformation_sum (); diff --git a/gcc/params.opt b/gcc/params.opt index 8d772309407..5223f784bf0 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -290,6 +290,10 @@ The size of translation unit that IPA-CP pass considers large. Common Joined UInteger Var(param_ipa_cp_value_list_size) Init(8) Param Optimization Maximum size of a list of values associated with each parameter for interprocedural constant propagation. +-param=ipa-cp-profile-count-base= +Common Joined UInteger Var(param_ipa_cp_profile_count_base) Init(10) IntegerRange(0, 100) Param Optimization +When using profile feedback, use the edge at this percentage position in frequncy histogram as the bases for IPA-CP heuristics. + -param=ipa-jump-function-lookups= Common Joined UInteger Var(param_ipa_jump_function_lookups) Init(8) Param Optimization Maximum number of statements visited during jump function offset discovery.