From patchwork Sat Jul 3 15:25:27 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 57812 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 375A0B6F1A for ; Sun, 4 Jul 2010 01:25:42 +1000 (EST) Received: (qmail 1231 invoked by alias); 3 Jul 2010 15:25:38 -0000 Received: (qmail 1205 invoked by uid 22791); 3 Jul 2010 15:25:35 -0000 X-SWARE-Spam-Status: No, hits=-0.4 required=5.0 tests=AWL, BAYES_50, TW_CF, TW_TM, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from nikam-dmz.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 03 Jul 2010 15:25:30 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 9066F9AC806; Sat, 3 Jul 2010 17:25:27 +0200 (CEST) Date: Sat, 3 Jul 2010 17:25:27 +0200 From: Jan Hubicka To: gcc-patches@gcc.gnu.org, jakub@redhat.com, matz@suse.de Subject: Split cfgexpand and var-tracking timevars Message-ID: <20100703152527.GI6378@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, this patch breaks out expansion and var-trakcing timevars to aswer some questions I got after posting the LTO build numbers. Now I get: garbage collection : 11.54 ( 2%) usr 0.28 ( 3%) sys 11.82 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization: 0.54 ( 0%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall 0 kB ( 0%) ggc varpool construction : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall 7046 kB ( 0%) ggc ipa cp : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 18172 kB ( 1%) ggc ipa lto gimple I/O : 7.24 ( 1%) usr 0.94 (10%) sys 8.32 ( 2%) wall 881022 kB (28%) ggc ipa lto decl I/O : 4.82 ( 1%) usr 0.20 ( 2%) sys 5.05 ( 1%) wall 249171 kB ( 8%) ggc ipa lto decl init I/O : 0.40 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall 55386 kB ( 2%) ggc ipa lto cgraph I/O : 0.16 ( 0%) usr 0.02 ( 0%) sys 0.18 ( 0%) wall 50866 kB ( 2%) ggc ipa lto decl merge : 1.70 ( 0%) usr 0.06 ( 1%) sys 1.75 ( 0%) wall 29 kB ( 0%) ggc ipa lto cgraph merge : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 6831 kB ( 0%) ggc ipa reference : 0.53 ( 0%) usr 0.03 ( 0%) sys 0.56 ( 0%) wall 0 kB ( 0%) ggc ipa profile : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 0.68 ( 0%) usr 0.00 ( 0%) sys 0.65 ( 0%) wall 1068 kB ( 0%) ggc cfg cleanup : 8.98 ( 2%) usr 0.04 ( 0%) sys 9.20 ( 2%) wall 30697 kB ( 1%) ggc trivially dead code : 2.83 ( 1%) usr 0.04 ( 0%) sys 3.09 ( 1%) wall 0 kB ( 0%) ggc df multiple defs : 1.78 ( 0%) usr 0.04 ( 0%) sys 1.87 ( 0%) wall 0 kB ( 0%) ggc df reaching defs : 4.72 ( 1%) usr 0.05 ( 1%) sys 4.69 ( 1%) wall 0 kB ( 0%) ggc df live regs : 23.07 ( 5%) usr 0.08 ( 1%) sys 23.04 ( 5%) wall 0 kB ( 0%) ggc df live&initialized regs: 12.43 ( 2%) usr 0.06 ( 1%) sys 12.48 ( 2%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 2.53 ( 1%) usr 0.00 ( 0%) sys 2.63 ( 1%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 9.57 ( 2%) usr 0.05 ( 1%) sys 9.89 ( 2%) wall 72343 kB ( 2%) ggc register information : 3.00 ( 1%) usr 0.01 ( 0%) sys 3.12 ( 1%) wall 0 kB ( 0%) ggc alias analysis : 9.51 ( 2%) usr 0.03 ( 0%) sys 9.60 ( 2%) wall 200305 kB ( 6%) ggc alias stmt walking : 4.83 ( 1%) usr 0.71 ( 8%) sys 5.69 ( 1%) wall 7104 kB ( 0%) ggc register scan : 1.10 ( 0%) usr 0.00 ( 0%) sys 1.21 ( 0%) wall 1946 kB ( 0%) ggc rebuild jump labels : 2.10 ( 0%) usr 0.00 ( 0%) sys 1.96 ( 0%) wall 0 kB ( 0%) ggc parser : 0.32 ( 0%) usr 0.07 ( 1%) sys 0.69 ( 0%) wall 23250 kB ( 1%) ggc inline heuristics : 2.17 ( 0%) usr 0.08 ( 1%) sys 2.45 ( 0%) wall 74899 kB ( 2%) ggc integration : 7.40 ( 1%) usr 0.56 ( 6%) sys 8.21 ( 2%) wall 699599 kB (22%) ggc tree CFG cleanup : 6.72 ( 1%) usr 0.12 ( 1%) sys 6.48 ( 1%) wall 18567 kB ( 1%) ggc tree VRP : 11.31 ( 2%) usr 0.32 ( 3%) sys 11.43 ( 2%) wall 319191 kB (10%) ggc tree copy propagation : 2.48 ( 0%) usr 0.04 ( 0%) sys 2.39 ( 0%) wall 14598 kB ( 0%) ggc tree PTA : 5.39 ( 1%) usr 0.01 ( 0%) sys 5.28 ( 1%) wall 42713 kB ( 1%) ggc tree SSA rewrite : 2.77 ( 1%) usr 0.04 ( 0%) sys 3.09 ( 1%) wall 50979 kB ( 2%) ggc tree SSA incremental : 5.76 ( 1%) usr 0.29 ( 3%) sys 5.55 ( 1%) wall 53118 kB ( 2%) ggc tree operand scan : 3.17 ( 1%) usr 1.42 (15%) sys 3.73 ( 1%) wall 444654 kB (14%) ggc dominator optimization: 5.77 ( 1%) usr 0.02 ( 0%) sys 5.87 ( 1%) wall 122530 kB ( 4%) ggc tree SRA : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.26 ( 0%) wall 2961 kB ( 0%) ggc tree CCP : 2.15 ( 0%) usr 0.02 ( 0%) sys 2.41 ( 0%) wall 12129 kB ( 0%) ggc tree PHI const/copy prop: 0.19 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall 1679 kB ( 0%) ggc tree split crit edges : 0.63 ( 0%) usr 0.01 ( 0%) sys 0.46 ( 0%) wall 84149 kB ( 3%) ggc tree reassociation : 0.83 ( 0%) usr 0.01 ( 0%) sys 0.94 ( 0%) wall 15332 kB ( 0%) ggc tree PRE : 28.01 ( 6%) usr 0.22 ( 2%) sys 28.32 ( 6%) wall 223553 kB ( 7%) ggc tree FRE : 5.65 ( 1%) usr 0.22 ( 2%) sys 6.10 ( 1%) wall 27160 kB ( 1%) ggc tree code sinking : 0.66 ( 0%) usr 0.02 ( 0%) sys 0.82 ( 0%) wall 8983 kB ( 0%) ggc tree linearize phis : 0.52 ( 0%) usr 0.04 ( 0%) sys 0.54 ( 0%) wall 2158 kB ( 0%) ggc tree forward propagate: 0.72 ( 0%) usr 0.02 ( 0%) sys 0.79 ( 0%) wall 18358 kB ( 1%) ggc tree phiprop : 0.10 ( 0%) usr 0.02 ( 0%) sys 0.08 ( 0%) wall 307 kB ( 0%) ggc tree conservative DCE : 1.60 ( 0%) usr 0.24 ( 3%) sys 2.03 ( 0%) wall 1904 kB ( 0%) ggc tree aggressive DCE : 1.42 ( 0%) usr 0.09 ( 1%) sys 1.41 ( 0%) wall 38880 kB ( 1%) ggc tree buildin call DCE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.88 ( 0%) usr 0.01 ( 0%) sys 0.85 ( 0%) wall 4062 kB ( 0%) ggc PHI merge : 0.57 ( 0%) usr 0.00 ( 0%) sys 0.52 ( 0%) wall 4838 kB ( 0%) ggc tree loop bounds : 0.65 ( 0%) usr 0.00 ( 0%) sys 0.51 ( 0%) wall 8531 kB ( 0%) ggc tree loop invariant motion: 1.03 ( 0%) usr 0.00 ( 0%) sys 0.96 ( 0%) wall 1452 kB ( 0%) ggc tree canonical iv : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 5122 kB ( 0%) ggc scev constant prop : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.52 ( 0%) wall 16752 kB ( 1%) ggc tree loop unswitching : 0.25 ( 0%) usr 0.02 ( 0%) sys 0.30 ( 0%) wall 18321 kB ( 1%) ggc complete unrolling : 0.72 ( 0%) usr 0.03 ( 0%) sys 0.93 ( 0%) wall 42447 kB ( 1%) ggc tree vectorization : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 18424 kB ( 1%) ggc tree slp vectorization: 3.48 ( 1%) usr 0.02 ( 0%) sys 3.78 ( 1%) wall 288800 kB ( 9%) ggc tree iv optimization : 1.86 ( 0%) usr 0.04 ( 0%) sys 2.11 ( 0%) wall 79505 kB ( 3%) ggc predictive commoning : 0.35 ( 0%) usr 0.00 ( 0%) sys 0.36 ( 0%) wall 11598 kB ( 0%) ggc tree loop init : 0.75 ( 0%) usr 0.00 ( 0%) sys 0.72 ( 0%) wall 19549 kB ( 1%) ggc tree loop fini : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc tree copy headers : 0.28 ( 0%) usr 0.01 ( 0%) sys 0.35 ( 0%) wall 28294 kB ( 1%) ggc tree SSA uncprop : 0.38 ( 0%) usr 0.00 ( 0%) sys 0.29 ( 0%) wall 0 kB ( 0%) ggc tree NRV optimization : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 278 kB ( 0%) ggc tree rename SSA copies: 0.61 ( 0%) usr 0.01 ( 0%) sys 0.53 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.85 ( 0%) usr 0.02 ( 0%) sys 0.77 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 5.67 ( 1%) usr 0.07 ( 1%) sys 5.92 ( 1%) wall 0 kB ( 0%) ggc control dependences : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc out of ssa : 2.27 ( 0%) usr 0.01 ( 0%) sys 2.37 ( 0%) wall 1539 kB ( 0%) ggc expand vars : 3.55 ( 1%) usr 0.00 ( 0%) sys 3.71 ( 1%) wall 85419 kB ( 3%) ggc expand : 38.67 ( 8%) usr 0.56 ( 6%) sys 39.27 ( 8%) wall 801014 kB (25%) ggc post expand cleanups : 0.78 ( 0%) usr 0.02 ( 0%) sys 0.81 ( 0%) wall 70191 kB ( 2%) ggc varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc lower subreg : 0.08 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc jump : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc forward prop : 5.00 ( 1%) usr 0.09 ( 1%) sys 4.92 ( 1%) wall 67389 kB ( 2%) ggc CSE : 12.53 ( 3%) usr 0.03 ( 0%) sys 12.12 ( 2%) wall 18902 kB ( 1%) ggc dead code elimination : 2.47 ( 0%) usr 0.02 ( 0%) sys 2.62 ( 1%) wall 0 kB ( 0%) ggc dead store elim1 : 4.12 ( 1%) usr 0.02 ( 0%) sys 4.28 ( 1%) wall 42608 kB ( 1%) ggc dead store elim2 : 4.04 ( 1%) usr 0.02 ( 0%) sys 3.77 ( 1%) wall 50827 kB ( 2%) ggc loop analysis : 0.50 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall 15084 kB ( 0%) ggc loop invariant motion : 1.67 ( 0%) usr 0.01 ( 0%) sys 1.71 ( 0%) wall 1691 kB ( 0%) ggc loop unswitching : 0.59 ( 0%) usr 0.01 ( 0%) sys 0.56 ( 0%) wall 484 kB ( 0%) ggc CPROP : 10.36 ( 2%) usr 0.04 ( 0%) sys 10.92 ( 2%) wall 93818 kB ( 3%) ggc PRE : 8.96 ( 2%) usr 0.04 ( 0%) sys 8.89 ( 2%) wall 13722 kB ( 0%) ggc code hoisting : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 21 kB ( 0%) ggc CSE 2 : 7.07 ( 1%) usr 0.02 ( 0%) sys 7.28 ( 1%) wall 11711 kB ( 0%) ggc branch prediction : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 32 kB ( 0%) ggc combiner : 15.08 ( 3%) usr 0.08 ( 1%) sys 15.66 ( 3%) wall 250426 kB ( 8%) ggc if-conversion : 3.01 ( 1%) usr 0.01 ( 0%) sys 2.92 ( 1%) wall 31718 kB ( 1%) ggc regmove : 1.39 ( 0%) usr 0.01 ( 0%) sys 1.53 ( 0%) wall 376 kB ( 0%) ggc integrated RA : 29.25 ( 6%) usr 0.05 ( 1%) sys 28.78 ( 6%) wall 129980 kB ( 4%) ggc reload : 12.15 ( 2%) usr 0.08 ( 1%) sys 12.14 ( 2%) wall 39196 kB ( 1%) ggc reload CSE regs : 8.59 ( 2%) usr 0.00 ( 0%) sys 8.44 ( 2%) wall 110975 kB ( 3%) ggc load CSE after reload : 0.96 ( 0%) usr 0.00 ( 0%) sys 1.10 ( 0%) wall 606 kB ( 0%) ggc zee : 0.85 ( 0%) usr 0.03 ( 0%) sys 0.80 ( 0%) wall 400 kB ( 0%) ggc thread pro- & epilogue: 1.98 ( 0%) usr 0.01 ( 0%) sys 1.84 ( 0%) wall 44591 kB ( 1%) ggc if-conversion 2 : 0.87 ( 0%) usr 0.01 ( 0%) sys 0.85 ( 0%) wall 8193 kB ( 0%) ggc combine stack adjustments: 0.38 ( 0%) usr 0.01 ( 0%) sys 0.36 ( 0%) wall 1 kB ( 0%) ggc peephole 2 : 1.35 ( 0%) usr 0.01 ( 0%) sys 1.25 ( 0%) wall 22154 kB ( 1%) ggc hard reg cprop : 2.96 ( 1%) usr 0.03 ( 0%) sys 3.36 ( 1%) wall 2352 kB ( 0%) ggc scheduling 2 : 15.25 ( 3%) usr 0.09 ( 1%) sys 15.33 ( 3%) wall 7224 kB ( 0%) ggc machine dep reorg : 2.63 ( 1%) usr 0.00 ( 0%) sys 2.34 ( 0%) wall 2880 kB ( 0%) ggc reorder blocks : 2.58 ( 1%) usr 0.01 ( 0%) sys 2.89 ( 1%) wall 71277 kB ( 2%) ggc final : 9.27 ( 2%) usr 0.66 ( 7%) sys 10.19 ( 2%) wall 145100 kB ( 5%) ggc variable output : 0.44 ( 0%) usr 0.02 ( 0%) sys 0.45 ( 0%) wall 5092 kB ( 0%) ggc symout : 6.91 ( 1%) usr 0.42 ( 4%) sys 7.34 ( 1%) wall 414781 kB (13%) ggc variable tracking : 8.64 ( 2%) usr 0.05 ( 1%) sys 9.16 ( 2%) wall 192447 kB ( 6%) ggc var-tracking dataflow : 24.20 ( 5%) usr 0.05 ( 1%) sys 23.70 ( 5%) wall 0 kB ( 0%) ggc var-tracking emit : 15.30 ( 3%) usr 0.04 ( 0%) sys 14.78 ( 3%) wall 179536 kB ( 6%) ggc tree if-combine : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 71 kB ( 0%) ggc uninit var anaysis : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 497.36 9.38 507.43 3172254 kB So var-tracking dataflow seems to be most expensive part of var tracking but just 50% of the problem. In expansion, it is the actual RTL expansion, not variable packng nor out-of-ssa code. Bootstrapped/regtested x86_64-linux, OK? Honza * timevar.def (TV_OUT_OF_SSA, TV_VAR_EXPAND, TV_POST_EXPAND, TV_VAR_TRACKING_DATAFLOW, TV_VAR_TRACKING_EMIT): New timevars. * cfgexpand.c (gimple_expand_cfg): Use new timevars. * var-tracking.c (vt_find_locations, variable_tracking_main_1): Likewise. Index: timevar.def =================================================================== --- timevar.def (revision 161774) +++ timevar.def (working copy) @@ -172,7 +172,10 @@ DEFTIMEVAR (TV_DOMINANCE , " DEFTIMEVAR (TV_CONTROL_DEPENDENCES , "control dependences") DEFTIMEVAR (TV_OVERLOAD , "overload resolution") DEFTIMEVAR (TV_TEMPLATE_INSTANTIATION, "template instantiation") +DEFTIMEVAR (TV_OUT_OF_SSA , "out of ssa") +DEFTIMEVAR (TV_VAR_EXPAND , "expand vars") DEFTIMEVAR (TV_EXPAND , "expand") +DEFTIMEVAR (TV_POST_EXPAND , "post expand cleanups") DEFTIMEVAR (TV_VARCONST , "varconst") DEFTIMEVAR (TV_LOWER_SUBREG , "lower subreg") DEFTIMEVAR (TV_JUMP , "jump") @@ -226,6 +229,8 @@ DEFTIMEVAR (TV_FINAL , " DEFTIMEVAR (TV_VAROUT , "variable output") DEFTIMEVAR (TV_SYMOUT , "symout") DEFTIMEVAR (TV_VAR_TRACKING , "variable tracking") +DEFTIMEVAR (TV_VAR_TRACKING_DATAFLOW , "var-tracking dataflow") +DEFTIMEVAR (TV_VAR_TRACKING_EMIT , "var-tracking emit") DEFTIMEVAR (TV_TREE_IFCOMBINE , "tree if-combine") DEFTIMEVAR (TV_TREE_UNINIT , "uninit var anaysis") DEFTIMEVAR (TV_PLUGIN_INIT , "plugin initialization") Index: cfgexpand.c =================================================================== --- cfgexpand.c (revision 161774) +++ cfgexpand.c (working copy) @@ -3764,7 +3764,9 @@ gimple_expand_cfg (void) edge e; unsigned i; + timevar_push (TV_OUT_OF_SSA); rewrite_out_of_ssa (&SA); + timevar_pop (TV_OUT_OF_SSA); SA.partition_to_pseudo = (rtx *)xcalloc (SA.map->num_partitions, sizeof (rtx)); @@ -3807,7 +3809,9 @@ gimple_expand_cfg (void) /* Expand the variables recorded during gimple lowering. */ + timevar_push (TV_VAR_EXPAND); expand_used_vars (); + timevar_pop (TV_VAR_EXPAND); /* Honor stack protection warnings. */ if (warn_stack_protect) @@ -3887,8 +3891,11 @@ gimple_expand_cfg (void) expand_debug_locations (); execute_free_datastructures (); + timevar_push (TV_OUT_OF_SSA); finish_out_of_ssa (&SA); + timevar_pop (TV_OUT_OF_SSA); + timevar_push (TV_POST_EXPAND); /* We are no longer in SSA form. */ cfun->gimple_df->in_ssa_p = false; @@ -3998,6 +4005,7 @@ gimple_expand_cfg (void) the common parent easily. */ set_block_levels (DECL_INITIAL (cfun->decl), 0); default_rtl_profile (); + timevar_pop (TV_POST_EXPAND); return 0; } Index: var-tracking.c =================================================================== --- var-tracking.c (revision 161774) +++ var-tracking.c (working copy) @@ -5992,6 +5992,7 @@ vt_find_locations (void) int htabmax = PARAM_VALUE (PARAM_MAX_VARTRACK_SIZE); bool success = true; + timevar_push (TV_VAR_TRACKING_DATAFLOW); /* Compute reverse completion order of depth first search of the CFG so that the data-flow runs faster. */ rc_order = XNEWVEC (int, n_basic_blocks - NUM_FIXED_BLOCKS); @@ -6027,6 +6028,7 @@ vt_find_locations (void) { bb = (basic_block) fibheap_extract_min (worklist); RESET_BIT (in_worklist, bb->index); + gcc_assert (!TEST_BIT (visited, bb->index)); if (!TEST_BIT (visited, bb->index)) { bool changed; @@ -6179,6 +6181,7 @@ vt_find_locations (void) sbitmap_free (in_worklist); sbitmap_free (in_pending); + timevar_pop (TV_VAR_TRACKING_DATAFLOW); return success; } @@ -8534,7 +8537,9 @@ variable_tracking_main_1 (void) dump_flow_info (dump_file, dump_flags); } + timevar_push (TV_VAR_TRACKING_EMIT); vt_emit_notes (); + timevar_pop (TV_VAR_TRACKING_EMIT); vt_finalize (); vt_debug_insns_local (false);