Message ID | 6f695498-7f72-fe2f-b88b-4240f0d4569a@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | predcom: Adjust some unnecessary update_ssa calls | expand |
On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi, > > As Richi suggested in PR100794, this patch is to remove > some unnecessary update_ssa calls with flag > TODO_update_ssa_only_virtuals, also do some refactoring. > > Bootstrapped/regtested on powerpc64le-linux-gnu P9, > x86_64-redhat-linux and aarch64-linux-gnu, built well > on Power9 ppc64le with --with-build-config=bootstrap-O3, > and passed both P8 and P9 SPEC2017 full build with > {-O3, -Ofast} + {,-funroll-loops}. > > Is it ok for trunk? LGTM, minor comment on the fancy C++: + auto cleanup = [&]() { + release_chains (chains); + free_data_refs (datarefs); + BITMAP_FREE (looparound_phis); + free_affine_expand_cache (&name_expansions); + }; + cleanup (); + return 0; so that could have been class cleanup { ~cleanup() { release_chains (chains); free_data_refs (datarefs); BITMAP_FREE (looparound_phis); free_affine_expand_cache (&name_expansions); } } cleanup; ? Or some other means of adding registering a RAII-style cleanup? I mean, we can't wrap it all in try {...} finally {...} because C++ doesn't have finally. OK with this tiny part of the C++ refactoring delayed, but we can also simply discuss best options. At least for looparound_phis a good cleanup would be to pass the bitmap around and use auto_bitmap local to tree_predictive_commoning_loop ... Thanks, Richard. > BR, > Kewen > ----- > gcc/ChangeLog: > > * tree-predcom.c (execute_pred_commoning): Remove update_ssa call. > (tree_predictive_commoning_loop): Factor some cleanup stuffs into > lambda function cleanup, remove scev_reset call, and adjust return > value. > (tree_predictive_commoning): Adjust for different changed values, > only set flag TODO_update_ssa_only_virtuals if changed. > (pass_data pass_data_predcom): Remove TODO_update_ssa_only_virtuals > from todo_flags_finish. >
On 6/7/21 8:46 AM, Richard Biener via Gcc-patches wrote: > On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi, >> >> As Richi suggested in PR100794, this patch is to remove >> some unnecessary update_ssa calls with flag >> TODO_update_ssa_only_virtuals, also do some refactoring. >> >> Bootstrapped/regtested on powerpc64le-linux-gnu P9, >> x86_64-redhat-linux and aarch64-linux-gnu, built well >> on Power9 ppc64le with --with-build-config=bootstrap-O3, >> and passed both P8 and P9 SPEC2017 full build with >> {-O3, -Ofast} + {,-funroll-loops}. >> >> Is it ok for trunk? > > LGTM, minor comment on the fancy C++: > > + auto cleanup = [&]() { > + release_chains (chains); > + free_data_refs (datarefs); > + BITMAP_FREE (looparound_phis); > + free_affine_expand_cache (&name_expansions); > + }; > > + cleanup (); > + return 0; > > so that could have been > > class cleanup { > ~cleanup() > { > release_chains (chains); > free_data_refs (datarefs); > BITMAP_FREE (looparound_phis); > free_affine_expand_cache (&name_expansions); > } > } cleanup; > > ? Or some other means of adding registering a RAII-style cleanup? I agree this would be better than invoking the cleanup lambda explicitly. Going a step further would be to encapsulate all the data in a class (eliminating the static variables) and make tree_predictive_commoning_loop() its member function (along with whatever others it calls), and have the dtor take care of the cleanup. Of course, if the data types were classes with ctors and dtors (including, for example, auto_vec rather than vec for chains) the cleanup would just happen and none of the explicit calls would be necessary. Martin > I mean, we can't wrap it all in > > try {...} > finally {...} > > because C++ doesn't have finally. > > OK with this tiny part of the C++ refactoring delayed, but we can also simply > discuss best options. At least for looparound_phis a good cleanup would > be to pass the bitmap around and use auto_bitmap local to > tree_predictive_commoning_loop ... > > Thanks, > Richard. > >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> * tree-predcom.c (execute_pred_commoning): Remove update_ssa call. >> (tree_predictive_commoning_loop): Factor some cleanup stuffs into >> lambda function cleanup, remove scev_reset call, and adjust return >> value. >> (tree_predictive_commoning): Adjust for different changed values, >> only set flag TODO_update_ssa_only_virtuals if changed. >> (pass_data pass_data_predcom): Remove TODO_update_ssa_only_virtuals >> from todo_flags_finish. >>
on 2021/6/7 下午10:46, Richard Biener wrote: > On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi, >> >> As Richi suggested in PR100794, this patch is to remove >> some unnecessary update_ssa calls with flag >> TODO_update_ssa_only_virtuals, also do some refactoring. >> >> Bootstrapped/regtested on powerpc64le-linux-gnu P9, >> x86_64-redhat-linux and aarch64-linux-gnu, built well >> on Power9 ppc64le with --with-build-config=bootstrap-O3, >> and passed both P8 and P9 SPEC2017 full build with >> {-O3, -Ofast} + {,-funroll-loops}. >> >> Is it ok for trunk? > > LGTM, minor comment on the fancy C++: > > + auto cleanup = [&]() { > + release_chains (chains); > + free_data_refs (datarefs); > + BITMAP_FREE (looparound_phis); > + free_affine_expand_cache (&name_expansions); > + }; > > + cleanup (); > + return 0; > > so that could have been > > class cleanup { > ~cleanup() > { > release_chains (chains); > free_data_refs (datarefs); > BITMAP_FREE (looparound_phis); > free_affine_expand_cache (&name_expansions); > } > } cleanup; > > ? Or some other means of adding registering a RAII-style cleanup? > I mean, we can't wrap it all in > > try {...} > finally {...} > > because C++ doesn't have finally. > > OK with this tiny part of the C++ refactoring delayed, but we can also simply > discuss best options. At least for looparound_phis a good cleanup would > be to pass the bitmap around and use auto_bitmap local to > tree_predictive_commoning_loop ... > Thanks Richi! One draft (not ready for review) is attached for the further discussion. It follows the idea of RAII-style cleanup. I noticed that Martin suggested stepping forward to make tree_predictive_commoning_loop and its callees into one class (Thanks Martin), since there are not many this kind of C++-style work functions, I want to double confirm which option do you guys prefer? One point you might have seen is that to make tree_predictive_commoning_loop and its callees as member functions of one class can avoid to pass bitmap looparound_phis all around what's in the draft. :) BR, Kewen diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index ac1674d5486..75acc342c5a 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -375,13 +375,40 @@ struct component struct component *next; }; -/* Bitmap of ssa names defined by looparound phi nodes covered by chains. */ +typedef hash_map<tree, name_expansion *> tree_expand_map_t; +static void release_chains (vec<chain_p> chains); -static bitmap looparound_phis; +/* Class for predictive commoning data structure for one LOOP. */ +class loop_pcom_info +{ +public: + loop_pcom_info (loop_p l) + : loop (l), datarefs (vNULL), dependences (vNULL), chains (vNULL), + cache (NULL) + { + dependences.create (10); + datarefs.create (10); + } -/* Cache used by tree_to_aff_combination_expand. */ + ~loop_pcom_info () + { + free_data_refs (datarefs); + free_dependence_relations (dependences); + release_chains (chains); + free_affine_expand_cache (&cache); + } -static hash_map<tree, name_expansion *> *name_expansions; + /* The pointer to the given loop. */ + loop_p loop; + /* All data references. */ + vec<data_reference_p> datarefs; + /* All data dependences. */ + vec<ddr_p> dependences; + /* All chains. */ + vec<chain_p> chains; + /* Cache used by tree_to_aff_combination_expand. */ + tree_expand_map_t *cache; +}; /* Dumps data reference REF to FILE. */ @@ -673,13 +700,13 @@ suitable_reference_p (struct data_reference *a, enum ref_step_type *ref_step) /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */ static void -aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset) +aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset, + tree_expand_map_t **cache_ptr) { tree type = TREE_TYPE (DR_OFFSET (dr)); aff_tree delta; - tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset, - &name_expansions); + tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset, cache_ptr); aff_combination_const (&delta, type, wi::to_poly_widest (DR_INIT (dr))); aff_combination_add (offset, &delta); } @@ -692,7 +719,7 @@ aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset) static bool determine_offset (struct data_reference *a, struct data_reference *b, - poly_widest_int *off) + poly_widest_int *off, tree_expand_map_t **cache_ptr) { aff_tree diff, baseb, step; tree typea, typeb; @@ -720,13 +747,13 @@ determine_offset (struct data_reference *a, struct data_reference *b, /* Compare the offsets of the addresses, and check whether the difference is a multiple of step. */ - aff_combination_dr_offset (a, &diff); - aff_combination_dr_offset (b, &baseb); + aff_combination_dr_offset (a, &diff, cache_ptr); + aff_combination_dr_offset (b, &baseb, cache_ptr); aff_combination_scale (&baseb, -1); aff_combination_add (&diff, &baseb); - tree_to_aff_combination_expand (DR_STEP (a), TREE_TYPE (DR_STEP (a)), - &step, &name_expansions); + tree_to_aff_combination_expand (DR_STEP (a), TREE_TYPE (DR_STEP (a)), &step, + cache_ptr); return aff_combination_constant_multiple_p (&diff, &step, off); } @@ -750,9 +777,9 @@ last_always_executed_block (class loop *loop) /* Splits dependence graph on DATAREFS described by DEPENDS to components. */ static struct component * -split_data_refs_to_components (class loop *loop, - vec<data_reference_p> datarefs, - vec<ddr_p> depends) +split_data_refs_to_components (class loop *loop, vec<data_reference_p> datarefs, + vec<ddr_p> depends, + tree_expand_map_t **cache_ptr) { unsigned i, n = datarefs.length (); unsigned ca, ia, ib, bad; @@ -827,7 +854,7 @@ split_data_refs_to_components (class loop *loop, if (DR_IS_READ (dra) && DR_IS_READ (drb)) { if (ia == bad || ib == bad - || !determine_offset (dra, drb, &dummy_off)) + || !determine_offset (dra, drb, &dummy_off, cache_ptr)) continue; } /* If A is read and B write or vice versa and there is unsuitable @@ -842,7 +869,7 @@ split_data_refs_to_components (class loop *loop, bitmap_set_bit (no_store_store_comps, ib); continue; } - else if (!determine_offset (dra, drb, &dummy_off)) + else if (!determine_offset (dra, drb, &dummy_off, cache_ptr)) { bitmap_set_bit (no_store_store_comps, ib); merge_comps (comp_father, comp_size, bad, ia); @@ -856,7 +883,7 @@ split_data_refs_to_components (class loop *loop, bitmap_set_bit (no_store_store_comps, ia); continue; } - else if (!determine_offset (dra, drb, &dummy_off)) + else if (!determine_offset (dra, drb, &dummy_off, cache_ptr)) { bitmap_set_bit (no_store_store_comps, ia); merge_comps (comp_father, comp_size, bad, ib); @@ -865,7 +892,7 @@ split_data_refs_to_components (class loop *loop, } else if (DR_IS_WRITE (dra) && DR_IS_WRITE (drb) && ia != bad && ib != bad - && !determine_offset (dra, drb, &dummy_off)) + && !determine_offset (dra, drb, &dummy_off, cache_ptr)) { merge_comps (comp_father, comp_size, bad, ia); merge_comps (comp_father, comp_size, bad, ib); @@ -949,7 +976,7 @@ end: loop. */ static bool -suitable_component_p (class loop *loop, struct component *comp) +suitable_component_p (class loop *loop, struct component *comp, tree_expand_map_t **cache_ptr) { unsigned i; dref a, first; @@ -980,7 +1007,7 @@ suitable_component_p (class loop *loop, struct component *comp) /* Polynomial offsets are no use, since we need to know the gap between iteration numbers at compile time. */ poly_widest_int offset; - if (!determine_offset (first->ref, a->ref, &offset) + if (!determine_offset (first->ref, a->ref, &offset, cache_ptr) || !offset.is_constant (&a->offset)) return false; @@ -1005,14 +1032,15 @@ suitable_component_p (class loop *loop, struct component *comp) the beginning of this file. LOOP is the current loop. */ static struct component * -filter_suitable_components (class loop *loop, struct component *comps) +filter_suitable_components (class loop *loop, struct component *comps, + tree_expand_map_t **cache_ptr) { struct component **comp, *act; for (comp = &comps; *comp; ) { act = *comp; - if (suitable_component_p (loop, act)) + if (suitable_component_p (loop, act, cache_ptr)) comp = &act->next; else { @@ -1206,8 +1234,8 @@ name_for_ref (dref ref) iterations of the innermost enclosing loop). */ static bool -valid_initializer_p (struct data_reference *ref, - unsigned distance, struct data_reference *root) +valid_initializer_p (struct data_reference *ref, unsigned distance, + struct data_reference *root, tree_expand_map_t **cache_ptr) { aff_tree diff, base, step; poly_widest_int off; @@ -1228,13 +1256,13 @@ valid_initializer_p (struct data_reference *ref, /* Verify that this index of REF is equal to the root's index at -DISTANCE-th iteration. */ - aff_combination_dr_offset (root, &diff); - aff_combination_dr_offset (ref, &base); + aff_combination_dr_offset (root, &diff, cache_ptr); + aff_combination_dr_offset (ref, &base, cache_ptr); aff_combination_scale (&base, -1); aff_combination_add (&diff, &base); tree_to_aff_combination_expand (DR_STEP (root), TREE_TYPE (DR_STEP (root)), - &step, &name_expansions); + &step, cache_ptr); if (!aff_combination_constant_multiple_p (&diff, &step, &off)) return false; @@ -1250,7 +1278,8 @@ valid_initializer_p (struct data_reference *ref, is the root of the current chain. */ static gphi * -find_looparound_phi (class loop *loop, dref ref, dref root) +find_looparound_phi (class loop *loop, dref ref, dref root, + tree_expand_map_t **cache_ptr) { tree name, init, init_ref; gphi *phi = NULL; @@ -1303,7 +1332,7 @@ find_looparound_phi (class loop *loop, dref ref, dref root) init_stmt)) return NULL; - if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref)) + if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref, cache_ptr)) return NULL; return phi; @@ -1339,7 +1368,8 @@ insert_looparound_copy (chain_p chain, dref ref, gphi *phi) (also, it may allow us to combine chains together). */ static void -add_looparound_copies (class loop *loop, chain_p chain) +add_looparound_copies (class loop *loop, chain_p chain, bitmap looparound_phis, + tree_expand_map_t **cache_ptr) { unsigned i; dref ref, root = get_chain_root (chain); @@ -1350,7 +1380,7 @@ add_looparound_copies (class loop *loop, chain_p chain) FOR_EACH_VEC_ELT (chain->refs, i, ref) { - phi = find_looparound_phi (loop, ref, root); + phi = find_looparound_phi (loop, ref, root, cache_ptr); if (!phi) continue; @@ -1364,9 +1394,9 @@ add_looparound_copies (class loop *loop, chain_p chain) loop. */ static void -determine_roots_comp (class loop *loop, - struct component *comp, - vec<chain_p> *chains) +determine_roots_comp (class loop *loop, struct component *comp, + vec<chain_p> *chains, bitmap looparound_phis, + tree_expand_map_t **cache_ptr) { unsigned i; dref a; @@ -1422,7 +1452,7 @@ determine_roots_comp (class loop *loop, { if (nontrivial_chain_p (chain)) { - add_looparound_copies (loop, chain); + add_looparound_copies (loop, chain, looparound_phis, cache_ptr); chains->safe_push (chain); } else @@ -1443,7 +1473,7 @@ determine_roots_comp (class loop *loop, if (nontrivial_chain_p (chain)) { - add_looparound_copies (loop, chain); + add_looparound_copies (loop, chain, looparound_phis, cache_ptr); chains->safe_push (chain); } else @@ -1454,13 +1484,14 @@ determine_roots_comp (class loop *loop, separates the references to CHAINS. LOOP is the current loop. */ static void -determine_roots (class loop *loop, - struct component *comps, vec<chain_p> *chains) +determine_roots (class loop *loop, struct component *comps, + vec<chain_p> *chains, bitmap looparound_phis, + tree_expand_map_t **cache_ptr) { struct component *comp; for (comp = comps; comp; comp = comp->next) - determine_roots_comp (loop, comp, chains); + determine_roots_comp (loop, comp, chains, looparound_phis, cache_ptr); } /* Replace the reference in statement STMT with temporary variable @@ -2028,7 +2059,7 @@ execute_load_motion (class loop *loop, chain_p chain, bitmap tmp_vars) such statement, or more statements, NULL is returned. */ static gimple * -single_nonlooparound_use (tree name) +single_nonlooparound_use (tree name, bitmap looparound_phis) { use_operand_p use; imm_use_iterator it; @@ -2063,7 +2094,7 @@ single_nonlooparound_use (tree name) used. */ static void -remove_stmt (gimple *stmt) +remove_stmt (gimple *stmt, bitmap looparound_phis) { tree name; gimple *next; @@ -2072,7 +2103,7 @@ remove_stmt (gimple *stmt) if (gimple_code (stmt) == GIMPLE_PHI) { name = PHI_RESULT (stmt); - next = single_nonlooparound_use (name); + next = single_nonlooparound_use (name, looparound_phis); reset_debug_uses (stmt); psi = gsi_for_stmt (stmt); remove_phi_node (&psi, true); @@ -2094,7 +2125,7 @@ remove_stmt (gimple *stmt) name = gimple_assign_lhs (stmt); if (TREE_CODE (name) == SSA_NAME) { - next = single_nonlooparound_use (name); + next = single_nonlooparound_use (name, looparound_phis); reset_debug_uses (stmt); } else @@ -2122,7 +2153,7 @@ remove_stmt (gimple *stmt) static void execute_pred_commoning_chain (class loop *loop, chain_p chain, - bitmap tmp_vars) + bitmap tmp_vars, bitmap looparound_phis) { unsigned i; dref a; @@ -2172,7 +2203,7 @@ execute_pred_commoning_chain (class loop *loop, chain_p chain, if (last_store_p) last_store_p = false; else - remove_stmt (a->stmt); + remove_stmt (a->stmt, looparound_phis); continue; } @@ -2252,8 +2283,8 @@ determine_unroll_factor (vec<chain_p> chains) Uids of the newly created temporary variables are marked in TMP_VARS. */ static void -execute_pred_commoning (class loop *loop, vec<chain_p> chains, - bitmap tmp_vars) +execute_pred_commoning (class loop *loop, vec<chain_p> chains, bitmap tmp_vars, + bitmap looparound_phis) { chain_p chain; unsigned i; @@ -2263,7 +2294,7 @@ execute_pred_commoning (class loop *loop, vec<chain_p> chains, if (chain->type == CT_INVARIANT) execute_load_motion (loop, chain, tmp_vars); else - execute_pred_commoning_chain (loop, chain, tmp_vars); + execute_pred_commoning_chain (loop, chain, tmp_vars, looparound_phis); } FOR_EACH_VEC_ELT (chains, i, chain) @@ -2277,7 +2308,7 @@ execute_pred_commoning (class loop *loop, vec<chain_p> chains, dref a; unsigned j; for (j = 1; chain->refs.iterate (j, &a); j++) - remove_stmt (a->stmt); + remove_stmt (a->stmt, looparound_phis); } } } @@ -2330,6 +2361,7 @@ struct epcc_data { vec<chain_p> chains; bitmap tmp_vars; + bitmap looparound_phis; }; static void @@ -2341,7 +2373,8 @@ execute_pred_commoning_cbck (class loop *loop, void *data) tree_transform_and_unroll_loop (see detailed description in tree_predictive_commoning_loop). */ replace_names_by_phis (dta->chains); - execute_pred_commoning (loop, dta->chains, dta->tmp_vars); + execute_pred_commoning (loop, dta->chains, dta->tmp_vars, + dta->looparound_phis); } /* Base NAME and all the names in the chain of phi nodes that use it @@ -2434,7 +2467,7 @@ chain_can_be_combined_p (chain_p chain) statement. */ static gimple * -find_use_stmt (tree *name) +find_use_stmt (tree *name, bitmap looparound_phis) { gimple *stmt; tree rhs, lhs; @@ -2442,7 +2475,7 @@ find_use_stmt (tree *name) /* Skip over assignments. */ while (1) { - stmt = single_nonlooparound_use (*name); + stmt = single_nonlooparound_use (*name, looparound_phis); if (!stmt) return NULL; @@ -2487,7 +2520,7 @@ may_reassociate_p (tree type, enum tree_code code) is stored in DISTANCE. */ static gimple * -find_associative_operation_root (gimple *stmt, unsigned *distance) +find_associative_operation_root (gimple *stmt, unsigned *distance, bitmap looparound_phis) { tree lhs; gimple *next; @@ -2503,7 +2536,7 @@ find_associative_operation_root (gimple *stmt, unsigned *distance) lhs = gimple_assign_lhs (stmt); gcc_assert (TREE_CODE (lhs) == SSA_NAME); - next = find_use_stmt (&lhs); + next = find_use_stmt (&lhs, looparound_phis); if (!next || gimple_assign_rhs_code (next) != code) break; @@ -2524,25 +2557,25 @@ find_associative_operation_root (gimple *stmt, unsigned *distance) NAME2. */ static gimple * -find_common_use_stmt (tree *name1, tree *name2) +find_common_use_stmt (tree *name1, tree *name2, bitmap looparound_phis) { gimple *stmt1, *stmt2; - stmt1 = find_use_stmt (name1); + stmt1 = find_use_stmt (name1, looparound_phis); if (!stmt1) return NULL; - stmt2 = find_use_stmt (name2); + stmt2 = find_use_stmt (name2, looparound_phis); if (!stmt2) return NULL; if (stmt1 == stmt2) return stmt1; - stmt1 = find_associative_operation_root (stmt1, NULL); + stmt1 = find_associative_operation_root (stmt1, NULL, looparound_phis); if (!stmt1) return NULL; - stmt2 = find_associative_operation_root (stmt2, NULL); + stmt2 = find_associative_operation_root (stmt2, NULL, looparound_phis); if (!stmt2) return NULL; @@ -2555,7 +2588,7 @@ find_common_use_stmt (tree *name1, tree *name2) static bool combinable_refs_p (dref r1, dref r2, - enum tree_code *code, bool *swap, tree *rslt_type) + enum tree_code *code, bool *swap, tree *rslt_type, bitmap looparound_phis) { enum tree_code acode; bool aswap; @@ -2567,7 +2600,7 @@ combinable_refs_p (dref r1, dref r2, name2 = name_for_ref (r2); gcc_assert (name1 != NULL_TREE && name2 != NULL_TREE); - stmt = find_common_use_stmt (&name1, &name2); + stmt = find_common_use_stmt (&name1, &name2, looparound_phis); if (!stmt /* A simple post-dominance check - make sure the combination @@ -2623,7 +2656,7 @@ remove_name_from_operation (gimple *stmt, tree op) are combined in a single statement, and returns this statement. */ static gimple * -reassociate_to_the_same_stmt (tree name1, tree name2) +reassociate_to_the_same_stmt (tree name1, tree name2, bitmap looparound_phis) { gimple *stmt1, *stmt2, *root1, *root2, *s1, *s2; gassign *new_stmt, *tmp_stmt; @@ -2633,10 +2666,10 @@ reassociate_to_the_same_stmt (tree name1, tree name2) tree type = TREE_TYPE (name1); gimple_stmt_iterator bsi; - stmt1 = find_use_stmt (&name1); - stmt2 = find_use_stmt (&name2); - root1 = find_associative_operation_root (stmt1, &dist1); - root2 = find_associative_operation_root (stmt2, &dist2); + stmt1 = find_use_stmt (&name1, looparound_phis); + stmt2 = find_use_stmt (&name2, looparound_phis); + root1 = find_associative_operation_root (stmt1, &dist1, looparound_phis); + root2 = find_associative_operation_root (stmt2, &dist2, looparound_phis); code = gimple_assign_rhs_code (stmt1); gcc_assert (root1 && root2 && root1 == root2 @@ -2651,22 +2684,22 @@ reassociate_to_the_same_stmt (tree name1, tree name2) while (dist1 > dist2) { - s1 = find_use_stmt (&r1); + s1 = find_use_stmt (&r1, looparound_phis); r1 = gimple_assign_lhs (s1); dist1--; } while (dist2 > dist1) { - s2 = find_use_stmt (&r2); + s2 = find_use_stmt (&r2, looparound_phis); r2 = gimple_assign_lhs (s2); dist2--; } while (s1 != s2) { - s1 = find_use_stmt (&r1); + s1 = find_use_stmt (&r1, looparound_phis); r1 = gimple_assign_lhs (s1); - s2 = find_use_stmt (&r2); + s2 = find_use_stmt (&r2, looparound_phis); r2 = gimple_assign_lhs (s2); } @@ -2708,25 +2741,25 @@ reassociate_to_the_same_stmt (tree name1, tree name2) the expression so that they are used in the same statement. */ static gimple * -stmt_combining_refs (dref r1, dref r2) +stmt_combining_refs (dref r1, dref r2, bitmap looparound_phis) { gimple *stmt1, *stmt2; tree name1 = name_for_ref (r1); tree name2 = name_for_ref (r2); - stmt1 = find_use_stmt (&name1); - stmt2 = find_use_stmt (&name2); + stmt1 = find_use_stmt (&name1, looparound_phis); + stmt2 = find_use_stmt (&name2, looparound_phis); if (stmt1 == stmt2) return stmt1; - return reassociate_to_the_same_stmt (name1, name2); + return reassociate_to_the_same_stmt (name1, name2, looparound_phis); } /* Tries to combine chains CH1 and CH2 together. If this succeeds, the description of the new chain is returned, otherwise we return NULL. */ static chain_p -combine_chains (chain_p ch1, chain_p ch2) +combine_chains (chain_p ch1, chain_p ch2, bitmap looparound_phis) { dref r1, r2, nw; enum tree_code op = ERROR_MARK; @@ -2749,7 +2782,7 @@ combine_chains (chain_p ch1, chain_p ch2) if (r1->distance != r2->distance) return NULL; - if (!combinable_refs_p (r1, r2, &op, &swap, &rslt_type)) + if (!combinable_refs_p (r1, r2, &op, &swap, &rslt_type, looparound_phis)) return NULL; } @@ -2768,7 +2801,7 @@ combine_chains (chain_p ch1, chain_p ch2) && ch2->refs.iterate (i, &r2)); i++) { nw = XCNEW (class dref_d); - nw->stmt = stmt_combining_refs (r1, r2); + nw->stmt = stmt_combining_refs (r1, r2, looparound_phis); nw->distance = r1->distance; new_chain->refs.safe_push (nw); @@ -2817,7 +2850,8 @@ pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2) /* Try to combine the CHAINS in LOOP. */ static void -try_combine_chains (class loop *loop, vec<chain_p> *chains) +try_combine_chains (class loop *loop, vec<chain_p> *chains, + bitmap looparound_phis) { unsigned i, j; chain_p ch1, ch2, cch; @@ -2839,7 +2873,7 @@ try_combine_chains (class loop *loop, vec<chain_p> *chains) if (!chain_can_be_combined_p (ch2)) continue; - cch = combine_chains (ch1, ch2); + cch = combine_chains (ch1, ch2, looparound_phis); if (cch) { worklist.safe_push (cch); @@ -3172,6 +3206,7 @@ insert_init_seqs (class loop *loop, vec<chain_p> chains) } } + /* Performs predictive commoning for LOOP. Sets bit 1<<1 of return value if LOOP was unrolled; Sets bit 1<<2 of return value if loop closed ssa form was corrupted. Non-zero return value indicates some changes were @@ -3180,10 +3215,7 @@ insert_init_seqs (class loop *loop, vec<chain_p> chains) static unsigned tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p) { - vec<data_reference_p> datarefs; - vec<ddr_p> dependences; struct component *components; - vec<chain_p> chains = vNULL; unsigned unroll_factor = 0; class tree_niter_desc desc; bool unroll = false, loop_closed_ssa = false; @@ -3202,31 +3234,23 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p) /* Find the data references and split them into components according to their dependence relations. */ + loop_pcom_info info (loop); auto_vec<loop_p, 3> loop_nest; - dependences.create (10); - datarefs.create (10); - if (! compute_data_dependences_for_loop (loop, true, &loop_nest, &datarefs, - &dependences)) + if (!compute_data_dependences_for_loop (loop, true, &loop_nest, + &info.datarefs, &info.dependences)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Cannot analyze data dependencies\n"); - free_data_refs (datarefs); - free_dependence_relations (dependences); return 0; } if (dump_file && (dump_flags & TDF_DETAILS)) - dump_data_dependence_relations (dump_file, dependences); + dump_data_dependence_relations (dump_file, info.dependences); - components = split_data_refs_to_components (loop, datarefs, dependences); - loop_nest.release (); - free_dependence_relations (dependences); + components = split_data_refs_to_components (loop, info.datarefs, + info.dependences, &info.cache); if (!components) - { - free_data_refs (datarefs); - free_affine_expand_cache (&name_expansions); return 0; - } if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -3235,47 +3259,40 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p) } /* Find the suitable components and split them into chains. */ - components = filter_suitable_components (loop, components); + components = filter_suitable_components (loop, components, &info.cache); auto_bitmap tmp_vars; - looparound_phis = BITMAP_ALLOC (NULL); - determine_roots (loop, components, &chains); + /* Bitmap of ssa names defined by looparound phi nodes covered by chains. */ + auto_bitmap looparound_phis; + determine_roots (loop, components, &info.chains, looparound_phis, &info.cache); release_components (components); - auto cleanup = [&]() { - release_chains (chains); - free_data_refs (datarefs); - BITMAP_FREE (looparound_phis); - free_affine_expand_cache (&name_expansions); - }; - - if (!chains.exists ()) + if (!info.chains.exists ()) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Predictive commoning failed: no suitable chains\n"); - cleanup (); return 0; } - prepare_initializers (loop, chains); - loop_closed_ssa = prepare_finalizers (loop, chains); + prepare_initializers (loop, info.chains); + loop_closed_ssa = prepare_finalizers (loop, info.chains); /* Try to combine the chains that are always worked with together. */ - try_combine_chains (loop, &chains); + try_combine_chains (loop, &info.chains, looparound_phis); - insert_init_seqs (loop, chains); + insert_init_seqs (loop, info.chains); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Before commoning:\n\n"); - dump_chains (dump_file, chains); + dump_chains (dump_file, info.chains); } if (allow_unroll_p) /* Determine the unroll factor, and if the loop should be unrolled, ensure that its number of iterations is divisible by the factor. */ - unroll_factor = determine_unroll_factor (chains); + unroll_factor = determine_unroll_factor (info.chains); if (unroll_factor > 1) unroll = can_unroll_loop_p (loop, unroll_factor, &desc); @@ -3289,8 +3306,9 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p) if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Unrolling %u times.\n", unroll_factor); - dta.chains = chains; + dta.chains = info.chains; dta.tmp_vars = tmp_vars; + dta.looparound_phis = looparound_phis; /* Cfg manipulations performed in tree_transform_and_unroll_loop before execute_pred_commoning_cbck is called may cause phi nodes to be @@ -3298,7 +3316,7 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p) statements. To fix this, we store the ssa names defined by the phi nodes here instead of the phi nodes themselves, and restore the phi nodes in execute_pred_commoning_cbck. A bit hacky. */ - replace_phis_by_defined_names (chains); + replace_phis_by_defined_names (info.chains); edge exit = single_dom_exit (loop); tree_transform_and_unroll_loop (loop, unroll_factor, exit, &desc, @@ -3310,11 +3328,9 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p) if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Executing predictive commoning without unrolling.\n"); - execute_pred_commoning (loop, chains, tmp_vars); + execute_pred_commoning (loop, info.chains, tmp_vars, looparound_phis); } - cleanup (); - return (unroll ? 2 : 1) | (loop_closed_ssa ? 4 : 1); }
On Tue, Jun 8, 2021 at 11:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > on 2021/6/7 下午10:46, Richard Biener wrote: > > On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> As Richi suggested in PR100794, this patch is to remove > >> some unnecessary update_ssa calls with flag > >> TODO_update_ssa_only_virtuals, also do some refactoring. > >> > >> Bootstrapped/regtested on powerpc64le-linux-gnu P9, > >> x86_64-redhat-linux and aarch64-linux-gnu, built well > >> on Power9 ppc64le with --with-build-config=bootstrap-O3, > >> and passed both P8 and P9 SPEC2017 full build with > >> {-O3, -Ofast} + {,-funroll-loops}. > >> > >> Is it ok for trunk? > > > > LGTM, minor comment on the fancy C++: > > > > + auto cleanup = [&]() { > > + release_chains (chains); > > + free_data_refs (datarefs); > > + BITMAP_FREE (looparound_phis); > > + free_affine_expand_cache (&name_expansions); > > + }; > > > > + cleanup (); > > + return 0; > > > > so that could have been > > > > class cleanup { > > ~cleanup() > > { > > release_chains (chains); > > free_data_refs (datarefs); > > BITMAP_FREE (looparound_phis); > > free_affine_expand_cache (&name_expansions); > > } > > } cleanup; > > > > ? Or some other means of adding registering a RAII-style cleanup? > > I mean, we can't wrap it all in > > > > try {...} > > finally {...} > > > > because C++ doesn't have finally. > > > > OK with this tiny part of the C++ refactoring delayed, but we can also simply > > discuss best options. At least for looparound_phis a good cleanup would > > be to pass the bitmap around and use auto_bitmap local to > > tree_predictive_commoning_loop ... > > > > Thanks Richi! One draft (not ready for review) is attached for the further > discussion. It follows the idea of RAII-style cleanup. I noticed that > Martin suggested stepping forward to make tree_predictive_commoning_loop > and its callees into one class (Thanks Martin), since there are not many > this kind of C++-style work functions, I want to double confirm which option > do you guys prefer? > > One point you might have seen is that to make tree_predictive_commoning_loop > and its callees as member functions of one class can avoid to pass bitmap > looparound_phis all around what's in the draft. :) Such general cleanup is of course desired - Giuliano started some of it within GSoC two years ago in the attempt to thread the compilation process. The cleanup then helps to get rid of global state which of course interferes here (and avoids unnecessary use of TLS vars). So yes, encapsulating global state into a class and making accessors member functions is something that is desired (but a lot of mechanical work). Thanks Richard. > BR, > Kewen >
On Tue, Jun 8, 2021 at 1:02 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Jun 8, 2021 at 11:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > > > on 2021/6/7 下午10:46, Richard Biener wrote: > > > On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > >> > > >> Hi, > > >> > > >> As Richi suggested in PR100794, this patch is to remove > > >> some unnecessary update_ssa calls with flag > > >> TODO_update_ssa_only_virtuals, also do some refactoring. > > >> > > >> Bootstrapped/regtested on powerpc64le-linux-gnu P9, > > >> x86_64-redhat-linux and aarch64-linux-gnu, built well > > >> on Power9 ppc64le with --with-build-config=bootstrap-O3, > > >> and passed both P8 and P9 SPEC2017 full build with > > >> {-O3, -Ofast} + {,-funroll-loops}. > > >> > > >> Is it ok for trunk? > > > > > > LGTM, minor comment on the fancy C++: > > > > > > + auto cleanup = [&]() { > > > + release_chains (chains); > > > + free_data_refs (datarefs); > > > + BITMAP_FREE (looparound_phis); > > > + free_affine_expand_cache (&name_expansions); > > > + }; > > > > > > + cleanup (); > > > + return 0; > > > > > > so that could have been > > > > > > class cleanup { > > > ~cleanup() > > > { > > > release_chains (chains); > > > free_data_refs (datarefs); > > > BITMAP_FREE (looparound_phis); > > > free_affine_expand_cache (&name_expansions); > > > } > > > } cleanup; > > > > > > ? Or some other means of adding registering a RAII-style cleanup? > > > I mean, we can't wrap it all in > > > > > > try {...} > > > finally {...} > > > > > > because C++ doesn't have finally. > > > > > > OK with this tiny part of the C++ refactoring delayed, but we can also simply > > > discuss best options. At least for looparound_phis a good cleanup would > > > be to pass the bitmap around and use auto_bitmap local to > > > tree_predictive_commoning_loop ... > > > > > > > Thanks Richi! One draft (not ready for review) is attached for the further > > discussion. It follows the idea of RAII-style cleanup. I noticed that > > Martin suggested stepping forward to make tree_predictive_commoning_loop > > and its callees into one class (Thanks Martin), since there are not many > > this kind of C++-style work functions, I want to double confirm which option > > do you guys prefer? > > > > One point you might have seen is that to make tree_predictive_commoning_loop > > and its callees as member functions of one class can avoid to pass bitmap > > looparound_phis all around what's in the draft. :) > > Such general cleanup is of course desired - Giuliano started some of it within > GSoC two years ago in the attempt to thread the compilation process. The > cleanup then helps to get rid of global state which of course interferes here > (and avoids unnecessary use of TLS vars). > > So yes, encapsulating global state into a class and making accessors > member functions is something that is desired (but a lot of mechanical > work). Btw, the patch you posted is OK with me as well, it achieves the global state removal, too. Richard. > Thanks > Richard. > > > BR, > > Kewen > >
On 6/8/21 3:30 AM, Kewen.Lin wrote: > on 2021/6/7 下午10:46, Richard Biener wrote: >> On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >>> >>> Hi, >>> >>> As Richi suggested in PR100794, this patch is to remove >>> some unnecessary update_ssa calls with flag >>> TODO_update_ssa_only_virtuals, also do some refactoring. >>> >>> Bootstrapped/regtested on powerpc64le-linux-gnu P9, >>> x86_64-redhat-linux and aarch64-linux-gnu, built well >>> on Power9 ppc64le with --with-build-config=bootstrap-O3, >>> and passed both P8 and P9 SPEC2017 full build with >>> {-O3, -Ofast} + {,-funroll-loops}. >>> >>> Is it ok for trunk? >> >> LGTM, minor comment on the fancy C++: >> >> + auto cleanup = [&]() { >> + release_chains (chains); >> + free_data_refs (datarefs); >> + BITMAP_FREE (looparound_phis); >> + free_affine_expand_cache (&name_expansions); >> + }; >> >> + cleanup (); >> + return 0; >> >> so that could have been >> >> class cleanup { >> ~cleanup() >> { >> release_chains (chains); >> free_data_refs (datarefs); >> BITMAP_FREE (looparound_phis); >> free_affine_expand_cache (&name_expansions); >> } >> } cleanup; >> >> ? Or some other means of adding registering a RAII-style cleanup? >> I mean, we can't wrap it all in >> >> try {...} >> finally {...} >> >> because C++ doesn't have finally. >> >> OK with this tiny part of the C++ refactoring delayed, but we can also simply >> discuss best options. At least for looparound_phis a good cleanup would >> be to pass the bitmap around and use auto_bitmap local to >> tree_predictive_commoning_loop ... >> > > Thanks Richi! One draft (not ready for review) is attached for the further > discussion. It follows the idea of RAII-style cleanup. I noticed that > Martin suggested stepping forward to make tree_predictive_commoning_loop > and its callees into one class (Thanks Martin), since there are not many > this kind of C++-style work functions, I want to double confirm which option > do you guys prefer? I meant that not necessarily as something to include in this patch but as a suggestion for a future improvement. If you'd like to tackle it at any point that would be great of course :) In any event, thanks for double-checking! The attached patch looks good to me as well (more for the sake of style than anything else, declaring the class copy ctor and copy assignment = delete would make it clear it's not meant to be copied, although in this case it's unlikely to make a practical difference). > > One point you might have seen is that to make tree_predictive_commoning_loop > and its callees as member functions of one class can avoid to pass bitmap > looparound_phis all around what's in the draft. :) Yes, that would simplify the interfaces of all the functions that the info members are passed to as arguments. Martin > > BR, > Kewen >
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index 5482f50198a..02f911a08bb 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -2280,8 +2280,6 @@ execute_pred_commoning (class loop *loop, vec<chain_p> chains, remove_stmt (a->stmt); } } - - update_ssa (TODO_update_ssa_only_virtuals); } /* For each reference in CHAINS, if its defining statement is @@ -3174,9 +3172,10 @@ insert_init_seqs (class loop *loop, vec<chain_p> chains) } } -/* Performs predictive commoning for LOOP. Sets bit 1<<0 of return value - if LOOP was unrolled; Sets bit 1<<1 of return value if loop closed ssa - form was corrupted. */ +/* Performs predictive commoning for LOOP. Sets bit 1<<1 of return value + if LOOP was unrolled; Sets bit 1<<2 of return value if loop closed ssa + form was corrupted. Non-zero return value indicates some changes were + applied to this loop. */ static unsigned tree_predictive_commoning_loop (class loop *loop) @@ -3188,7 +3187,6 @@ tree_predictive_commoning_loop (class loop *loop) unsigned unroll_factor; class tree_niter_desc desc; bool unroll = false, loop_closed_ssa = false; - edge exit; if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Processing loop %d\n", loop->num); @@ -3244,13 +3242,22 @@ tree_predictive_commoning_loop (class loop *loop) determine_roots (loop, components, &chains); release_components (components); + auto cleanup = [&]() { + release_chains (chains); + free_data_refs (datarefs); + BITMAP_FREE (looparound_phis); + free_affine_expand_cache (&name_expansions); + }; + if (!chains.exists ()) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Predictive commoning failed: no suitable chains\n"); - goto end; + cleanup (); + return 0; } + prepare_initializers (loop, chains); loop_closed_ssa = prepare_finalizers (loop, chains); @@ -3268,10 +3275,8 @@ tree_predictive_commoning_loop (class loop *loop) /* Determine the unroll factor, and if the loop should be unrolled, ensure that its number of iterations is divisible by the factor. */ unroll_factor = determine_unroll_factor (chains); - scev_reset (); unroll = (unroll_factor > 1 && can_unroll_loop_p (loop, unroll_factor, &desc)); - exit = single_dom_exit (loop); /* Execute the predictive commoning transformations, and possibly unroll the loop. */ @@ -3285,8 +3290,6 @@ tree_predictive_commoning_loop (class loop *loop) dta.chains = chains; dta.tmp_vars = tmp_vars; - update_ssa (TODO_update_ssa_only_virtuals); - /* Cfg manipulations performed in tree_transform_and_unroll_loop before execute_pred_commoning_cbck is called may cause phi nodes to be reallocated, which is a problem since CHAINS may point to these @@ -3295,6 +3298,7 @@ tree_predictive_commoning_loop (class loop *loop) the phi nodes in execute_pred_commoning_cbck. A bit hacky. */ replace_phis_by_defined_names (chains); + edge exit = single_dom_exit (loop); tree_transform_and_unroll_loop (loop, unroll_factor, exit, &desc, execute_pred_commoning_cbck, &dta); eliminate_temp_copies (loop, tmp_vars); @@ -3307,14 +3311,9 @@ tree_predictive_commoning_loop (class loop *loop) execute_pred_commoning (loop, chains, tmp_vars); } -end: ; - release_chains (chains); - free_data_refs (datarefs); - BITMAP_FREE (looparound_phis); + cleanup (); - free_affine_expand_cache (&name_expansions); - - return (unroll ? 1 : 0) | (loop_closed_ssa ? 2 : 0); + return (unroll ? 2 : 1) | (loop_closed_ssa ? 4 : 1); } /* Runs predictive commoning. */ @@ -3335,12 +3334,19 @@ tree_predictive_commoning (void) if (changed > 0) { - scev_reset (); + ret = TODO_update_ssa_only_virtuals; + /* Some loop(s) got unrolled. */ if (changed > 1) - rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa); + { + scev_reset (); - ret = TODO_cleanup_cfg; + /* Need to fix up loop closed SSA. */ + if (changed >= 4) + rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa); + + ret |= TODO_cleanup_cfg; + } } return ret; @@ -3369,7 +3375,7 @@ const pass_data pass_data_predcom = 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ - TODO_update_ssa_only_virtuals, /* todo_flags_finish */ + 0, /* todo_flags_finish */ }; class pass_predcom : public gimple_opt_pass