Message ID | CAO2gOZWX+A3R0MGvWXYztBokQ=ezHNiS3vR0Cu9cbGPi_102ww@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Sun, Sep 9, 2012 at 12:26 AM, Dehao Chen <dehao@google.com> wrote: > Hi, Diego, > > Thanks a lot for the review. I've updated the patch. > > This patch is large and may easily break builds because it reserves > more complete information for TREE_BLOCK as well as gimple_block (may > trigger bugs that was hided when these info are unavailable). I've > done more rigorous testing to ensure that most bugs are caught before > checking in. > > * Sync to the head and retest all gcc testsuite. > * Port the patch to google-4_7 branch to retest all gcc testsuite, as > well as build many large applications. > > Through these tests, I've found two additional bugs that was omitted > in the original implementation. A new patch is attached (patch.txt) to > fix these problems. After this fix, all gcc testsuites pass for both > trunk and google-4_7 branch. I've also copy pasted the new fixes > (lto.c and tree-cfg.c) below. Now I'd say this patch is in good shape. > But it may not be perfect. I'll look into build failures as soon as it > arises. > > Richard and Diego, could you help me take a look at the following two fixes? > > Thanks, > Dehao > > New fixes: > --- gcc/lto/lto.c (revision 191083) > +++ gcc/lto/lto.c (working copy) > @@ -1559,8 +1559,6 @@ lto_fixup_prevailing_decls (tree t) > { > enum tree_code code = TREE_CODE (t); > LTO_NO_PREVAIL (TREE_TYPE (t)); > - if (CODE_CONTAINS_STRUCT (code, TS_COMMON)) > - LTO_NO_PREVAIL (TREE_CHAIN (t)); That change is odd. Can you show us how it breaks? > if (DECL_P (t)) > { > LTO_NO_PREVAIL (DECL_NAME (t)); > > Index: gcc/tree-cfg.c > =================================================================== > --- gcc/tree-cfg.c (revision 191083) > +++ gcc/tree-cfg.c (working copy) > @@ -5980,9 +5974,21 @@ move_stmt_op (tree *tp, int *walk_subtrees, void * > tree t = *tp; > > if (EXPR_P (t)) > - /* We should never have TREE_BLOCK set on non-statements. */ > - gcc_assert (!TREE_BLOCK (t)); > - > + { > + tree block = TREE_BLOCK (t); > + if (p->orig_block == NULL_TREE > + || block == p->orig_block > + || block == NULL_TREE) > + TREE_SET_BLOCK (t, p->new_block); > +#ifdef ENABLE_CHECKING > + else if (block != p->new_block) > + { > + while (block && block != p->orig_block) > + block = BLOCK_SUPERCONTEXT (block); > + gcc_assert (block); > + } > +#endif I think what this means is that TREE_BLOCK on non-stmts are meaningless (thus only gimple_block is interesting on GIMPLE, not BLOCKs on trees). So instead of setting a BLOCK in some cases you should clear BLOCK if it happens to be set, or alternatively, only re-set it if there was a block associated with it. Richard. > + } > else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME) > { > if (TREE_CODE (t) == SSA_NAME) > > Whole patch: > gcc/ChangeLog: > 2012-09-08 Dehao Chen <dehao@google.com> > > * toplev.c (general_init): Init block_locations. > * tree.c (tree_set_block): New. > (tree_block): Change to use LOCATION_BLOCK. > * tree.h (TREE_SET_BLOCK): New. > * final.c (reemit_insn_block_notes): Change to use LOCATION_BLOCK. > (final_start_function): Likewise. > * input.c (expand_location_1): Likewise. > * input.h (LOCATION_LOCUS): New. > (LOCATION_BLOCK): New. > (IS_UNKNOWN_LOCATION): New. > * fold-const.c (expr_location_or): Change to use new location. > * reorg.c (emit_delay_sequence): Likewise. > (try_merge_delay_insns): Likewise. > * modulo-sched.c (dump_insn_location): Likewise. > * lto-streamer-out.c (lto_output_location_bitpack): Likewise. > * jump.c (rtx_renumbered_equal_p): Likewise. > * ifcvt.c (noce_try_move): Likewise. > (noce_try_store_flag): Likewise. > (noce_try_store_flag_constants): Likewise. > (noce_try_addcc): Likewise. > (noce_try_store_flag_mask): Likewise. > (noce_try_cmove): Likewise. > (noce_try_cmove_arith): Likewise. > (noce_try_minmax): Likewise. > (noce_try_abs): Likewise. > (noce_try_sign_mask): Likewise. > (noce_try_bitop): Likewise. > (noce_process_if_block): Likewise. > (cond_move_process_if_block): Likewise. > (find_cond_trap): Likewise. > * dwarf2out.c (add_src_coords_attributes): Likewise. > * expr.c (expand_expr_real): Likewise. > * tree-parloops.c (create_loop_fn): Likewise. > * recog.c (peep2_attempt): Likewise. > * function.c (free_after_compilation): Likewise. > (expand_function_end): Likewise. > (set_insn_locations): Likewise. > (thread_prologue_and_epilogue_insns): Likewise. > * print-rtl.c (print_rtx): Likewise. > * profile.c (branch_prob): Likewise. > * trans-mem.c (ipa_tm_scan_irr_block): Likewise. > * gimplify.c (gimplify_call_expr): Likewise. > * except.c (duplicate_eh_regions_1): Likewise. > * emit-rtl.c (try_split): Likewise. > (make_insn_raw): Likewise. > (make_debug_insn_raw): Likewise. > (make_jump_insn_raw): Likewise. > (make_call_insn_raw): Likewise. > (emit_pattern_after_setloc): Likewise. > (emit_pattern_after): Likewise. > (emit_debug_insn_after): Likewise. > (emit_pattern_before): Likewise. > (emit_insn_before_setloc): Likewise. > (emit_jump_insn_before): Likewise. > (emit_call_insn_before_setloc): Likewise. > (emit_call_insn_before): Likeise. > (emit_debug_insn_before_setloc): Likewise. > (emit_copy_of_insn_after): Likewise. > (insn_locators_alloc): Remove. > (insn_locators_finalize): Remove. > (insn_locators_free): Remove. > (set_curr_insn_source_location): Remove. > (get_curr_insn_source_location): Remove. > (set_curr_insn_block): Remove. > (get_curr_insn_block): Remove. > (locator_scope): Remove. > (insn_scope): Change to use new location. > (locator_location): Remove. > (insn_line): Change to use new location. > (locator_file): Remove. > (insn_file): Change to use new location. > (locator_eq): Remove. > (insn_locations_init): New. > (insn_locations_finalize): New. > (set_curr_insn_location): New. > (curr_insn_location): New. > * cfgexpand.c (gimple_assign_rhs_to_tree): Change to use new location. > (expand_gimple_cond): Likewise. > (expand_call_stmt): Likewise. > (expand_gimple_stmt_1): Likewise. > (expand_gimple_basic_block): Likewise. > (construct_exit_block): Likewise. > (gimple_expand_cfg): Likewise. > * cfgcleanup.c (try_forward_edges): Likewise. > * tree-ssa-live.c (remove_unused_scope_block_p): Likewise. > (dump_scope_block): Likewise. > (remove_unused_locals): Likewise. > * rtl.c (rtx_equal_p_cb): Likewise. > (rtx_equal_p): Likewise. > * rtl.h (XUINT): New. > (INSN_LOCATOR): Remove. > (CURR_INSN_LOCATION): Remove. > (INSN_LOCATION): New. > (INSN_HAS_LOCATION): New. > * tree-inline.c (remap_gimple_op_r): Change to use new location. > (copy_tree_body_r): Likewise. > (copy_phis_for_bb): Likewise. > (expand_call_inline): Likewise. > * tree-streamer-in.c (lto_input_ts_exp_tree_pointers): Likewise. > * tree-streamer-out.c (write_ts_decl_minimal_tree_pointers): Likewise. > * gimple-streamer-out.c (output_gimple_stmt): Likewise. > * combine.c (try_combine): Likewise. > * tree-outof-ssa.c (set_location_for_edge): Likewise. > (insert_partition_copy_on_edge): Likewise. > (insert_value_copy_on_edge): Likewise. > (insert_rtx_to_part_on_edge): Likewise. > (insert_part_to_rtx_on_edge): Likewise. > * basic-block.h (edge_def): Remove field. > * gimple.h (gimple_statement_base): Remove field. > (gimple_bb): Change to use new location. > (gimple_set_block): Likewise. > (gimple_has_location): Likewise. > * tree-cfg.c (make_cond_expr_edges): Likewise. > (make_goto_expr_edges): Likewise. > (gimple_can_merge_blocks_p): Likewise. > (move_stmt_op): Likewise. > (move_block_to_fn): Likewise. > * config/alpha/alpha.c (alpha_output_mi_thunk_osf): Likewise. > * config/sparc/sparc.c (sparc_output_mi_thunk): Likewise. > * config/i386/i386.c (x86_output_mi_thunk): Likewise. > * config/tilegx/tilegx.c (tilegx_output_mi_thunk): Likewise. > * config/sh/sh.c (sh_output_mi_thunk): Likewise. > * config/ia64/ia64.c (ia64_output_mi_thunk): Likewise. > * config/rs6000/rs6000.c (rs6000_output_mi_thunk): Likewise. > * config/score/score.c (score_output_mi_thunk): Likewise. > * config/tilepro/tilepro.c (tilepro_asm_output_mi_thunk): Likewise. > * config/mips/mips.c (mips_output_mi_thunk): Likewise. > * cfgrtl.c (unique_locus_on_edge_between_p): Likewise. > (unique_locus_on_edge_between_p): Likewise. > (emit_nop_for_unique_locus_between): Likewise. > (force_nonfallthru_and_redirect): Likewise. > (fixup_reorder_chain): Likewise. > (cfg_layout_merge_blocks): Likewise. > * stmt.c (emit_case_nodes): Likewise. > > gcc/lto/ChangeLog: > 2012-09-08 Dehao Chen <dehao@google.com> > > * lto/lto.c (lto_fixup_prevailing_decls): Remove tree.exp.block field. > > libcpp/ChangeLog: > 2012-09-08 Dehao Chen <dehao@google.com> > > * include/line-map.h (MAX_SOURCE_LOCATION): New value. > (location_adhoc_data_init): New. > (location_adhoc_data_fini): New. > (get_combined_adhoc_loc): New. > (get_data_from_adhoc_loc): New. > (get_location_from_adhoc_loc): New. > (COMBINE_LOCATION_DATA): New. > (IS_ADHOC_LOC): New. > (expanded_location): New field. > * line-map.c (location_adhoc_data): New. > (location_adhoc_data_htab): New. > (curr_adhoc_loc): New. > (location_adhoc_data): New. > (allocated_location_adhoc_data): New. > (location_adhoc_data_hash): New. > (location_adhoc_data_eq): New. > (location_adhoc_data_update): New. > (get_combined_adhoc_loc): New. > (get_data_from_adhoc_loc): New. > (get_location_from_adhoc_loc): New. > (location_adhoc_data_init): New. > (location_adhoc_data_fini): New. > (linemap_lookup): Change to use new location. > (linemap_ordinary_map_lookup): Likewise. > (linemap_macro_map_lookup): Likewise. > (linemap_macro_map_loc_to_def_point): Likewise. > (linemap_macro_map_loc_unwind_toward_spel): Likewise. > (linemap_get_expansion_line): Likewise. > (linemap_get_expansion_filename): Likewise. > (linemap_location_in_system_header_p): Likewise. > (linemap_location_from_macro_expansion_p): Likewise. > (linemap_macro_loc_to_spelling_point): Likewise. > (linemap_macro_loc_to_def_point): Likewise. > (linemap_macro_loc_to_exp_point): Likewise. > (linemap_resolve_location): Likewise. > (linemap_unwind_toward_expansion): Likewise. > (linemap_unwind_to_first_non_reserved_loc): Likewise. > (linemap_expand_location): Likewise. > (linemap_dump_location): Likewise.
On Mon, Sep 10, 2012 at 3:01 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Sun, Sep 9, 2012 at 12:26 AM, Dehao Chen <dehao@google.com> wrote: >> Hi, Diego, >> >> Thanks a lot for the review. I've updated the patch. >> >> This patch is large and may easily break builds because it reserves >> more complete information for TREE_BLOCK as well as gimple_block (may >> trigger bugs that was hided when these info are unavailable). I've >> done more rigorous testing to ensure that most bugs are caught before >> checking in. >> >> * Sync to the head and retest all gcc testsuite. >> * Port the patch to google-4_7 branch to retest all gcc testsuite, as >> well as build many large applications. >> >> Through these tests, I've found two additional bugs that was omitted >> in the original implementation. A new patch is attached (patch.txt) to >> fix these problems. After this fix, all gcc testsuites pass for both >> trunk and google-4_7 branch. I've also copy pasted the new fixes >> (lto.c and tree-cfg.c) below. Now I'd say this patch is in good shape. >> But it may not be perfect. I'll look into build failures as soon as it >> arises. >> >> Richard and Diego, could you help me take a look at the following two fixes? >> >> Thanks, >> Dehao >> >> New fixes: >> --- gcc/lto/lto.c (revision 191083) >> +++ gcc/lto/lto.c (working copy) >> @@ -1559,8 +1559,6 @@ lto_fixup_prevailing_decls (tree t) >> { >> enum tree_code code = TREE_CODE (t); >> LTO_NO_PREVAIL (TREE_TYPE (t)); >> - if (CODE_CONTAINS_STRUCT (code, TS_COMMON)) >> - LTO_NO_PREVAIL (TREE_CHAIN (t)); > > That change is odd. Can you show us how it breaks? This will break LTO build of gcc.c-torture/execute/pr38051.c There is data structure like: union { long int l; char c[sizeof (long int)]; } u; Once the block info is reserved for this, it'll reserve this data structure. And inside this data structure, there is VAR_DECL. Thus LTO_NO_PREVAIL assertion does not satisfy here for TREE_CHAIN (t). > >> if (DECL_P (t)) >> { >> LTO_NO_PREVAIL (DECL_NAME (t)); >> >> Index: gcc/tree-cfg.c >> =================================================================== >> --- gcc/tree-cfg.c (revision 191083) >> +++ gcc/tree-cfg.c (working copy) >> @@ -5980,9 +5974,21 @@ move_stmt_op (tree *tp, int *walk_subtrees, void * >> tree t = *tp; >> >> if (EXPR_P (t)) >> - /* We should never have TREE_BLOCK set on non-statements. */ >> - gcc_assert (!TREE_BLOCK (t)); >> - >> + { >> + tree block = TREE_BLOCK (t); >> + if (p->orig_block == NULL_TREE >> + || block == p->orig_block >> + || block == NULL_TREE) >> + TREE_SET_BLOCK (t, p->new_block); >> +#ifdef ENABLE_CHECKING >> + else if (block != p->new_block) >> + { >> + while (block && block != p->orig_block) >> + block = BLOCK_SUPERCONTEXT (block); >> + gcc_assert (block); >> + } >> +#endif > > I think what this means is that TREE_BLOCK on non-stmts are meaningless > (thus only gimple_block is interesting on GIMPLE, not BLOCKs on trees). > > So instead of setting a BLOCK in some cases you should clear BLOCK > if it happens to be set, or alternatively, only re-set it if there was > a block associated > with it. Yeah, makes sense. New change: @@ -5980,9 +5974,10 @@ tree t = *tp; if (EXPR_P (t)) - /* We should never have TREE_BLOCK set on non-statements. */ - gcc_assert (!TREE_BLOCK (t)); - + { + if (TREE_BLOCK (t)) + TREE_SET_BLOCK (t, p->new_block); + } else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME) { if (TREE_CODE (t) == SSA_NAME) Thanks, Dehao > > Richard. > >> + } >> else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME) >> { >> if (TREE_CODE (t) == SSA_NAME) >> >> Whole patch: >> gcc/ChangeLog: >> 2012-09-08 Dehao Chen <dehao@google.com> >> >> * toplev.c (general_init): Init block_locations. >> * tree.c (tree_set_block): New. >> (tree_block): Change to use LOCATION_BLOCK. >> * tree.h (TREE_SET_BLOCK): New. >> * final.c (reemit_insn_block_notes): Change to use LOCATION_BLOCK. >> (final_start_function): Likewise. >> * input.c (expand_location_1): Likewise. >> * input.h (LOCATION_LOCUS): New. >> (LOCATION_BLOCK): New. >> (IS_UNKNOWN_LOCATION): New. >> * fold-const.c (expr_location_or): Change to use new location. >> * reorg.c (emit_delay_sequence): Likewise. >> (try_merge_delay_insns): Likewise. >> * modulo-sched.c (dump_insn_location): Likewise. >> * lto-streamer-out.c (lto_output_location_bitpack): Likewise. >> * jump.c (rtx_renumbered_equal_p): Likewise. >> * ifcvt.c (noce_try_move): Likewise. >> (noce_try_store_flag): Likewise. >> (noce_try_store_flag_constants): Likewise. >> (noce_try_addcc): Likewise. >> (noce_try_store_flag_mask): Likewise. >> (noce_try_cmove): Likewise. >> (noce_try_cmove_arith): Likewise. >> (noce_try_minmax): Likewise. >> (noce_try_abs): Likewise. >> (noce_try_sign_mask): Likewise. >> (noce_try_bitop): Likewise. >> (noce_process_if_block): Likewise. >> (cond_move_process_if_block): Likewise. >> (find_cond_trap): Likewise. >> * dwarf2out.c (add_src_coords_attributes): Likewise. >> * expr.c (expand_expr_real): Likewise. >> * tree-parloops.c (create_loop_fn): Likewise. >> * recog.c (peep2_attempt): Likewise. >> * function.c (free_after_compilation): Likewise. >> (expand_function_end): Likewise. >> (set_insn_locations): Likewise. >> (thread_prologue_and_epilogue_insns): Likewise. >> * print-rtl.c (print_rtx): Likewise. >> * profile.c (branch_prob): Likewise. >> * trans-mem.c (ipa_tm_scan_irr_block): Likewise. >> * gimplify.c (gimplify_call_expr): Likewise. >> * except.c (duplicate_eh_regions_1): Likewise. >> * emit-rtl.c (try_split): Likewise. >> (make_insn_raw): Likewise. >> (make_debug_insn_raw): Likewise. >> (make_jump_insn_raw): Likewise. >> (make_call_insn_raw): Likewise. >> (emit_pattern_after_setloc): Likewise. >> (emit_pattern_after): Likewise. >> (emit_debug_insn_after): Likewise. >> (emit_pattern_before): Likewise. >> (emit_insn_before_setloc): Likewise. >> (emit_jump_insn_before): Likewise. >> (emit_call_insn_before_setloc): Likewise. >> (emit_call_insn_before): Likeise. >> (emit_debug_insn_before_setloc): Likewise. >> (emit_copy_of_insn_after): Likewise. >> (insn_locators_alloc): Remove. >> (insn_locators_finalize): Remove. >> (insn_locators_free): Remove. >> (set_curr_insn_source_location): Remove. >> (get_curr_insn_source_location): Remove. >> (set_curr_insn_block): Remove. >> (get_curr_insn_block): Remove. >> (locator_scope): Remove. >> (insn_scope): Change to use new location. >> (locator_location): Remove. >> (insn_line): Change to use new location. >> (locator_file): Remove. >> (insn_file): Change to use new location. >> (locator_eq): Remove. >> (insn_locations_init): New. >> (insn_locations_finalize): New. >> (set_curr_insn_location): New. >> (curr_insn_location): New. >> * cfgexpand.c (gimple_assign_rhs_to_tree): Change to use new location. >> (expand_gimple_cond): Likewise. >> (expand_call_stmt): Likewise. >> (expand_gimple_stmt_1): Likewise. >> (expand_gimple_basic_block): Likewise. >> (construct_exit_block): Likewise. >> (gimple_expand_cfg): Likewise. >> * cfgcleanup.c (try_forward_edges): Likewise. >> * tree-ssa-live.c (remove_unused_scope_block_p): Likewise. >> (dump_scope_block): Likewise. >> (remove_unused_locals): Likewise. >> * rtl.c (rtx_equal_p_cb): Likewise. >> (rtx_equal_p): Likewise. >> * rtl.h (XUINT): New. >> (INSN_LOCATOR): Remove. >> (CURR_INSN_LOCATION): Remove. >> (INSN_LOCATION): New. >> (INSN_HAS_LOCATION): New. >> * tree-inline.c (remap_gimple_op_r): Change to use new location. >> (copy_tree_body_r): Likewise. >> (copy_phis_for_bb): Likewise. >> (expand_call_inline): Likewise. >> * tree-streamer-in.c (lto_input_ts_exp_tree_pointers): Likewise. >> * tree-streamer-out.c (write_ts_decl_minimal_tree_pointers): Likewise. >> * gimple-streamer-out.c (output_gimple_stmt): Likewise. >> * combine.c (try_combine): Likewise. >> * tree-outof-ssa.c (set_location_for_edge): Likewise. >> (insert_partition_copy_on_edge): Likewise. >> (insert_value_copy_on_edge): Likewise. >> (insert_rtx_to_part_on_edge): Likewise. >> (insert_part_to_rtx_on_edge): Likewise. >> * basic-block.h (edge_def): Remove field. >> * gimple.h (gimple_statement_base): Remove field. >> (gimple_bb): Change to use new location. >> (gimple_set_block): Likewise. >> (gimple_has_location): Likewise. >> * tree-cfg.c (make_cond_expr_edges): Likewise. >> (make_goto_expr_edges): Likewise. >> (gimple_can_merge_blocks_p): Likewise. >> (move_stmt_op): Likewise. >> (move_block_to_fn): Likewise. >> * config/alpha/alpha.c (alpha_output_mi_thunk_osf): Likewise. >> * config/sparc/sparc.c (sparc_output_mi_thunk): Likewise. >> * config/i386/i386.c (x86_output_mi_thunk): Likewise. >> * config/tilegx/tilegx.c (tilegx_output_mi_thunk): Likewise. >> * config/sh/sh.c (sh_output_mi_thunk): Likewise. >> * config/ia64/ia64.c (ia64_output_mi_thunk): Likewise. >> * config/rs6000/rs6000.c (rs6000_output_mi_thunk): Likewise. >> * config/score/score.c (score_output_mi_thunk): Likewise. >> * config/tilepro/tilepro.c (tilepro_asm_output_mi_thunk): Likewise. >> * config/mips/mips.c (mips_output_mi_thunk): Likewise. >> * cfgrtl.c (unique_locus_on_edge_between_p): Likewise. >> (unique_locus_on_edge_between_p): Likewise. >> (emit_nop_for_unique_locus_between): Likewise. >> (force_nonfallthru_and_redirect): Likewise. >> (fixup_reorder_chain): Likewise. >> (cfg_layout_merge_blocks): Likewise. >> * stmt.c (emit_case_nodes): Likewise. >> >> gcc/lto/ChangeLog: >> 2012-09-08 Dehao Chen <dehao@google.com> >> >> * lto/lto.c (lto_fixup_prevailing_decls): Remove tree.exp.block field. >> >> libcpp/ChangeLog: >> 2012-09-08 Dehao Chen <dehao@google.com> >> >> * include/line-map.h (MAX_SOURCE_LOCATION): New value. >> (location_adhoc_data_init): New. >> (location_adhoc_data_fini): New. >> (get_combined_adhoc_loc): New. >> (get_data_from_adhoc_loc): New. >> (get_location_from_adhoc_loc): New. >> (COMBINE_LOCATION_DATA): New. >> (IS_ADHOC_LOC): New. >> (expanded_location): New field. >> * line-map.c (location_adhoc_data): New. >> (location_adhoc_data_htab): New. >> (curr_adhoc_loc): New. >> (location_adhoc_data): New. >> (allocated_location_adhoc_data): New. >> (location_adhoc_data_hash): New. >> (location_adhoc_data_eq): New. >> (location_adhoc_data_update): New. >> (get_combined_adhoc_loc): New. >> (get_data_from_adhoc_loc): New. >> (get_location_from_adhoc_loc): New. >> (location_adhoc_data_init): New. >> (location_adhoc_data_fini): New. >> (linemap_lookup): Change to use new location. >> (linemap_ordinary_map_lookup): Likewise. >> (linemap_macro_map_lookup): Likewise. >> (linemap_macro_map_loc_to_def_point): Likewise. >> (linemap_macro_map_loc_unwind_toward_spel): Likewise. >> (linemap_get_expansion_line): Likewise. >> (linemap_get_expansion_filename): Likewise. >> (linemap_location_in_system_header_p): Likewise. >> (linemap_location_from_macro_expansion_p): Likewise. >> (linemap_macro_loc_to_spelling_point): Likewise. >> (linemap_macro_loc_to_def_point): Likewise. >> (linemap_macro_loc_to_exp_point): Likewise. >> (linemap_resolve_location): Likewise. >> (linemap_unwind_toward_expansion): Likewise. >> (linemap_unwind_to_first_non_reserved_loc): Likewise. >> (linemap_expand_location): Likewise. >> (linemap_dump_location): Likewise.
Hi, I was curious how the patch behaves memory wise on compilling Mozilla. It actually crashes on: (gdb) bt #0 0x00007fab8cd70945 in raise () from /lib64/libc.so.6 #1 0x00007fab8cd71f21 in abort () from /lib64/libc.so.6 #2 0x0000000000b52330 in linemap_location_from_macro_expansion_p (set=0x7805, location=30725) at ../../libcpp/line-map.c:952 #3 0x0000000000b526fc in linemap_lookup (set=0x7fab8dc34000, line=0) at ../../libcpp/line-map.c:644 #4 0x0000000000776745 in maybe_unwind_expanded_macro_loc (where=0, diagnostic=<optimized out>, context=<optimized out>) at ../../gcc/tree-diagnostic.c:113 #5 virt_loc_aware_diagnostic_finalizer (context=0x11b8a80, diagnostic=0x7fff4d8adf90) at ../../gcc/tree-diagnostic.c:282 #6 0x0000000000b4aa80 in diagnostic_report_diagnostic (context=0x11b8a80, diagnostic=0x7fff4d8adf90) at ../../gcc/diagnostic.c:652 #7 0x0000000000b4acd6 in internal_error (gmsgid=<optimized out>) at ../../gcc/diagnostic.c:957 #8 0x00000000007555c0 in crash_signal (signo=11) at ../../gcc/toplev.c:335 #9 <signal handler called> #10 0x0000000000b526e8 in linemap_lookup (set=0x7fab8dc34000, line=4294967295) at ../../libcpp/line-map.c:643 #11 0x0000000000b530fa in linemap_location_in_system_header_p (set=0x7fab8dc34000, location=4294967295) at ../../libcpp/line-map.c:916 #12 0x0000000000b4a8b2 in diagnostic_report_diagnostic (context=0x11b8a80, diagnostic=0x7fff4d8ae620) at ../../gcc/diagnostic.c:513 #13 0x0000000000b4b462 in warning_at (location=<optimized out>, opt=0, gmsgid=<optimized out>) at ../../gcc/diagnostic.c:805 #14 0x0000000000699679 in lto_symtab_merge_decls_2 (diagnosed_p=<optimized out>, slot=<optimized out>) at ../../gcc/lto-symtab.c:574 #15 lto_symtab_merge_decls_1 (slot=<optimized out>, data=<optimized out>) at ../../gcc/lto-symtab.c:691 #16 0x0000000000bd32e8 in htab_traverse_noresize (htab=<optimized out>, callback=0x698ed0 <lto_symtab_merge_decls_1(void**, void*)>, info=0x0) at ../../libiberty/hashtab.c:784 #17 0x00000000004e2630 in read_cgraph_and_symbols (nfiles=2849, fnames=<optimized out>) at ../../gcc/lto/lto.c:1824 #18 0x00000000004e2b75 in lto_main () at ../../gcc/lto/lto.c:2107 It seems that warning_at is not really able to lookup the position. Honza
Thanks for helping test this. I'll try to build mozzila to check the memory consumption as well as find new bugs. Dehao On Tue, Sep 11, 2012 at 12:41 AM, Jan Hubicka <hubicka@ucw.cz> wrote: > Hi, > I was curious how the patch behaves memory wise on compilling Mozilla. It actually crashes on: > (gdb) bt > #0 0x00007fab8cd70945 in raise () from /lib64/libc.so.6 > #1 0x00007fab8cd71f21 in abort () from /lib64/libc.so.6 > #2 0x0000000000b52330 in linemap_location_from_macro_expansion_p (set=0x7805, location=30725) at ../../libcpp/line-map.c:952 > #3 0x0000000000b526fc in linemap_lookup (set=0x7fab8dc34000, line=0) at ../../libcpp/line-map.c:644 > #4 0x0000000000776745 in maybe_unwind_expanded_macro_loc (where=0, diagnostic=<optimized out>, context=<optimized out>) at ../../gcc/tree-diagnostic.c:113 > #5 virt_loc_aware_diagnostic_finalizer (context=0x11b8a80, diagnostic=0x7fff4d8adf90) at ../../gcc/tree-diagnostic.c:282 > #6 0x0000000000b4aa80 in diagnostic_report_diagnostic (context=0x11b8a80, diagnostic=0x7fff4d8adf90) at ../../gcc/diagnostic.c:652 > #7 0x0000000000b4acd6 in internal_error (gmsgid=<optimized out>) at ../../gcc/diagnostic.c:957 > #8 0x00000000007555c0 in crash_signal (signo=11) at ../../gcc/toplev.c:335 > #9 <signal handler called> > #10 0x0000000000b526e8 in linemap_lookup (set=0x7fab8dc34000, line=4294967295) at ../../libcpp/line-map.c:643 > #11 0x0000000000b530fa in linemap_location_in_system_header_p (set=0x7fab8dc34000, location=4294967295) at ../../libcpp/line-map.c:916 > #12 0x0000000000b4a8b2 in diagnostic_report_diagnostic (context=0x11b8a80, diagnostic=0x7fff4d8ae620) at ../../gcc/diagnostic.c:513 > #13 0x0000000000b4b462 in warning_at (location=<optimized out>, opt=0, gmsgid=<optimized out>) at ../../gcc/diagnostic.c:805 > #14 0x0000000000699679 in lto_symtab_merge_decls_2 (diagnosed_p=<optimized out>, slot=<optimized out>) at ../../gcc/lto-symtab.c:574 > #15 lto_symtab_merge_decls_1 (slot=<optimized out>, data=<optimized out>) at ../../gcc/lto-symtab.c:691 > #16 0x0000000000bd32e8 in htab_traverse_noresize (htab=<optimized out>, callback=0x698ed0 <lto_symtab_merge_decls_1(void**, void*)>, info=0x0) at ../../libiberty/hashtab.c:784 > #17 0x00000000004e2630 in read_cgraph_and_symbols (nfiles=2849, fnames=<optimized out>) at ../../gcc/lto/lto.c:1824 > #18 0x00000000004e2b75 in lto_main () at ../../gcc/lto/lto.c:2107 > > It seems that warning_at is not really able to lookup the position. > > Honza
On Mon, Sep 10, 2012 at 5:27 PM, Dehao Chen <dehao@google.com> wrote: > On Mon, Sep 10, 2012 at 3:01 AM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Sun, Sep 9, 2012 at 12:26 AM, Dehao Chen <dehao@google.com> wrote: >>> Hi, Diego, >>> >>> Thanks a lot for the review. I've updated the patch. >>> >>> This patch is large and may easily break builds because it reserves >>> more complete information for TREE_BLOCK as well as gimple_block (may >>> trigger bugs that was hided when these info are unavailable). I've >>> done more rigorous testing to ensure that most bugs are caught before >>> checking in. >>> >>> * Sync to the head and retest all gcc testsuite. >>> * Port the patch to google-4_7 branch to retest all gcc testsuite, as >>> well as build many large applications. >>> >>> Through these tests, I've found two additional bugs that was omitted >>> in the original implementation. A new patch is attached (patch.txt) to >>> fix these problems. After this fix, all gcc testsuites pass for both >>> trunk and google-4_7 branch. I've also copy pasted the new fixes >>> (lto.c and tree-cfg.c) below. Now I'd say this patch is in good shape. >>> But it may not be perfect. I'll look into build failures as soon as it >>> arises. >>> >>> Richard and Diego, could you help me take a look at the following two fixes? >>> >>> Thanks, >>> Dehao >>> >>> New fixes: >>> --- gcc/lto/lto.c (revision 191083) >>> +++ gcc/lto/lto.c (working copy) >>> @@ -1559,8 +1559,6 @@ lto_fixup_prevailing_decls (tree t) >>> { >>> enum tree_code code = TREE_CODE (t); >>> LTO_NO_PREVAIL (TREE_TYPE (t)); >>> - if (CODE_CONTAINS_STRUCT (code, TS_COMMON)) >>> - LTO_NO_PREVAIL (TREE_CHAIN (t)); >> >> That change is odd. Can you show us how it breaks? > > This will break LTO build of gcc.c-torture/execute/pr38051.c > > There is data structure like: > > union { long int l; char c[sizeof (long int)]; } u; > > Once the block info is reserved for this, it'll reserve this data > structure. And inside this data structure, there is VAR_DECL. Thus > LTO_NO_PREVAIL assertion does not satisfy here for TREE_CHAIN (t). I see - the issue here is that this data structure is not reached at the time we call free_lang_data (via find_decls_types_r). But maybe I do not understand "once the block info is reserved for this". So the patch papers over an issue elsewhere I believe. Maybe Micha can add some clarification here though, how BLOCK_VARS should be visible here Richard. >> >>> if (DECL_P (t)) >>> { >>> LTO_NO_PREVAIL (DECL_NAME (t)); >>> >>> Index: gcc/tree-cfg.c >>> =================================================================== >>> --- gcc/tree-cfg.c (revision 191083) >>> +++ gcc/tree-cfg.c (working copy) >>> @@ -5980,9 +5974,21 @@ move_stmt_op (tree *tp, int *walk_subtrees, void * >>> tree t = *tp; >>> >>> if (EXPR_P (t)) >>> - /* We should never have TREE_BLOCK set on non-statements. */ >>> - gcc_assert (!TREE_BLOCK (t)); >>> - >>> + { >>> + tree block = TREE_BLOCK (t); >>> + if (p->orig_block == NULL_TREE >>> + || block == p->orig_block >>> + || block == NULL_TREE) >>> + TREE_SET_BLOCK (t, p->new_block); >>> +#ifdef ENABLE_CHECKING >>> + else if (block != p->new_block) >>> + { >>> + while (block && block != p->orig_block) >>> + block = BLOCK_SUPERCONTEXT (block); >>> + gcc_assert (block); >>> + } >>> +#endif >> >> I think what this means is that TREE_BLOCK on non-stmts are meaningless >> (thus only gimple_block is interesting on GIMPLE, not BLOCKs on trees). >> >> So instead of setting a BLOCK in some cases you should clear BLOCK >> if it happens to be set, or alternatively, only re-set it if there was >> a block associated >> with it. > > Yeah, makes sense. New change: > > @@ -5980,9 +5974,10 @@ > tree t = *tp; > > if (EXPR_P (t)) > - /* We should never have TREE_BLOCK set on non-statements. */ > - gcc_assert (!TREE_BLOCK (t)); > - > + { > + if (TREE_BLOCK (t)) > + TREE_SET_BLOCK (t, p->new_block); > + } > else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME) > { > if (TREE_CODE (t) == SSA_NAME) > > Thanks, > Dehao > >> >> Richard. >> >>> + } >>> else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME) >>> { >>> if (TREE_CODE (t) == SSA_NAME) >>> >>> Whole patch: >>> gcc/ChangeLog: >>> 2012-09-08 Dehao Chen <dehao@google.com> >>> >>> * toplev.c (general_init): Init block_locations. >>> * tree.c (tree_set_block): New. >>> (tree_block): Change to use LOCATION_BLOCK. >>> * tree.h (TREE_SET_BLOCK): New. >>> * final.c (reemit_insn_block_notes): Change to use LOCATION_BLOCK. >>> (final_start_function): Likewise. >>> * input.c (expand_location_1): Likewise. >>> * input.h (LOCATION_LOCUS): New. >>> (LOCATION_BLOCK): New. >>> (IS_UNKNOWN_LOCATION): New. >>> * fold-const.c (expr_location_or): Change to use new location. >>> * reorg.c (emit_delay_sequence): Likewise. >>> (try_merge_delay_insns): Likewise. >>> * modulo-sched.c (dump_insn_location): Likewise. >>> * lto-streamer-out.c (lto_output_location_bitpack): Likewise. >>> * jump.c (rtx_renumbered_equal_p): Likewise. >>> * ifcvt.c (noce_try_move): Likewise. >>> (noce_try_store_flag): Likewise. >>> (noce_try_store_flag_constants): Likewise. >>> (noce_try_addcc): Likewise. >>> (noce_try_store_flag_mask): Likewise. >>> (noce_try_cmove): Likewise. >>> (noce_try_cmove_arith): Likewise. >>> (noce_try_minmax): Likewise. >>> (noce_try_abs): Likewise. >>> (noce_try_sign_mask): Likewise. >>> (noce_try_bitop): Likewise. >>> (noce_process_if_block): Likewise. >>> (cond_move_process_if_block): Likewise. >>> (find_cond_trap): Likewise. >>> * dwarf2out.c (add_src_coords_attributes): Likewise. >>> * expr.c (expand_expr_real): Likewise. >>> * tree-parloops.c (create_loop_fn): Likewise. >>> * recog.c (peep2_attempt): Likewise. >>> * function.c (free_after_compilation): Likewise. >>> (expand_function_end): Likewise. >>> (set_insn_locations): Likewise. >>> (thread_prologue_and_epilogue_insns): Likewise. >>> * print-rtl.c (print_rtx): Likewise. >>> * profile.c (branch_prob): Likewise. >>> * trans-mem.c (ipa_tm_scan_irr_block): Likewise. >>> * gimplify.c (gimplify_call_expr): Likewise. >>> * except.c (duplicate_eh_regions_1): Likewise. >>> * emit-rtl.c (try_split): Likewise. >>> (make_insn_raw): Likewise. >>> (make_debug_insn_raw): Likewise. >>> (make_jump_insn_raw): Likewise. >>> (make_call_insn_raw): Likewise. >>> (emit_pattern_after_setloc): Likewise. >>> (emit_pattern_after): Likewise. >>> (emit_debug_insn_after): Likewise. >>> (emit_pattern_before): Likewise. >>> (emit_insn_before_setloc): Likewise. >>> (emit_jump_insn_before): Likewise. >>> (emit_call_insn_before_setloc): Likewise. >>> (emit_call_insn_before): Likeise. >>> (emit_debug_insn_before_setloc): Likewise. >>> (emit_copy_of_insn_after): Likewise. >>> (insn_locators_alloc): Remove. >>> (insn_locators_finalize): Remove. >>> (insn_locators_free): Remove. >>> (set_curr_insn_source_location): Remove. >>> (get_curr_insn_source_location): Remove. >>> (set_curr_insn_block): Remove. >>> (get_curr_insn_block): Remove. >>> (locator_scope): Remove. >>> (insn_scope): Change to use new location. >>> (locator_location): Remove. >>> (insn_line): Change to use new location. >>> (locator_file): Remove. >>> (insn_file): Change to use new location. >>> (locator_eq): Remove. >>> (insn_locations_init): New. >>> (insn_locations_finalize): New. >>> (set_curr_insn_location): New. >>> (curr_insn_location): New. >>> * cfgexpand.c (gimple_assign_rhs_to_tree): Change to use new location. >>> (expand_gimple_cond): Likewise. >>> (expand_call_stmt): Likewise. >>> (expand_gimple_stmt_1): Likewise. >>> (expand_gimple_basic_block): Likewise. >>> (construct_exit_block): Likewise. >>> (gimple_expand_cfg): Likewise. >>> * cfgcleanup.c (try_forward_edges): Likewise. >>> * tree-ssa-live.c (remove_unused_scope_block_p): Likewise. >>> (dump_scope_block): Likewise. >>> (remove_unused_locals): Likewise. >>> * rtl.c (rtx_equal_p_cb): Likewise. >>> (rtx_equal_p): Likewise. >>> * rtl.h (XUINT): New. >>> (INSN_LOCATOR): Remove. >>> (CURR_INSN_LOCATION): Remove. >>> (INSN_LOCATION): New. >>> (INSN_HAS_LOCATION): New. >>> * tree-inline.c (remap_gimple_op_r): Change to use new location. >>> (copy_tree_body_r): Likewise. >>> (copy_phis_for_bb): Likewise. >>> (expand_call_inline): Likewise. >>> * tree-streamer-in.c (lto_input_ts_exp_tree_pointers): Likewise. >>> * tree-streamer-out.c (write_ts_decl_minimal_tree_pointers): Likewise. >>> * gimple-streamer-out.c (output_gimple_stmt): Likewise. >>> * combine.c (try_combine): Likewise. >>> * tree-outof-ssa.c (set_location_for_edge): Likewise. >>> (insert_partition_copy_on_edge): Likewise. >>> (insert_value_copy_on_edge): Likewise. >>> (insert_rtx_to_part_on_edge): Likewise. >>> (insert_part_to_rtx_on_edge): Likewise. >>> * basic-block.h (edge_def): Remove field. >>> * gimple.h (gimple_statement_base): Remove field. >>> (gimple_bb): Change to use new location. >>> (gimple_set_block): Likewise. >>> (gimple_has_location): Likewise. >>> * tree-cfg.c (make_cond_expr_edges): Likewise. >>> (make_goto_expr_edges): Likewise. >>> (gimple_can_merge_blocks_p): Likewise. >>> (move_stmt_op): Likewise. >>> (move_block_to_fn): Likewise. >>> * config/alpha/alpha.c (alpha_output_mi_thunk_osf): Likewise. >>> * config/sparc/sparc.c (sparc_output_mi_thunk): Likewise. >>> * config/i386/i386.c (x86_output_mi_thunk): Likewise. >>> * config/tilegx/tilegx.c (tilegx_output_mi_thunk): Likewise. >>> * config/sh/sh.c (sh_output_mi_thunk): Likewise. >>> * config/ia64/ia64.c (ia64_output_mi_thunk): Likewise. >>> * config/rs6000/rs6000.c (rs6000_output_mi_thunk): Likewise. >>> * config/score/score.c (score_output_mi_thunk): Likewise. >>> * config/tilepro/tilepro.c (tilepro_asm_output_mi_thunk): Likewise. >>> * config/mips/mips.c (mips_output_mi_thunk): Likewise. >>> * cfgrtl.c (unique_locus_on_edge_between_p): Likewise. >>> (unique_locus_on_edge_between_p): Likewise. >>> (emit_nop_for_unique_locus_between): Likewise. >>> (force_nonfallthru_and_redirect): Likewise. >>> (fixup_reorder_chain): Likewise. >>> (cfg_layout_merge_blocks): Likewise. >>> * stmt.c (emit_case_nodes): Likewise. >>> >>> gcc/lto/ChangeLog: >>> 2012-09-08 Dehao Chen <dehao@google.com> >>> >>> * lto/lto.c (lto_fixup_prevailing_decls): Remove tree.exp.block field. >>> >>> libcpp/ChangeLog: >>> 2012-09-08 Dehao Chen <dehao@google.com> >>> >>> * include/line-map.h (MAX_SOURCE_LOCATION): New value. >>> (location_adhoc_data_init): New. >>> (location_adhoc_data_fini): New. >>> (get_combined_adhoc_loc): New. >>> (get_data_from_adhoc_loc): New. >>> (get_location_from_adhoc_loc): New. >>> (COMBINE_LOCATION_DATA): New. >>> (IS_ADHOC_LOC): New. >>> (expanded_location): New field. >>> * line-map.c (location_adhoc_data): New. >>> (location_adhoc_data_htab): New. >>> (curr_adhoc_loc): New. >>> (location_adhoc_data): New. >>> (allocated_location_adhoc_data): New. >>> (location_adhoc_data_hash): New. >>> (location_adhoc_data_eq): New. >>> (location_adhoc_data_update): New. >>> (get_combined_adhoc_loc): New. >>> (get_data_from_adhoc_loc): New. >>> (get_location_from_adhoc_loc): New. >>> (location_adhoc_data_init): New. >>> (location_adhoc_data_fini): New. >>> (linemap_lookup): Change to use new location. >>> (linemap_ordinary_map_lookup): Likewise. >>> (linemap_macro_map_lookup): Likewise. >>> (linemap_macro_map_loc_to_def_point): Likewise. >>> (linemap_macro_map_loc_unwind_toward_spel): Likewise. >>> (linemap_get_expansion_line): Likewise. >>> (linemap_get_expansion_filename): Likewise. >>> (linemap_location_in_system_header_p): Likewise. >>> (linemap_location_from_macro_expansion_p): Likewise. >>> (linemap_macro_loc_to_spelling_point): Likewise. >>> (linemap_macro_loc_to_def_point): Likewise. >>> (linemap_macro_loc_to_exp_point): Likewise. >>> (linemap_resolve_location): Likewise. >>> (linemap_unwind_toward_expansion): Likewise. >>> (linemap_unwind_to_first_non_reserved_loc): Likewise. >>> (linemap_expand_location): Likewise. >>> (linemap_dump_location): Likewise.
Hi, On Tue, 11 Sep 2012, Richard Guenther wrote: > >>> +++ gcc/lto/lto.c (working copy) > >>> @@ -1559,8 +1559,6 @@ lto_fixup_prevailing_decls (tree t) > >>> { > >>> enum tree_code code = TREE_CODE (t); > >>> LTO_NO_PREVAIL (TREE_TYPE (t)); > >>> - if (CODE_CONTAINS_STRUCT (code, TS_COMMON)) > >>> - LTO_NO_PREVAIL (TREE_CHAIN (t)); > >> > >> That change is odd. Can you show us how it breaks? > > > > This will break LTO build of gcc.c-torture/execute/pr38051.c > > > > There is data structure like: > > > > union { long int l; char c[sizeof (long int)]; } u; > > > > Once the block info is reserved for this, it'll reserve this data > > structure. And inside this data structure, there is VAR_DECL. Thus > > LTO_NO_PREVAIL assertion does not satisfy here for TREE_CHAIN (t). > > I see - the issue here is that this data structure is not reached at the > time we call free_lang_data (via find_decls_types_r). It should be reached just fine. The problem is that TREE_CHAIN of that union type contains random garbage (in this case the var_decl 'u'). This is not supposed to happen. It's set as part of reading back a BLOCK_VARS chain, so the type_decl itself is in such a chain (and 'u' is part of it via the TREE_CHAIN pointer). I have no idea why this is no problem without the patch. Possibly because of the hunk in remove_unused_scope_block_p that makes more blocks stay. > But maybe I do not understand "once the block info is reserved for > this". > > So the patch papers over an issue elsewhere I believe. Maybe Micha can > add some clarification here though, how BLOCK_VARS should be visible > here Hmm. Without the half-hearted tries to support debug info with LTO the block_vars list was no problem, it simply wouldn't be streamed. Now I think it is a problem, and we need to fix it up with the prevailing decls if there are multiple ones. I.e. instead of removing the two lines, replace LTO_NO_PREVAIL (TREE_CHAIN (t)) with LTO_SET_PREVAIL. This is quite unfortunate as we really rather want to make sure that TREE_CHAIN isn't randomly set to something. But as long as block_vars are implemented via TREE_CHAIN, and we want to preserve block_vars we don't have much choice :-( Ciao, Michael.
On Tue, Sep 11, 2012 at 3:30 PM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Tue, 11 Sep 2012, Richard Guenther wrote: > >> >>> +++ gcc/lto/lto.c (working copy) >> >>> @@ -1559,8 +1559,6 @@ lto_fixup_prevailing_decls (tree t) >> >>> { >> >>> enum tree_code code = TREE_CODE (t); >> >>> LTO_NO_PREVAIL (TREE_TYPE (t)); >> >>> - if (CODE_CONTAINS_STRUCT (code, TS_COMMON)) >> >>> - LTO_NO_PREVAIL (TREE_CHAIN (t)); >> >> >> >> That change is odd. Can you show us how it breaks? >> > >> > This will break LTO build of gcc.c-torture/execute/pr38051.c >> > >> > There is data structure like: >> > >> > union { long int l; char c[sizeof (long int)]; } u; >> > >> > Once the block info is reserved for this, it'll reserve this data >> > structure. And inside this data structure, there is VAR_DECL. Thus >> > LTO_NO_PREVAIL assertion does not satisfy here for TREE_CHAIN (t). >> >> I see - the issue here is that this data structure is not reached at the >> time we call free_lang_data (via find_decls_types_r). > > It should be reached just fine. The problem is that TREE_CHAIN of that > union type contains random garbage (in this case the var_decl 'u'). This > is not supposed to happen. It's set as part of reading back a BLOCK_VARS > chain, so the type_decl itself is in such a chain (and 'u' is part of it > via the TREE_CHAIN pointer). > > I have no idea why this is no problem without the patch. Possibly because > of the hunk in remove_unused_scope_block_p that makes more blocks stay. > >> But maybe I do not understand "once the block info is reserved for >> this". >> >> So the patch papers over an issue elsewhere I believe. Maybe Micha can >> add some clarification here though, how BLOCK_VARS should be visible >> here > > Hmm. Without the half-hearted tries to support debug info with LTO the > block_vars list was no problem, it simply wouldn't be streamed. Now I > think it is a problem, and we need to fix it up with the prevailing decls > if there are multiple ones. I.e. instead of removing the two lines, > replace LTO_NO_PREVAIL (TREE_CHAIN (t)) with LTO_SET_PREVAIL. > > This is quite unfortunate as we really rather want to make sure that > TREE_CHAIN isn't randomly set to something. But as long as block_vars are > implemented via TREE_CHAIN, and we want to preserve block_vars we don't > have much choice :-( I don't think we can fixup TREE_CHAIN - the things cannot be in multiple lists after all. Unifying/fixing up would need to happen at a BLOCK level. But as you say - only TYPE_DECLs should be in BLOCK_VARS, but never global ones, so there would be nothing to replace. Which means we shouldn't even try to merge those. Hmm. Richard. > > Ciao, > Michael.
Hi, On Tue, 11 Sep 2012, Dehao Chen wrote: > Looks like we have two choices: > > 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) This will actually not work correctly in some cases. The problem is, if the prevailing decl is already part of another chain (say in another block_var list) you would break the current chain. Hence block vars need special handling in the lto streamer (another reason why tree_chain is not the most clever think to use for this chain). This problem area needs to be solved somehow if block info is to be preserved correctly. > 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL > (TREE_CHAIN (t)). That's also a large hammer as it basically will mean no debug info after LTO :-/ Sigh, at this point I have no good solution that doesn't involve quite some work, perhaps your hack is good enough for the time being, though I hate it :) Ciao, Michael.
I saw comments in tree-streamer-out.c: /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information for early inlining so drop it on the floor instead of ICEing in dwarf2out.c. */ streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); However, what the code is doing seemed contradictory with the comment. Or am I missing something? On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Tue, 11 Sep 2012, Dehao Chen wrote: > >> Looks like we have two choices: >> >> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) > > This will actually not work correctly in some cases. The problem is, if > the prevailing decl is already part of another chain (say in another > block_var list) you would break the current chain. Hence block vars need > special handling in the lto streamer (another reason why tree_chain is not > the most clever think to use for this chain). This problem area needs to > be solved somehow if block info is to be preserved correctly. > >> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >> (TREE_CHAIN (t)). > > That's also a large hammer as it basically will mean no debug info after > LTO :-/ Sigh, at this point I have no good solution that doesn't involve > quite some work, perhaps your hack is good enough for the time being, > though I hate it :) I got it. Then I'll keep the patch as it is (remove the LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and then we should be good to check in? Thanks, Dehao > > > Ciao, > Michael.
Now I think we are facing a more complex problem. The data structure we use to store the location_adhoc_data are file-static in linemap.c in libcpp. These data structures are not guarded by GTY(()). Meanwhile, as we have removed the block data structure from gimple.gsbase as well as tree.exp (encoding them into an location_t). This could cause block being GCed and the LOCATION_BLOCK becoming dangling pointers. I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can help me. Another approach would be guard the location_adhoc_data and related data structures in GTY(()). However, this is non-trivial because tree is not visible in libcpp. At the same time, my implementation heavily relies on hashtable to make the code efficient, thus it's quite tricky to make "param_is" and "use_params" work. The final approach, which I'll try tomorrow, would be move all my implementation from libcpp to gcc, and guard them with GTY(()). I still haven't thought of any potential problem of this approach. Any comments? Thanks, Dehao On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: > I saw comments in tree-streamer-out.c: > > /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information > for early inlining so drop it on the floor instead of ICEing in > dwarf2out.c. */ > streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); > > However, what the code is doing seemed contradictory with the comment. > Or am I missing something? > > > > On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >> Hi, >> >> On Tue, 11 Sep 2012, Dehao Chen wrote: >> >>> Looks like we have two choices: >>> >>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >> >> This will actually not work correctly in some cases. The problem is, if >> the prevailing decl is already part of another chain (say in another >> block_var list) you would break the current chain. Hence block vars need >> special handling in the lto streamer (another reason why tree_chain is not >> the most clever think to use for this chain). This problem area needs to >> be solved somehow if block info is to be preserved correctly. >> >>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>> (TREE_CHAIN (t)). >> >> That's also a large hammer as it basically will mean no debug info after >> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >> quite some work, perhaps your hack is good enough for the time being, >> though I hate it :) > > I got it. Then I'll keep the patch as it is (remove the > LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and > then we should be good to check in? > > Thanks, > Dehao > >> >> >> Ciao, >> Michael.
Can you make definition of location_adhoc_data available in both libcpp and gcc? In gcc side, the data structure will be annotated with GTY(). The root global variables should also be visible in gcc side. David On Tue, Sep 11, 2012 at 10:06 PM, Dehao Chen <dehao@google.com> wrote: > Now I think we are facing a more complex problem. The data structure > we use to store the location_adhoc_data are file-static in linemap.c > in libcpp. These data structures are not guarded by GTY(()). > Meanwhile, as we have removed the block data structure from > gimple.gsbase as well as tree.exp (encoding them into an location_t). > This could cause block being GCed and the LOCATION_BLOCK becoming > dangling pointers. > > I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from > gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can > help me. > > Another approach would be guard the location_adhoc_data and related > data structures in GTY(()). However, this is non-trivial because tree > is not visible in libcpp. At the same time, my implementation heavily > relies on hashtable to make the code efficient, thus it's quite tricky > to make "param_is" and "use_params" work. > > The final approach, which I'll try tomorrow, would be move all my > implementation from libcpp to gcc, and guard them with GTY(()). I > still haven't thought of any potential problem of this approach. Any > comments? > > Thanks, > Dehao > > On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >> I saw comments in tree-streamer-out.c: >> >> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >> for early inlining so drop it on the floor instead of ICEing in >> dwarf2out.c. */ >> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >> >> However, what the code is doing seemed contradictory with the comment. >> Or am I missing something? >> >> >> >> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>> Hi, >>> >>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>> >>>> Looks like we have two choices: >>>> >>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>> >>> This will actually not work correctly in some cases. The problem is, if >>> the prevailing decl is already part of another chain (say in another >>> block_var list) you would break the current chain. Hence block vars need >>> special handling in the lto streamer (another reason why tree_chain is not >>> the most clever think to use for this chain). This problem area needs to >>> be solved somehow if block info is to be preserved correctly. >>> >>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>> (TREE_CHAIN (t)). >>> >>> That's also a large hammer as it basically will mean no debug info after >>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>> quite some work, perhaps your hack is good enough for the time being, >>> though I hate it :) >> >> I got it. Then I'll keep the patch as it is (remove the >> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >> then we should be good to check in? >> >> Thanks, >> Dehao >> >>> >>> >>> Ciao, >>> Michael.
On Tue, Sep 11, 2012 at 5:32 PM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Tue, 11 Sep 2012, Dehao Chen wrote: > >> Looks like we have two choices: >> >> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) > > This will actually not work correctly in some cases. The problem is, if > the prevailing decl is already part of another chain (say in another > block_var list) you would break the current chain. Hence block vars need > special handling in the lto streamer (another reason why tree_chain is not > the most clever think to use for this chain). This problem area needs to > be solved somehow if block info is to be preserved correctly. Well. The issue is that at present we stream BLOCKs in the function section via its DECL_INTIAL. Which means we _never_ should get a non-prevailing DECL in BLOCK_VARS. You need to debug why that doesn't work anymore. Possibly the BLOCK leaks into decls it should not leak to via the location mechanism? >> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >> (TREE_CHAIN (t)). > > That's also a large hammer as it basically will mean no debug info after > LTO :-/ Sigh, at this point I have no good solution that doesn't involve > quite some work, perhaps your hack is good enough for the time being, > though I hate it :) It hides a bug. If we replace anything in BLOCK_VARS then the risk is that you generate an infinite chain in some BLOCK_VARS list and thus get infinite loops somewhere in the compiler. So, no, that's not an option. Richard. > > Ciao, > Michael.
On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote: > Now I think we are facing a more complex problem. The data structure > we use to store the location_adhoc_data are file-static in linemap.c > in libcpp. These data structures are not guarded by GTY(()). > Meanwhile, as we have removed the block data structure from > gimple.gsbase as well as tree.exp (encoding them into an location_t). > This could cause block being GCed and the LOCATION_BLOCK becoming > dangling pointers. Uh. Note that it is quite important that we are able to garbage-collect unused BLOCKs, this is the whole point of removing unused BLOCK scopes in remove_unused_locals. So this indeed becomes much more complicated ... What would be desired is that the garbage collector can NULL an entry in the mapping table when it is not referenced in any other way (that other reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL). > I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from > gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can > help me. > > Another approach would be guard the location_adhoc_data and related > data structures in GTY(()). However, this is non-trivial because tree > is not visible in libcpp. At the same time, my implementation heavily > relies on hashtable to make the code efficient, thus it's quite tricky > to make "param_is" and "use_params" work. > > The final approach, which I'll try tomorrow, would be move all my > implementation from libcpp to gcc, and guard them with GTY(()). I > still haven't thought of any potential problem of this approach. Any > comments? I think moving the mapping to GC in a lazy manner as I described above would be the way to go. For hashtables GC already supports if_marked, not sure if similar support is available for arrays/vecs. Richard. > Thanks, > Dehao > > On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >> I saw comments in tree-streamer-out.c: >> >> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >> for early inlining so drop it on the floor instead of ICEing in >> dwarf2out.c. */ >> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >> >> However, what the code is doing seemed contradictory with the comment. >> Or am I missing something? >> >> >> >> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>> Hi, >>> >>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>> >>>> Looks like we have two choices: >>>> >>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>> >>> This will actually not work correctly in some cases. The problem is, if >>> the prevailing decl is already part of another chain (say in another >>> block_var list) you would break the current chain. Hence block vars need >>> special handling in the lto streamer (another reason why tree_chain is not >>> the most clever think to use for this chain). This problem area needs to >>> be solved somehow if block info is to be preserved correctly. >>> >>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>> (TREE_CHAIN (t)). >>> >>> That's also a large hammer as it basically will mean no debug info after >>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>> quite some work, perhaps your hack is good enough for the time being, >>> though I hate it :) >> >> I got it. Then I'll keep the patch as it is (remove the >> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >> then we should be good to check in? >> >> Thanks, >> Dehao >> >>> >>> >>> Ciao, >>> Michael.
Hi, On Wed, 12 Sep 2012, Richard Guenther wrote: > > This will actually not work correctly in some cases. The problem is, > > if the prevailing decl is already part of another chain (say in > > another block_var list) you would break the current chain. Hence > > block vars need special handling in the lto streamer (another reason > > why tree_chain is not the most clever think to use for this chain). > > This problem area needs to be solved somehow if block info is to be > > preserved correctly. > > Well. The issue is that at present we stream BLOCKs in the function > section via its DECL_INTIAL. Which means we _never_ should get a > non-prevailing DECL in BLOCK_VARS. You need to debug why that doesn't > work anymore. The assert that triggers tests that there's no var_decl in TREE_CHAIN. It doesn't test that it's a prevailing decl. So we could assert that instead of the current check. > >> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL > >> (TREE_CHAIN (t)). > > > > That's also a large hammer as it basically will mean no debug info after > > LTO :-/ Sigh, at this point I have no good solution that doesn't involve > > quite some work, perhaps your hack is good enough for the time being, > > though I hate it :) > > It hides a bug. If we replace anything in BLOCK_VARS then the risk is > that you generate an infinite chain in some BLOCK_VARS list and thus > get infinite loops somewhere in the compiler. That's what I said for using SET_PREVAIL. But his hack would specifically not replace anything in TREE_CHAIN (and hence BLOCK_VARS), and indeed not check anything either. Ciao, Michael.
On Wed, Sep 12, 2012 at 2:14 PM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Wed, 12 Sep 2012, Richard Guenther wrote: > >> > This will actually not work correctly in some cases. The problem is, >> > if the prevailing decl is already part of another chain (say in >> > another block_var list) you would break the current chain. Hence >> > block vars need special handling in the lto streamer (another reason >> > why tree_chain is not the most clever think to use for this chain). >> > This problem area needs to be solved somehow if block info is to be >> > preserved correctly. >> >> Well. The issue is that at present we stream BLOCKs in the function >> section via its DECL_INTIAL. Which means we _never_ should get a >> non-prevailing DECL in BLOCK_VARS. You need to debug why that doesn't >> work anymore. > > The assert that triggers tests that there's no var_decl in TREE_CHAIN. > It doesn't test that it's a prevailing decl. So we could assert that > instead of the current check. > >> >> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >> >> (TREE_CHAIN (t)). >> > >> > That's also a large hammer as it basically will mean no debug info after >> > LTO :-/ Sigh, at this point I have no good solution that doesn't involve >> > quite some work, perhaps your hack is good enough for the time being, >> > though I hate it :) >> >> It hides a bug. If we replace anything in BLOCK_VARS then the risk is >> that you generate an infinite chain in some BLOCK_VARS list and thus >> get infinite loops somewhere in the compiler. > > That's what I said for using SET_PREVAIL. But his hack would specifically > not replace anything in TREE_CHAIN (and hence BLOCK_VARS), and indeed not > check anything either. Hm, but we shouldn't end up streaming any BLOCKs at this point (nor local TYPE_DECLs). Those are supposed to be in the local function sections only where no fixup for prevailing decls happens. Richard. > > Ciao, > Michael.
Hi, On Wed, 12 Sep 2012, Richard Guenther wrote: > >> It hides a bug. If we replace anything in BLOCK_VARS then the risk > >> is that you generate an infinite chain in some BLOCK_VARS list and > >> thus get infinite loops somewhere in the compiler. > > > > That's what I said for using SET_PREVAIL. But his hack would > > specifically not replace anything in TREE_CHAIN (and hence > > BLOCK_VARS), and indeed not check anything either. > > Hm, but we shouldn't end up streaming any BLOCKs at this point (nor > local TYPE_DECLs). Those are supposed to be in the local function > sections only where no fixup for prevailing decls happens. That's true, something is fishy with the patch, will try to investigate. Ciao, Michael.
On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote: >> Now I think we are facing a more complex problem. The data structure >> we use to store the location_adhoc_data are file-static in linemap.c >> in libcpp. These data structures are not guarded by GTY(()). >> Meanwhile, as we have removed the block data structure from >> gimple.gsbase as well as tree.exp (encoding them into an location_t). >> This could cause block being GCed and the LOCATION_BLOCK becoming >> dangling pointers. > > Uh. Note that it is quite important that we are able to garbage-collect unused > BLOCKs, this is the whole point of removing unused BLOCK scopes in > remove_unused_locals. So this indeed becomes much more complicated ... > What would be desired is that the garbage collector can NULL an entry in > the mapping table when it is not referenced in any other way (that other > reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL). It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS are created for a large C++ program. This patch saves memory by shrinking tree size, is it a net win or loss without GC those BLOCKS? thanks, David > >> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from >> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can >> help me. >> >> Another approach would be guard the location_adhoc_data and related >> data structures in GTY(()). However, this is non-trivial because tree >> is not visible in libcpp. At the same time, my implementation heavily >> relies on hashtable to make the code efficient, thus it's quite tricky >> to make "param_is" and "use_params" work. >> >> The final approach, which I'll try tomorrow, would be move all my >> implementation from libcpp to gcc, and guard them with GTY(()). I >> still haven't thought of any potential problem of this approach. Any >> comments? > > I think moving the mapping to GC in a lazy manner as I described above > would be the way to go. For hashtables GC already supports if_marked, > not sure if similar support is available for arrays/vecs. > > Richard. > >> Thanks, >> Dehao >> >> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >>> I saw comments in tree-streamer-out.c: >>> >>> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >>> for early inlining so drop it on the floor instead of ICEing in >>> dwarf2out.c. */ >>> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >>> >>> However, what the code is doing seemed contradictory with the comment. >>> Or am I missing something? >>> >>> >>> >>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>>> Hi, >>>> >>>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>>> >>>>> Looks like we have two choices: >>>>> >>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>>> >>>> This will actually not work correctly in some cases. The problem is, if >>>> the prevailing decl is already part of another chain (say in another >>>> block_var list) you would break the current chain. Hence block vars need >>>> special handling in the lto streamer (another reason why tree_chain is not >>>> the most clever think to use for this chain). This problem area needs to >>>> be solved somehow if block info is to be preserved correctly. >>>> >>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>>> (TREE_CHAIN (t)). >>>> >>>> That's also a large hammer as it basically will mean no debug info after >>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>>> quite some work, perhaps your hack is good enough for the time being, >>>> though I hate it :) >>> >>> I got it. Then I'll keep the patch as it is (remove the >>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >>> then we should be good to check in? >>> >>> Thanks, >>> Dehao >>> >>>> >>>> >>>> Ciao, >>>> Michael.
There are two parts that needs memory management: 1. The BLOCK structure. This is managed by GC. I originally thought that removing blocks from tree.gsbase would paralyze GC. This turned out not to be a concern because DECL_INITIAL will still mark those used tree nodes. This patch may decrease the memory consumption by removing blocks from tree/gimple. However, as it makes more blocks become used, they also increase the memory consumption. 2. The data structure in libcpp that maintains the hashtable for the location->block mapping. This is relatively minor because for the largest source I've seen, it only maintains less than 100K entries in the array (less than 1M total memory consumption). However, as it is a global data structure, it may make LTO unhappy. Honza is helping testing the memory consumption on LTO (but we first need to make this patch work for LTO). If the LTO result turns out ok, we probably don't want to put these under GC because: 1. it'll make things much more complicated. 2. using self managed memory is more efficient (as this is frequently used in many passes). 3. not using GC actually saves memory because even though the block is in the map, it can still be GCed as soon as it's not reachable from DECL_INITIAL. I've tested this on some very large C++ files (each one takes more than 10s to build), the memory consumption does not see noticeable increase/decrease. Thanks, Dehao On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li <davidxl@google.com> wrote: > On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote: >>> Now I think we are facing a more complex problem. The data structure >>> we use to store the location_adhoc_data are file-static in linemap.c >>> in libcpp. These data structures are not guarded by GTY(()). >>> Meanwhile, as we have removed the block data structure from >>> gimple.gsbase as well as tree.exp (encoding them into an location_t). >>> This could cause block being GCed and the LOCATION_BLOCK becoming >>> dangling pointers. >> >> Uh. Note that it is quite important that we are able to garbage-collect unused >> BLOCKs, this is the whole point of removing unused BLOCK scopes in >> remove_unused_locals. So this indeed becomes much more complicated ... >> What would be desired is that the garbage collector can NULL an entry in >> the mapping table when it is not referenced in any other way (that other >> reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL). > > It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS > are created for a large C++ program. This patch saves memory by > shrinking tree size, is it a net win or loss without GC those BLOCKS? > > thanks, > > David > > >> >>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from >>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can >>> help me. >>> >>> Another approach would be guard the location_adhoc_data and related >>> data structures in GTY(()). However, this is non-trivial because tree >>> is not visible in libcpp. At the same time, my implementation heavily >>> relies on hashtable to make the code efficient, thus it's quite tricky >>> to make "param_is" and "use_params" work. >>> >>> The final approach, which I'll try tomorrow, would be move all my >>> implementation from libcpp to gcc, and guard them with GTY(()). I >>> still haven't thought of any potential problem of this approach. Any >>> comments? >> >> I think moving the mapping to GC in a lazy manner as I described above >> would be the way to go. For hashtables GC already supports if_marked, >> not sure if similar support is available for arrays/vecs. >> >> Richard. >> >>> Thanks, >>> Dehao >>> >>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >>>> I saw comments in tree-streamer-out.c: >>>> >>>> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >>>> for early inlining so drop it on the floor instead of ICEing in >>>> dwarf2out.c. */ >>>> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >>>> >>>> However, what the code is doing seemed contradictory with the comment. >>>> Or am I missing something? >>>> >>>> >>>> >>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>>>> Hi, >>>>> >>>>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>>>> >>>>>> Looks like we have two choices: >>>>>> >>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>>>> >>>>> This will actually not work correctly in some cases. The problem is, if >>>>> the prevailing decl is already part of another chain (say in another >>>>> block_var list) you would break the current chain. Hence block vars need >>>>> special handling in the lto streamer (another reason why tree_chain is not >>>>> the most clever think to use for this chain). This problem area needs to >>>>> be solved somehow if block info is to be preserved correctly. >>>>> >>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>>>> (TREE_CHAIN (t)). >>>>> >>>>> That's also a large hammer as it basically will mean no debug info after >>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>>>> quite some work, perhaps your hack is good enough for the time being, >>>>> though I hate it :) >>>> >>>> I got it. Then I'll keep the patch as it is (remove the >>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >>>> then we should be good to check in? >>>> >>>> Thanks, >>>> Dehao >>>> >>>>> >>>>> >>>>> Ciao, >>>>> Michael.
On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen <dehao@google.com> wrote: > There are two parts that needs memory management: > > 1. The BLOCK structure. This is managed by GC. I originally thought > that removing blocks from tree.gsbase would paralyze GC. This turned > out not to be a concern because DECL_INITIAL will still mark those > used tree nodes. This patch may decrease the memory consumption by > removing blocks from tree/gimple. However, as it makes more blocks > become used, they also increase the memory consumption. You mean when you also make the location table GC root. Can you share the mem-stats information for the large program with and without your patch? thanks, David > 2. The data structure in libcpp that maintains the hashtable for the > location->block mapping. This is relatively minor because for the > largest source I've seen, it only maintains less than 100K entries in > the array (less than 1M total memory consumption). However, as it is a > global data structure, it may make LTO unhappy. Honza is helping > testing the memory consumption on LTO (but we first need to make this > patch work for LTO). If the LTO result turns out ok, we probably don't > want to put these under GC because: 1. it'll make things much more > complicated. 2. using self managed memory is more efficient (as this > is frequently used in many passes). 3. not using GC actually saves > memory because even though the block is in the map, it can still be > GCed as soon as it's not reachable from DECL_INITIAL. > > I've tested this on some very large C++ files (each one takes more > than 10s to build), the memory consumption does not see noticeable > increase/decrease. > > Thanks, > Dehao > > On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li <davidxl@google.com> wrote: >> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote: >>>> Now I think we are facing a more complex problem. The data structure >>>> we use to store the location_adhoc_data are file-static in linemap.c >>>> in libcpp. These data structures are not guarded by GTY(()). >>>> Meanwhile, as we have removed the block data structure from >>>> gimple.gsbase as well as tree.exp (encoding them into an location_t). >>>> This could cause block being GCed and the LOCATION_BLOCK becoming >>>> dangling pointers. >>> >>> Uh. Note that it is quite important that we are able to garbage-collect unused >>> BLOCKs, this is the whole point of removing unused BLOCK scopes in >>> remove_unused_locals. So this indeed becomes much more complicated ... >>> What would be desired is that the garbage collector can NULL an entry in >>> the mapping table when it is not referenced in any other way (that other >>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL). >> >> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS >> are created for a large C++ program. This patch saves memory by >> shrinking tree size, is it a net win or loss without GC those BLOCKS? >> >> thanks, >> >> David >> >> >>> >>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from >>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can >>>> help me. >>>> >>>> Another approach would be guard the location_adhoc_data and related >>>> data structures in GTY(()). However, this is non-trivial because tree >>>> is not visible in libcpp. At the same time, my implementation heavily >>>> relies on hashtable to make the code efficient, thus it's quite tricky >>>> to make "param_is" and "use_params" work. >>>> >>>> The final approach, which I'll try tomorrow, would be move all my >>>> implementation from libcpp to gcc, and guard them with GTY(()). I >>>> still haven't thought of any potential problem of this approach. Any >>>> comments? >>> >>> I think moving the mapping to GC in a lazy manner as I described above >>> would be the way to go. For hashtables GC already supports if_marked, >>> not sure if similar support is available for arrays/vecs. >>> >>> Richard. >>> >>>> Thanks, >>>> Dehao >>>> >>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >>>>> I saw comments in tree-streamer-out.c: >>>>> >>>>> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >>>>> for early inlining so drop it on the floor instead of ICEing in >>>>> dwarf2out.c. */ >>>>> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >>>>> >>>>> However, what the code is doing seemed contradictory with the comment. >>>>> Or am I missing something? >>>>> >>>>> >>>>> >>>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>>>>> Hi, >>>>>> >>>>>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>>>>> >>>>>>> Looks like we have two choices: >>>>>>> >>>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>>>>> >>>>>> This will actually not work correctly in some cases. The problem is, if >>>>>> the prevailing decl is already part of another chain (say in another >>>>>> block_var list) you would break the current chain. Hence block vars need >>>>>> special handling in the lto streamer (another reason why tree_chain is not >>>>>> the most clever think to use for this chain). This problem area needs to >>>>>> be solved somehow if block info is to be preserved correctly. >>>>>> >>>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>>>>> (TREE_CHAIN (t)). >>>>>> >>>>>> That's also a large hammer as it basically will mean no debug info after >>>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>>>>> quite some work, perhaps your hack is good enough for the time being, >>>>>> though I hate it :) >>>>> >>>>> I got it. Then I'll keep the patch as it is (remove the >>>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >>>>> then we should be good to check in? >>>>> >>>>> Thanks, >>>>> Dehao >>>>> >>>>>> >>>>>> >>>>>> Ciao, >>>>>> Michael.
Attached is the memory consumption report for a very large source file. Looks like this patch actually reduced the memory consumption by 2%. Dehao On Thu, Sep 13, 2012 at 1:18 AM, Xinliang David Li <davidxl@google.com> wrote: > On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen <dehao@google.com> wrote: >> There are two parts that needs memory management: >> >> 1. The BLOCK structure. This is managed by GC. I originally thought >> that removing blocks from tree.gsbase would paralyze GC. This turned >> out not to be a concern because DECL_INITIAL will still mark those >> used tree nodes. This patch may decrease the memory consumption by >> removing blocks from tree/gimple. However, as it makes more blocks >> become used, they also increase the memory consumption. > > You mean when you also make the location table GC root. > > Can you share the mem-stats information for the large program with and > without your patch? > > thanks, > > David > >> 2. The data structure in libcpp that maintains the hashtable for the >> location->block mapping. This is relatively minor because for the >> largest source I've seen, it only maintains less than 100K entries in >> the array (less than 1M total memory consumption). However, as it is a >> global data structure, it may make LTO unhappy. Honza is helping >> testing the memory consumption on LTO (but we first need to make this >> patch work for LTO). If the LTO result turns out ok, we probably don't >> want to put these under GC because: 1. it'll make things much more >> complicated. 2. using self managed memory is more efficient (as this >> is frequently used in many passes). 3. not using GC actually saves >> memory because even though the block is in the map, it can still be >> GCed as soon as it's not reachable from DECL_INITIAL. >> >> I've tested this on some very large C++ files (each one takes more >> than 10s to build), the memory consumption does not see noticeable >> increase/decrease. >> >> Thanks, >> Dehao >> >> On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li <davidxl@google.com> wrote: >>> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote: >>>>> Now I think we are facing a more complex problem. The data structure >>>>> we use to store the location_adhoc_data are file-static in linemap.c >>>>> in libcpp. These data structures are not guarded by GTY(()). >>>>> Meanwhile, as we have removed the block data structure from >>>>> gimple.gsbase as well as tree.exp (encoding them into an location_t). >>>>> This could cause block being GCed and the LOCATION_BLOCK becoming >>>>> dangling pointers. >>>> >>>> Uh. Note that it is quite important that we are able to garbage-collect unused >>>> BLOCKs, this is the whole point of removing unused BLOCK scopes in >>>> remove_unused_locals. So this indeed becomes much more complicated ... >>>> What would be desired is that the garbage collector can NULL an entry in >>>> the mapping table when it is not referenced in any other way (that other >>>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL). >>> >>> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS >>> are created for a large C++ program. This patch saves memory by >>> shrinking tree size, is it a net win or loss without GC those BLOCKS? >>> >>> thanks, >>> >>> David >>> >>> >>>> >>>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from >>>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can >>>>> help me. >>>>> >>>>> Another approach would be guard the location_adhoc_data and related >>>>> data structures in GTY(()). However, this is non-trivial because tree >>>>> is not visible in libcpp. At the same time, my implementation heavily >>>>> relies on hashtable to make the code efficient, thus it's quite tricky >>>>> to make "param_is" and "use_params" work. >>>>> >>>>> The final approach, which I'll try tomorrow, would be move all my >>>>> implementation from libcpp to gcc, and guard them with GTY(()). I >>>>> still haven't thought of any potential problem of this approach. Any >>>>> comments? >>>> >>>> I think moving the mapping to GC in a lazy manner as I described above >>>> would be the way to go. For hashtables GC already supports if_marked, >>>> not sure if similar support is available for arrays/vecs. >>>> >>>> Richard. >>>> >>>>> Thanks, >>>>> Dehao >>>>> >>>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >>>>>> I saw comments in tree-streamer-out.c: >>>>>> >>>>>> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >>>>>> for early inlining so drop it on the floor instead of ICEing in >>>>>> dwarf2out.c. */ >>>>>> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >>>>>> >>>>>> However, what the code is doing seemed contradictory with the comment. >>>>>> Or am I missing something? >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>>>>>> >>>>>>>> Looks like we have two choices: >>>>>>>> >>>>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>>>>>> >>>>>>> This will actually not work correctly in some cases. The problem is, if >>>>>>> the prevailing decl is already part of another chain (say in another >>>>>>> block_var list) you would break the current chain. Hence block vars need >>>>>>> special handling in the lto streamer (another reason why tree_chain is not >>>>>>> the most clever think to use for this chain). This problem area needs to >>>>>>> be solved somehow if block info is to be preserved correctly. >>>>>>> >>>>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>>>>>> (TREE_CHAIN (t)). >>>>>>> >>>>>>> That's also a large hammer as it basically will mean no debug info after >>>>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>>>>>> quite some work, perhaps your hack is good enough for the time being, >>>>>>> though I hate it :) >>>>>> >>>>>> I got it. Then I'll keep the patch as it is (remove the >>>>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >>>>>> then we should be good to check in? >>>>>> >>>>>> Thanks, >>>>>> Dehao >>>>>> >>>>>>> >>>>>>> >>>>>>> Ciao, >>>>>>> Michael. Number of expanded macros: 23481 Average number of tokens per macro expansion: 12 Line Table allocations during the compilation process Number of ordinary maps used: 1451 Ordinary map used size: 56k Number of ordinary maps allocated: 1638 Ordinary maps allocated size: 63k Number of macro maps used: 0 Macro maps used size: 0 Macro maps locations size: 0 Macro maps size: 0 Duplicated maps locations size: 0 Total allocated maps size: 63k Total used maps size: 56k Memory still allocated at the end of the compilation process Size Allocated Used Overhead 8 88k 64k 2640 16 16M 4726k 356k 32 31M 11M 570k 64 60M 37M 960k 128 7296k 6833k 99k 256 29M 24M 417k 512 1224k 753k 16k 1024 21M 21M 301k 2048 600k 484k 8400 4096 2660k 2660k 36k 8192 1352k 1352k 9464 16384 2560k 2560k 8960 32768 288k 288k 504 65536 512k 512k 448 131072 256k 256k 112 262144 2048k 2048k 448 524288 2560k 2560k 280 1048576 4096k 4096k 224 2097152 2048k 2048k 56 24 48M 7165k 875k 40 66M 24M 1068k 48 21M 5121k 342k 56 14M 4309k 230k 72 30M 16M 431k 80 82M 49M 1149k 88 15M 10M 218k 104 10M 8041k 152k 112 10056k 2527k 137k 120 10M 6083k 146k 184 20M 17M 292k 152 8996k 6841k 122k 160 38M 11M 532k 168 53M 40M 754k 96 17M 6035k 251k 304 54M 49M 761k 136 35M 31M 496k Total 726M 421M 10M String pool entries 187117 identifiers 111112 (59.38%) slots 262144 deleted 67636 bytes 5237k (17592186044410M overhead) table size 2048k coll/search 1.3684 ins/search 0.0853 avg. entry 28.66 bytes (+/- 34.34) longest entry 236 ??? tree nodes created (No per-node statistics) Type hash: size 131071, 68058 elements, 1.008263 collisions DECL_DEBUG_EXPR hash: size 8191, 131 elements, 0.527951 collisions DECL_VALUE_EXPR hash: size 1021, 16 elements, 0.000000 collisions no search statistics decl_specializations: size 65521, 33767 elements, 1.169079 collisions type_specializations: size 32749, 22750 elements, 1.193797 collisions No gimple statistics Alias oracle query stats: refs_may_alias_p: 8495 disambiguations, 24866 queries ref_maybe_used_by_call_p: 152203 disambiguations, 27333 queries call_may_clobber_ref_p: 151635 disambiguations, 151635 queries PTA query stats: pt_solution_includes: 430 disambiguations, 17991 queries pt_solutions_intersect: 1291 disambiguations, 655295 queries Number of expanded macros: 23481 Average number of tokens per macro expansion: 12 Line Table allocations during the compilation process Number of ordinary maps used: 1451 Ordinary map used size: 56k Number of ordinary maps allocated: 1638 Ordinary maps allocated size: 63k Number of macro maps used: 0 Macro maps used size: 0 Macro maps locations size: 0 Macro maps size: 0 Duplicated maps locations size: 0 Total allocated maps size: 63k Total used maps size: 56k Memory still allocated at the end of the compilation process Size Allocated Used Overhead 8 88k 64k 2640 16 17M 4640k 376k 32 48M 13M 865k 64 51M 34M 820k 128 8596k 6686k 117k 256 27M 23M 385k 512 1184k 753k 16k 1024 21M 21M 299k 2048 592k 484k 8288 4096 2660k 2660k 36k 8192 1352k 1352k 9464 16384 2560k 2560k 8960 32768 288k 288k 504 65536 512k 512k 448 131072 256k 256k 112 262144 1792k 1792k 392 524288 3072k 3072k 336 1048576 4096k 4096k 224 2097152 2048k 2048k 56 24 55M 7228k 995k 40 61M 22M 990k 48 19M 6224k 318k 56 14M 4400k 238k 72 45M 17M 643k 80 65M 42M 920k 88 12M 10M 172k 104 10M 8087k 145k 112 10124k 2468k 138k 120 10M 5958k 141k 184 21M 17M 295k 152 8348k 6826k 114k 160 19M 6792k 275k 168 54M 40M 757k 96 18M 6030k 259k 304 54M 49M 762k 136 33M 31M 473k Total 710M 407M 10M String pool entries 187117 identifiers 108460 (57.96%) slots 262144 deleted 70040 bytes 5133k (17592186044410M overhead) table size 2048k coll/search 1.3759 ins/search 0.0847 avg. entry 28.09 bytes (+/- 34.18) longest entry 236 ??? tree nodes created (No per-node statistics) Type hash: size 131071, 68058 elements, 1.009220 collisions DECL_DEBUG_EXPR hash: size 8191, 131 elements, 0.457233 collisions DECL_VALUE_EXPR hash: size 1021, 15 elements, 0.033784 collisions no search statistics decl_specializations: size 65521, 33767 elements, 1.169079 collisions type_specializations: size 32749, 22750 elements, 1.193797 collisions No gimple statistics Alias oracle query stats: refs_may_alias_p: 8495 disambiguations, 24866 queries ref_maybe_used_by_call_p: 152240 disambiguations, 27333 queries call_may_clobber_ref_p: 151672 disambiguations, 151672 queries PTA query stats: pt_solution_includes: 430 disambiguations, 17991 queries pt_solutions_intersect: 1291 disambiguations, 655295 queries
For the largest bucket (size==80), the size reduction is 20%. Not bad. David On Wed, Sep 12, 2012 at 1:44 PM, Dehao Chen <dehao@google.com> wrote: > Attached is the memory consumption report for a very large source > file. Looks like this patch actually reduced the memory consumption by > 2%. > > Dehao > > On Thu, Sep 13, 2012 at 1:18 AM, Xinliang David Li <davidxl@google.com> wrote: >> On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen <dehao@google.com> wrote: >>> There are two parts that needs memory management: >>> >>> 1. The BLOCK structure. This is managed by GC. I originally thought >>> that removing blocks from tree.gsbase would paralyze GC. This turned >>> out not to be a concern because DECL_INITIAL will still mark those >>> used tree nodes. This patch may decrease the memory consumption by >>> removing blocks from tree/gimple. However, as it makes more blocks >>> become used, they also increase the memory consumption. >> >> You mean when you also make the location table GC root. >> >> Can you share the mem-stats information for the large program with and >> without your patch? >> >> thanks, >> >> David >> >>> 2. The data structure in libcpp that maintains the hashtable for the >>> location->block mapping. This is relatively minor because for the >>> largest source I've seen, it only maintains less than 100K entries in >>> the array (less than 1M total memory consumption). However, as it is a >>> global data structure, it may make LTO unhappy. Honza is helping >>> testing the memory consumption on LTO (but we first need to make this >>> patch work for LTO). If the LTO result turns out ok, we probably don't >>> want to put these under GC because: 1. it'll make things much more >>> complicated. 2. using self managed memory is more efficient (as this >>> is frequently used in many passes). 3. not using GC actually saves >>> memory because even though the block is in the map, it can still be >>> GCed as soon as it's not reachable from DECL_INITIAL. >>> >>> I've tested this on some very large C++ files (each one takes more >>> than 10s to build), the memory consumption does not see noticeable >>> increase/decrease. >>> >>> Thanks, >>> Dehao >>> >>> On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li <davidxl@google.com> wrote: >>>> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther >>>> <richard.guenther@gmail.com> wrote: >>>>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote: >>>>>> Now I think we are facing a more complex problem. The data structure >>>>>> we use to store the location_adhoc_data are file-static in linemap.c >>>>>> in libcpp. These data structures are not guarded by GTY(()). >>>>>> Meanwhile, as we have removed the block data structure from >>>>>> gimple.gsbase as well as tree.exp (encoding them into an location_t). >>>>>> This could cause block being GCed and the LOCATION_BLOCK becoming >>>>>> dangling pointers. >>>>> >>>>> Uh. Note that it is quite important that we are able to garbage-collect unused >>>>> BLOCKs, this is the whole point of removing unused BLOCK scopes in >>>>> remove_unused_locals. So this indeed becomes much more complicated ... >>>>> What would be desired is that the garbage collector can NULL an entry in >>>>> the mapping table when it is not referenced in any other way (that other >>>>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL). >>>> >>>> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS >>>> are created for a large C++ program. This patch saves memory by >>>> shrinking tree size, is it a net win or loss without GC those BLOCKS? >>>> >>>> thanks, >>>> >>>> David >>>> >>>> >>>>> >>>>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from >>>>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can >>>>>> help me. >>>>>> >>>>>> Another approach would be guard the location_adhoc_data and related >>>>>> data structures in GTY(()). However, this is non-trivial because tree >>>>>> is not visible in libcpp. At the same time, my implementation heavily >>>>>> relies on hashtable to make the code efficient, thus it's quite tricky >>>>>> to make "param_is" and "use_params" work. >>>>>> >>>>>> The final approach, which I'll try tomorrow, would be move all my >>>>>> implementation from libcpp to gcc, and guard them with GTY(()). I >>>>>> still haven't thought of any potential problem of this approach. Any >>>>>> comments? >>>>> >>>>> I think moving the mapping to GC in a lazy manner as I described above >>>>> would be the way to go. For hashtables GC already supports if_marked, >>>>> not sure if similar support is available for arrays/vecs. >>>>> >>>>> Richard. >>>>> >>>>>> Thanks, >>>>>> Dehao >>>>>> >>>>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >>>>>>> I saw comments in tree-streamer-out.c: >>>>>>> >>>>>>> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >>>>>>> for early inlining so drop it on the floor instead of ICEing in >>>>>>> dwarf2out.c. */ >>>>>>> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >>>>>>> >>>>>>> However, what the code is doing seemed contradictory with the comment. >>>>>>> Or am I missing something? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>>>>>>> >>>>>>>>> Looks like we have two choices: >>>>>>>>> >>>>>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>>>>>>> >>>>>>>> This will actually not work correctly in some cases. The problem is, if >>>>>>>> the prevailing decl is already part of another chain (say in another >>>>>>>> block_var list) you would break the current chain. Hence block vars need >>>>>>>> special handling in the lto streamer (another reason why tree_chain is not >>>>>>>> the most clever think to use for this chain). This problem area needs to >>>>>>>> be solved somehow if block info is to be preserved correctly. >>>>>>>> >>>>>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>>>>>>> (TREE_CHAIN (t)). >>>>>>>> >>>>>>>> That's also a large hammer as it basically will mean no debug info after >>>>>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>>>>>>> quite some work, perhaps your hack is good enough for the time being, >>>>>>>> though I hate it :) >>>>>>> >>>>>>> I got it. Then I'll keep the patch as it is (remove the >>>>>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >>>>>>> then we should be good to check in? >>>>>>> >>>>>>> Thanks, >>>>>>> Dehao >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Ciao, >>>>>>>> Michael.
On Wed, Sep 12, 2012 at 6:39 PM, Xinliang David Li <davidxl@google.com> wrote: > On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote: >>> Now I think we are facing a more complex problem. The data structure >>> we use to store the location_adhoc_data are file-static in linemap.c >>> in libcpp. These data structures are not guarded by GTY(()). >>> Meanwhile, as we have removed the block data structure from >>> gimple.gsbase as well as tree.exp (encoding them into an location_t). >>> This could cause block being GCed and the LOCATION_BLOCK becoming >>> dangling pointers. >> >> Uh. Note that it is quite important that we are able to garbage-collect unused >> BLOCKs, this is the whole point of removing unused BLOCK scopes in >> remove_unused_locals. So this indeed becomes much more complicated ... >> What would be desired is that the garbage collector can NULL an entry in >> the mapping table when it is not referenced in any other way (that other >> reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL). > > It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS > are created for a large C++ program. This patch saves memory by > shrinking tree size, is it a net win or loss without GC those BLOCKS? Memory usage issues pop up with C++ code using expression templates (try BOOST MPL or tramp3d or some larger spirit testcases). Inlining creates tons of "empty" BLOCK trees that just wrap others. It is important to be able to GC those. Now, it might be that no expression / location which references the BLOCK survives, and if the line-table is not scanned by GC then we will just end up with never re-usable entries (the BLOCK address may get re-used - can we get false sharing here?) Richard. > thanks, > > David > > >> >>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from >>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can >>> help me. >>> >>> Another approach would be guard the location_adhoc_data and related >>> data structures in GTY(()). However, this is non-trivial because tree >>> is not visible in libcpp. At the same time, my implementation heavily >>> relies on hashtable to make the code efficient, thus it's quite tricky >>> to make "param_is" and "use_params" work. >>> >>> The final approach, which I'll try tomorrow, would be move all my >>> implementation from libcpp to gcc, and guard them with GTY(()). I >>> still haven't thought of any potential problem of this approach. Any >>> comments? >> >> I think moving the mapping to GC in a lazy manner as I described above >> would be the way to go. For hashtables GC already supports if_marked, >> not sure if similar support is available for arrays/vecs. >> >> Richard. >> >>> Thanks, >>> Dehao >>> >>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >>>> I saw comments in tree-streamer-out.c: >>>> >>>> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >>>> for early inlining so drop it on the floor instead of ICEing in >>>> dwarf2out.c. */ >>>> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >>>> >>>> However, what the code is doing seemed contradictory with the comment. >>>> Or am I missing something? >>>> >>>> >>>> >>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>>>> Hi, >>>>> >>>>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>>>> >>>>>> Looks like we have two choices: >>>>>> >>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>>>> >>>>> This will actually not work correctly in some cases. The problem is, if >>>>> the prevailing decl is already part of another chain (say in another >>>>> block_var list) you would break the current chain. Hence block vars need >>>>> special handling in the lto streamer (another reason why tree_chain is not >>>>> the most clever think to use for this chain). This problem area needs to >>>>> be solved somehow if block info is to be preserved correctly. >>>>> >>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>>>> (TREE_CHAIN (t)). >>>>> >>>>> That's also a large hammer as it basically will mean no debug info after >>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>>>> quite some work, perhaps your hack is good enough for the time being, >>>>> though I hate it :) >>>> >>>> I got it. Then I'll keep the patch as it is (remove the >>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >>>> then we should be good to check in? >>>> >>>> Thanks, >>>> Dehao >>>> >>>>> >>>>> >>>>> Ciao, >>>>> Michael.
On Wed, Sep 12, 2012 at 10:44 PM, Dehao Chen <dehao@google.com> wrote: > Attached is the memory consumption report for a very large source > file. Looks like this patch actually reduced the memory consumption by > 2%. Please make sure to test large C++ expression template users. Large regular programs do not stress this part. Richard. > Dehao > > On Thu, Sep 13, 2012 at 1:18 AM, Xinliang David Li <davidxl@google.com> wrote: >> On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen <dehao@google.com> wrote: >>> There are two parts that needs memory management: >>> >>> 1. The BLOCK structure. This is managed by GC. I originally thought >>> that removing blocks from tree.gsbase would paralyze GC. This turned >>> out not to be a concern because DECL_INITIAL will still mark those >>> used tree nodes. This patch may decrease the memory consumption by >>> removing blocks from tree/gimple. However, as it makes more blocks >>> become used, they also increase the memory consumption. >> >> You mean when you also make the location table GC root. >> >> Can you share the mem-stats information for the large program with and >> without your patch? >> >> thanks, >> >> David >> >>> 2. The data structure in libcpp that maintains the hashtable for the >>> location->block mapping. This is relatively minor because for the >>> largest source I've seen, it only maintains less than 100K entries in >>> the array (less than 1M total memory consumption). However, as it is a >>> global data structure, it may make LTO unhappy. Honza is helping >>> testing the memory consumption on LTO (but we first need to make this >>> patch work for LTO). If the LTO result turns out ok, we probably don't >>> want to put these under GC because: 1. it'll make things much more >>> complicated. 2. using self managed memory is more efficient (as this >>> is frequently used in many passes). 3. not using GC actually saves >>> memory because even though the block is in the map, it can still be >>> GCed as soon as it's not reachable from DECL_INITIAL. >>> >>> I've tested this on some very large C++ files (each one takes more >>> than 10s to build), the memory consumption does not see noticeable >>> increase/decrease. >>> >>> Thanks, >>> Dehao >>> >>> On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li <davidxl@google.com> wrote: >>>> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther >>>> <richard.guenther@gmail.com> wrote: >>>>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote: >>>>>> Now I think we are facing a more complex problem. The data structure >>>>>> we use to store the location_adhoc_data are file-static in linemap.c >>>>>> in libcpp. These data structures are not guarded by GTY(()). >>>>>> Meanwhile, as we have removed the block data structure from >>>>>> gimple.gsbase as well as tree.exp (encoding them into an location_t). >>>>>> This could cause block being GCed and the LOCATION_BLOCK becoming >>>>>> dangling pointers. >>>>> >>>>> Uh. Note that it is quite important that we are able to garbage-collect unused >>>>> BLOCKs, this is the whole point of removing unused BLOCK scopes in >>>>> remove_unused_locals. So this indeed becomes much more complicated ... >>>>> What would be desired is that the garbage collector can NULL an entry in >>>>> the mapping table when it is not referenced in any other way (that other >>>>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL). >>>> >>>> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS >>>> are created for a large C++ program. This patch saves memory by >>>> shrinking tree size, is it a net win or loss without GC those BLOCKS? >>>> >>>> thanks, >>>> >>>> David >>>> >>>> >>>>> >>>>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from >>>>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can >>>>>> help me. >>>>>> >>>>>> Another approach would be guard the location_adhoc_data and related >>>>>> data structures in GTY(()). However, this is non-trivial because tree >>>>>> is not visible in libcpp. At the same time, my implementation heavily >>>>>> relies on hashtable to make the code efficient, thus it's quite tricky >>>>>> to make "param_is" and "use_params" work. >>>>>> >>>>>> The final approach, which I'll try tomorrow, would be move all my >>>>>> implementation from libcpp to gcc, and guard them with GTY(()). I >>>>>> still haven't thought of any potential problem of this approach. Any >>>>>> comments? >>>>> >>>>> I think moving the mapping to GC in a lazy manner as I described above >>>>> would be the way to go. For hashtables GC already supports if_marked, >>>>> not sure if similar support is available for arrays/vecs. >>>>> >>>>> Richard. >>>>> >>>>>> Thanks, >>>>>> Dehao >>>>>> >>>>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >>>>>>> I saw comments in tree-streamer-out.c: >>>>>>> >>>>>>> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >>>>>>> for early inlining so drop it on the floor instead of ICEing in >>>>>>> dwarf2out.c. */ >>>>>>> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >>>>>>> >>>>>>> However, what the code is doing seemed contradictory with the comment. >>>>>>> Or am I missing something? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>>>>>>> >>>>>>>>> Looks like we have two choices: >>>>>>>>> >>>>>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>>>>>>> >>>>>>>> This will actually not work correctly in some cases. The problem is, if >>>>>>>> the prevailing decl is already part of another chain (say in another >>>>>>>> block_var list) you would break the current chain. Hence block vars need >>>>>>>> special handling in the lto streamer (another reason why tree_chain is not >>>>>>>> the most clever think to use for this chain). This problem area needs to >>>>>>>> be solved somehow if block info is to be preserved correctly. >>>>>>>> >>>>>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>>>>>>> (TREE_CHAIN (t)). >>>>>>>> >>>>>>>> That's also a large hammer as it basically will mean no debug info after >>>>>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>>>>>>> quite some work, perhaps your hack is good enough for the time being, >>>>>>>> though I hate it :) >>>>>>> >>>>>>> I got it. Then I'll keep the patch as it is (remove the >>>>>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >>>>>>> then we should be good to check in? >>>>>>> >>>>>>> Thanks, >>>>>>> Dehao >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Ciao, >>>>>>>> Michael.
On Thu, Sep 13, 2012 at 8:02 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Wed, Sep 12, 2012 at 6:39 PM, Xinliang David Li <davidxl@google.com> wrote: >> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote: >>>> Now I think we are facing a more complex problem. The data structure >>>> we use to store the location_adhoc_data are file-static in linemap.c >>>> in libcpp. These data structures are not guarded by GTY(()). >>>> Meanwhile, as we have removed the block data structure from >>>> gimple.gsbase as well as tree.exp (encoding them into an location_t). >>>> This could cause block being GCed and the LOCATION_BLOCK becoming >>>> dangling pointers. >>> >>> Uh. Note that it is quite important that we are able to garbage-collect unused >>> BLOCKs, this is the whole point of removing unused BLOCK scopes in >>> remove_unused_locals. So this indeed becomes much more complicated ... >>> What would be desired is that the garbage collector can NULL an entry in >>> the mapping table when it is not referenced in any other way (that other >>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL). >> >> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS >> are created for a large C++ program. This patch saves memory by >> shrinking tree size, is it a net win or loss without GC those BLOCKS? > > Memory usage issues pop up with C++ code using expression templates > (try BOOST MPL or tramp3d or some larger spirit testcases). Inlining I compared the memory consumption for tramp3d, the patched version has a peak of 504065kB, while non-patched version has a peak of 491853kB. > creates tons of "empty" BLOCK trees that just wrap others. It is important > to be able to GC those. Now, it might be that no expression / location > which references the BLOCK survives, and if the line-table is not scanned Those non-used blocks will still be GCed in this patch. > by GC then we will just end up with never re-usable entries (the BLOCK address > may get re-used - can we get false sharing here?) That is true (memory wasted). However, in the tramp3d case, only 409600 entries are allocated in location_adhoc_data (4.9MB, 1% of the peak mem consumption). Thus the wasted entry should not be significant. Concerning re-used BLOCK, if the block address and the location are the same, the previously allocated entry will be reused. But it'll not affect the correctness. Thanks, Dehao > > Richard. > >> thanks, >> >> David >> >> >>> >>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from >>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can >>>> help me. >>>> >>>> Another approach would be guard the location_adhoc_data and related >>>> data structures in GTY(()). However, this is non-trivial because tree >>>> is not visible in libcpp. At the same time, my implementation heavily >>>> relies on hashtable to make the code efficient, thus it's quite tricky >>>> to make "param_is" and "use_params" work. >>>> >>>> The final approach, which I'll try tomorrow, would be move all my >>>> implementation from libcpp to gcc, and guard them with GTY(()). I >>>> still haven't thought of any potential problem of this approach. Any >>>> comments? >>> >>> I think moving the mapping to GC in a lazy manner as I described above >>> would be the way to go. For hashtables GC already supports if_marked, >>> not sure if similar support is available for arrays/vecs. >>> >>> Richard. >>> >>>> Thanks, >>>> Dehao >>>> >>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote: >>>>> I saw comments in tree-streamer-out.c: >>>>> >>>>> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug information >>>>> for early inlining so drop it on the floor instead of ICEing in >>>>> dwarf2out.c. */ >>>>> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >>>>> >>>>> However, what the code is doing seemed contradictory with the comment. >>>>> Or am I missing something? >>>>> >>>>> >>>>> >>>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote: >>>>>> Hi, >>>>>> >>>>>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>>>>> >>>>>>> Looks like we have two choices: >>>>>>> >>>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>>>>> >>>>>> This will actually not work correctly in some cases. The problem is, if >>>>>> the prevailing decl is already part of another chain (say in another >>>>>> block_var list) you would break the current chain. Hence block vars need >>>>>> special handling in the lto streamer (another reason why tree_chain is not >>>>>> the most clever think to use for this chain). This problem area needs to >>>>>> be solved somehow if block info is to be preserved correctly. >>>>>> >>>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>>>>> (TREE_CHAIN (t)). >>>>>> >>>>>> That's also a large hammer as it basically will mean no debug info after >>>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>>>>> quite some work, perhaps your hack is good enough for the time being, >>>>>> though I hate it :) >>>>> >>>>> I got it. Then I'll keep the patch as it is (remove the >>>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >>>>> then we should be good to check in? >>>>> >>>>> Thanks, >>>>> Dehao >>>>> >>>>>> >>>>>> >>>>>> Ciao, >>>>>> Michael.
--- gcc/lto/lto.c (revision 191083) +++ gcc/lto/lto.c (working copy) @@ -1559,8 +1559,6 @@ lto_fixup_prevailing_decls (tree t) { enum tree_code code = TREE_CODE (t); LTO_NO_PREVAIL (TREE_TYPE (t)); - if (CODE_CONTAINS_STRUCT (code, TS_COMMON)) - LTO_NO_PREVAIL (TREE_CHAIN (t)); if (DECL_P (t)) { LTO_NO_PREVAIL (DECL_NAME (t)); Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c (revision 191083) +++ gcc/tree-cfg.c (working copy) @@ -5980,9 +5974,21 @@ move_stmt_op (tree *tp, int *walk_subtrees, void * tree t = *tp; if (EXPR_P (t)) - /* We should never have TREE_BLOCK set on non-statements. */ - gcc_assert (!TREE_BLOCK (t)); - + { + tree block = TREE_BLOCK (t); + if (p->orig_block == NULL_TREE + || block == p->orig_block + || block == NULL_TREE) + TREE_SET_BLOCK (t, p->new_block); +#ifdef ENABLE_CHECKING + else if (block != p->new_block) + { + while (block && block != p->orig_block) + block = BLOCK_SUPERCONTEXT (block); + gcc_assert (block); + } +#endif + } else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME) { if (TREE_CODE (t) == SSA_NAME)