Message ID | 20100608071707.GB8913@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
On 8 June 2010 09:17, Jan Hubicka <hubicka@ucw.cz> wrote: > Hi, > oprofile shows as one of highest sample count the following sanity check: > :/* Advance the iterator to the next element. */ > :static inline void > :ei_next (edge_iterator *i) > :{ > 6578 0.4915 : gcc_assert (i->index < EDGE_COUNT (ei_container (*i))); > 201 0.0150 : i->index++; > :} > > I guess it is mostly cache miss that would be later attributed elsewhere, but > it also seems to me that this kind of bounds checking is something we want to > omit in release compiler (same was as we disable VEC range checking and such). > The probability that we will get useful error here is relatively low compared > to amount of checks we spread across compiler by inlining the function. Why is not gcc_assert a NOP without ENABLE_CHECKING? I was pretty sure it was and hence I was using it very liberally. I wonder what the performance of the compiler would be in that case. Cheers, Manuel
On Tue, Jun 08, 2010 at 11:23:18AM +0200, Manuel López-Ibáñez wrote: > On 8 June 2010 09:17, Jan Hubicka <hubicka@ucw.cz> wrote: > > Hi, > > oprofile shows as one of highest sample count the following sanity check: > > :/* Advance the iterator to the next element. */ > > :static inline void > > :ei_next (edge_iterator *i) > > :{ > > 6578 0.4915 : gcc_assert (i->index < EDGE_COUNT (ei_container (*i))); > > 201 0.0150 : i->index++; > > :} > > > > I guess it is mostly cache miss that would be later attributed elsewhere, but > > it also seems to me that this kind of bounds checking is something we want to > > omit in release compiler (same was as we disable VEC range checking and such). > > The probability that we will get useful error here is relatively low compared > > to amount of checks we spread across compiler by inlining the function. > > Why is not gcc_assert a NOP without ENABLE_CHECKING? I was pretty sure > it was and hence I was using it very liberally. > > I wonder what the performance of the compiler would be in that case. Because we have --enable-checking=release and --disable-checking. --enable-checking=release keeps gcc_asserts around, --disable-checking optimizes them out. The reason we want inexpensive asserts around is it is preferrable to crash the compiler over silently generating wrong code. More expensive asserts (or ones on very hot paths) should be guarded with #ifdef ENABLE_CHECKING. Jakub
On Tue, Jun 8, 2010 at 9:17 AM, Jan Hubicka <hubicka@ucw.cz> wrote: > Hi, > oprofile shows as one of highest sample count the following sanity check: > :/* Advance the iterator to the next element. */ > :static inline void > :ei_next (edge_iterator *i) > :{ > 6578 0.4915 : gcc_assert (i->index < EDGE_COUNT (ei_container (*i))); > 201 0.0150 : i->index++; > :} > > I guess it is mostly cache miss that would be later attributed elsewhere, but > it also seems to me that this kind of bounds checking is something we want to > omit in release compiler (same was as we disable VEC range checking and such). > The probability that we will get useful error here is relatively low compared > to amount of checks we spread across compiler by inlining the function. > > I tested the attached patch. It reduce size of stripped cc1 binary (with WHOPR > build) from 11944192 to 11939616 bytes (4.5KB) and time needed to build cc1 > binary from 10m29s to 10m19s. With non-WHOPR this should be more effective > since we are worse on optimizing out unnecesary checks, but I didn't verified. > > Bootstrapped/regtested x86_64-linux. If this seems to make sense, I will look at the > other inline functions sanity checks. I think in gimple.h we also have few quite > good candidates for ENABLE_CHECKING. I thought I've gone over gimple.h at some point ... Ok. Thanks, Richard. > Honza > > * basic-block.h (single_succ_edge, single_pred_edge, ei_container, > ei_next, ei_prev): Do sanity checking with ENABLE_CHECKING only. > Index: basic-block.h > =================================================================== > --- basic-block.h (revision 160382) > +++ basic-block.h (working copy) > @@ -554,7 +554,9 @@ single_pred_p (const_basic_block bb) > static inline edge > single_succ_edge (const_basic_block bb) > { > +#ifdef ENABLE_CHECKING > gcc_assert (single_succ_p (bb)); > +#endif > return EDGE_SUCC (bb, 0); > } > > @@ -564,7 +566,9 @@ single_succ_edge (const_basic_block bb) > static inline edge > single_pred_edge (const_basic_block bb) > { > +#ifdef ENABLE_CHECKING > gcc_assert (single_pred_p (bb)); > +#endif > return EDGE_PRED (bb, 0); > } > > @@ -596,7 +600,9 @@ typedef struct { > static inline VEC(edge,gc) * > ei_container (edge_iterator i) > { > +#ifdef ENABLE_CHECKING > gcc_assert (i.container); > +#endif > return *i.container; > } > > @@ -647,7 +653,9 @@ ei_one_before_end_p (edge_iterator i) > static inline void > ei_next (edge_iterator *i) > { > +#ifdef ENABLE_CHECKING > gcc_assert (i->index < EDGE_COUNT (ei_container (*i))); > +#endif > i->index++; > } > > @@ -655,7 +663,9 @@ ei_next (edge_iterator *i) > static inline void > ei_prev (edge_iterator *i) > { > +#ifdef ENABLE_CHECKING > gcc_assert (i->index > 0); > +#endif > i->index--; > } > >
On 8 June 2010 11:32, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, Jun 08, 2010 at 11:23:18AM +0200, Manuel López-Ibáñez wrote: >> On 8 June 2010 09:17, Jan Hubicka <hubicka@ucw.cz> wrote: >> > Hi, >> > oprofile shows as one of highest sample count the following sanity check: >> > :/* Advance the iterator to the next element. */ >> > :static inline void >> > :ei_next (edge_iterator *i) >> > :{ >> > 6578 0.4915 : gcc_assert (i->index < EDGE_COUNT (ei_container (*i))); >> > 201 0.0150 : i->index++; >> > :} >> > >> > I guess it is mostly cache miss that would be later attributed elsewhere, but >> > it also seems to me that this kind of bounds checking is something we want to >> > omit in release compiler (same was as we disable VEC range checking and such). >> > The probability that we will get useful error here is relatively low compared >> > to amount of checks we spread across compiler by inlining the function. >> >> Why is not gcc_assert a NOP without ENABLE_CHECKING? I was pretty sure >> it was and hence I was using it very liberally. >> >> I wonder what the performance of the compiler would be in that case. > > Because we have --enable-checking=release and --disable-checking. > --enable-checking=release keeps gcc_asserts around, --disable-checking > optimizes them out. > The reason we want inexpensive asserts around is it is preferrable to crash > the compiler over silently generating wrong code. > More expensive asserts (or ones on very hot paths) should be > guarded with #ifdef ENABLE_CHECKING. Good to know. Thanks for the explanation! Manuel.
> > I thought I've gone over gimple.h at some point ... > those are static counts of .h asserts surviving to .optimize dumps with -O3 WHOPR build 9 fancy_abort (&"../../gcc/tree-chrec.h"[0], 143, &__FUNCTION__[0]); 9 fancy_abort (&"../../gcc/tree-chrec.h"[0], 147, &__FUNCTION__[0]); 9 fancy_abort (&"../../gcc/tree-chrec.h"[0], 149, &__FUNCTION__[0]); 9 fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 51, &__FUNCTION__[0]); 10 fancy_abort (&"../../gcc/gimple.h"[0], 3218, &__FUNCTION__[0]); 12 fancy_abort (&"../../gcc/gimple.h"[0], 2711, &__FUNCTION__[0]); 12 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1206, &__FUNCTION__[0]); 12 fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 144, &__FUNCTION__[0]); 13 fancy_abort (&"../../gcc/gimple.h"[0], 2678, &__FUNCTION__[0]); 13 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1331, &__FUNCTION__[0]); 13 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1332, &__FUNCTION__[0]); 13 fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 729, &__FUNCTION__[0]); 14 fancy_abort (&"../../gcc/gimple.h"[0], 3271, &__FUNCTION__[0]); 14 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1067, &__FUNCTION__[0]); 14 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1127, &__FUNCTION__[0]); 20 fancy_abort (&"../../gcc/graphite-poly.h"[0], 1053, &__FUNCTION__[0]); 20 fancy_abort (&"../../gcc/graphite-poly.h"[0], 608, &__FUNCTION__[0]); 21 fancy_abort (&"../../gcc/tree-vectorizer.h"[0], 644, &__FUNCTION__[0]); 28 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1052, &__FUNCTION__[0]); 29 fancy_abort (&"../../gcc/basic-block.h"[0], 567, &__FUNCTION__[0]); 93 fancy_abort (&"../../gcc/basic-block.h"[0], 557, &__FUNCTION__[0]); 93 fancy_abort (&"../../gcc/basic-block.h"[0], 599, &__FUNCTION__[0]); 133 fancy_abort (&"../../gcc/gimple.h"[0], 1470, &__FUNCTION__[0]); 162 fancy_abort (&"../../gcc/gimple.h"[0], 1686, &__FUNCTION__[0]); 178 fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 510, &__FUNCTION__[0]); 234 fancy_abort (&"../../gcc/basic-block.h"[0], 650, &__FUNCTION__[0]); 240 fancy_abort (&"../../gcc/tree-vectorizer.h"[0], 634, &__FUNCTION__[0]); 250 fancy_abort (&"../../gcc/gimple.h"[0], 3159, &__FUNCTION__[0]); 685 fancy_abort (&"../../gcc/gimple.h"[0], 1643, &__FUNCTION__[0]); so looks like gimple.h is major offender even if it did not show in my profile. My current oprofile is as follows: 27416 4.2304 htab_find_slot_with_hash 11987 1.8496 bitmap_ior_into 8987 1.3867 bitmap_set_bit_1 8347 1.2880 df_note_compute.40921.3735 8034 1.2397 df_worklist_dataflow 7338 1.1323 fast_dce.463781.constprop.768 7208 1.1122 lto_get_section_data 6815 1.0516 bitmap_clear_bit 6762 1.0434 bitmap_and 6146 0.9484 bitmap_copy 5993 0.9247 bitmap_ior_and_compl 5414 0.8354 ggc_alloc_stat 5248 0.8098 lto_input_tree 5123 0.7905 cleanup_cfg 5083 0.7843 record_reg_classes.102605.constprop.811.3366 5059 0.7806 htab_find_with_hash 4763 0.7349 bitmap_and_into 4650 0.7175 walk_tree_1.constprop.778 4597 0.7093 bitmap_bit_p_1 4593 0.7087 inverted_post_order_compute 4567 0.7047 lto_output_tree 4473 0.6902 extract_insn 4305 0.6643 execute_function_todo.149820.4272 4222 0.6515 htab_expand.538910 3946 0.6089 pool_alloc 3804 0.5870 post_order_compute 3785 0.5840 lto_output_1_stream 3730 0.5756 sorted_array_from_bitmap_set.307068.4314 3724 0.5746 constrain_operands This is still with inlined bitmap operations that now seem to help perofrmance, I will get more clear numbers on that. Also have some optimizations on the bitmap_ior_into and friends that however does not show much speedups overall since they are cache bound (perhaps doing rest of obstack changes will help this) fast_dce seems to be replaceable with ud-dce that has better worst case scenario (fast_dce is calling iterative dataflow several times, so part of the dataflow slowness is comming from that. I want to analyze what really makes it to iterate so many times on GCC and what can be done about it). I am still not sure what to do about df notes - those are mostly bitmap operations htab_find_slot_with_hash is I think combination of type merging and var tracking, but I did not look terribly deep into it. Honza
On Tue, Jun 8, 2010 at 10:04 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >> >> I thought I've gone over gimple.h at some point ... >> > those are static counts of .h asserts surviving to .optimize dumps with -O3 WHOPR build > 9 fancy_abort (&"../../gcc/tree-chrec.h"[0], 143, &__FUNCTION__[0]); > 9 fancy_abort (&"../../gcc/tree-chrec.h"[0], 147, &__FUNCTION__[0]); > 9 fancy_abort (&"../../gcc/tree-chrec.h"[0], 149, &__FUNCTION__[0]); > 9 fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 51, &__FUNCTION__[0]); > 10 fancy_abort (&"../../gcc/gimple.h"[0], 3218, &__FUNCTION__[0]); > 12 fancy_abort (&"../../gcc/gimple.h"[0], 2711, &__FUNCTION__[0]); > 12 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1206, &__FUNCTION__[0]); > 12 fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 144, &__FUNCTION__[0]); > 13 fancy_abort (&"../../gcc/gimple.h"[0], 2678, &__FUNCTION__[0]); > 13 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1331, &__FUNCTION__[0]); > 13 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1332, &__FUNCTION__[0]); > 13 fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 729, &__FUNCTION__[0]); > 14 fancy_abort (&"../../gcc/gimple.h"[0], 3271, &__FUNCTION__[0]); > 14 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1067, &__FUNCTION__[0]); > 14 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1127, &__FUNCTION__[0]); > 20 fancy_abort (&"../../gcc/graphite-poly.h"[0], 1053, &__FUNCTION__[0]); > 20 fancy_abort (&"../../gcc/graphite-poly.h"[0], 608, &__FUNCTION__[0]); > 21 fancy_abort (&"../../gcc/tree-vectorizer.h"[0], 644, &__FUNCTION__[0]); > 28 fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1052, &__FUNCTION__[0]); > 29 fancy_abort (&"../../gcc/basic-block.h"[0], 567, &__FUNCTION__[0]); > 93 fancy_abort (&"../../gcc/basic-block.h"[0], 557, &__FUNCTION__[0]); > 93 fancy_abort (&"../../gcc/basic-block.h"[0], 599, &__FUNCTION__[0]); > 133 fancy_abort (&"../../gcc/gimple.h"[0], 1470, &__FUNCTION__[0]); > 162 fancy_abort (&"../../gcc/gimple.h"[0], 1686, &__FUNCTION__[0]); > 178 fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 510, &__FUNCTION__[0]); > 234 fancy_abort (&"../../gcc/basic-block.h"[0], 650, &__FUNCTION__[0]); > 240 fancy_abort (&"../../gcc/tree-vectorizer.h"[0], 634, &__FUNCTION__[0]); > 250 fancy_abort (&"../../gcc/gimple.h"[0], 3159, &__FUNCTION__[0]); > 685 fancy_abort (&"../../gcc/gimple.h"[0], 1643, &__FUNCTION__[0]); > > so looks like gimple.h is major offender even if it did not show in my profile. > > My current oprofile is as follows: > 27416 4.2304 htab_find_slot_with_hash > 11987 1.8496 bitmap_ior_into > 8987 1.3867 bitmap_set_bit_1 > 8347 1.2880 df_note_compute.40921.3735 > 8034 1.2397 df_worklist_dataflow > 7338 1.1323 fast_dce.463781.constprop.768 > 7208 1.1122 lto_get_section_data > 6815 1.0516 bitmap_clear_bit > 6762 1.0434 bitmap_and > 6146 0.9484 bitmap_copy > 5993 0.9247 bitmap_ior_and_compl > 5414 0.8354 ggc_alloc_stat > 5248 0.8098 lto_input_tree > 5123 0.7905 cleanup_cfg > 5083 0.7843 record_reg_classes.102605.constprop.811.3366 > 5059 0.7806 htab_find_with_hash > 4763 0.7349 bitmap_and_into > 4650 0.7175 walk_tree_1.constprop.778 > 4597 0.7093 bitmap_bit_p_1 > 4593 0.7087 inverted_post_order_compute > 4567 0.7047 lto_output_tree > 4473 0.6902 extract_insn > 4305 0.6643 execute_function_todo.149820.4272 > 4222 0.6515 htab_expand.538910 > 3946 0.6089 pool_alloc > 3804 0.5870 post_order_compute > 3785 0.5840 lto_output_1_stream > 3730 0.5756 sorted_array_from_bitmap_set.307068.4314 > 3724 0.5746 constrain_operands > > > This is still with inlined bitmap operations that now seem to help perofrmance, I will > get more clear numbers on that. Also have some optimizations on the bitmap_ior_into > and friends that however does not show much speedups overall since they are cache > bound (perhaps doing rest of obstack changes will help this) > > fast_dce seems to be replaceable with ud-dce that has better worst case > scenario (fast_dce is calling iterative dataflow several times, so part of the > dataflow slowness is comming from that. I want to analyze what really makes it > to iterate so many times on GCC and what can be done about it). I am still not > sure what to do about df notes - those are mostly bitmap operations > > htab_find_slot_with_hash is I think combination of type merging and var tracking, > but I did not look terribly deep into it. For the type merging I have a patch, but I will delay it until next week. I posted it and there's a i386 backend fix in it, so if you want ot have a look.... Richard. > Honza >
Index: basic-block.h =================================================================== --- basic-block.h (revision 160382) +++ basic-block.h (working copy) @@ -554,7 +554,9 @@ single_pred_p (const_basic_block bb) static inline edge single_succ_edge (const_basic_block bb) { +#ifdef ENABLE_CHECKING gcc_assert (single_succ_p (bb)); +#endif return EDGE_SUCC (bb, 0); } @@ -564,7 +566,9 @@ single_succ_edge (const_basic_block bb) static inline edge single_pred_edge (const_basic_block bb) { +#ifdef ENABLE_CHECKING gcc_assert (single_pred_p (bb)); +#endif return EDGE_PRED (bb, 0); } @@ -596,7 +600,9 @@ typedef struct { static inline VEC(edge,gc) * ei_container (edge_iterator i) { +#ifdef ENABLE_CHECKING gcc_assert (i.container); +#endif return *i.container; } @@ -647,7 +653,9 @@ ei_one_before_end_p (edge_iterator i) static inline void ei_next (edge_iterator *i) { +#ifdef ENABLE_CHECKING gcc_assert (i->index < EDGE_COUNT (ei_container (*i))); +#endif i->index++; } @@ -655,7 +663,9 @@ ei_next (edge_iterator *i) static inline void ei_prev (edge_iterator *i) { +#ifdef ENABLE_CHECKING gcc_assert (i->index > 0); +#endif i->index--; }