Message ID | CAAgBjMkaTCAvwO6AVKu5C6_-TqpuQtTcs0EqYHqogDhru3juYg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 05/15/2017 04:39 AM, Prathamesh Kulkarni wrote: > Hi, > I have attached a bare-bones prototype patch that propagates malloc attribute in > ipa-pure-const. As far as I understand, from the doc a function could > be annotated > with malloc attribute if it returns a pointer to a newly allocated > memory block (uninitialized or zeroed) and the pointer does not alias > any other valid pointer ? > > * Analysis > The analysis used by the patch in malloc_candidate_p is too conservative, > and I would be grateful for suggestions on improving it. > Currently it only checks if: > 1) The function returns a value of pointer type. > 2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and > SSA_NAME_DEF_STMT(each element of phi) is function call. > 3) The return-value has immediate uses only within comparisons (gcond > or gassign) and return_stmt (and likewise a phi arg has immediate use only > within comparison or the phi stmt). ISTM the return value *must* be either the result of a call to a function with the malloc attribute or NULL. It can't be a mix of malloc'd objects or something else (such as a passed-in object). > > malloc_candidate_p would determine f to be a candidate for malloc > attribute since it returns the return-value of it's callee > (__builtin_malloc) and the return value is used only within comparison > and return_stmt. > > However due to the imprecise analysis it misses this case: > char **g(size_t n) > { > char **p = malloc (sizeof(char *) * 10); > for (int i = 0; i < 10; i++) > p[i] = malloc (sizeof(char) * n); > return p; > } > I suppose g() could be annotated with malloc here ? I think that's up to us to decide. So the question becomes what makes the most sense from a user and optimization standpoint. I would start relatively conservatively and look to add further analysis to detect more malloc functions over time. You've already identified comparisons as valid uses, but ISTM the pointer could be passed as an argument, stored into memory and all kinds of other things. You'll probably want instrumentation to log cases where something looked like a potential candidate, but had to be rejected for some reason. Jeff
On Mon, 15 May 2017, Prathamesh Kulkarni wrote: > Hi, > I have attached a bare-bones prototype patch that propagates malloc attribute in > ipa-pure-const. As far as I understand, from the doc a function could > be annotated > with malloc attribute if it returns a pointer to a newly allocated > memory block (uninitialized or zeroed) and the pointer does not alias > any other valid pointer ? > > * Analysis > The analysis used by the patch in malloc_candidate_p is too conservative, > and I would be grateful for suggestions on improving it. > Currently it only checks if: > 1) The function returns a value of pointer type. > 2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and > SSA_NAME_DEF_STMT(each element of phi) is function call. > 3) The return-value has immediate uses only within comparisons (gcond > or gassign) and return_stmt (and likewise a phi arg has immediate use only > within comparison or the phi stmt). > > The intent is that malloc_candidate_p(node) returns true if node > returns the return value of it's callee, and the return value is only > used for comparisons within node. > Then I assume it's safe (although conservative) to decide that node > could possibly be a malloc function if callee is found to be malloc ? > > for eg: > void *f(size_t n) > { > void *p = __builtin_malloc (n); > if (p == 0) > __builtin_abort (); > return p; > } > > malloc_candidate_p would determine f to be a candidate for malloc > attribute since it returns the return-value of it's callee > (__builtin_malloc) and the return value is used only within comparison > and return_stmt. > > However due to the imprecise analysis it misses this case: > char **g(size_t n) > { > char **p = malloc (sizeof(char *) * 10); > for (int i = 0; i < 10; i++) > p[i] = malloc (sizeof(char) * n); > return p; > } > I suppose g() could be annotated with malloc here ? It cannot because what p points to is initialized. If you do then char **x = g(10); **x = 1; will be optimized away. Now what would be interesting is to do "poor mans IPA PTA", namely identify functions that are a) small, b) likely return a newly allocated pointer. At PTA time then "inline" all such wrappers, but only for the PTA constraints. That will give more precise points-to results (and can handle more cases, esp. initialization) than just annotating functions with 'malloc'. + /* With non-call exceptions we can't say for sure if other function + body was not possibly optimized to still throw. */ + if (!non_call || node->binds_to_current_def_p ()) + { + DECL_IS_MALLOC (node->decl) = true; + *changed = true; I don't see how malloc throwing would be an issue. + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval) + if (gcond *cond = dyn_cast<gcond *> (use_stmt)) + { + enum tree_code code = gimple_cond_code (cond); + if (TREE_CODE_CLASS (code) != tcc_comparison) always false + RETURN_FROM_IMM_USE_STMT (use_iter, false); I think you want to disallow eq/ne_expr against sth not integer_zerop. + else if (gassign *ga = dyn_cast<gassign *> (use_stmt)) + { + enum tree_code code = gimple_assign_rhs_code (ga); + if (TREE_CODE_CLASS (code) != tcc_comparison) + RETURN_FROM_IMM_USE_STMT (use_iter, false); likewise. +static bool +malloc_candidate_p (function *fun, vec<cgraph_node *>& callees) +{ + basic_block bb; + FOR_EACH_BB_FN (bb, fun) + { + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi)); + if (ret_stmt) I think you want do do this much cheaper by only walking the last stmt of predecessor(s) of EXIT_BLOCK_FOR_FN. (s) because we have multiple return stmts for int foo (int i) { if (i) return; else return i; } but all return stmts will be the last stmt in one of the exit block predecessors. + if (!check_retval_uses (retval, ret_stmt)) + return false; + + gimple *def = SSA_NAME_DEF_STMT (retval); + if (gcall *call_stmt = dyn_cast<gcall *> (def)) + { + tree callee_decl = gimple_call_fndecl (call_stmt); + /* FIXME: In case of indirect call, callee_decl might be NULL + Not sure what to do in that case, punting for now. */ + if (!callee_decl) + return false; + cgraph_node *callee = cgraph_node::get_create (callee_decl); + callees.safe_push (callee); + } + else if (gphi *phi = dyn_cast<gphi *> (def)) + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) + { + tree arg = gimple_phi_arg_def (phi, i); + if (TREE_CODE (arg) != SSA_NAME) + return false; + if (!check_retval_uses (arg, phi)) + return false; + I think you should refactor this to a separate function and handle recursion. For recursion you miss simple SSA name assigns. For PHI args you want to allow literal zero as well (for -fdelete-null-pointer-checks). > The patch uses return_calles_map which is a hash_map<node, callees> such > that the return value of node is return value of one of these callees. > For above eg it would be <f, [__builtin_malloc]> > The analysis phase populates return_callees_map, and the propagation > phase uses it to take the "meet" of callees. I think you are supposed to treat functions optimistically as "malloc" (use the DECL_IS_MALLOC flag) and during propagation revisit that change by propagating across callgraph edges. callgraph edges that are relevant (calls feeding a pointer return value) should then be marked somehow (set a flag you stream). This also means that indirect calls should be only handled during propagation. Not sure if we have an example of a "edge encoder". Honza? Bonus points for auto-detecting alloc_size and alloc_align attributes and warning with -Wsuggest-attribute=malloc{_size,_align,}. > * LTO and memory management > This is a general question about LTO and memory management. > IIUC the following sequence takes place during normal LTO: > LGEN: generate_summary, write_summary > WPA: read_summary, execute ipa passes, write_opt_summary > > So I assumed it was OK in LGEN to allocate return_callees_map in > generate_summary and free it in write_summary and during WPA, allocate > return_callees_map in read_summary and free it after execute (since > write_opt_summary does not require return_callees_map). > > However with fat LTO, it seems the sequence changes for LGEN with > execute phase takes place after write_summary. However since > return_callees_map is freed in pure_const_write_summary and > propagate_malloc() accesses it in execute stage, it results in > segmentation fault. > > To work around this, I am using the following hack in pure_const_write_summary: > // FIXME: Do not free if -ffat-lto-objects is enabled. > if (!global_options.x_flag_fat_lto_objects) > free_return_callees_map (); > Is there a better approach for handling this ? > > Also I am not sure if I have written cgraph_node::set_malloc_flag[_1] correctly. > I tried to imitate cgraph_node::set_const_flag. > > The patch passes bootstrap+test on x86_64 and found a few functions in > the source tree (attached func_names.txt) that could be annotated with > malloc (I gave a brief look at some of the functions and didn't appear > to be false positives but I will recheck thoroughly) > > Does the patch look in the right direction ? > I would be grateful for suggestions on improving it. > > Thanks, > Prathamesh >
> The patch passes bootstrap+test on x86_64 and found a few functions in > the source tree (attached func_names.txt) that could be annotated with > malloc (I gave a brief look at some of the functions and didn't appear > to be false positives but I will recheck thoroughly) virtual char* libcp1::compiler::find(std::__cxx11::string&) const The virtual on the list of your candidates gave me pause. Consider this completely contrived example: struct B { virtual void* f (unsigned n) { return new char [n]; } }; void* foo (B &b, unsigned n) { return b.f (n); } Based on these definitions alone both functions are candidates for attribute malloc. But suppose foo is called with an object of a type derived from B that overrides f() to do something wacky (but strictly not invalid) like: struct D: B { char buf[32]; virtual void* f (unsigned n) { if (n < 32) return n <= 32 ? buf : B::f (n); } Breaking foo's attribute malloc constraint. In other words, I think virtual functions need to be excluded from the list (unless they're defined in a class marked final, or unless we know they're not overridden to break the constraint like above). Martin
On Wed, 17 May 2017, Martin Sebor wrote: > > The patch passes bootstrap+test on x86_64 and found a few functions in > > the source tree (attached func_names.txt) that could be annotated with > > malloc (I gave a brief look at some of the functions and didn't appear > > to be false positives but I will recheck thoroughly) > > virtual char* libcp1::compiler::find(std::__cxx11::string&) const > > The virtual on the list of your candidates gave me pause. Consider > this completely contrived example: > > struct B { > virtual void* f (unsigned n) { > return new char [n]; > } > }; > > void* foo (B &b, unsigned n) > { > return b.f (n); > } > > Based on these definitions alone both functions are candidates > for attribute malloc. > > But suppose foo is called with an object of a type derived from > B that overrides f() to do something wacky (but strictly not > invalid) like: > > struct D: B { > char buf[32]; > virtual void* f (unsigned n) { > if (n < 32) > return n <= 32 ? buf : B::f (n); > } > > Breaking foo's attribute malloc constraint. > > In other words, I think virtual functions need to be excluded > from the list (unless they're defined in a class marked final, > or unless we know they're not overridden to break the constraint > like above). But we are annotating the actual decl, not the type in the class struct. Richard.
> > struct D: B { > > char buf[32]; > > virtual void* f (unsigned n) { > > if (n < 32) > > return n <= 32 ? buf : B::f (n); > > } > > > > Breaking foo's attribute malloc constraint. > > > > In other words, I think virtual functions need to be excluded > > from the list (unless they're defined in a class marked final, > > or unless we know they're not overridden to break the constraint > > like above). > > But we are annotating the actual decl, not the type in the class > struct. Yep and we will be able to use the information in case we devirtualize the call. So this seems OK to me. Honza > > Richard.
> > * LTO and memory management > This is a general question about LTO and memory management. > IIUC the following sequence takes place during normal LTO: > LGEN: generate_summary, write_summary > WPA: read_summary, execute ipa passes, write_opt_summary > > So I assumed it was OK in LGEN to allocate return_callees_map in > generate_summary and free it in write_summary and during WPA, allocate > return_callees_map in read_summary and free it after execute (since > write_opt_summary does not require return_callees_map). > > However with fat LTO, it seems the sequence changes for LGEN with > execute phase takes place after write_summary. However since > return_callees_map is freed in pure_const_write_summary and > propagate_malloc() accesses it in execute stage, it results in > segmentation fault. > > To work around this, I am using the following hack in pure_const_write_summary: > // FIXME: Do not free if -ffat-lto-objects is enabled. > if (!global_options.x_flag_fat_lto_objects) > free_return_callees_map (); > Is there a better approach for handling this ? I think most passes just do not free summaries with -flto. We probably want to fix it to make it possible to compile multiple units i.e. from plugin by adding release_summaries method... So I would say it is OK to do the same as others do and leak it with -flto. > diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c > index e457166ea39..724c26e03f6 100644 > --- a/gcc/ipa-pure-const.c > +++ b/gcc/ipa-pure-const.c > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-scalar-evolution.h" > #include "intl.h" > #include "opts.h" > +#include "ssa.h" > > /* Lattice values for const and pure functions. Everything starts out > being const, then may drop to pure and then neither depending on > @@ -69,6 +70,15 @@ enum pure_const_state_e > > const char *pure_const_names[3] = {"const", "pure", "neither"}; > > +enum malloc_state_e > +{ > + PURE_CONST_MALLOC_TOP, > + PURE_CONST_MALLOC, > + PURE_CONST_MALLOC_BOTTOM > +}; It took me a while to work out what PURE_CONST means here :) I would just call it something like STATE_MALLOC_TOP... or so. ipa_pure_const is outdated name from the time pass was doing only those two. > @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; > > static vec<funct_state> funct_state_vec; > > +/* A map from node to subset of callees. The subset contains those callees > + * whose return-value is returned by the node. */ > +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map; > + Hehe, a special case of return jump function. We ought to support those more generally. How do you keep it up to date over callgraph changes? > @@ -921,6 +1055,23 @@ end: > if (TREE_NOTHROW (decl)) > l->can_throw = false; > > + if (ipa) > + { > + vec<cgraph_node *> v = vNULL; > + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; > + if (DECL_IS_MALLOC (decl)) > + l->malloc_state = PURE_CONST_MALLOC; > + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) > + { > + l->malloc_state = PURE_CONST_MALLOC_TOP; > + vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL); > + for (unsigned i = 0; i < v.length (); ++i) > + callees_p->safe_push (v[i]); > + return_callees_map->put (fn, callees_p); > + } > + v.release (); > + } > + I would do non-ipa variant, too. I think most attributes can be detected that way as well. The patch generally makes sense to me. It would be nice to make it easier to write such a basic propagators across callgraph (perhaps adding a template doing the basic propagation logic). Also I think you need to solve the problem with keeping your summaries up to date across callgraph node removal and duplications. Honza
On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote: >> >> * LTO and memory management >> This is a general question about LTO and memory management. >> IIUC the following sequence takes place during normal LTO: >> LGEN: generate_summary, write_summary >> WPA: read_summary, execute ipa passes, write_opt_summary >> >> So I assumed it was OK in LGEN to allocate return_callees_map in >> generate_summary and free it in write_summary and during WPA, allocate >> return_callees_map in read_summary and free it after execute (since >> write_opt_summary does not require return_callees_map). >> >> However with fat LTO, it seems the sequence changes for LGEN with >> execute phase takes place after write_summary. However since >> return_callees_map is freed in pure_const_write_summary and >> propagate_malloc() accesses it in execute stage, it results in >> segmentation fault. >> >> To work around this, I am using the following hack in pure_const_write_summary: >> // FIXME: Do not free if -ffat-lto-objects is enabled. >> if (!global_options.x_flag_fat_lto_objects) >> free_return_callees_map (); >> Is there a better approach for handling this ? > > I think most passes just do not free summaries with -flto. We probably want > to fix it to make it possible to compile multiple units i.e. from plugin by > adding release_summaries method... > So I would say it is OK to do the same as others do and leak it with -flto. >> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c >> index e457166ea39..724c26e03f6 100644 >> --- a/gcc/ipa-pure-const.c >> +++ b/gcc/ipa-pure-const.c >> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see >> #include "tree-scalar-evolution.h" >> #include "intl.h" >> #include "opts.h" >> +#include "ssa.h" >> >> /* Lattice values for const and pure functions. Everything starts out >> being const, then may drop to pure and then neither depending on >> @@ -69,6 +70,15 @@ enum pure_const_state_e >> >> const char *pure_const_names[3] = {"const", "pure", "neither"}; >> >> +enum malloc_state_e >> +{ >> + PURE_CONST_MALLOC_TOP, >> + PURE_CONST_MALLOC, >> + PURE_CONST_MALLOC_BOTTOM >> +}; > > It took me a while to work out what PURE_CONST means here :) > I would just call it something like STATE_MALLOC_TOP... or so. > ipa_pure_const is outdated name from the time pass was doing only > those two. >> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; >> >> static vec<funct_state> funct_state_vec; >> >> +/* A map from node to subset of callees. The subset contains those callees >> + * whose return-value is returned by the node. */ >> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map; >> + > > Hehe, a special case of return jump function. We ought to support those more generally. > How do you keep it up to date over callgraph changes? >> @@ -921,6 +1055,23 @@ end: >> if (TREE_NOTHROW (decl)) >> l->can_throw = false; >> >> + if (ipa) >> + { >> + vec<cgraph_node *> v = vNULL; >> + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; >> + if (DECL_IS_MALLOC (decl)) >> + l->malloc_state = PURE_CONST_MALLOC; >> + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) >> + { >> + l->malloc_state = PURE_CONST_MALLOC_TOP; >> + vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL); >> + for (unsigned i = 0; i < v.length (); ++i) >> + callees_p->safe_push (v[i]); >> + return_callees_map->put (fn, callees_p); >> + } >> + v.release (); >> + } >> + > > I would do non-ipa variant, too. I think most attributes can be detected that way > as well. > > The patch generally makes sense to me. It would be nice to make it easier to write such > a basic propagators across callgraph (perhaps adding a template doing the basic > propagation logic). Also I think you need to solve the problem with keeping your > summaries up to date across callgraph node removal and duplications. Thanks for the suggestions, I will try to address them in a follow-up patch. IIUC, I would need to modify ipa-pure-const cgraph hooks - add_new_function, remove_node_data, duplicate_node_data to keep return_callees_map up-to-date across callgraph node insertions and removal ? Also, if instead of having a separate data-structure like return_callees_map, should we rather have a flag within cgraph_edge, which marks that the caller may return the value of the callee ? Thanks, Prathamesh > > Honza
diff --git a/gcc/cgraph.c b/gcc/cgraph.c index e505b10e211..eab8216100e 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2492,6 +2492,61 @@ cgraph_node::set_nothrow_flag (bool nothrow) return changed; } +/* Worker to set malloc flag. */ +static void +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool non_call, + bool *changed) +{ + cgraph_edge *e; + + if (malloc_p && !DECL_IS_MALLOC (node->decl)) + { + /* With non-call exceptions we can't say for sure if other function + body was not possibly optimized to still throw. */ + if (!non_call || node->binds_to_current_def_p ()) + { + DECL_IS_MALLOC (node->decl) = true; + *changed = true; + } + } + else if (!malloc_p && DECL_IS_MALLOC (node->decl)) + { + DECL_IS_MALLOC (node->decl) = false; + *changed = true; + } + + ipa_ref *ref; + FOR_EACH_ALIAS (node, ref) + { + cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring); + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) + set_malloc_flag_1 (alias, malloc_p, non_call, changed); + } +} +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any. */ + +bool +cgraph_node::set_malloc_flag (bool malloc_p) +{ + bool changed = false; + bool non_call = opt_for_fn (decl, flag_non_call_exceptions); + + if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE) + set_malloc_flag_1 (this, malloc_p, non_call, &changed); + else + { + ipa_ref *ref; + + FOR_EACH_ALIAS (this, ref) + { + cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring); + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) + set_malloc_flag_1 (alias, malloc_p, non_call, &changed); + } + } + return changed; +} + /* Worker to set_const_flag. */ static void diff --git a/gcc/cgraph.h b/gcc/cgraph.h index be4eaee71e2..6739bb3094d 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1116,6 +1116,10 @@ public: if any to NOTHROW. */ bool set_nothrow_flag (bool nothrow); + /* SET DECL_IS_MALLOC on cgraph_node's decl and on aliases of the node + if any. */ + bool set_malloc_flag (bool malloc_p); + /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST. If SET_CONST if false, clear the flag. diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index e457166ea39..724c26e03f6 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "intl.h" #include "opts.h" +#include "ssa.h" /* Lattice values for const and pure functions. Everything starts out being const, then may drop to pure and then neither depending on @@ -69,6 +70,15 @@ enum pure_const_state_e const char *pure_const_names[3] = {"const", "pure", "neither"}; +enum malloc_state_e +{ + PURE_CONST_MALLOC_TOP, + PURE_CONST_MALLOC, + PURE_CONST_MALLOC_BOTTOM +}; + +const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"}; + /* Holder for the const_state. There is one of these per function decl. */ struct funct_state_d @@ -92,11 +102,13 @@ struct funct_state_d /* If function can call free, munmap or otherwise make previously non-trapping memory accesses trapping. */ bool can_free; + + enum malloc_state_e malloc_state; }; /* State used when we know nothing about function. */ static struct funct_state_d varying_state - = { IPA_NEITHER, IPA_NEITHER, true, true, true, true }; + = { IPA_NEITHER, IPA_NEITHER, true, true, true, true, PURE_CONST_MALLOC_BOTTOM }; typedef struct funct_state_d * funct_state; @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; static vec<funct_state> funct_state_vec; +/* A map from node to subset of callees. The subset contains those callees + * whose return-value is returned by the node. */ +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map; + static bool gate_pure_const (void); namespace { @@ -812,6 +828,124 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa) } } +static bool +check_retval_uses (tree retval, gimple *stmt) +{ + imm_use_iterator use_iter; + gimple *use_stmt; + + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval) + if (gcond *cond = dyn_cast<gcond *> (use_stmt)) + { + enum tree_code code = gimple_cond_code (cond); + if (TREE_CODE_CLASS (code) != tcc_comparison) + RETURN_FROM_IMM_USE_STMT (use_iter, false); + } + else if (gassign *ga = dyn_cast<gassign *> (use_stmt)) + { + enum tree_code code = gimple_assign_rhs_code (ga); + if (TREE_CODE_CLASS (code) != tcc_comparison) + RETURN_FROM_IMM_USE_STMT (use_iter, false); + } + else if (use_stmt != stmt) + RETURN_FROM_IMM_USE_STMT (use_iter, false); + + return true; +} + +/* + * Currently this function does a very conservative analysis to check if + * function could be a malloc candidate. + * + * The function is considered to be a candidate if + * 1) The function returns a value of pointer type. + * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or + * a phi, and SSA_NAME_DEF_STMT(each element of phi) is function call. + * 3) The return-value has immediate uses only within comparisons (gcond or gassign) + * and return_stmt (and likewise a phi arg has immediate use only within comparison + * or the phi stmt). + */ + +static bool +malloc_candidate_p (function *fun, vec<cgraph_node *>& callees) +{ + basic_block bb; + FOR_EACH_BB_FN (bb, fun) + { + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi)); + if (ret_stmt) + { + tree retval = gimple_return_retval (ret_stmt); + if (!retval) + return false; + + if (TREE_CODE (retval) != SSA_NAME + || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE) + return false; + + if (!check_retval_uses (retval, ret_stmt)) + return false; + + gimple *def = SSA_NAME_DEF_STMT (retval); + if (gcall *call_stmt = dyn_cast<gcall *> (def)) + { + tree callee_decl = gimple_call_fndecl (call_stmt); + /* FIXME: In case of indirect call, callee_decl might be NULL + Not sure what to do in that case, punting for now. */ + if (!callee_decl) + return false; + cgraph_node *callee = cgraph_node::get_create (callee_decl); + callees.safe_push (callee); + } + else if (gphi *phi = dyn_cast<gphi *> (def)) + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) + { + tree arg = gimple_phi_arg_def (phi, i); + if (TREE_CODE (arg) != SSA_NAME) + return false; + if (!check_retval_uses (arg, phi)) + return false; + + gimple *arg_def = SSA_NAME_DEF_STMT (arg); + gcall *call_stmt = dyn_cast<gcall *> (arg_def); + if (!call_stmt) + return false; + tree callee_decl = gimple_call_fndecl (call_stmt); + if (!callee_decl) + return false; + cgraph_node *callee = cgraph_node::get_create (callee_decl); + callees.safe_push (callee); + } + else + return false; + } + } + } + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\nFound %s to be candidate for malloc attribute\n", + IDENTIFIER_POINTER (DECL_NAME (fun->decl))); + return true; +} + + +static void +free_return_callees_map (void) +{ + for (hash_map<cgraph_node *, vec<cgraph_node *>*>::iterator it = return_callees_map->begin (); + it != return_callees_map->end (); + ++it) + { + std::pair<cgraph_node *, vec<cgraph_node *>*> p = *it; + p.second->release (); + delete p.second; + } + delete return_callees_map; +} /* This is the main routine for finding the reference patterns for global variables within a function FN. */ @@ -921,6 +1055,23 @@ end: if (TREE_NOTHROW (decl)) l->can_throw = false; + if (ipa) + { + vec<cgraph_node *> v = vNULL; + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; + if (DECL_IS_MALLOC (decl)) + l->malloc_state = PURE_CONST_MALLOC; + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) + { + l->malloc_state = PURE_CONST_MALLOC_TOP; + vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL); + for (unsigned i = 0; i < v.length (); ++i) + callees_p->safe_push (v[i]); + return_callees_map->put (fn, callees_p); + } + v.release (); + } + pop_cfun (); if (dump_file) { @@ -1003,6 +1154,7 @@ pure_const_generate_summary (void) pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass); pass->register_hooks (); + return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *>*>; /* Process all of the functions. @@ -1067,11 +1219,30 @@ pure_const_write_summary (void) bp_pack_value (&bp, fs->looping, 1); bp_pack_value (&bp, fs->can_throw, 1); bp_pack_value (&bp, fs->can_free, 1); + bp_pack_value (&bp, fs->malloc_state, 2); streamer_write_bitpack (&bp); + + vec<cgraph_node *> **callees_pp = return_callees_map->get (node); + if (callees_pp) + { + vec<cgraph_node *> callees = **callees_pp; + streamer_write_uhwi_stream (ob->main_stream, callees.length ()); + for (unsigned i = 0; i < callees.length (); ++i) + { + int callee_ref = lto_symtab_encoder_encode (encoder, callees[i]); + streamer_write_uhwi_stream (ob->main_stream, callee_ref); + } + } + else + streamer_write_uhwi_stream (ob->main_stream, 0); } } lto_destroy_simple_output_block (ob); + + // FIXME: Do not free if -ffat-lto-objects is enabled. + if (!global_options.x_flag_fat_lto_objects) + free_return_callees_map (); } @@ -1087,6 +1258,8 @@ pure_const_read_summary (void) pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass); pass->register_hooks (); + return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *> *>; + while ((file_data = file_data_vec[j++])) { const char *data; @@ -1127,6 +1300,22 @@ pure_const_read_summary (void) fs->looping = bp_unpack_value (&bp, 1); fs->can_throw = bp_unpack_value (&bp, 1); fs->can_free = bp_unpack_value (&bp, 1); + fs->malloc_state + = (enum malloc_state_e) bp_unpack_value (&bp, 2); + + unsigned len = streamer_read_uhwi (ib); + if (len) + { + vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL); + for (unsigned i = 0; i < len; i++) + { + int callee_index = streamer_read_uhwi (ib); + cgraph_node *callee = dyn_cast<cgraph_node *> (lto_symtab_encoder_deref (encoder, callee_index)); + callees_p->safe_push (callee); + } + return_callees_map->put (node, callees_p); + } + if (dump_file) { int flags = flags_from_decl_or_type (node->decl); @@ -1151,6 +1340,8 @@ pure_const_read_summary (void) fprintf (dump_file," function is locally throwing\n"); if (fs->can_free) fprintf (dump_file," function can locally free\n"); + fprintf (dump_file, "\n malloc state: %s\n", + malloc_state_names[fs->malloc_state]); } } @@ -1664,6 +1855,121 @@ propagate_nothrow (void) free (order); } +DEBUG_FUNCTION +static void +dump_malloc_lattice (FILE *dump_file, const char *s) +{ + if (!dump_file) + return; + + fprintf (dump_file, "\n\nMALLOC LATTICE %s:\n", s); + cgraph_node *node; + FOR_EACH_FUNCTION (node) + { + funct_state fs = get_function_state (node); + malloc_state_e state = fs->malloc_state; + fprintf (dump_file, "%s: %s\n", node->name (), malloc_state_names[state]); + } +} + +static void +propagate_malloc (void) +{ + cgraph_node *node; + FOR_EACH_FUNCTION (node) + { + if (DECL_IS_MALLOC (node->decl)) + if (!has_function_state (node)) + { + funct_state l = XCNEW (struct funct_state_d); + *l = varying_state; + l->malloc_state = PURE_CONST_MALLOC; + set_function_state (node, l); + } + } + + dump_malloc_lattice (dump_file, "Initial"); + struct cgraph_node **order + = XNEWVEC (struct cgraph_node *, symtab->cgraph_count); + int order_pos = ipa_reverse_postorder (order); + bool changed = true; + + while (changed) + { + changed = false; + /* Walk in postorder. */ + for (int i = order_pos - 1; i >= 0; --i) + { + cgraph_node *node = order[i]; + if (node->alias + || !node->definition + || !has_function_state (node)) + continue; + + funct_state l = get_function_state (node); + + /* FIXME: add support for indirect-calls. */ + if (node->indirect_calls) + { + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; + continue; + } + + if (node->get_availability () <= AVAIL_INTERPOSABLE) + { + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; + continue; + } + + if (l->malloc_state == PURE_CONST_MALLOC_BOTTOM) + continue; + + vec<cgraph_node *> **callees_pp = return_callees_map->get (node); + if (!callees_pp) + continue; + + vec<cgraph_node *> callees = **callees_pp; + malloc_state_e new_state = l->malloc_state; + for (unsigned j = 0; j < callees.length (); j++) + { + cgraph_node *callee = callees[j]; + if (!has_function_state (callee)) + { + new_state = PURE_CONST_MALLOC_BOTTOM; + break; + } + malloc_state_e callee_state = get_function_state (callee)->malloc_state; + if (new_state < callee_state) + new_state = callee_state; + } + if (new_state != l->malloc_state) + { + changed = true; + l->malloc_state = new_state; + } + } + } + + FOR_EACH_DEFINED_FUNCTION (node) + if (has_function_state (node)) + { + funct_state l = get_function_state (node); + if (!node->alias + && l->malloc_state == PURE_CONST_MALLOC + && !node->global.inlined_to) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Function %s found to be malloc\n", + node->name ()); + node->set_malloc_flag (true); + } + } + + dump_malloc_lattice (dump_file, "after propagation"); + ipa_free_postorder_info (); + free (order); + free_return_callees_map (); +} /* Produce the global information by preforming a transitive closure on the local information that was produced by generate_summary. */ @@ -1682,6 +1988,7 @@ execute (function *) /* Nothrow makes more function to not lead to return and improve later analysis. */ propagate_nothrow (); + propagate_malloc (); remove_p = propagate_pure_const (); /* Cleanup. */ diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h index c8aa77bd4f3..740cbf13cb2 100644 --- a/gcc/ssa-iterators.h +++ b/gcc/ssa-iterators.h @@ -93,6 +93,12 @@ struct imm_use_iterator break; \ } +/* Similarly for return. */ +#define RETURN_FROM_IMM_USE_STMT(ITER, VAL) \ + { \ + end_imm_use_stmt_traverse (&(ITER)); \ + return (VAL); \ + } /* Use this iterator in combination with FOR_EACH_IMM_USE_STMT to get access to each occurrence of ssavar on the stmt returned by diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c new file mode 100644 index 00000000000..9a95f817079 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */ + +__attribute__((noinline, no_icf, used)) +static void *f(__SIZE_TYPE__ n) +{ + void *p = __builtin_malloc (n); + if (p == 0) + __builtin_abort (); + return p; +} + +__attribute__((noinline, no_icf, used)) +static void *bar(__SIZE_TYPE__ n) +{ + void *p = f (n); + return p; +} + +/* { dg-final { scan-ipa-dump "Function f found to be malloc" "pure-const" } } */ +/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */ diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c new file mode 100644 index 00000000000..95b2fd74a7a --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */ + +__attribute__((noinline, used, no_icf)) +static void *foo (__SIZE_TYPE__ n) +{ + return __builtin_malloc (n * 10); +} + +__attribute__((noinline, used, no_icf)) +static void *bar(__SIZE_TYPE__ n, int cond) +{ + void *p; + if (cond) + p = foo (n); + else + p = __builtin_malloc (n); + + return p; +} + +/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */ +/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */ diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c new file mode 100644 index 00000000000..13558ddd07d --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */ + +static void *foo(__SIZE_TYPE__, int) __attribute__((noinline, no_icf, used)); + +__attribute__((noinline, used, no_icf)) +static void *bar(__SIZE_TYPE__ n, int m) +{ + return foo (n, m); +} + +static void *foo(__SIZE_TYPE__ n, int m) +{ + void *p; + if (m > 0) + p = bar (n, --m); + else + p = __builtin_malloc (n); + + return p; +} + +/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */ +/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */