Message ID | 20211031101238.523164-1-keithp@keithp.com |
---|---|
State | New |
Headers | show |
Series | Add -fopt-builtin optimization option | expand |
On Sun, Oct 31, 2021 at 11:13 AM Keith Packard via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > This option (enabled by default) controls optimizations which convert > a sequence of operations into an equivalent sequence that includes > calls to builtin functions. Typical cases here are code which matches > memcpy, calloc, sincos. > > The -ftree-loop-distribute-patterns flag only covers converting loops > into builtin calls, not numerous other places where knowledge of > builtin function semantics changes the generated code. > > The goal is to allow built-in functions to be declared by the compiler > and used directly by the application, but to disable optimizations > which create new calls to them, and to allow this optimization > behavior to be changed for individual functions by decorating the > function definition like this: > > void > attribute((optimize("no-opt-builtin"))) > sincos(double x, double *s, double *c) > { > *s = sin(x); > *c = cos(x); > } > > This also avoids converting loops into library calls like this: > > void * > attribute((optimize("no-opt-builtin"))) > memcpy(void *__restrict__ dst, const void *__restrict__ src, size_t n) > { > char *d = dst; > const char *s = src; > > while (n--) > *d++ = *s++; > return dst; > } > > As well as disabling analysis of memory lifetimes around free as in > this example: > > void * > attribute((optimize("no-opt-builtin"))) > erase_and_free(void *ptr) > { > memset(ptr, '\0', malloc_usable_size(ptr)); > free(ptr); > } > > Clang has a more sophisticated version of this mechanism which > can disable all builtins, or disable a specific builtin: > > double > attribute((no_builtin("exp2"))) > exp2(double x) > { > return pow (2.0, x); > } I don't think it reliably works the way you implement it. It's also having more side-effects than what you document, in particular pow (2.0, x); will now clobber and use global memory (besides errno). I think you may want to instead change builtin_decl_implicit to avoid code-generating a specific builtin. Generally we'd also want sth like the clang attribute and _not_ use optimize("") for this or a global flag_*, so the behavior can be more readily encoded in the IL. In fact a flag on the call statement could be added to denote the desired effect on it. I also don't see the advantage compared to -fno-builtin[-foo]. Declaring the function should be something that's already done. Richard. > Signed-off-by: Keith Packard <keithp@keithp.com> > --- > gcc/builtins.c | 6 ++++++ > gcc/common.opt | 4 ++++ > gcc/gimple.c | 3 +++ > gcc/tree-loop-distribution.c | 2 ++ > 4 files changed, 15 insertions(+) > > diff --git a/gcc/builtins.c b/gcc/builtins.c > index 7d0f61fc98b..7aae57deab5 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -1922,6 +1922,9 @@ mathfn_built_in_2 (tree type, combined_fn fn) > built_in_function fcodef64x = END_BUILTINS; > built_in_function fcodef128x = END_BUILTINS; > > + if (flag_no_opt_builtin) > + return END_BUILTINS; > + > switch (fn) > { > #define SEQ_OF_CASE_MATHFN \ > @@ -2125,6 +2128,9 @@ mathfn_built_in_type (combined_fn fn) > case CFN_BUILT_IN_##MATHFN##L_R: \ > return long_double_type_node; > > + if (flag_no_opt_builtin) > + return NULL_TREE; > + > switch (fn) > { > SEQ_OF_CASE_MATHFN > diff --git a/gcc/common.opt b/gcc/common.opt > index eeba1a727f2..d6111cc776a 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2142,6 +2142,10 @@ fomit-frame-pointer > Common Var(flag_omit_frame_pointer) Optimization > When possible do not generate stack frames. > > +fopt-builtin > +Common Var(flag_no_opt_builtin, 0) Optimization > +Match code sequences equivalent to builtin functions > + > fopt-info > Common Var(flag_opt_info) Optimization > Enable all optimization info dumps on stderr. > diff --git a/gcc/gimple.c b/gcc/gimple.c > index 22dd6417d19..5b82b9409c0 100644 > --- a/gcc/gimple.c > +++ b/gcc/gimple.c > @@ -2790,6 +2790,9 @@ gimple_builtin_call_types_compatible_p (const gimple *stmt, tree fndecl) > { > gcc_checking_assert (DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN); > > + if (flag_no_opt_builtin) > + return false; > + > tree ret = gimple_call_lhs (stmt); > if (ret > && !useless_type_conversion_p (TREE_TYPE (ret), > diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c > index 583c01a42d8..43f22a3c7ce 100644 > --- a/gcc/tree-loop-distribution.c > +++ b/gcc/tree-loop-distribution.c > @@ -1859,6 +1859,7 @@ loop_distribution::classify_partition (loop_p loop, > > /* Perform general partition disqualification for builtins. */ > if (volatiles_p > + || flag_no_opt_builtin > || !flag_tree_loop_distribute_patterns) > return has_reduction; > > @@ -3764,6 +3765,7 @@ loop_distribution::execute (function *fun) > /* Don't distribute multiple exit edges loop, or cold loop when > not doing pattern detection. */ > if (!single_exit (loop) > + || flag_no_opt_builtin > || (!flag_tree_loop_distribute_patterns > && !optimize_loop_for_speed_p (loop))) > continue; > -- > 2.33.0 >
Richard Biener <richard.guenther@gmail.com> writes: > I don't think it reliably works the way you implement it. It's also having > more side-effects than what you document, in particular Yeah, I made a 'minimal' patch that had the effect I needed, but it's clearly in the wrong place as it disables the matching of builtins against the incoming source code instead of the generation of new builtin references from the tree. > I think you may want to instead change builtin_decl_implicit > to avoid code-generating a specific builtin. Yup, I looked at that and there are numerous places which assume that will work, so it will be a more complicated patch. > Generally we'd also want sth like the clang attribute and _not_ > use optimize("") for this or a global flag_*, so the behavior can > be more readily encoded in the IL. In fact a flag on the call > statement could be added to denote the desired effect on it. Agreed, using the existing optimize attribute was a short-cut to leverage the existing code handling that case. If we think providing something that matches the clang attribute would be useful, it makes sense to provide it using the same syntax. > I also don't see the advantage compared to -fno-builtin[-foo]. > Declaring the function should be something that's already done. The semantic of the clang option is not to completely disable access to the given builtin function, but rather to stop the optimizer from creating new builtin function references (either to a specific builtin, or to all builtins). If I could use "no-builtin" in a function attribute, I probably wouldn't have bothered looking to implement the clang semantics, but -fno-builtin isn't supported in this way. But, now that I think I understand the behavior of attribute((no_builtin)) in clang, I think it has value beyond what -fno-builtin performs as you can still gain access to builtin functions when they are directly named. I'll go implement changes in builtin_decl_implicit and all of the affected call sites and see what that looks like. Thanks much for your review!
diff --git a/gcc/builtins.c b/gcc/builtins.c index 7d0f61fc98b..7aae57deab5 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -1922,6 +1922,9 @@ mathfn_built_in_2 (tree type, combined_fn fn) built_in_function fcodef64x = END_BUILTINS; built_in_function fcodef128x = END_BUILTINS; + if (flag_no_opt_builtin) + return END_BUILTINS; + switch (fn) { #define SEQ_OF_CASE_MATHFN \ @@ -2125,6 +2128,9 @@ mathfn_built_in_type (combined_fn fn) case CFN_BUILT_IN_##MATHFN##L_R: \ return long_double_type_node; + if (flag_no_opt_builtin) + return NULL_TREE; + switch (fn) { SEQ_OF_CASE_MATHFN diff --git a/gcc/common.opt b/gcc/common.opt index eeba1a727f2..d6111cc776a 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2142,6 +2142,10 @@ fomit-frame-pointer Common Var(flag_omit_frame_pointer) Optimization When possible do not generate stack frames. +fopt-builtin +Common Var(flag_no_opt_builtin, 0) Optimization +Match code sequences equivalent to builtin functions + fopt-info Common Var(flag_opt_info) Optimization Enable all optimization info dumps on stderr. diff --git a/gcc/gimple.c b/gcc/gimple.c index 22dd6417d19..5b82b9409c0 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -2790,6 +2790,9 @@ gimple_builtin_call_types_compatible_p (const gimple *stmt, tree fndecl) { gcc_checking_assert (DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN); + if (flag_no_opt_builtin) + return false; + tree ret = gimple_call_lhs (stmt); if (ret && !useless_type_conversion_p (TREE_TYPE (ret), diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 583c01a42d8..43f22a3c7ce 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1859,6 +1859,7 @@ loop_distribution::classify_partition (loop_p loop, /* Perform general partition disqualification for builtins. */ if (volatiles_p + || flag_no_opt_builtin || !flag_tree_loop_distribute_patterns) return has_reduction; @@ -3764,6 +3765,7 @@ loop_distribution::execute (function *fun) /* Don't distribute multiple exit edges loop, or cold loop when not doing pattern detection. */ if (!single_exit (loop) + || flag_no_opt_builtin || (!flag_tree_loop_distribute_patterns && !optimize_loop_for_speed_p (loop))) continue;
This option (enabled by default) controls optimizations which convert a sequence of operations into an equivalent sequence that includes calls to builtin functions. Typical cases here are code which matches memcpy, calloc, sincos. The -ftree-loop-distribute-patterns flag only covers converting loops into builtin calls, not numerous other places where knowledge of builtin function semantics changes the generated code. The goal is to allow built-in functions to be declared by the compiler and used directly by the application, but to disable optimizations which create new calls to them, and to allow this optimization behavior to be changed for individual functions by decorating the function definition like this: void attribute((optimize("no-opt-builtin"))) sincos(double x, double *s, double *c) { *s = sin(x); *c = cos(x); } This also avoids converting loops into library calls like this: void * attribute((optimize("no-opt-builtin"))) memcpy(void *__restrict__ dst, const void *__restrict__ src, size_t n) { char *d = dst; const char *s = src; while (n--) *d++ = *s++; return dst; } As well as disabling analysis of memory lifetimes around free as in this example: void * attribute((optimize("no-opt-builtin"))) erase_and_free(void *ptr) { memset(ptr, '\0', malloc_usable_size(ptr)); free(ptr); } Clang has a more sophisticated version of this mechanism which can disable all builtins, or disable a specific builtin: double attribute((no_builtin("exp2"))) exp2(double x) { return pow (2.0, x); } Signed-off-by: Keith Packard <keithp@keithp.com> --- gcc/builtins.c | 6 ++++++ gcc/common.opt | 4 ++++ gcc/gimple.c | 3 +++ gcc/tree-loop-distribution.c | 2 ++ 4 files changed, 15 insertions(+)