Message ID | 20140611081414.GA17894@msticlxl57.ims.intel.com |
---|---|
State | New |
Headers | show |
On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote: > This patch adds instrumented code support for strlen optimization. > > Bootstrapped and tested on linux-x86_64. > > Does it look OK? Ugh, this looks terribly ugly. So you are changing the meaning of arguments of every single builtin that has pointer arguments, so all spots that work with those builtins have to take into account gimple_call_with_bounds_p and do something different? That is very error prone and maintainance nightmare. Jakub
2014-06-11 12:22 GMT+04:00 Jakub Jelinek <jakub@redhat.com>: > On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote: >> This patch adds instrumented code support for strlen optimization. >> >> Bootstrapped and tested on linux-x86_64. >> >> Does it look OK? > > Ugh, this looks terribly ugly. So you are changing the meaning of arguments > of every single builtin that has pointer arguments, so all spots that > work with those builtins have to take into account > gimple_call_with_bounds_p and do something different? > That is very error prone and maintainance nightmare. All such builtins are had to be handled anyway. It is possible to use new codes for instrumented builtins, but this would not change the handling code much. It is also possible to use something like gimple_call_nobnd_arg to use same arg numbers for both instrumented and not instrumented calls. This would still require additional support when we replace calls. Ilya > > Jakub
On Wed, Jun 11, 2014 at 12:44:29PM +0400, Ilya Enkovich wrote: > 2014-06-11 12:22 GMT+04:00 Jakub Jelinek <jakub@redhat.com>: > > On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote: > >> This patch adds instrumented code support for strlen optimization. > >> > >> Bootstrapped and tested on linux-x86_64. > >> > >> Does it look OK? > > > > Ugh, this looks terribly ugly. So you are changing the meaning of arguments > > of every single builtin that has pointer arguments, so all spots that > > work with those builtins have to take into account > > gimple_call_with_bounds_p and do something different? > > That is very error prone and maintainance nightmare. > > All such builtins are had to be handled anyway. It is possible to use > new codes for instrumented builtins, but this would not change the > handling code much. It is also possible to use something like > gimple_call_nobnd_arg to use same arg numbers for both instrumented > and not instrumented calls. This would still require additional > support when we replace calls. That just brings the question why is all the instrumentation introduced so early that it affects practically all GIMPLE passes? I mean, mudflap, also a bounded pointer implementation, could get away with two passes and practically no changes to the rest of the compiler, asan also has two passes and almost no changes to the rest, why does this need to affect everything? If that is because you need to clone functions, which you indeed need to do at IPA time, perhaps you can just clone the functions but not add the instrumentation to the whole IL at that point, just note you've created a bounded pointer clone, and then just shortly before expansion in a new pass add the instrumentation, plus hook into expansion that it is expanded properly? Changes all over the compiler for a rarely used feature most GCC developers will not be able to even test is going to cause substantial maintainance issues. Jakub
On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote: > This patch adds instrumented code support for strlen optimization. > > Bootstrapped and tested on linux-x86_64. > > Does it look OK? I haven't followed the Pointer Bounds Checker development, only looked at some of the patches, but I think that if all builtins similar to BUILT_IN_STRCPY would need changes like in the patch below, we're heading down a very bad road. Adding with_bounds code to every second builtin in so many passes to properly handle its arguments will surely lead to unmaintainable mess over time. Marek
2014-06-11 13:00 GMT+04:00 Marek Polacek <polacek@redhat.com>: > On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote: >> This patch adds instrumented code support for strlen optimization. >> >> Bootstrapped and tested on linux-x86_64. >> >> Does it look OK? > > I haven't followed the Pointer Bounds Checker development, only looked at > some of the patches, but I think that if all builtins similar to > BUILT_IN_STRCPY would need changes like in the patch below, we're > heading down a very bad road. Adding with_bounds code to every second > builtin in so many passes to properly handle its arguments will surely > lead to unmaintainable mess over time. I think talking about every second builtin and many passes is too speculative. I do not see troubles with builtins other than string functions which are handled mostly in strlen pass. And string functions were always a special case. They are too important to have them not optimized or not instrumented. In cases of troubles over time instrumentation pass may be restricted to instrument only some subset of builtin functions. It is also possible to use different codes for instrumented builtins and thus have only adjusted code to work with them. Ilya > > Marek
2014-06-11 12:55 GMT+04:00 Jakub Jelinek <jakub@redhat.com>: > On Wed, Jun 11, 2014 at 12:44:29PM +0400, Ilya Enkovich wrote: >> 2014-06-11 12:22 GMT+04:00 Jakub Jelinek <jakub@redhat.com>: >> > On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote: >> >> This patch adds instrumented code support for strlen optimization. >> >> >> >> Bootstrapped and tested on linux-x86_64. >> >> >> >> Does it look OK? >> > >> > Ugh, this looks terribly ugly. So you are changing the meaning of arguments >> > of every single builtin that has pointer arguments, so all spots that >> > work with those builtins have to take into account >> > gimple_call_with_bounds_p and do something different? >> > That is very error prone and maintainance nightmare. >> >> All such builtins are had to be handled anyway. It is possible to use >> new codes for instrumented builtins, but this would not change the >> handling code much. It is also possible to use something like >> gimple_call_nobnd_arg to use same arg numbers for both instrumented >> and not instrumented calls. This would still require additional >> support when we replace calls. > > That just brings the question why is all the instrumentation introduced so > early that it affects practically all GIMPLE passes? > > I mean, mudflap, also a bounded pointer implementation, could get away with > two passes and practically no changes to the rest of the compiler, asan > also has two passes and almost no changes to the rest, why does this need > to affect everything? If that is because you need to clone functions, > which you indeed need to do at IPA time, perhaps you can just clone the > functions but not add the instrumentation to the whole IL at that point, > just note you've created a bounded pointer clone, and then just shortly > before expansion in a new pass add the instrumentation, plus hook into > expansion that it is expanded properly? Mudflap did not work with pointer bounds. It maintained a database of valid objects and checked each dereferenced pointer points to a valid object. Asan does similar thing but via shadow map. When we work with bounds it becomes more important which pointer is used for memory access and it forces early instrumentation before optimizations corrupt this information. Ilya > > Changes all over the compiler for a rarely used feature most GCC developers > will not be able to even test is going to cause substantial maintainance > issues. > > Jakub
2014-06-11 12:22 GMT+04:00 Jakub Jelinek <jakub@redhat.com>: > On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote: >> This patch adds instrumented code support for strlen optimization. >> >> Bootstrapped and tested on linux-x86_64. >> >> Does it look OK? > > Ugh, this looks terribly ugly. So you are changing the meaning of arguments > of every single builtin that has pointer arguments, so all spots that > work with those builtins have to take into account > gimple_call_with_bounds_p and do something different? > That is very error prone and maintainance nightmare. > > Jakub Hi Jakub, I've tried an approach with internal functions usage as you suggested on Cauldron. Shortly about used scheme: - only a selected set of builtin functions is instrumented - during instrumentation builtin call is replaced with internal function call I tried to implement it and faced few issues: 1. On expand pass we need to expand all those internal function and since they should be expanded as regular instrumented calls, I need to create a CALL_EXPR tree for that. I have nothing to use as fndecl for this call except the same instrumented builtin fndecl I use right now. Therefore the problem of instrumented call with the same builtin function code would still exist during expand pass. 2. Some builtins actually may have body. If I instrument such call and replace it with internal call then I loose a possibility to inline it. If I do not replace it with internal call then I have to make an instrumented call and again I have to create and use instrumented builtin fndecl having the same situation I have now but this time only until call is inlined. 3. Usage of internal functions means we cannot support whole program instrumentation (or we have to declare internal function for each builtin) which may appear a very useful scheme with reduced memory and performance penalties. Now I want to try another scheme. I want to change enum built_in_function so that each builtin code has a paired code to be used for instrumented version. I'm not going to actually declare additional builtins for users, I just want to obtain special function codes to be used for instrumented builtins and thus to not make current code handling builtins to work with instrumented calls. Ilya
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index f55b7ee..959b376 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" #include "params.h" #include "expr.h" +#include "tree-chkp.h" /* A vector indexed by SSA_NAME_VERSION. 0 means unknown, positive value is an index into strinfo vector, negative value stands for @@ -426,6 +427,7 @@ get_string_length (strinfo si) if (si->stmt) { gimple stmt = si->stmt, lenstmt; + bool with_bounds = gimple_call_with_bounds_p (stmt); tree callee, lhs, fn, tem; location_t loc; gimple_stmt_iterator gsi; @@ -449,7 +451,14 @@ get_string_length (strinfo si) fn = builtin_decl_implicit (BUILT_IN_STRLEN); gcc_assert (lhs == NULL_TREE); tem = unshare_expr (gimple_call_arg (stmt, 0)); - lenstmt = gimple_build_call (fn, 1, tem); + if (with_bounds) + { + lenstmt = gimple_build_call (chkp_maybe_create_clone (fn)->decl, + 2, tem, gimple_call_arg (stmt, 1)); + gimple_call_set_with_bounds (lenstmt, true); + } + else + lenstmt = gimple_build_call (fn, 1, tem); lhs = make_ssa_name (TREE_TYPE (TREE_TYPE (fn)), lenstmt); gimple_call_set_lhs (lenstmt, lhs); gimple_set_vuse (lenstmt, gimple_vuse (stmt)); @@ -472,10 +481,12 @@ get_string_length (strinfo si) /* FALLTHRU */ case BUILT_IN_STRCPY: case BUILT_IN_STRCPY_CHK: - if (gimple_call_num_args (stmt) == 2) + if (gimple_call_num_args (stmt) == (with_bounds ? 4 : 2)) fn = builtin_decl_implicit (BUILT_IN_STPCPY); else fn = builtin_decl_explicit (BUILT_IN_STPCPY_CHK); + if (with_bounds) + fn = chkp_maybe_create_clone (fn)->decl; gcc_assert (lhs == NULL_TREE); if (dump_file && (dump_flags & TDF_DETAILS) != 0) { @@ -780,6 +791,7 @@ adjust_last_stmt (strinfo si, gimple stmt, bool is_strcat) tree vuse, callee, len; struct laststmt_struct last = laststmt; strinfo lastsi, firstsi; + unsigned len_arg_no = 2; laststmt.stmt = NULL; laststmt.len = NULL_TREE; @@ -855,7 +867,9 @@ adjust_last_stmt (strinfo si, gimple stmt, bool is_strcat) return; } - len = gimple_call_arg (last.stmt, 2); + if (gimple_call_with_bounds_p (last.stmt)) + len_arg_no = 4; + len = gimple_call_arg (last.stmt, len_arg_no); if (tree_fits_uhwi_p (len)) { if (!tree_fits_uhwi_p (last.len) @@ -879,7 +893,7 @@ adjust_last_stmt (strinfo si, gimple stmt, bool is_strcat) else return; - gimple_call_set_arg (last.stmt, 2, last.len); + gimple_call_set_arg (last.stmt, len_arg_no, last.len); update_stmt (last.stmt); } @@ -970,11 +984,12 @@ handle_builtin_strchr (gimple_stmt_iterator *gsi) tree src; gimple stmt = gsi_stmt (*gsi); tree lhs = gimple_call_lhs (stmt); + bool with_bounds = gimple_call_with_bounds_p (stmt); if (lhs == NULL_TREE) return; - if (!integer_zerop (gimple_call_arg (stmt, 1))) + if (!integer_zerop (gimple_call_arg (stmt, with_bounds ? 2 : 1))) return; src = gimple_call_arg (stmt, 0); @@ -1081,8 +1096,9 @@ handle_builtin_strcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi) gimple stmt = gsi_stmt (*gsi); strinfo si, dsi, olddsi, zsi; location_t loc; + bool with_bounds = gimple_call_with_bounds_p (stmt); - src = gimple_call_arg (stmt, 1); + src = gimple_call_arg (stmt, with_bounds ? 2 : 1); dst = gimple_call_arg (stmt, 0); lhs = gimple_call_lhs (stmt); idx = get_stridx (src); @@ -1268,14 +1284,33 @@ handle_builtin_strcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi) fprintf (dump_file, "Optimizing: "); print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); } - if (gimple_call_num_args (stmt) == 2) - success = update_gimple_call (gsi, fn, 3, dst, src, len); + if (with_bounds) + { + fn = chkp_maybe_create_clone (fn)->decl; + if (gimple_call_num_args (stmt) == 4) + success = update_gimple_call (gsi, fn, 5, dst, + gimple_call_arg (stmt, 1), + src, + gimple_call_arg (stmt, 3), + len); + else + success = update_gimple_call (gsi, fn, 6, dst, + gimple_call_arg (stmt, 1), + src, + gimple_call_arg (stmt, 3), + len, + gimple_call_arg (stmt, 4)); + } else - success = update_gimple_call (gsi, fn, 4, dst, src, len, - gimple_call_arg (stmt, 2)); + if (gimple_call_num_args (stmt) == 2) + success = update_gimple_call (gsi, fn, 3, dst, src, len); + else + success = update_gimple_call (gsi, fn, 4, dst, src, len, + gimple_call_arg (stmt, 2)); if (success) { stmt = gsi_stmt (*gsi); + gimple_call_set_with_bounds (stmt, with_bounds); update_stmt (stmt); if (dump_file && (dump_flags & TDF_DETAILS) != 0) { @@ -1303,9 +1338,10 @@ handle_builtin_memcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi) tree src, dst, len, lhs, oldlen, newlen; gimple stmt = gsi_stmt (*gsi); strinfo si, dsi, olddsi; + bool with_bounds = gimple_call_with_bounds_p (stmt); - len = gimple_call_arg (stmt, 2); - src = gimple_call_arg (stmt, 1); + len = gimple_call_arg (stmt, with_bounds ? 4 : 2); + src = gimple_call_arg (stmt, with_bounds ? 2 : 1); dst = gimple_call_arg (stmt, 0); idx = get_stridx (src); if (idx == 0) @@ -1442,8 +1478,9 @@ handle_builtin_strcat (enum built_in_function bcode, gimple_stmt_iterator *gsi) gimple stmt = gsi_stmt (*gsi); strinfo si, dsi; location_t loc; + bool with_bounds = gimple_call_with_bounds_p (stmt); - src = gimple_call_arg (stmt, 1); + src = gimple_call_arg (stmt, with_bounds ? 2 : 1); dst = gimple_call_arg (stmt, 0); lhs = gimple_call_lhs (stmt); @@ -1549,7 +1586,7 @@ handle_builtin_strcat (enum built_in_function bcode, gimple_stmt_iterator *gsi) fn = builtin_decl_explicit (BUILT_IN_MEMCPY_CHK); else fn = builtin_decl_explicit (BUILT_IN_STRCPY_CHK); - objsz = gimple_call_arg (stmt, 2); + objsz = gimple_call_arg (stmt, with_bounds ? 4 : 2); break; default: gcc_unreachable (); @@ -1584,15 +1621,35 @@ handle_builtin_strcat (enum built_in_function bcode, gimple_stmt_iterator *gsi) fprintf (dump_file, "Optimizing: "); print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); } - if (srclen != NULL_TREE) - success = update_gimple_call (gsi, fn, 3 + (objsz != NULL_TREE), - dst, src, len, objsz); + if (with_bounds) + { + fn = chkp_maybe_create_clone (fn)->decl; + if (srclen != NULL_TREE) + success = update_gimple_call (gsi, fn, 5 + (objsz != NULL_TREE), + dst, + gimple_call_arg (stmt, 1), + src, + gimple_call_arg (stmt, 3), + len, objsz); + else + success = update_gimple_call (gsi, fn, 4 + (objsz != NULL_TREE), + dst, + gimple_call_arg (stmt, 1), + src, + gimple_call_arg (stmt, 3), + objsz); + } else - success = update_gimple_call (gsi, fn, 2 + (objsz != NULL_TREE), - dst, src, objsz); + if (srclen != NULL_TREE) + success = update_gimple_call (gsi, fn, 3 + (objsz != NULL_TREE), + dst, src, len, objsz); + else + success = update_gimple_call (gsi, fn, 2 + (objsz != NULL_TREE), + dst, src, objsz); if (success) { stmt = gsi_stmt (*gsi); + gimple_call_set_with_bounds (stmt, with_bounds); update_stmt (stmt); if (dump_file && (dump_flags & TDF_DETAILS) != 0) {