diff mbox

[Pointer,Bounds,Checker,27/x] Strlen

Message ID 20140611081414.GA17894@msticlxl57.ims.intel.com
State New
Headers show

Commit Message

Ilya Enkovich June 11, 2014, 8:14 a.m. UTC
Hi,

This patch adds instrumented code support for strlen optimization.

Bootstrapped and tested on linux-x86_64.

Does it look OK?

Thanks,
Ilya
--
gcc/

2014-06-09  Ilya Enkovich  <ilya.enkovich@intel.com>

	* tree-ssa-strlen.c: include tree-chkp.h.
	(get_string_length): Handle calls with bounds.
	(adjust_last_stmt): Likewise.
	(handle_builtin_strchr): Likewise.
	(handle_builtin_strcpy): Likewise.
	(handle_builtin_memcpy): Likewise.
	(handle_builtin_strcat): Likewise.

Comments

Jakub Jelinek June 11, 2014, 8:22 a.m. UTC | #1
On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote:
> This patch adds instrumented code support for strlen optimization.
> 
> Bootstrapped and tested on linux-x86_64.
> 
> Does it look OK?

Ugh, this looks terribly ugly.  So you are changing the meaning of arguments
of every single builtin that has pointer arguments, so all spots that
work with those builtins have to take into account
gimple_call_with_bounds_p and do something different?
That is very error prone and maintainance nightmare.

	Jakub
Ilya Enkovich June 11, 2014, 8:44 a.m. UTC | #2
2014-06-11 12:22 GMT+04:00 Jakub Jelinek <jakub@redhat.com>:
> On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote:
>> This patch adds instrumented code support for strlen optimization.
>>
>> Bootstrapped and tested on linux-x86_64.
>>
>> Does it look OK?
>
> Ugh, this looks terribly ugly.  So you are changing the meaning of arguments
> of every single builtin that has pointer arguments, so all spots that
> work with those builtins have to take into account
> gimple_call_with_bounds_p and do something different?
> That is very error prone and maintainance nightmare.

All such builtins are had to be handled anyway. It is possible to use
new codes for instrumented builtins, but this would not change the
handling code much. It is also possible to use something like
gimple_call_nobnd_arg to use same arg numbers for both instrumented
and not instrumented calls. This would still require additional
support when we replace calls.

Ilya

>
>         Jakub
Jakub Jelinek June 11, 2014, 8:55 a.m. UTC | #3
On Wed, Jun 11, 2014 at 12:44:29PM +0400, Ilya Enkovich wrote:
> 2014-06-11 12:22 GMT+04:00 Jakub Jelinek <jakub@redhat.com>:
> > On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote:
> >> This patch adds instrumented code support for strlen optimization.
> >>
> >> Bootstrapped and tested on linux-x86_64.
> >>
> >> Does it look OK?
> >
> > Ugh, this looks terribly ugly.  So you are changing the meaning of arguments
> > of every single builtin that has pointer arguments, so all spots that
> > work with those builtins have to take into account
> > gimple_call_with_bounds_p and do something different?
> > That is very error prone and maintainance nightmare.
> 
> All such builtins are had to be handled anyway. It is possible to use
> new codes for instrumented builtins, but this would not change the
> handling code much. It is also possible to use something like
> gimple_call_nobnd_arg to use same arg numbers for both instrumented
> and not instrumented calls. This would still require additional
> support when we replace calls.

That just brings the question why is all the instrumentation introduced so
early that it affects practically all GIMPLE passes?

I mean, mudflap, also a bounded pointer implementation, could get away with
two passes and practically no changes to the rest of the compiler, asan
also has two passes and almost no changes to the rest, why does this need
to affect everything?  If that is because you need to clone functions,
which you indeed need to do at IPA time, perhaps you can just clone the
functions but not add the instrumentation to the whole IL at that point,
just note you've created a bounded pointer clone, and then just shortly
before expansion in a new pass add the instrumentation, plus hook into
expansion that it is expanded properly?

Changes all over the compiler for a rarely used feature most GCC developers
will not be able to even test is going to cause substantial maintainance
issues.

	Jakub
Marek Polacek June 11, 2014, 9 a.m. UTC | #4
On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote:
> This patch adds instrumented code support for strlen optimization.
> 
> Bootstrapped and tested on linux-x86_64.
> 
> Does it look OK?

I haven't followed the Pointer Bounds Checker development, only looked at
some of the patches, but I think that if all builtins similar to
BUILT_IN_STRCPY would need changes like in the patch below, we're
heading down a very bad road.  Adding with_bounds code to every second
builtin in so many passes to properly handle its arguments will surely
lead to unmaintainable mess over time.

	Marek
Ilya Enkovich June 11, 2014, 9:36 a.m. UTC | #5
2014-06-11 13:00 GMT+04:00 Marek Polacek <polacek@redhat.com>:
> On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote:
>> This patch adds instrumented code support for strlen optimization.
>>
>> Bootstrapped and tested on linux-x86_64.
>>
>> Does it look OK?
>
> I haven't followed the Pointer Bounds Checker development, only looked at
> some of the patches, but I think that if all builtins similar to
> BUILT_IN_STRCPY would need changes like in the patch below, we're
> heading down a very bad road.  Adding with_bounds code to every second
> builtin in so many passes to properly handle its arguments will surely
> lead to unmaintainable mess over time.

I think talking about every second builtin and many passes is too
speculative. I do not see troubles with builtins other than string
functions which are handled mostly in strlen pass. And string
functions were always a special case. They are too important to have
them not optimized or not instrumented.

In cases of troubles over time instrumentation pass may be restricted
to instrument only some subset of builtin functions. It is also
possible to use different codes for instrumented builtins and thus
have only adjusted code to work with them.

Ilya

>
>         Marek
Ilya Enkovich June 11, 2014, 9:36 a.m. UTC | #6
2014-06-11 12:55 GMT+04:00 Jakub Jelinek <jakub@redhat.com>:
> On Wed, Jun 11, 2014 at 12:44:29PM +0400, Ilya Enkovich wrote:
>> 2014-06-11 12:22 GMT+04:00 Jakub Jelinek <jakub@redhat.com>:
>> > On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote:
>> >> This patch adds instrumented code support for strlen optimization.
>> >>
>> >> Bootstrapped and tested on linux-x86_64.
>> >>
>> >> Does it look OK?
>> >
>> > Ugh, this looks terribly ugly.  So you are changing the meaning of arguments
>> > of every single builtin that has pointer arguments, so all spots that
>> > work with those builtins have to take into account
>> > gimple_call_with_bounds_p and do something different?
>> > That is very error prone and maintainance nightmare.
>>
>> All such builtins are had to be handled anyway. It is possible to use
>> new codes for instrumented builtins, but this would not change the
>> handling code much. It is also possible to use something like
>> gimple_call_nobnd_arg to use same arg numbers for both instrumented
>> and not instrumented calls. This would still require additional
>> support when we replace calls.
>
> That just brings the question why is all the instrumentation introduced so
> early that it affects practically all GIMPLE passes?
>
> I mean, mudflap, also a bounded pointer implementation, could get away with
> two passes and practically no changes to the rest of the compiler, asan
> also has two passes and almost no changes to the rest, why does this need
> to affect everything?  If that is because you need to clone functions,
> which you indeed need to do at IPA time, perhaps you can just clone the
> functions but not add the instrumentation to the whole IL at that point,
> just note you've created a bounded pointer clone, and then just shortly
> before expansion in a new pass add the instrumentation, plus hook into
> expansion that it is expanded properly?

Mudflap did not work with pointer bounds. It maintained a database of
valid objects and checked each dereferenced pointer points to a valid
object. Asan does similar thing but via shadow map. When we work with
bounds it becomes more important which pointer is used for memory
access and it forces early instrumentation before optimizations
corrupt this information.

Ilya

>
> Changes all over the compiler for a rarely used feature most GCC developers
> will not be able to even test is going to cause substantial maintainance
> issues.
>
>         Jakub
Ilya Enkovich July 31, 2014, 10:07 a.m. UTC | #7
2014-06-11 12:22 GMT+04:00 Jakub Jelinek <jakub@redhat.com>:
> On Wed, Jun 11, 2014 at 12:14:14PM +0400, Ilya Enkovich wrote:
>> This patch adds instrumented code support for strlen optimization.
>>
>> Bootstrapped and tested on linux-x86_64.
>>
>> Does it look OK?
>
> Ugh, this looks terribly ugly.  So you are changing the meaning of arguments
> of every single builtin that has pointer arguments, so all spots that
> work with those builtins have to take into account
> gimple_call_with_bounds_p and do something different?
> That is very error prone and maintainance nightmare.
>
>         Jakub

Hi Jakub,

I've tried an approach with internal functions usage as you suggested
on Cauldron.

Shortly about used scheme:
  - only a selected set of builtin functions is instrumented
  - during instrumentation builtin call is replaced with internal function call

I tried to implement it and faced few issues:

1. On expand pass we need to expand all those internal function and
since they should be expanded as regular instrumented calls, I need to
create a CALL_EXPR tree for that. I have nothing to use as fndecl for
this call except the same instrumented builtin fndecl I use right now.
Therefore the problem of instrumented call with the same builtin
function code would still exist during expand pass.
2. Some builtins actually may have body. If I instrument such call and
replace it with internal call then I loose a possibility to inline it.
If I do not replace it with internal call then I have to make an
instrumented call and again I have to create and use instrumented
builtin fndecl having the same situation I have now but this time only
until call is inlined.
3. Usage of internal functions means we cannot support whole program
instrumentation (or we have to declare internal function for each
builtin) which may appear a very useful scheme with reduced memory and
performance penalties.

Now I want to try another scheme. I want to change enum
built_in_function so that each builtin code has a paired code to be
used for instrumented version. I'm not going to actually declare
additional builtins for users, I just want to obtain special function
codes to be used for instrumented builtins and thus to not make
current code handling builtins to work with instrumented calls.

Ilya
diff mbox

Patch

diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index f55b7ee..959b376 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -50,6 +50,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "params.h"
 #include "expr.h"
+#include "tree-chkp.h"
 
 /* A vector indexed by SSA_NAME_VERSION.  0 means unknown, positive value
    is an index into strinfo vector, negative value stands for
@@ -426,6 +427,7 @@  get_string_length (strinfo si)
   if (si->stmt)
     {
       gimple stmt = si->stmt, lenstmt;
+      bool with_bounds = gimple_call_with_bounds_p (stmt);
       tree callee, lhs, fn, tem;
       location_t loc;
       gimple_stmt_iterator gsi;
@@ -449,7 +451,14 @@  get_string_length (strinfo si)
 	  fn = builtin_decl_implicit (BUILT_IN_STRLEN);
 	  gcc_assert (lhs == NULL_TREE);
 	  tem = unshare_expr (gimple_call_arg (stmt, 0));
-	  lenstmt = gimple_build_call (fn, 1, tem);
+	  if (with_bounds)
+	    {
+	      lenstmt = gimple_build_call (chkp_maybe_create_clone (fn)->decl,
+					   2, tem, gimple_call_arg (stmt, 1));
+	      gimple_call_set_with_bounds (lenstmt, true);
+	    }
+	  else
+	    lenstmt = gimple_build_call (fn, 1, tem);
 	  lhs = make_ssa_name (TREE_TYPE (TREE_TYPE (fn)), lenstmt);
 	  gimple_call_set_lhs (lenstmt, lhs);
 	  gimple_set_vuse (lenstmt, gimple_vuse (stmt));
@@ -472,10 +481,12 @@  get_string_length (strinfo si)
 	  /* FALLTHRU */
 	case BUILT_IN_STRCPY:
 	case BUILT_IN_STRCPY_CHK:
-	  if (gimple_call_num_args (stmt) == 2)
+	  if (gimple_call_num_args (stmt) == (with_bounds ? 4 : 2))
 	    fn = builtin_decl_implicit (BUILT_IN_STPCPY);
 	  else
 	    fn = builtin_decl_explicit (BUILT_IN_STPCPY_CHK);
+	  if (with_bounds)
+	    fn = chkp_maybe_create_clone (fn)->decl;
 	  gcc_assert (lhs == NULL_TREE);
 	  if (dump_file && (dump_flags & TDF_DETAILS) != 0)
 	    {
@@ -780,6 +791,7 @@  adjust_last_stmt (strinfo si, gimple stmt, bool is_strcat)
   tree vuse, callee, len;
   struct laststmt_struct last = laststmt;
   strinfo lastsi, firstsi;
+  unsigned len_arg_no = 2;
 
   laststmt.stmt = NULL;
   laststmt.len = NULL_TREE;
@@ -855,7 +867,9 @@  adjust_last_stmt (strinfo si, gimple stmt, bool is_strcat)
       return;
     }
 
-  len = gimple_call_arg (last.stmt, 2);
+  if (gimple_call_with_bounds_p (last.stmt))
+    len_arg_no = 4;
+  len = gimple_call_arg (last.stmt, len_arg_no);
   if (tree_fits_uhwi_p (len))
     {
       if (!tree_fits_uhwi_p (last.len)
@@ -879,7 +893,7 @@  adjust_last_stmt (strinfo si, gimple stmt, bool is_strcat)
   else
     return;
 
-  gimple_call_set_arg (last.stmt, 2, last.len);
+  gimple_call_set_arg (last.stmt, len_arg_no, last.len);
   update_stmt (last.stmt);
 }
 
@@ -970,11 +984,12 @@  handle_builtin_strchr (gimple_stmt_iterator *gsi)
   tree src;
   gimple stmt = gsi_stmt (*gsi);
   tree lhs = gimple_call_lhs (stmt);
+  bool with_bounds = gimple_call_with_bounds_p (stmt);
 
   if (lhs == NULL_TREE)
     return;
 
-  if (!integer_zerop (gimple_call_arg (stmt, 1)))
+  if (!integer_zerop (gimple_call_arg (stmt, with_bounds ? 2 : 1)))
     return;
 
   src = gimple_call_arg (stmt, 0);
@@ -1081,8 +1096,9 @@  handle_builtin_strcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi)
   gimple stmt = gsi_stmt (*gsi);
   strinfo si, dsi, olddsi, zsi;
   location_t loc;
+  bool with_bounds = gimple_call_with_bounds_p (stmt);
 
-  src = gimple_call_arg (stmt, 1);
+  src = gimple_call_arg (stmt, with_bounds ? 2 : 1);
   dst = gimple_call_arg (stmt, 0);
   lhs = gimple_call_lhs (stmt);
   idx = get_stridx (src);
@@ -1268,14 +1284,33 @@  handle_builtin_strcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi)
       fprintf (dump_file, "Optimizing: ");
       print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
     }
-  if (gimple_call_num_args (stmt) == 2)
-    success = update_gimple_call (gsi, fn, 3, dst, src, len);
+  if (with_bounds)
+    {
+      fn = chkp_maybe_create_clone (fn)->decl;
+      if (gimple_call_num_args (stmt) == 4)
+	success = update_gimple_call (gsi, fn, 5, dst,
+				      gimple_call_arg (stmt, 1),
+				      src,
+				      gimple_call_arg (stmt, 3),
+				      len);
+      else
+	success = update_gimple_call (gsi, fn, 6, dst,
+				      gimple_call_arg (stmt, 1),
+				      src,
+				      gimple_call_arg (stmt, 3),
+				      len,
+				      gimple_call_arg (stmt, 4));
+    }
   else
-    success = update_gimple_call (gsi, fn, 4, dst, src, len,
-				  gimple_call_arg (stmt, 2));
+    if (gimple_call_num_args (stmt) == 2)
+      success = update_gimple_call (gsi, fn, 3, dst, src, len);
+    else
+      success = update_gimple_call (gsi, fn, 4, dst, src, len,
+				    gimple_call_arg (stmt, 2));
   if (success)
     {
       stmt = gsi_stmt (*gsi);
+      gimple_call_set_with_bounds (stmt, with_bounds);
       update_stmt (stmt);
       if (dump_file && (dump_flags & TDF_DETAILS) != 0)
 	{
@@ -1303,9 +1338,10 @@  handle_builtin_memcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi)
   tree src, dst, len, lhs, oldlen, newlen;
   gimple stmt = gsi_stmt (*gsi);
   strinfo si, dsi, olddsi;
+  bool with_bounds = gimple_call_with_bounds_p (stmt);
 
-  len = gimple_call_arg (stmt, 2);
-  src = gimple_call_arg (stmt, 1);
+  len = gimple_call_arg (stmt, with_bounds ? 4 : 2);
+  src = gimple_call_arg (stmt, with_bounds ? 2 : 1);
   dst = gimple_call_arg (stmt, 0);
   idx = get_stridx (src);
   if (idx == 0)
@@ -1442,8 +1478,9 @@  handle_builtin_strcat (enum built_in_function bcode, gimple_stmt_iterator *gsi)
   gimple stmt = gsi_stmt (*gsi);
   strinfo si, dsi;
   location_t loc;
+  bool with_bounds = gimple_call_with_bounds_p (stmt);
 
-  src = gimple_call_arg (stmt, 1);
+  src = gimple_call_arg (stmt, with_bounds ? 2 : 1);
   dst = gimple_call_arg (stmt, 0);
   lhs = gimple_call_lhs (stmt);
 
@@ -1549,7 +1586,7 @@  handle_builtin_strcat (enum built_in_function bcode, gimple_stmt_iterator *gsi)
 	fn = builtin_decl_explicit (BUILT_IN_MEMCPY_CHK);
       else
 	fn = builtin_decl_explicit (BUILT_IN_STRCPY_CHK);
-      objsz = gimple_call_arg (stmt, 2);
+      objsz = gimple_call_arg (stmt, with_bounds ? 4 : 2);
       break;
     default:
       gcc_unreachable ();
@@ -1584,15 +1621,35 @@  handle_builtin_strcat (enum built_in_function bcode, gimple_stmt_iterator *gsi)
       fprintf (dump_file, "Optimizing: ");
       print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
     }
-  if (srclen != NULL_TREE)
-    success = update_gimple_call (gsi, fn, 3 + (objsz != NULL_TREE),
-				  dst, src, len, objsz);
+  if (with_bounds)
+    {
+      fn = chkp_maybe_create_clone (fn)->decl;
+      if (srclen != NULL_TREE)
+	success = update_gimple_call (gsi, fn, 5 + (objsz != NULL_TREE),
+				      dst,
+				      gimple_call_arg (stmt, 1),
+				      src,
+				      gimple_call_arg (stmt, 3),
+				      len, objsz);
+      else
+	success = update_gimple_call (gsi, fn, 4 + (objsz != NULL_TREE),
+				      dst,
+				      gimple_call_arg (stmt, 1),
+				      src,
+				      gimple_call_arg (stmt, 3),
+				      objsz);
+    }
   else
-    success = update_gimple_call (gsi, fn, 2 + (objsz != NULL_TREE),
-				  dst, src, objsz);
+    if (srclen != NULL_TREE)
+      success = update_gimple_call (gsi, fn, 3 + (objsz != NULL_TREE),
+				    dst, src, len, objsz);
+    else
+      success = update_gimple_call (gsi, fn, 2 + (objsz != NULL_TREE),
+				    dst, src, objsz);
   if (success)
     {
       stmt = gsi_stmt (*gsi);
+      gimple_call_set_with_bounds (stmt, with_bounds);
       update_stmt (stmt);
       if (dump_file && (dump_flags & TDF_DETAILS) != 0)
 	{