diff mbox series

Adjust c_getstr/c_strlen to new STRING_CST semantic

Message ID VI1PR0701MB28627F9F56CECE73AD5F6B2CE40F0@VI1PR0701MB2862.eurprd07.prod.outlook.com
State New
Headers show
Series Adjust c_getstr/c_strlen to new STRING_CST semantic | expand

Commit Message

Bernd Edlinger Aug. 31, 2018, 7:08 a.m. UTC
Hi,


when re-testing the new STRING_CST semantic patch here:
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01569.html

I noticed that the (my) fix for PR 87053 does still use
semantic properties of the STRING_CST that is not compatible
with the new proposed STRING_CST semantics.

That means that c_strlen needs to handle the case
where strings are not zero terminated.

When this is fixed, fortran.dg/pr45636.f90 starts to regress,
because the check in gimple_fold_builtin_memory_op here:

       if (tree_fits_uhwi_p (len)
           && compare_tree_int (len, MOVE_MAX) <= 0
           /* ???  Don't transform copies from strings with known length this
              confuses the tree-ssa-strlen.c.  This doesn't handle
              the case in gcc.dg/strlenopt-8.c which is XFAILed for that
              reason.  */
           && !c_strlen (src, 2))
  
does now return NULL_TREE, because the fortran string is not null terminated.
However that allows the folding which makes the fortran test case fail.

I fixed that by pulling in the c_getstr patch again, and use
it to make another exception for string constants without any embedded NUL.
That is what tree-ssa-forwprop.c can handle, and should make this work in
fortran again.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

Comments

Jeff Law Aug. 31, 2018, 2:42 p.m. UTC | #1
On 08/31/2018 01:08 AM, Bernd Edlinger wrote:
> Hi,
> 
> 
> when re-testing the new STRING_CST semantic patch here:
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01569.html
> 
> I noticed that the (my) fix for PR 87053 does still use
> semantic properties of the STRING_CST that is not compatible
> with the new proposed STRING_CST semantics.
> 
> That means that c_strlen needs to handle the case
> where strings are not zero terminated.
> 
> When this is fixed, fortran.dg/pr45636.f90 starts to regress,
> because the check in gimple_fold_builtin_memory_op here:
> 
>        if (tree_fits_uhwi_p (len)
>            && compare_tree_int (len, MOVE_MAX) <= 0
>            /* ???  Don't transform copies from strings with known length this
>               confuses the tree-ssa-strlen.c.  This doesn't handle
>               the case in gcc.dg/strlenopt-8.c which is XFAILed for that
>               reason.  */
>            && !c_strlen (src, 2))
>   
> does now return NULL_TREE, because the fortran string is not null terminated.
> However that allows the folding which makes the fortran test case fail.
> 
> I fixed that by pulling in the c_getstr patch again, and use
> it to make another exception for string constants without any embedded NUL.
> That is what tree-ssa-forwprop.c can handle, and should make this work in
> fortran again.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
I've gone the rounds on pr45636 a couple times and it's part of the
reason why there's a FIXME in the bits I committed earlier this week.

I'll look at this closely in conjunction with the (unposted) patch which
addresses the FIXME.

Jeff
Bernd Edlinger Aug. 31, 2018, 5:45 p.m. UTC | #2
On 08/31/18 16:42, Jeff Law wrote:
> On 08/31/2018 01:08 AM, Bernd Edlinger wrote:
>> Hi,
>>
>>
>> when re-testing the new STRING_CST semantic patch here:
>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01569.html
>>
>> I noticed that the (my) fix for PR 87053 does still use
>> semantic properties of the STRING_CST that is not compatible
>> with the new proposed STRING_CST semantics.
>>
>> That means that c_strlen needs to handle the case
>> where strings are not zero terminated.
>>
>> When this is fixed, fortran.dg/pr45636.f90 starts to regress,
>> because the check in gimple_fold_builtin_memory_op here:
>>
>>         if (tree_fits_uhwi_p (len)
>>             && compare_tree_int (len, MOVE_MAX) <= 0
>>             /* ???  Don't transform copies from strings with known length this
>>                confuses the tree-ssa-strlen.c.  This doesn't handle
>>                the case in gcc.dg/strlenopt-8.c which is XFAILed for that
>>                reason.  */
>>             && !c_strlen (src, 2))
>>    
>> does now return NULL_TREE, because the fortran string is not null terminated.
>> However that allows the folding which makes the fortran test case fail.
>>
>> I fixed that by pulling in the c_getstr patch again, and use
>> it to make another exception for string constants without any embedded NUL.
>> That is what tree-ssa-forwprop.c can handle, and should make this work in
>> fortran again.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> I've gone the rounds on pr45636 a couple times and it's part of the
> reason why there's a FIXME in the bits I committed earlier this week.
> 

Yes, the rationale here is that tree-ssa-forwprop will likely choke if there
is a NUL byte in the string:

           /* Neither builtin_strncpy_read_str nor builtin_memcpy_read_str
              handle embedded '\0's.  */
           if (strlen (src_buf) != src_len)
             break;

Prior to this the c_strlen (src, 2) returned 2, thus this folding was not done,
but since the string does not contain any NUL bytes this returns now NULL.
However it is easy to add an exception that triggers only in a fortran string
in this way.

> I'll look at this closely in conjunction with the (unposted) patch which
> addresses the FIXME.
> 
> Jeff
> 

Bernd.
Bernd Edlinger Sept. 13, 2018, 1:30 p.m. UTC | #3
On 08/31/18 19:45, Bernd Edlinger wrote:
> On 08/31/18 16:42, Jeff Law wrote:
>> On 08/31/2018 01:08 AM, Bernd Edlinger wrote:
>>> Hi,
>>>
>>>
>>> when re-testing the new STRING_CST semantic patch here:
>>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01569.html
>>>
>>> I noticed that the (my) fix for PR 87053 does still use
>>> semantic properties of the STRING_CST that is not compatible
>>> with the new proposed STRING_CST semantics.
>>>
>>> That means that c_strlen needs to handle the case
>>> where strings are not zero terminated.
>>>
>>> When this is fixed, fortran.dg/pr45636.f90 starts to regress,
>>> because the check in gimple_fold_builtin_memory_op here:
>>>
>>>         if (tree_fits_uhwi_p (len)
>>>             && compare_tree_int (len, MOVE_MAX) <= 0
>>>             /* ???  Don't transform copies from strings with known length this
>>>                confuses the tree-ssa-strlen.c.  This doesn't handle
>>>                the case in gcc.dg/strlenopt-8.c which is XFAILed for that
>>>                reason.  */
>>>             && !c_strlen (src, 2))
>>> does now return NULL_TREE, because the fortran string is not null terminated.
>>> However that allows the folding which makes the fortran test case fail.
>>>
>>> I fixed that by pulling in the c_getstr patch again, and use
>>> it to make another exception for string constants without any embedded NUL.
>>> That is what tree-ssa-forwprop.c can handle, and should make this work in
>>> fortran again.
>>>
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>> I've gone the rounds on pr45636 a couple times and it's part of the
>> reason why there's a FIXME in the bits I committed earlier this week.
>>
> 
> Yes, the rationale here is that tree-ssa-forwprop will likely choke if there
> is a NUL byte in the string:
> 
>            /* Neither builtin_strncpy_read_str nor builtin_memcpy_read_str
>               handle embedded '\0's.  */
>            if (strlen (src_buf) != src_len)
>              break;
> 
> Prior to this the c_strlen (src, 2) returned 2, thus this folding was not done,
> but since the string does not contain any NUL bytes this returns now NULL.
> However it is easy to add an exception that triggers only in a fortran string
> in this way.
> 
>> I'll look at this closely in conjunction with the (unposted) patch which
>> addresses the FIXME.
>>
>> Jeff
>>
> 

Hi,

this is a minor update to the previous patch version, in that it adds
the following to c_getstr, in order to be bootstrapped with or without
the other STRING_CST-v2 semantic patches:

@@ -14611,6 +14611,10 @@ c_getstr (tree src, unsigned HOST_WIDE_I
    unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
    unsigned HOST_WIDE_INT string_size = tree_to_uhwi (mem_size);
  
+  /* Ideally this would turn into a gcc_checking_assert over time.  */
+  if (string_length > string_size)
+    string_length = string_size;
+
    const char *string = TREE_STRING_POINTER (src);
  
    if (string_length == 0


So this patch can be boot-strapped alone,
or together with the following patches:

[PATCHv2] Handle overlength strings in the C FE
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01566.html

[PATCHv2] Handle overlength strings in C++ FE
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01567.html
=> Apporoved, without the part in vtable-class-hierarchy.c (!)

[PATCHv2] Handle overlength string literals in the fortan FE
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01568.html
=> Approved.

[PATCH] Check the STRING_CSTs in varasm.c
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01569.html




Bernd.
2018-08-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* builtins.c (c_strlen): Handle not zero terminated STRING_CSTs
	correctly.
	* fold-const.c (c_getstr): Fix function comment.  Remove unused third
	argument.  Fix range checks.
	* fold-const.h (c_getstr): Adjust protoype.
	* gimple-fold.c (gimple_fold_builtin_memory_op): Avoid folding when
	string is constant but contains no NUL byte.

diff -Npur gcc/builtins.c gcc/builtins.c
--- gcc/builtins.c	2018-08-30 08:21:13.000000000 +0200
+++ gcc/builtins.c	2018-08-30 21:46:11.155211333 +0200
@@ -604,12 +604,12 @@ c_strlen (tree src, int only_value, unsi
      In that case, the elements of the array after the terminating NUL are
      all NUL.  */
   HOST_WIDE_INT strelts = TREE_STRING_LENGTH (src);
-  strelts = strelts / eltsize - 1;
+  strelts = strelts / eltsize;
 
   if (!tree_fits_uhwi_p (memsize))
     return NULL_TREE;
 
-  HOST_WIDE_INT maxelts = tree_to_uhwi (memsize) / eltsize - 1;
+  HOST_WIDE_INT maxelts = tree_to_uhwi (memsize) / eltsize;
 
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
@@ -617,10 +617,6 @@ c_strlen (tree src, int only_value, unsi
 
   if (byteoff && TREE_CODE (byteoff) != INTEGER_CST)
     {
-      /* For empty strings the result should be zero.  */
-      if (maxelts == 0)
-	return ssize_int (0);
-
       /* The code below works only for single byte character types.  */
       if (eltsize != 1)
 	return NULL_TREE;
@@ -632,9 +628,13 @@ c_strlen (tree src, int only_value, unsi
       unsigned len = string_length (ptr, eltsize, strelts);
 
       /* Return when an embedded null character is found or none at all.  */
-      if (len < strelts || len > maxelts)
+      if (len + 1 < strelts || len >= maxelts)
 	return NULL_TREE;
 
+      /* For empty strings the result should be zero.  */
+      if (len == 0)
+	return ssize_int (0);
+
       /* We don't know the starting offset, but we do know that the string
 	 has no internal zero bytes.  If the offset falls within the bounds
 	 of the string subtract the offset from the length of the string,
@@ -644,7 +644,7 @@ c_strlen (tree src, int only_value, unsi
       offsave = fold_convert (ssizetype, offsave);
       tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
 				      build_int_cst (ssizetype, len));
-      tree lenexp = size_diffop_loc (loc, ssize_int (strelts), offsave);
+      tree lenexp = size_diffop_loc (loc, ssize_int (len), offsave);
       return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
 			      build_zero_cst (ssizetype));
     }
@@ -663,7 +663,7 @@ c_strlen (tree src, int only_value, unsi
 
   /* If the offset is known to be out of bounds, warn, and call strlen at
      runtime.  */
-  if (eltoff < 0 || eltoff > maxelts)
+  if (eltoff < 0 || eltoff >= maxelts)
     {
      /* Suppress multiple warnings for propagated constant strings.  */
       if (only_value != 2
@@ -691,9 +691,9 @@ c_strlen (tree src, int only_value, unsi
   unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
 				strelts - eltoff);
 
-  /* Don't know what to return if there was no zero termination. 
+  /* Don't know what to return if there was no zero termination.
      Ideally this would turn into a gcc_checking_assert over time.  */
-  if (len > maxelts - eltoff)
+  if (len >= maxelts - eltoff)
     return NULL_TREE;
 
   return ssize_int (len);
diff -Npur gcc/fold-const.c gcc/fold-const.c
--- gcc/fold-const.c	2018-08-30 08:21:13.000000000 +0200
+++ gcc/fold-const.c	2018-08-30 21:03:12.612460165 +0200
@@ -14576,23 +14576,20 @@ fold_build_pointer_plus_hwi_loc (locatio
 /* Return a pointer P to a NUL-terminated string representing the sequence
    of constant characters referred to by SRC (or a subsequence of such
    characters within it if SRC is a reference to a string plus some
-   constant offset).  If STRLEN is non-null, store stgrlen(P) in *STRLEN.
-   If STRSIZE is non-null, store in *STRSIZE the size of the array
-   the string is stored in; in that case, even though P points to a NUL
-   terminated string, SRC need not refer to one.  This can happen when
-   SRC refers to a constant character array initialized to all non-NUL
-   values, as in the C declaration: char a[4] = "1234";  */
+   constant offset).  If STRLEN is non-null, store the number of bytes
+   in the string constant including the terminating NUL char.  *STRLEN is
+   typically strlen(P) + 1 in the absence of embedded NUL characters.  */
 
 const char *
-c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
-	  unsigned HOST_WIDE_INT *strsize /* = NULL */)
+c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */)
 {
   tree offset_node;
+  tree mem_size;
 
   if (strlen)
     *strlen = 0;
 
-  src = string_constant (src, &offset_node, NULL, NULL);
+  src = string_constant (src, &offset_node, &mem_size, NULL);
   if (src == 0)
     return NULL;
 
@@ -14605,25 +14602,18 @@ c_getstr (tree src, unsigned HOST_WIDE_I
 	offset = tree_to_uhwi (offset_node);
     }
 
+  if (!tree_fits_uhwi_p (mem_size))
+    return NULL;
+
   /* STRING_LENGTH is the size of the string literal, including any
      embedded NULs.  STRING_SIZE is the size of the array the string
      literal is stored in.  */
   unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
-  unsigned HOST_WIDE_INT string_size = string_length;
-  tree type = TREE_TYPE (src);
-  if (tree size = TYPE_SIZE_UNIT (type))
-    if (tree_fits_shwi_p (size))
-      string_size = tree_to_uhwi (size);
+  unsigned HOST_WIDE_INT string_size = tree_to_uhwi (mem_size);
 
-  if (strlen)
-    {
-      /* Compute and store the length of the substring at OFFSET.
-	 All offsets past the initial length refer to null strings.  */
-      if (offset <= string_length)
-	*strlen = string_length - offset;
-      else
-	*strlen = 0;
-    }
+  /* Ideally this would turn into a gcc_checking_assert over time.  */
+  if (string_length > string_size)
+    string_length = string_size;
 
   const char *string = TREE_STRING_POINTER (src);
 
@@ -14631,21 +14621,26 @@ c_getstr (tree src, unsigned HOST_WIDE_I
       || offset >= string_size)
     return NULL;
 
-  if (strsize)
+  if (strlen)
     {
-      /* Support even constant character arrays that aren't proper
-	 NUL-terminated strings.  */
-      *strsize = string_size;
+      /* Compute and store the length of the substring at OFFSET.
+	 All offsets past the initial length refer to null strings.  */
+      if (offset < string_length)
+	*strlen = string_length - offset;
+      else
+	*strlen = 1;
     }
-  else if (string[string_length - 1] != '\0')
+  else
     {
-      /* Support only properly NUL-terminated strings but handle
-	 consecutive strings within the same array, such as the six
-	 substrings in "1\0002\0003".  */
-      return NULL;
+      tree eltype = TREE_TYPE (TREE_TYPE (src));
+      /* Support only properly NUL-terminated single byte strings.  */
+      if (tree_to_uhwi (TYPE_SIZE_UNIT (eltype)) != 1)
+	return NULL;
+      if (string[string_length - 1] != '\0')
+	return NULL;
     }
 
-  return offset <= string_length ? string + offset : "";
+  return offset < string_length ? string + offset : "";
 }
 
 /* Given a tree T, compute which bits in T may be nonzero.  */
diff -Npur gcc/fold-const.h gcc/fold-const.h
--- gcc/fold-const.h	2018-08-30 08:20:44.000000000 +0200
+++ gcc/fold-const.h	2018-08-30 20:58:54.154101370 +0200
@@ -187,8 +187,7 @@ extern bool expr_not_equal_to (tree t, c
 extern tree const_unop (enum tree_code, tree, tree);
 extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (combined_fn);
-extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL,
-			     unsigned HOST_WIDE_INT * = NULL);
+extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL);
 extern wide_int tree_nonzero_bits (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
diff -Npur gcc/gimple-fold.c gcc/gimple-fold.c
--- gcc/gimple-fold.c	2018-08-28 15:44:49.000000000 +0200
+++ gcc/gimple-fold.c	2018-08-31 00:57:50.947845988 +0200
@@ -725,6 +725,8 @@ gimple_fold_builtin_memory_op (gimple_st
       tree srctype, desttype;
       unsigned int src_align, dest_align;
       tree off0;
+      const char *tmp_str;
+      unsigned HOST_WIDE_INT tmp_len;
 
       /* Build accesses at offset zero with a ref-all character type.  */
       off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
@@ -742,7 +744,9 @@ gimple_fold_builtin_memory_op (gimple_st
 	     confuses the tree-ssa-strlen.c.  This doesn't handle
 	     the case in gcc.dg/strlenopt-8.c which is XFAILed for that
 	     reason.  */
-	  && !c_strlen (src, 2))
+	  && !c_strlen (src, 2)
+	  && !((tmp_str = c_getstr (src, &tmp_len)) != NULL
+	       && memchr (tmp_str, 0, tmp_len) == NULL))
 	{
 	  unsigned ilen = tree_to_uhwi (len);
 	  if (pow2p_hwi (ilen))
Jeff Law Sept. 13, 2018, 10:02 p.m. UTC | #4
On 9/13/18 7:30 AM, Bernd Edlinger wrote:
> On 08/31/18 19:45, Bernd Edlinger wrote:
>> On 08/31/18 16:42, Jeff Law wrote:
>>> On 08/31/2018 01:08 AM, Bernd Edlinger wrote:
>>>> Hi,
>>>>
>>>>
>>>> when re-testing the new STRING_CST semantic patch here:
>>>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01569.html
>>>>
>>>> I noticed that the (my) fix for PR 87053 does still use
>>>> semantic properties of the STRING_CST that is not compatible
>>>> with the new proposed STRING_CST semantics.
>>>>
>>>> That means that c_strlen needs to handle the case
>>>> where strings are not zero terminated.
>>>>
>>>> When this is fixed, fortran.dg/pr45636.f90 starts to regress,
>>>> because the check in gimple_fold_builtin_memory_op here:
>>>>
>>>>         if (tree_fits_uhwi_p (len)
>>>>             && compare_tree_int (len, MOVE_MAX) <= 0
>>>>             /* ???  Don't transform copies from strings with known length this
>>>>                confuses the tree-ssa-strlen.c.  This doesn't handle
>>>>                the case in gcc.dg/strlenopt-8.c which is XFAILed for that
>>>>                reason.  */
>>>>             && !c_strlen (src, 2))
>>>> does now return NULL_TREE, because the fortran string is not null terminated.
>>>> However that allows the folding which makes the fortran test case fail.
>>>>
>>>> I fixed that by pulling in the c_getstr patch again, and use
>>>> it to make another exception for string constants without any embedded NUL.
>>>> That is what tree-ssa-forwprop.c can handle, and should make this work in
>>>> fortran again.
>>>>
>>>>
>>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>>> Is it OK for trunk?
>>> I've gone the rounds on pr45636 a couple times and it's part of the
>>> reason why there's a FIXME in the bits I committed earlier this week.
>>>
>>
>> Yes, the rationale here is that tree-ssa-forwprop will likely choke if there
>> is a NUL byte in the string:
>>
>>            /* Neither builtin_strncpy_read_str nor builtin_memcpy_read_str
>>               handle embedded '\0's.  */
>>            if (strlen (src_buf) != src_len)
>>              break;
>>
>> Prior to this the c_strlen (src, 2) returned 2, thus this folding was not done,
>> but since the string does not contain any NUL bytes this returns now NULL.
>> However it is easy to add an exception that triggers only in a fortran string
>> in this way.
>>
>>> I'll look at this closely in conjunction with the (unposted) patch which
>>> addresses the FIXME.
>>>
>>> Jeff
>>>
>>
> 
> Hi,
> 
> this is a minor update to the previous patch version, in that it adds
> the following to c_getstr, in order to be bootstrapped with or without
> the other STRING_CST-v2 semantic patches:
> 
> @@ -14611,6 +14611,10 @@ c_getstr (tree src, unsigned HOST_WIDE_I
>     unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
>     unsigned HOST_WIDE_INT string_size = tree_to_uhwi (mem_size);
>   
> +  /* Ideally this would turn into a gcc_checking_assert over time.  */
> +  if (string_length > string_size)
> +    string_length = string_size;> +
>     const char *string = TREE_STRING_POINTER (src);
>   
>     if (string_length == 0
I've bootstrapped and regression tested this along with the other
patches in the kit to update STRING_CST semantics.   This re-fixes the
pr87053 regression.  Installed on the trunk.

Jeff
Jeff Law Sept. 14, 2018, 1:49 a.m. UTC | #5
On 9/13/18 7:30 AM, Bernd Edlinger wrote:
> On 08/31/18 19:45, Bernd Edlinger wrote:
>> On 08/31/18 16:42, Jeff Law wrote:
>>> On 08/31/2018 01:08 AM, Bernd Edlinger wrote:
>>>> Hi,
>>>>
>>>>
>>>> when re-testing the new STRING_CST semantic patch here:
>>>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01569.html
>>>>
>>>> I noticed that the (my) fix for PR 87053 does still use
>>>> semantic properties of the STRING_CST that is not compatible
>>>> with the new proposed STRING_CST semantics.
>>>>
>>>> That means that c_strlen needs to handle the case
>>>> where strings are not zero terminated.
>>>>
>>>> When this is fixed, fortran.dg/pr45636.f90 starts to regress,
>>>> because the check in gimple_fold_builtin_memory_op here:
>>>>
>>>>         if (tree_fits_uhwi_p (len)
>>>>             && compare_tree_int (len, MOVE_MAX) <= 0
>>>>             /* ???  Don't transform copies from strings with known length this
>>>>                confuses the tree-ssa-strlen.c.  This doesn't handle
>>>>                the case in gcc.dg/strlenopt-8.c which is XFAILed for that
>>>>                reason.  */
>>>>             && !c_strlen (src, 2))
>>>> does now return NULL_TREE, because the fortran string is not null terminated.
>>>> However that allows the folding which makes the fortran test case fail.
>>>>
>>>> I fixed that by pulling in the c_getstr patch again, and use
>>>> it to make another exception for string constants without any embedded NUL.
>>>> That is what tree-ssa-forwprop.c can handle, and should make this work in
>>>> fortran again.
>>>>
>>>>
>>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>>> Is it OK for trunk?
>>> I've gone the rounds on pr45636 a couple times and it's part of the
>>> reason why there's a FIXME in the bits I committed earlier this week.
>>>
>> Yes, the rationale here is that tree-ssa-forwprop will likely choke if there
>> is a NUL byte in the string:
>>
>>            /* Neither builtin_strncpy_read_str nor builtin_memcpy_read_str
>>               handle embedded '\0's.  */
>>            if (strlen (src_buf) != src_len)
>>              break;
>>
>> Prior to this the c_strlen (src, 2) returned 2, thus this folding was not done,
>> but since the string does not contain any NUL bytes this returns now NULL.
>> However it is easy to add an exception that triggers only in a fortran string
>> in this way.
>>
>>> I'll look at this closely in conjunction with the (unposted) patch which
>>> addresses the FIXME.
>>>
>>> Jeff
>>>
> Hi,
> 
> this is a minor update to the previous patch version, in that it adds
> the following to c_getstr, in order to be bootstrapped with or without
> the other STRING_CST-v2 semantic patches:
> 
> @@ -14611,6 +14611,10 @@ c_getstr (tree src, unsigned HOST_WIDE_I
>     unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
>     unsigned HOST_WIDE_INT string_size = tree_to_uhwi (mem_size);
>   
> +  /* Ideally this would turn into a gcc_checking_assert over time.  */
> +  if (string_length > string_size)
> +    string_length = string_size;
> +
>     const char *string = TREE_STRING_POINTER (src);
>   
>     if (string_length == 0
> 
> 
> So this patch can be boot-strapped alone,
> or together with the following patches:
> 
> [PATCHv2] Handle overlength strings in the C FE
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01566.html
> 
> [PATCHv2] Handle overlength strings in C++ FE
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01567.html
> => Apporoved, without the part in vtable-class-hierarchy.c (!)
> 
> [PATCHv2] Handle overlength string literals in the fortan FE
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01568.html
> => Approved.
> 
> [PATCH] Check the STRING_CSTs in varasm.c
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01569.html
> 
> 
> 
> 
> Bernd.
> 
> 
> patch-c-strlen-v2.diff
> 
> 2018-08-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	* builtins.c (c_strlen): Handle not zero terminated STRING_CSTs
> 	correctly.
> 	* fold-const.c (c_getstr): Fix function comment.  Remove unused third
> 	argument.  Fix range checks.
> 	* fold-const.h (c_getstr): Adjust protoype.
> 	* gimple-fold.c (gimple_fold_builtin_memory_op): Avoid folding when
> 	string is constant but contains no NUL byte.
And I've committed the rest of this patch (a piece of the fold-const.c
changes were already committed).

jeff
diff mbox series

Patch

2018-08-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* builtins.c (c_strlen): Handle not zero terminated STRING_CSTs
	correctly.
	* fold-const.c (c_getstr): Fix function comment.  Remove unused third
	argument.  Fix range checks.
	* fold-const.h (c_getstr): Adjust protoype.
	* gimple-fold.c (gimple_fold_builtin_memory_op): Avoid folding when
	string is constant but contains no NUL byte.

diff -Npur gcc/builtins.c gcc/builtins.c
--- gcc/builtins.c	2018-08-30 08:21:13.000000000 +0200
+++ gcc/builtins.c	2018-08-30 21:46:11.155211333 +0200
@@ -604,12 +604,12 @@  c_strlen (tree src, int only_value, unsi
      In that case, the elements of the array after the terminating NUL are
      all NUL.  */
   HOST_WIDE_INT strelts = TREE_STRING_LENGTH (src);
-  strelts = strelts / eltsize - 1;
+  strelts = strelts / eltsize;
 
   if (!tree_fits_uhwi_p (memsize))
     return NULL_TREE;
 
-  HOST_WIDE_INT maxelts = tree_to_uhwi (memsize) / eltsize - 1;
+  HOST_WIDE_INT maxelts = tree_to_uhwi (memsize) / eltsize;
 
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
@@ -617,10 +617,6 @@  c_strlen (tree src, int only_value, unsi
 
   if (byteoff && TREE_CODE (byteoff) != INTEGER_CST)
     {
-      /* For empty strings the result should be zero.  */
-      if (maxelts == 0)
-	return ssize_int (0);
-
       /* The code below works only for single byte character types.  */
       if (eltsize != 1)
 	return NULL_TREE;
@@ -632,9 +628,13 @@  c_strlen (tree src, int only_value, unsi
       unsigned len = string_length (ptr, eltsize, strelts);
 
       /* Return when an embedded null character is found or none at all.  */
-      if (len < strelts || len > maxelts)
+      if (len + 1 < strelts || len >= maxelts)
 	return NULL_TREE;
 
+      /* For empty strings the result should be zero.  */
+      if (len == 0)
+	return ssize_int (0);
+
       /* We don't know the starting offset, but we do know that the string
 	 has no internal zero bytes.  If the offset falls within the bounds
 	 of the string subtract the offset from the length of the string,
@@ -644,7 +644,7 @@  c_strlen (tree src, int only_value, unsi
       offsave = fold_convert (ssizetype, offsave);
       tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
 				      build_int_cst (ssizetype, len));
-      tree lenexp = size_diffop_loc (loc, ssize_int (strelts), offsave);
+      tree lenexp = size_diffop_loc (loc, ssize_int (len), offsave);
       return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
 			      build_zero_cst (ssizetype));
     }
@@ -663,7 +663,7 @@  c_strlen (tree src, int only_value, unsi
 
   /* If the offset is known to be out of bounds, warn, and call strlen at
      runtime.  */
-  if (eltoff < 0 || eltoff > maxelts)
+  if (eltoff < 0 || eltoff >= maxelts)
     {
      /* Suppress multiple warnings for propagated constant strings.  */
       if (only_value != 2
@@ -691,9 +691,9 @@  c_strlen (tree src, int only_value, unsi
   unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
 				strelts - eltoff);
 
-  /* Don't know what to return if there was no zero termination. 
+  /* Don't know what to return if there was no zero termination.
      Ideally this would turn into a gcc_checking_assert over time.  */
-  if (len > maxelts - eltoff)
+  if (len >= maxelts - eltoff)
     return NULL_TREE;
 
   return ssize_int (len);
diff -Npur gcc/fold-const.c gcc/fold-const.c
--- gcc/fold-const.c	2018-08-30 08:21:13.000000000 +0200
+++ gcc/fold-const.c	2018-08-30 21:03:12.612460165 +0200
@@ -14576,23 +14576,20 @@  fold_build_pointer_plus_hwi_loc (locatio
 /* Return a pointer P to a NUL-terminated string representing the sequence
    of constant characters referred to by SRC (or a subsequence of such
    characters within it if SRC is a reference to a string plus some
-   constant offset).  If STRLEN is non-null, store stgrlen(P) in *STRLEN.
-   If STRSIZE is non-null, store in *STRSIZE the size of the array
-   the string is stored in; in that case, even though P points to a NUL
-   terminated string, SRC need not refer to one.  This can happen when
-   SRC refers to a constant character array initialized to all non-NUL
-   values, as in the C declaration: char a[4] = "1234";  */
+   constant offset).  If STRLEN is non-null, store the number of bytes
+   in the string constant including the terminating NUL char.  *STRLEN is
+   typically strlen(P) + 1 in the absence of embedded NUL characters.  */
 
 const char *
-c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
-	  unsigned HOST_WIDE_INT *strsize /* = NULL */)
+c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */)
 {
   tree offset_node;
+  tree mem_size;
 
   if (strlen)
     *strlen = 0;
 
-  src = string_constant (src, &offset_node, NULL, NULL);
+  src = string_constant (src, &offset_node, &mem_size, NULL);
   if (src == 0)
     return NULL;
 
@@ -14605,25 +14602,14 @@  c_getstr (tree src, unsigned HOST_WIDE_I
 	offset = tree_to_uhwi (offset_node);
     }
 
+  if (!tree_fits_uhwi_p (mem_size))
+    return NULL;
+
   /* STRING_LENGTH is the size of the string literal, including any
      embedded NULs.  STRING_SIZE is the size of the array the string
      literal is stored in.  */
   unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
-  unsigned HOST_WIDE_INT string_size = string_length;
-  tree type = TREE_TYPE (src);
-  if (tree size = TYPE_SIZE_UNIT (type))
-    if (tree_fits_shwi_p (size))
-      string_size = tree_to_uhwi (size);
-
-  if (strlen)
-    {
-      /* Compute and store the length of the substring at OFFSET.
-	 All offsets past the initial length refer to null strings.  */
-      if (offset <= string_length)
-	*strlen = string_length - offset;
-      else
-	*strlen = 0;
-    }
+  unsigned HOST_WIDE_INT string_size = tree_to_uhwi (mem_size);
 
   const char *string = TREE_STRING_POINTER (src);
 
@@ -14631,21 +14617,26 @@  c_getstr (tree src, unsigned HOST_WIDE_I
       || offset >= string_size)
     return NULL;
 
-  if (strsize)
+  if (strlen)
     {
-      /* Support even constant character arrays that aren't proper
-	 NUL-terminated strings.  */
-      *strsize = string_size;
+      /* Compute and store the length of the substring at OFFSET.
+	 All offsets past the initial length refer to null strings.  */
+      if (offset < string_length)
+	*strlen = string_length - offset;
+      else
+	*strlen = 1;
     }
-  else if (string[string_length - 1] != '\0')
+  else
     {
-      /* Support only properly NUL-terminated strings but handle
-	 consecutive strings within the same array, such as the six
-	 substrings in "1\0002\0003".  */
-      return NULL;
+      tree eltype = TREE_TYPE (TREE_TYPE (src));
+      /* Support only properly NUL-terminated single byte strings.  */
+      if (tree_to_uhwi (TYPE_SIZE_UNIT (eltype)) != 1)
+	return NULL;
+      if (string[string_length - 1] != '\0')
+	return NULL;
     }
 
-  return offset <= string_length ? string + offset : "";
+  return offset < string_length ? string + offset : "";
 }
 
 /* Given a tree T, compute which bits in T may be nonzero.  */
diff -Npur gcc/fold-const.h gcc/fold-const.h
--- gcc/fold-const.h	2018-08-30 08:20:44.000000000 +0200
+++ gcc/fold-const.h	2018-08-30 20:58:54.154101370 +0200
@@ -187,8 +187,7 @@  extern bool expr_not_equal_to (tree t, c
 extern tree const_unop (enum tree_code, tree, tree);
 extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (combined_fn);
-extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL,
-			     unsigned HOST_WIDE_INT * = NULL);
+extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL);
 extern wide_int tree_nonzero_bits (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
diff -Npur gcc/gimple-fold.c gcc/gimple-fold.c
--- gcc/gimple-fold.c	2018-08-28 15:44:49.000000000 +0200
+++ gcc/gimple-fold.c	2018-08-31 00:57:50.947845988 +0200
@@ -725,6 +725,8 @@  gimple_fold_builtin_memory_op (gimple_st
       tree srctype, desttype;
       unsigned int src_align, dest_align;
       tree off0;
+      const char *tmp_str;
+      unsigned HOST_WIDE_INT tmp_len;
 
       /* Build accesses at offset zero with a ref-all character type.  */
       off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
@@ -742,7 +744,9 @@  gimple_fold_builtin_memory_op (gimple_st
 	     confuses the tree-ssa-strlen.c.  This doesn't handle
 	     the case in gcc.dg/strlenopt-8.c which is XFAILed for that
 	     reason.  */
-	  && !c_strlen (src, 2))
+	  && !c_strlen (src, 2)
+	  && !((tmp_str = c_getstr (src, &tmp_len)) != NULL
+	       && memchr (tmp_str, 0, tmp_len) == NULL))
 	{
 	  unsigned ilen = tree_to_uhwi (len);
 	  if (pow2p_hwi (ilen))