Message ID | AM5PR0701MB26577C800216899DCF6647E6E42F0@AM5PR0701MB2657.eurprd07.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | Fix the damage done by my other patch from yesterday to strlenopt-49.c | expand |
On Mon, 30 Jul 2018, Bernd Edlinger wrote: > Hi, > > this is how I would like to handle the over length strings issue in the C FE. > If the string constant is exactly the right length and ends in one explicit > NUL character, shorten it by one character. > > I thought Martin would be working on it, but as this is a really simple fix, > I would dare to send it to gcc-patches anyway, hope you don't mind... > > The patch is relative to the other patch here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? I'll leave this to FE maintainers but can I ask you to verify the (other) FEs do not leak this kind of invalid initializers to the middle-end? I suggest to put this verification in output_constructor which otherwise happily truncates initializers with excess size. There's also gimplification which might elide a = { "abcd", "cdse" }; to a.x = "abcd"; a.y = "cdse"; but hopefully there the GIMPLE verifier (verify_gimple_assign_single) verifies this - well, it only dispatches to useless_type_conversion_p (lhs_type, rhs1_type) for this case, but non-flexarrays should be handled fine there. Richard.
On 07/30/18 15:03, Richard Biener wrote: > On Mon, 30 Jul 2018, Bernd Edlinger wrote: > >> Hi, >> >> this is how I would like to handle the over length strings issue in the C FE. >> If the string constant is exactly the right length and ends in one explicit >> NUL character, shorten it by one character. >> >> I thought Martin would be working on it, but as this is a really simple fix, >> I would dare to send it to gcc-patches anyway, hope you don't mind... >> >> The patch is relative to the other patch here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > > I'll leave this to FE maintainers but can I ask you to verify the > (other) FEs do not leak this kind of invalid initializers to the > middle-end? I suggest to put this verification in > output_constructor which otherwise happily truncates initializers > with excess size. There's also gimplification which might elide > a = { "abcd", "cdse" }; to a.x = "abcd"; a.y = "cdse"; but > hopefully there the GIMPLE verifier (verify_gimple_assign_single) > verifies this - well, it only dispatches to useless_type_conversion_p > (lhs_type, rhs1_type) for this case, but non-flexarrays should be > handled fine there. > > Richard. > In the moment I would already be happy if all STRING_CSTs would be zero terminated. However Go does not create zero-terminated STRING_CSTs, @Ian sorry, could you look at changing this to include the terminating NUL char? Index: gcc/go/go-gcc.cc =================================================================== --- gcc/go/go-gcc.cc (revision 263068) +++ gcc/go/go-gcc.cc (working copy) @@ -1394,7 +1394,7 @@ Gcc_backend::string_constant_expression(const std: TYPE_QUAL_CONST); tree string_type = build_array_type(const_char_type, index_type); TYPE_STRING_FLAG(string_type) = 1; - tree string_val = build_string(val.length(), val.data()); + tree string_val = build_string(val.length() + 1, val.data()); TREE_TYPE(string_val) = string_type; return this->make_expression(string_val); A am pretty sure that the C++ FE allows overlength initializers with -permissive. They should be hedged in string_constant IMHO, however with the patch I am still holding back on Jeff's request I ran over a string constant in tree-ssa-strlen.c (get_min_string_length) that had a terminating NUL char but the index range type did not include the string terminator. One just needs to be careful here. A quick survey shows that Fortran creates C strings with range 1..n, which puts the pr86532 address computation again in question. Remember, you asked for array_ref_element_size but not for array_ref_low_bound, and Jeff acked the patch in this state. Thanks Bernd.
On 07/30/2018 05:51 AM, Bernd Edlinger wrote: > Hi, > > this is how I would like to handle the over length strings issue in the C FE. > If the string constant is exactly the right length and ends in one explicit > NUL character, shorten it by one character. > > I thought Martin would be working on it, but as this is a really simple fix, > I would dare to send it to gcc-patches anyway, hope you don't mind... And as I said repeatedly, I am working on it along with a number of other things in this very area. There are a number of issues to solve: 1) issue a warning for the non-nul terminated strings (bug 86552: raised in response to your comments, initial patch posted, more work in progress) 2) avoid creating overlong string literals (bug 86688: raised partly also in response to your earlier comments) 3) handle braced-initializers (as in char a[] = { 1, 2, 0 }; ) I'm testing a patch for that. I am actively working on all three of these so please, for the fourth time, let me finish my work. Submitting patches that try to handle any of these issues in a slightly or substantially different way at the same time isn't helpful. On the contrary, it's very disruptive. This has nothing to do with ownership and everything with coordination and an apparent divergence of opinions. There are over 10,000 open bugs in Bugzilla. Working on any of them that's not assigned to anyone would be helpful. This is not. Martin
On Mon, 30 Jul 2018, Bernd Edlinger wrote: > Hi, > > this is how I would like to handle the over length strings issue in the C FE. > If the string constant is exactly the right length and ends in one explicit > NUL character, shorten it by one character. I don't think shortening should be limited to that case. I think the case where the constant is longer than that (and so gets an unconditional pedwarn) should also have it shortened - any constant that doesn't fit in the object being initialized should be shortened to fit, whether diagnosed or not, we should define GENERIC / GIMPLE to disallow too-large string constants in initializers, and should add an assertion somewhere in the middle-end that no too-large string constants reach it.
On Mon, 30 Jul 2018, Bernd Edlinger wrote: > In the moment I would already be happy if all STRING_CSTs would > be zero terminated. generic.texi says they need not be. Making the STRING_CST contain only the bytes of the initializer and not the trailing NUL in the C case where the trailing NUL does not fit in the object initialized would of course mean you get non-NUL-terminated STRING_CSTs for valid C code as well.
On Mon, Jul 30, 2018 at 03:52:39PM +0000, Joseph Myers wrote: > On Mon, 30 Jul 2018, Bernd Edlinger wrote: > > > In the moment I would already be happy if all STRING_CSTs would > > be zero terminated. > > generic.texi says they need not be. Making the STRING_CST contain only > the bytes of the initializer and not the trailing NUL in the C case where > the trailing NUL does not fit in the object initialized would of course > mean you get non-NUL-terminated STRING_CSTs for valid C code as well. One thing is whether TREE_STRING_LENGTH includes the trailing NUL byte, that doesn't need to be the case e.g. for the shortened initializers. The other thing is whether we as a convenience for the compiler's internals want to waste some memory for the NUL termination; I think we could avoid some bugs that way. Jakub
On Mon, 30 Jul 2018, Jakub Jelinek wrote: > On Mon, Jul 30, 2018 at 03:52:39PM +0000, Joseph Myers wrote: > > On Mon, 30 Jul 2018, Bernd Edlinger wrote: > > > > > In the moment I would already be happy if all STRING_CSTs would > > > be zero terminated. > > > > generic.texi says they need not be. Making the STRING_CST contain only > > the bytes of the initializer and not the trailing NUL in the C case where > > the trailing NUL does not fit in the object initialized would of course > > mean you get non-NUL-terminated STRING_CSTs for valid C code as well. > > One thing is whether TREE_STRING_LENGTH includes the trailing NUL byte, > that doesn't need to be the case e.g. for the shortened initializers. > The other thing is whether we as a convenience for the compiler's internals > want to waste some memory for the NUL termination; I think we could avoid > some bugs that way. TREE_STRING_LENGTH includes the NUL if it is logically part of the object, but should not if the NUL is purely an implementation convenience.
On 07/30/18 17:57, Jakub Jelinek wrote: > On Mon, Jul 30, 2018 at 03:52:39PM +0000, Joseph Myers wrote: >> On Mon, 30 Jul 2018, Bernd Edlinger wrote: >> >>> In the moment I would already be happy if all STRING_CSTs would >>> be zero terminated. >> >> generic.texi says they need not be. Making the STRING_CST contain only >> the bytes of the initializer and not the trailing NUL in the C case where >> the trailing NUL does not fit in the object initialized would of course >> mean you get non-NUL-terminated STRING_CSTs for valid C code as well. > > One thing is whether TREE_STRING_LENGTH includes the trailing NUL byte, > that doesn't need to be the case e.g. for the shortened initializers. > The other thing is whether we as a convenience for the compiler's internals > want to waste some memory for the NUL termination; I think we could avoid > some bugs that way. > Yes, exactly, currently the middle-end tries determine if a STRING_CST is nul terminated, but that is broken for wide character, for instance c_getstr: else if (string[string_length - 1] != '\0') { /* Support only properly NUL-terminated strings but handle consecutive strings within the same array, such as the six substrings in "1\0002\0003". */ return NULL; } It would be much better if any string constant could be zero terminated. When always zero-terminated STRING_CST the check for a non-zero terminated value is much more easy: compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), TREE_STRING_LENGTH (init)) Bernd.
On 07/30/18 18:01, Joseph Myers wrote: > On Mon, 30 Jul 2018, Jakub Jelinek wrote: > >> On Mon, Jul 30, 2018 at 03:52:39PM +0000, Joseph Myers wrote: >>> On Mon, 30 Jul 2018, Bernd Edlinger wrote: >>> >>>> In the moment I would already be happy if all STRING_CSTs would >>>> be zero terminated. >>> >>> generic.texi says they need not be. Making the STRING_CST contain only >>> the bytes of the initializer and not the trailing NUL in the C case where >>> the trailing NUL does not fit in the object initialized would of course >>> mean you get non-NUL-terminated STRING_CSTs for valid C code as well. >> >> One thing is whether TREE_STRING_LENGTH includes the trailing NUL byte, >> that doesn't need to be the case e.g. for the shortened initializers. >> The other thing is whether we as a convenience for the compiler's internals >> want to waste some memory for the NUL termination; I think we could avoid >> some bugs that way. > > TREE_STRING_LENGTH includes the NUL if it is logically part of the object, > but should not if the NUL is purely an implementation convenience. > To complicate things a bit more STRING_CST that are created by the Ada FE for the purpose of ASM constraints, do not contain a NUL character. That is in TREE_STRING_LENGTH, there is of course always another NUL char after the payload. Likewise for C __attribute__ values. Thanks Bernd.
On Mon, Jul 30, 2018 at 04:28:50PM +0000, Bernd Edlinger wrote: > >>> generic.texi says they need not be. Making the STRING_CST contain only > >>> the bytes of the initializer and not the trailing NUL in the C case where > >>> the trailing NUL does not fit in the object initialized would of course > >>> mean you get non-NUL-terminated STRING_CSTs for valid C code as well. > >> > >> One thing is whether TREE_STRING_LENGTH includes the trailing NUL byte, > >> that doesn't need to be the case e.g. for the shortened initializers. > >> The other thing is whether we as a convenience for the compiler's internals > >> want to waste some memory for the NUL termination; I think we could avoid > >> some bugs that way. > > > > TREE_STRING_LENGTH includes the NUL if it is logically part of the object, > > but should not if the NUL is purely an implementation convenience. > > > > To complicate things a bit more STRING_CST that are created by the Ada FE > for the purpose of ASM constraints, do not contain a NUL character. > That is in TREE_STRING_LENGTH, there is of course always another NUL char > after the payload. Likewise for C __attribute__ values. If there is a NUL at TREE_STRING_POINTER (x) + TREE_STRING_LENGTH (x), that is ok. Jakub
On July 30, 2018 4:41:19 PM GMT+02:00, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > > >On 07/30/18 15:03, Richard Biener wrote: >> On Mon, 30 Jul 2018, Bernd Edlinger wrote: >> >>> Hi, >>> >>> this is how I would like to handle the over length strings issue in >the C FE. >>> If the string constant is exactly the right length and ends in one >explicit >>> NUL character, shorten it by one character. >>> >>> I thought Martin would be working on it, but as this is a really >simple fix, >>> I would dare to send it to gcc-patches anyway, hope you don't >mind... >>> >>> The patch is relative to the other patch here: >https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html >>> >>> >>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >>> Is it OK for trunk? >> >> I'll leave this to FE maintainers but can I ask you to verify the >> (other) FEs do not leak this kind of invalid initializers to the >> middle-end? I suggest to put this verification in >> output_constructor which otherwise happily truncates initializers >> with excess size. There's also gimplification which might elide >> a = { "abcd", "cdse" }; to a.x = "abcd"; a.y = "cdse"; but >> hopefully there the GIMPLE verifier (verify_gimple_assign_single) >> verifies this - well, it only dispatches to useless_type_conversion_p >> (lhs_type, rhs1_type) for this case, but non-flexarrays should be >> handled fine there. >> >> Richard. >> > >In the moment I would already be happy if all STRING_CSTs would >be zero terminated. > >However Go does not create zero-terminated STRING_CSTs, @Ian sorry, >could you look at changing this to include the terminating NUL char? > >Index: gcc/go/go-gcc.cc >=================================================================== >--- gcc/go/go-gcc.cc (revision 263068) >+++ gcc/go/go-gcc.cc (working copy) >@@ -1394,7 +1394,7 @@ Gcc_backend::string_constant_expression(const >std: > TYPE_QUAL_CONST); > tree string_type = build_array_type(const_char_type, index_type); > TYPE_STRING_FLAG(string_type) = 1; >- tree string_val = build_string(val.length(), val.data()); >+ tree string_val = build_string(val.length() + 1, val.data()); > TREE_TYPE(string_val) = string_type; > > return this->make_expression(string_val); > > >A am pretty sure that the C++ FE allows overlength initializers >with -permissive. They should be hedged in string_constant IMHO, >however with the patch I am still holding back on Jeff's request >I ran over a string constant in tree-ssa-strlen.c >(get_min_string_length) >that had a terminating NUL char but the index range type did not >include the string terminator. One just needs to be careful here. > >A quick survey shows that Fortran creates C strings with range >1..n, which puts the pr86532 address computation again in question. >Remember, you asked for array_ref_element_size but not for >array_ref_low_bound, and Jeff acked the patch in this state. The (earlier) changes in this area are unfortunately in a quite messy state. It would be nice to roll back completely and attack this in smaller and more obvious issues. Richard. > > >Thanks >Bernd.
On 07/30/18 15:03, Richard Biener wrote: > On Mon, 30 Jul 2018, Bernd Edlinger wrote: > >> Hi, >> >> this is how I would like to handle the over length strings issue in the C FE. >> If the string constant is exactly the right length and ends in one explicit >> NUL character, shorten it by one character. >> >> I thought Martin would be working on it, but as this is a really simple fix, >> I would dare to send it to gcc-patches anyway, hope you don't mind... >> >> The patch is relative to the other patch here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > > I'll leave this to FE maintainers but can I ask you to verify the > (other) FEs do not leak this kind of invalid initializers to the > middle-end? I suggest to put this verification in > output_constructor which otherwise happily truncates initializers > with excess size. There's also gimplification which might elide > a = { "abcd", "cdse" }; to a.x = "abcd"; a.y = "cdse"; but > hopefully there the GIMPLE verifier (verify_gimple_assign_single) > verifies this - well, it only dispatches to useless_type_conversion_p > (lhs_type, rhs1_type) for this case, but non-flexarrays should be > handled fine there. > Okay, this is what I am currently playing with. There is still a major fault in the fortran FE, but I believe sanitizing the middle-end is worth it.... IMHO sanitizing should have priority over new optimizations :( Thanks Bernd. Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (Revision 263072) +++ gcc/varasm.c (Arbeitskopie) @@ -4774,6 +4774,29 @@ initializer_constant_valid_for_bitfield_p (tree va return false; } +/* Check if a STRING_CST fits into the field. + Tolerate only the case when the NUL termination + does not fit into the field. */ + +bool +check_string_literal (tree string, unsigned HOST_WIDE_INT size) +{ + tree eltype = TREE_TYPE (TREE_TYPE (string)); + unsigned HOST_WIDE_INT elts = tree_to_uhwi (TYPE_SIZE_UNIT (eltype)); + const char *p = TREE_STRING_POINTER (string); + int len = TREE_STRING_LENGTH (string); + + if (elts != 1 && elts != 2 && elts != 4) + return false; + if (len <= 0 || len % elts != 0) + return false; + if ((unsigned)len != size && (unsigned)len != size + elts) + return false; + if (memcmp (p + len - elts, "\0\0\0\0", elts) != 0) + return false; + return true; +} + /* output_constructor outer state of relevance in recursive calls, typically for nested aggregate bitfields. */ @@ -4942,6 +4965,7 @@ output_constant (tree exp, unsigned HOST_WIDE_INT case STRING_CST: thissize = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size); + gcc_checking_assert (check_string_literal (exp, thissize)); assemble_string (TREE_STRING_POINTER (exp), thissize); break; case VECTOR_CST:
gcc/c: 2018-07-30 Bernd Edlinger <bernd.edlinger@hotmail.de> * c-typeck.c (digest_init): Fix overlength strings. testsuite: 2018-07-30 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc.dg/strlenopt-49.c: Adjust test expectations. diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c --- gcc/c/c-typeck.c 2018-06-20 18:35:15.000000000 +0200 +++ gcc/c/c-typeck.c 2018-07-30 12:17:34.175481372 +0200 @@ -7435,29 +7435,38 @@ digest_init (location_t init_loc, tree t } } - TREE_TYPE (inside_init) = type; if (TYPE_DOMAIN (type) != NULL_TREE && TYPE_SIZE (type) != NULL_TREE && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST) { unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init); + unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT; /* Subtract the size of a single (possibly wide) character because it's ok to ignore the terminating null char that is counted in the length of the constant. */ - if (compare_tree_int (TYPE_SIZE_UNIT (type), - (len - (TYPE_PRECISION (typ1) - / BITS_PER_UNIT))) < 0) + if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0) pedwarn_init (init_loc, 0, ("initializer-string for array of chars " "is too long")); - else if (warn_cxx_compat - && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) - warning_at (init_loc, OPT_Wc___compat, - ("initializer-string for array chars " - "is too long for C++")); + else if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) + { + if (warn_cxx_compat) + warning_at (init_loc, OPT_Wc___compat, + ("initializer-string for array chars " + "is too long for C++")); + if (len >= 2 * unit) + { + const char *p = TREE_STRING_POINTER (inside_init); + + len -= unit; + if (memcmp (p + len - unit, "\0\0\0\0", unit) == 0) + inside_init = build_string (len, p); + } + } } + TREE_TYPE (inside_init) = type; return inside_init; } else if (INTEGRAL_TYPE_P (typ1)) diff -pur gcc/testsuite/gcc.dg/strlenopt-49.c gcc/testsuite/gcc.dg/strlenopt-49.c --- gcc/testsuite/gcc.dg/strlenopt-49.c 2018-07-30 13:02:34.735478726 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-49.c 2018-07-30 13:08:21.074859303 +0200 @@ -11,9 +11,6 @@ const char a3[3] = "12\0"; const char a8[8] = "1234567\0"; const char a9[9] = "12345678\0"; -const char ax[9] = "12345678\0\0\0\0"; /* { dg-warning "initializer-string for array of chars is too long" } */ -const char ay[9] = "\00012345678\0\0\0\0"; /* { dg-warning "initializer-string for array of chars is too long" } */ - int len1 (void) { @@ -27,27 +24,13 @@ int len (void) return len; } -int lenx (void) -{ - size_t lenx = strlen (ax); - return lenx; -} - -int leny (void) -{ - size_t leny = strlen (ay); - return leny; -} - int cmp88 (void) { int cmp88 = memcmp (a8, "1234567\0", sizeof a8); return cmp88; } -/* { dg-final { scan-tree-dump-times "strlen" 0 "gimple" { xfail *-*-* } } } - { dg-final { scan-tree-dump-times "len0 = 0;" 1 "gimple" { xfail *-*-* } } } - { dg-final { scan-tree-dump-times "len = 18;" 1 "gimple" { xfail *-*-* } } } - { dg-final { scan-tree-dump-times "lenx = 8;" 1 "gimple" { xfail *-*-* } } } - { dg-final { scan-tree-dump-times "leny = 0;" 1 "gimple" { xfail *-*-* } } } - { dg-final { scan-tree-dump-times "cmp88 = 0;" 1 "gimple" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "strlen" 0 "gimple" } } + { dg-final { scan-tree-dump-times "len0 = 0;" 1 "gimple" } } + { dg-final { scan-tree-dump-times "len = 18;" 1 "gimple" } } + { dg-final { scan-tree-dump-times "cmp88 = 0;" 1 "gimple" } } */