Message ID | DB6PR0701MB266438272BCFD277C6600399E4280@DB6PR0701MB2664.eurprd07.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | Fix wrong code with truncated string literals (PR 86711/86714) | expand |
On 07/29/2018 04:56 AM, Bernd Edlinger wrote: > Hi! > > This fixes two wrong code bugs where string_constant > returns over length string constants. Initializers > like that are rejected in C++, but valid in C. If by valid you are referring to declarations like the one in the added test: const char a[2][3] = { "1234", "xyz" }; then (as I explained), the excess elements in "1234" make the char[3] initialization and thus the test case undefined. I have resolved bug 86711 as invalid on those grounds. Bug 86711 has a valid test case that needs to be fixed, along with bug 86688 that I raised for the same underlying problem: considering the excess nul as part of the string. As has been discussed in a separate bug, rather than working around the excessively long strings in the middle-end, it would be preferable to avoid creating them to begin with. I'm already working on a fix for bug 86688, in part because I introduced the code change and also because I'm making other changes in this area -- bug 86552. Both of these in response to your comments. I would normally welcome someone else helping with my work but (as I already made clear last week) it's counteproductive to have two people working in the very same area at the same time, especially when they are working at cross purposes as you seem to be hell-bent on doing. > I have xfailed strlenopt-49.c, which tests this feature. That's not appropriate. The purpose of the test is to verify the fix for bug 86428: namely, that a call to strlen() on an array initialized with a string of the same length is folded, such as in: const char b[4] = "123\0"; That's a valid initializer that can and should continue to be folded. The test needs to continue to exercise that feature. The test also happens to exercise invalid/overlong initializers. This is because that, in my view, it's safer to fold the result of such calls to a constant than than to call the library function and have it either return an unpredictable value or perhaps even crash. > Personally I don't think that it is worth the effort to > optimize something that is per se invalid in C++. This is a C test, not C++. (I don't suppose you are actually saying that only the common subset between C and C++ is worth optimizing.) Just in case it isn't clear from the above: the point of the test exercising the behavior for overlong strings isn't optimizing undefined behavior but rather avoiding the worst consequences of it. I have already explained this once before so I'm starting to wonder if I'm being insufficiently clear or if you are not receiving or reading (or understanding) my responses. We can have a broader discussion about whether this is the best approach for GCC to take either in this instance or in general, but in the meantime I would appreciate it if you refrained from undoing my changes just because you don't agree with or don't understand the motivation behind them. Martin PS I continue to wonder about your motivation and ethics. It's rare to have someone so openly, blatantly and persistently try to undermine someone else's work.
On Sun, 29 Jul 2018, Martin Sebor wrote: > On 07/29/2018 04:56 AM, Bernd Edlinger wrote: > > Hi! > > > > This fixes two wrong code bugs where string_constant > > returns over length string constants. Initializers > > like that are rejected in C++, but valid in C. > > If by valid you are referring to declarations like the one in > the added test: > > const char a[2][3] = { "1234", "xyz" }; > > then (as I explained), the excess elements in "1234" make > the char[3] initialization and thus the test case undefined. > I have resolved bug 86711 as invalid on those grounds. > > Bug 86711 has a valid test case that needs to be fixed, along > with bug 86688 that I raised for the same underlying problem: > considering the excess nul as part of the string. As has been > discussed in a separate bug, rather than working around > the excessively long strings in the middle-end, it would be > preferable to avoid creating them to begin with. > > I'm already working on a fix for bug 86688, in part because > I introduced the code change and also because I'm making other > changes in this area -- bug 86552. Both of these in response > to your comments. > > I would normally welcome someone else helping with my work > but (as I already made clear last week) it's counteproductive > to have two people working in the very same area at the same > time, especially when they are working at cross purposes as > you seem to be hell-bent on doing. > > > I have xfailed strlenopt-49.c, which tests this feature. > > That's not appropriate. The purpose of the test is to verify > the fix for bug 86428: namely, that a call to strlen() on > an array initialized with a string of the same length is > folded, such as in: > > const char b[4] = "123\0"; > > That's a valid initializer that can and should continue to be > folded. The test needs to continue to exercise that feature. > > The test also happens to exercise invalid/overlong initializers. > This is because that, in my view, it's safer to fold the result > of such calls to a constant than than to call the library > function and have it either return an unpredictable value or > perhaps even crash. > > > Personally I don't think that it is worth the effort to > > optimize something that is per se invalid in C++. > > This is a C test, not C++. (I don't suppose you are actually > saying that only the common subset between C and C++ is worth > optimizing.) > > Just in case it isn't clear from the above: the point of > the test exercising the behavior for overlong strings isn't > optimizing undefined behavior but rather avoiding the worst > consequences of it. I have already explained this once > before so I'm starting to wonder if I'm being insufficiently > clear or if you are not receiving or reading (or understanding) > my responses. We can have a broader discussion about whether > this is the best approach for GCC to take either in this instance > or in general, but in the meantime I would appreciate it if you > refrained from undoing my changes just because you don't agree > with or don't understand the motivation behind them. > > Martin > > PS I continue to wonder about your motivation and ethics. It's > rare to have someone so openly, blatantly and persistently try > to undermine someone else's work. Martin, you are clearly the one being hostile here - Bernd is trying to help. In fact his patches are more focused, easier to undestand and thus easier to review. Get your feelings about "ownership" under control. Richard.
On 07/30/2018 12:57 AM, Richard Biener wrote: > On Sun, 29 Jul 2018, Martin Sebor wrote: > >> On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >>> Hi! >>> >>> This fixes two wrong code bugs where string_constant >>> returns over length string constants. Initializers >>> like that are rejected in C++, but valid in C. >> >> If by valid you are referring to declarations like the one in >> the added test: >> >> const char a[2][3] = { "1234", "xyz" }; >> >> then (as I explained), the excess elements in "1234" make >> the char[3] initialization and thus the test case undefined. >> I have resolved bug 86711 as invalid on those grounds. >> >> Bug 86711 has a valid test case that needs to be fixed, along >> with bug 86688 that I raised for the same underlying problem: >> considering the excess nul as part of the string. As has been >> discussed in a separate bug, rather than working around >> the excessively long strings in the middle-end, it would be >> preferable to avoid creating them to begin with. >> >> I'm already working on a fix for bug 86688, in part because >> I introduced the code change and also because I'm making other >> changes in this area -- bug 86552. Both of these in response >> to your comments. >> >> I would normally welcome someone else helping with my work >> but (as I already made clear last week) it's counteproductive >> to have two people working in the very same area at the same >> time, especially when they are working at cross purposes as >> you seem to be hell-bent on doing. >> >>> I have xfailed strlenopt-49.c, which tests this feature. >> >> That's not appropriate. The purpose of the test is to verify >> the fix for bug 86428: namely, that a call to strlen() on >> an array initialized with a string of the same length is >> folded, such as in: >> >> const char b[4] = "123\0"; >> >> That's a valid initializer that can and should continue to be >> folded. The test needs to continue to exercise that feature. >> >> The test also happens to exercise invalid/overlong initializers. >> This is because that, in my view, it's safer to fold the result >> of such calls to a constant than than to call the library >> function and have it either return an unpredictable value or >> perhaps even crash. >> >>> Personally I don't think that it is worth the effort to >>> optimize something that is per se invalid in C++. >> >> This is a C test, not C++. (I don't suppose you are actually >> saying that only the common subset between C and C++ is worth >> optimizing.) >> >> Just in case it isn't clear from the above: the point of >> the test exercising the behavior for overlong strings isn't >> optimizing undefined behavior but rather avoiding the worst >> consequences of it. I have already explained this once >> before so I'm starting to wonder if I'm being insufficiently >> clear or if you are not receiving or reading (or understanding) >> my responses. We can have a broader discussion about whether >> this is the best approach for GCC to take either in this instance >> or in general, but in the meantime I would appreciate it if you >> refrained from undoing my changes just because you don't agree >> with or don't understand the motivation behind them. >> >> Martin >> >> PS I continue to wonder about your motivation and ethics. It's >> rare to have someone so openly, blatantly and persistently try >> to undermine someone else's work. > > Martin, you are clearly the one being hostile here - Bernd is trying > to help. In fact his patches are more focused, easier to undestand > and thus easier to review. As I explained, it's unhelpful for two people to making changes to the same code at the same time, especially when one is undoing the other one's changes. I have made it clear that I am working in this area -- I welcome and address valid feedback but I can't very well do that while someone is compromising my work. I appreciate test cases and suggestions for improvements but please avoid making changes to this code while I'm working on it. It is not helpful. Martin
On 07/30/18 01:05, Martin Sebor wrote: > On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >> Hi! >> >> This fixes two wrong code bugs where string_constant >> returns over length string constants. Initializers >> like that are rejected in C++, but valid in C. > > If by valid you are referring to declarations like the one in > the added test: > > const char a[2][3] = { "1234", "xyz" }; > > then (as I explained), the excess elements in "1234" make > the char[3] initialization and thus the test case undefined. > I have resolved bug 86711 as invalid on those grounds. > > Bug 86711 has a valid test case that needs to be fixed, along > with bug 86688 that I raised for the same underlying problem: > considering the excess nul as part of the string. As has been > discussed in a separate bug, rather than working around > the excessively long strings in the middle-end, it would be > preferable to avoid creating them to begin with. > > I'm already working on a fix for bug 86688, in part because > I introduced the code change and also because I'm making other > changes in this area -- bug 86552. Both of these in response > to your comments. > Sorry, I must admit, I have completely lost track on how many things you are trying to work in parallel. Nevertheless I started to review you pr86552 patch here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html But so far you did not respond to me. Well actually I doubt your patch does apply to trunk, maybe you start to re-base that one, and post it again instead? Thanks Bernd.
On 07/30/2018 09:24 AM, Bernd Edlinger wrote: > On 07/30/18 01:05, Martin Sebor wrote: >> On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >>> Hi! >>> >>> This fixes two wrong code bugs where string_constant >>> returns over length string constants. Initializers >>> like that are rejected in C++, but valid in C. >> >> If by valid you are referring to declarations like the one in >> the added test: >> >> const char a[2][3] = { "1234", "xyz" }; >> >> then (as I explained), the excess elements in "1234" make >> the char[3] initialization and thus the test case undefined. >> I have resolved bug 86711 as invalid on those grounds. >> >> Bug 86711 has a valid test case that needs to be fixed, along >> with bug 86688 that I raised for the same underlying problem: >> considering the excess nul as part of the string. As has been >> discussed in a separate bug, rather than working around >> the excessively long strings in the middle-end, it would be >> preferable to avoid creating them to begin with. >> >> I'm already working on a fix for bug 86688, in part because >> I introduced the code change and also because I'm making other >> changes in this area -- bug 86552. Both of these in response >> to your comments. >> > > Sorry, I must admit, I have completely lost track on how many things > you are trying to work in parallel. > > Nevertheless I started to review you pr86552 patch here: > https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html > > But so far you did not respond to me. > > Well actually I doubt your patch does apply to trunk, > maybe you start to re-base that one, and post it again > instead? I read your comments and have been busy working on enhancing the patch (among other things). There are a large number of additional contexts where constant strings are expected and where a missing nul needs to be detected. Some include additional instances of strlen calls that my initial patch didn't handle, many more others that involve other string functions. I have posted an updated patch that applies cleanly and that handles the first set. There is also a class of problems involving constant character arrays initialized by a braced list, as in char [] = { x, y, z }; Those are currently not recognized as strings even if they are nul-terminated, but they are far more likely to be a source of these problems than string literals, certainly in C++ where string initializers must fit in the array. I am testing a patch to convert those into STRING_CST so they can be handled as well. Since initializing arrays with more elements than fit is undefined in C and since the behavior is undefined at compile time it seems to me that rejecting such initializers with a hard error (as opposed to a warning) would be appropriate and obviate having to deal with them in the middle-end. Martin
On 07/30/18 21:52, Martin Sebor wrote: > On 07/30/2018 09:24 AM, Bernd Edlinger wrote: >> On 07/30/18 01:05, Martin Sebor wrote: >>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >>>> Hi! >>>> >>>> This fixes two wrong code bugs where string_constant >>>> returns over length string constants. Initializers >>>> like that are rejected in C++, but valid in C. >>> >>> If by valid you are referring to declarations like the one in >>> the added test: >>> >>> const char a[2][3] = { "1234", "xyz" }; >>> >>> then (as I explained), the excess elements in "1234" make >>> the char[3] initialization and thus the test case undefined. >>> I have resolved bug 86711 as invalid on those grounds. >>> >>> Bug 86711 has a valid test case that needs to be fixed, along >>> with bug 86688 that I raised for the same underlying problem: >>> considering the excess nul as part of the string. As has been >>> discussed in a separate bug, rather than working around >>> the excessively long strings in the middle-end, it would be >>> preferable to avoid creating them to begin with. >>> >>> I'm already working on a fix for bug 86688, in part because >>> I introduced the code change and also because I'm making other >>> changes in this area -- bug 86552. Both of these in response >>> to your comments. >>> >> >> Sorry, I must admit, I have completely lost track on how many things >> you are trying to work in parallel. >> >> Nevertheless I started to review you pr86552 patch here: >> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html >> >> But so far you did not respond to me. >> >> Well actually I doubt your patch does apply to trunk, >> maybe you start to re-base that one, and post it again >> instead? > > I read your comments and have been busy working on enhancing > the patch (among other things). There are a large number of > additional contexts where constant strings are expected and > where a missing nul needs to be detected. Some include > additional instances of strlen calls that my initial patch > didn't handle, many more others that involve other string > functions. I have posted an updated patch that applies > cleanly and that handles the first set. > > There is also a class of problems involving constant character > arrays initialized by a braced list, as in char [] = { x, y, z }; > Those are currently not recognized as strings even if they are > nul-terminated, but they are far more likely to be a source of > these problems than string literals, certainly in C++ where > string initializers must fit in the array. I am testing a patch > to convert those into STRING_CST so they can be handled as well. > > Since initializing arrays with more elements than fit is > undefined in C and since the behavior is undefined at compile > time it seems to me that rejecting such initializers with > a hard error (as opposed to a warning) would be appropriate > and obviate having to deal with them in the middle-end. > We do not want to change what is currently accepted by the front end. period. But there is no reason why ambiguous string constants have to be passed to the middle end. For instance char c[2] = "a\0"; should look like c[1] = "a"; while c[2] = "aaa"; should look like c[2] = "aa"; varasm.c will cut the excess precision off anyway. That is TREE_STRING_LENGTH (str) == 3 and TREE_STRING_POINTER(str) = "aa\0"; I propose to have all STRING_CST always be created by the FE with explicit nul termination, but the TYPE_SIZE_UNIT (TREE_TYPE (str)) >= TREE_STRING_LENGTH(str) in normal case (null-terminated) TREE_SIZE_UNIT (TREE_TYPE (str)) < TREE_STRING_LENGTH(str) if non zero terminated, truncated in the initializer. Do you understand what I mean? Bernd.
On 07/30/2018 02:21 PM, Bernd Edlinger wrote: > On 07/30/18 21:52, Martin Sebor wrote: >> On 07/30/2018 09:24 AM, Bernd Edlinger wrote: >>> On 07/30/18 01:05, Martin Sebor wrote: >>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >>>>> Hi! >>>>> >>>>> This fixes two wrong code bugs where string_constant >>>>> returns over length string constants. Initializers >>>>> like that are rejected in C++, but valid in C. >>>> >>>> If by valid you are referring to declarations like the one in >>>> the added test: >>>> >>>> const char a[2][3] = { "1234", "xyz" }; >>>> >>>> then (as I explained), the excess elements in "1234" make >>>> the char[3] initialization and thus the test case undefined. >>>> I have resolved bug 86711 as invalid on those grounds. >>>> >>>> Bug 86711 has a valid test case that needs to be fixed, along >>>> with bug 86688 that I raised for the same underlying problem: >>>> considering the excess nul as part of the string. As has been >>>> discussed in a separate bug, rather than working around >>>> the excessively long strings in the middle-end, it would be >>>> preferable to avoid creating them to begin with. >>>> >>>> I'm already working on a fix for bug 86688, in part because >>>> I introduced the code change and also because I'm making other >>>> changes in this area -- bug 86552. Both of these in response >>>> to your comments. >>>> >>> >>> Sorry, I must admit, I have completely lost track on how many things >>> you are trying to work in parallel. >>> >>> Nevertheless I started to review you pr86552 patch here: >>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html >>> >>> But so far you did not respond to me. >>> >>> Well actually I doubt your patch does apply to trunk, >>> maybe you start to re-base that one, and post it again >>> instead? >> >> I read your comments and have been busy working on enhancing >> the patch (among other things). There are a large number of >> additional contexts where constant strings are expected and >> where a missing nul needs to be detected. Some include >> additional instances of strlen calls that my initial patch >> didn't handle, many more others that involve other string >> functions. I have posted an updated patch that applies >> cleanly and that handles the first set. >> >> There is also a class of problems involving constant character >> arrays initialized by a braced list, as in char [] = { x, y, z }; >> Those are currently not recognized as strings even if they are >> nul-terminated, but they are far more likely to be a source of >> these problems than string literals, certainly in C++ where >> string initializers must fit in the array. I am testing a patch >> to convert those into STRING_CST so they can be handled as well. >> >> Since initializing arrays with more elements than fit is >> undefined in C and since the behavior is undefined at compile >> time it seems to me that rejecting such initializers with >> a hard error (as opposed to a warning) would be appropriate >> and obviate having to deal with them in the middle-end. >> > > We do not want to change what is currently accepted by the > front end. period. On whose behalf are you making such categorical statements? It was Jakub and Richard's suggestion in bug 86714 to reject the undefined excessive initializers and I happen to like the idea. I don't recall anyone making a decision about what "we" do or don't want to change. That said, if rejecting such initializers is not acceptable an alternate solution that I believe Richard preferred is to trim excess elements early on, e.g., during gimplification (or at some other point after the front-end is done). That's okay with me too. > > But there is no reason why ambiguous string constants > have to be passed to the middle end. > > For instance char c[2] = "a\0"; should look like c[1] = "a"; > while c[2] = "aaa"; should look like c[2] = "aa"; varasm.c > will cut the excess precision off anyway. > > That is TREE_STRING_LENGTH (str) == 3 and TREE_STRING_POINTER(str) = "aa\0"; > > I propose to have all STRING_CST always be created by the > FE with explicit nul termination, but the > TYPE_SIZE_UNIT (TREE_TYPE (str)) >= TREE_STRING_LENGTH(str) in normal case (null-terminated) > TREE_SIZE_UNIT (TREE_TYPE (str)) < TREE_STRING_LENGTH(str) if non zero terminated, > truncated in the initializer. > > Do you understand what I mean? I don't insist on any particular internal representation as long as it makes it possible to detect and diagnose common bugs, and (ideally) also help mitigate their worst consequences. Martin
On Mon, Jul 30, 2018 at 10:01:38PM -0600, Martin Sebor wrote: > > We do not want to change what is currently accepted by the > > front end. period. > > On whose behalf are you making such categorical statements? > It was Jakub and Richard's suggestion in bug 86714 to reject > the undefined excessive initializers and I happen to like > the idea. I don't recall anyone making a decision about what It was not my suggestion and it is already rejected with -pedantic-errors, which is all that is needed, reject it in pedantic mode. Otherwise there is a warning emitted by default which is enough. The suggestion was that if we don't reject (and no reason to change when we reject it), that we handle it right, which is what your optimization broke. Jakub
On 07/30/2018 02:21 PM, Bernd Edlinger wrote: > On 07/30/18 21:52, Martin Sebor wrote: >> On 07/30/2018 09:24 AM, Bernd Edlinger wrote: >>> On 07/30/18 01:05, Martin Sebor wrote: >>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >>>>> Hi! >>>>> >>>>> This fixes two wrong code bugs where string_constant >>>>> returns over length string constants. Initializers >>>>> like that are rejected in C++, but valid in C. >>>> >>>> If by valid you are referring to declarations like the one in >>>> the added test: >>>> >>>> const char a[2][3] = { "1234", "xyz" }; >>>> >>>> then (as I explained), the excess elements in "1234" make >>>> the char[3] initialization and thus the test case undefined. >>>> I have resolved bug 86711 as invalid on those grounds. >>>> >>>> Bug 86711 has a valid test case that needs to be fixed, along >>>> with bug 86688 that I raised for the same underlying problem: >>>> considering the excess nul as part of the string. As has been >>>> discussed in a separate bug, rather than working around >>>> the excessively long strings in the middle-end, it would be >>>> preferable to avoid creating them to begin with. >>>> >>>> I'm already working on a fix for bug 86688, in part because >>>> I introduced the code change and also because I'm making other >>>> changes in this area -- bug 86552. Both of these in response >>>> to your comments. >>>> >>> >>> Sorry, I must admit, I have completely lost track on how many things >>> you are trying to work in parallel. >>> >>> Nevertheless I started to review you pr86552 patch here: >>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html >>> >>> But so far you did not respond to me. >>> >>> Well actually I doubt your patch does apply to trunk, >>> maybe you start to re-base that one, and post it again >>> instead? >> >> I read your comments and have been busy working on enhancing >> the patch (among other things). There are a large number of >> additional contexts where constant strings are expected and >> where a missing nul needs to be detected. Some include >> additional instances of strlen calls that my initial patch >> didn't handle, many more others that involve other string >> functions. I have posted an updated patch that applies >> cleanly and that handles the first set. >> >> There is also a class of problems involving constant character >> arrays initialized by a braced list, as in char [] = { x, y, z }; >> Those are currently not recognized as strings even if they are >> nul-terminated, but they are far more likely to be a source of >> these problems than string literals, certainly in C++ where >> string initializers must fit in the array. I am testing a patch >> to convert those into STRING_CST so they can be handled as well. >> >> Since initializing arrays with more elements than fit is >> undefined in C and since the behavior is undefined at compile >> time it seems to me that rejecting such initializers with >> a hard error (as opposed to a warning) would be appropriate >> and obviate having to deal with them in the middle-end. >> > > We do not want to change what is currently accepted by the > front end. period. ?!? I don't follow this. When we can detect an error, we should issue a suitable diagnostic. As is mentioned later in the thread, some diagnostics are considered "pedantic" and are conditional. But I don't see how you get to "We do not want to change what is currently accepted by the front end. period." That seems like far too strong of a statement. I'm not sure I agree with this being a pedantic error only. See below... > > But there is no reason why ambiguous string constants > have to be passed to the middle end. Agreed -- if we're not outright rejecting, then we should present the middle end with something reasonable. But.... > > For instance char c[2] = "a\0"; should look like c[1] = "a"; > while c[2] = "aaa"; should look like c[2] = "aa"; varasm.c > will cut the excess precision off anyway. I get the first case. The second is less clear. One could argue that c[1] should be NUL or one could argue that c[1] should be 'a'. It's the inability to know what is "more correct" and the security implications of leaving the string unterminated that drives my opinion that this should not be a pedantic error, but instead a hard error. > > That is TREE_STRING_LENGTH (str) == 3 and TREE_STRING_POINTER(str) = "aa\0"; See above. That's going to leave an unterminated string in the resulting code. That has a negative security impact. > > I propose to have all STRING_CST always be created by the > FE with explicit nul termination, but the > TYPE_SIZE_UNIT (TREE_TYPE (str)) >= TREE_STRING_LENGTH(str) in normal case (null-terminated) > TREE_SIZE_UNIT (TREE_TYPE (str)) < TREE_STRING_LENGTH(str) if non zero terminated, > truncated in the initializer. > > Do you understand what I mean? I think you're starting from the wrong basis, namely that we shouldn't be issuing a hard error for excess initializers like this when we don't have a clear cut best way to handle it. jeff
On 07/31/2018 12:33 AM, Jakub Jelinek wrote: > On Mon, Jul 30, 2018 at 10:01:38PM -0600, Martin Sebor wrote: >>> We do not want to change what is currently accepted by the >>> front end. period. >> >> On whose behalf are you making such categorical statements? >> It was Jakub and Richard's suggestion in bug 86714 to reject >> the undefined excessive initializers and I happen to like >> the idea. I don't recall anyone making a decision about what > > It was not my suggestion and it is already rejected with -pedantic-errors, > which is all that is needed, reject it in pedantic mode. > Otherwise there is a warning emitted by default which is enough. But I think that's a mistake. I think a hard error at this point is warranted and the safest thing to do. > > The suggestion was that if we don't reject (and no reason to change when we > reject it), that we handle it right, which is what your optimization broke. But it's not always clear what is right. That's my concern. If we had rules which were clearly right, then applying those rules and continuing is much more reasonable. jeff
On 07/30/2018 01:52 PM, Martin Sebor wrote: > On 07/30/2018 09:24 AM, Bernd Edlinger wrote: >> On 07/30/18 01:05, Martin Sebor wrote: >>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >>>> Hi! >>>> >>>> This fixes two wrong code bugs where string_constant >>>> returns over length string constants. Initializers >>>> like that are rejected in C++, but valid in C. >>> >>> If by valid you are referring to declarations like the one in >>> the added test: >>> >>> const char a[2][3] = { "1234", "xyz" }; >>> >>> then (as I explained), the excess elements in "1234" make >>> the char[3] initialization and thus the test case undefined. >>> I have resolved bug 86711 as invalid on those grounds. >>> >>> Bug 86711 has a valid test case that needs to be fixed, along >>> with bug 86688 that I raised for the same underlying problem: >>> considering the excess nul as part of the string. As has been >>> discussed in a separate bug, rather than working around >>> the excessively long strings in the middle-end, it would be >>> preferable to avoid creating them to begin with. >>> >>> I'm already working on a fix for bug 86688, in part because >>> I introduced the code change and also because I'm making other >>> changes in this area -- bug 86552. Both of these in response >>> to your comments. >>> >> >> Sorry, I must admit, I have completely lost track on how many things >> you are trying to work in parallel. >> >> Nevertheless I started to review you pr86552 patch here: >> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html >> >> But so far you did not respond to me. >> >> Well actually I doubt your patch does apply to trunk, >> maybe you start to re-base that one, and post it again >> instead? > > I read your comments and have been busy working on enhancing > the patch (among other things). There are a large number of > additional contexts where constant strings are expected and > where a missing nul needs to be detected. Some include > additional instances of strlen calls that my initial patch > didn't handle, many more others that involve other string > functions. I have posted an updated patch that applies > cleanly and that handles the first set. So without seeing the code my worry here is we end up with a patch that gets increasingly complex because it's trying to handle a large number of cases. If at all possible let's try to make incremental improvements, focusing first on correctness issues. > > There is also a class of problems involving constant character > arrays initialized by a braced list, as in char [] = { x, y, z }; > Those are currently not recognized as strings even if they are > nul-terminated, but they are far more likely to be a source of > these problems than string literals, certainly in C++ where > string initializers must fit in the array. I am testing a patch > to convert those into STRING_CST so they can be handled as well. This feels like an independent, but very useful change. > Since initializing arrays with more elements than fit is > undefined in C and since the behavior is undefined at compile > time it seems to me that rejecting such initializers with > a hard error (as opposed to a warning) would be appropriate > and obviate having to deal with them in the middle-end. I tend to agree when there's no good set of rules we can apply to allow compilation to continue. However, I think that means getting some consensus to change existing practice where this is just a pedantic error. jeff
On Fri, Aug 03, 2018 at 03:16:41PM -0600, Jeff Law wrote: > On 07/31/2018 12:33 AM, Jakub Jelinek wrote: > > On Mon, Jul 30, 2018 at 10:01:38PM -0600, Martin Sebor wrote: > >>> We do not want to change what is currently accepted by the > >>> front end. period. > >> > >> On whose behalf are you making such categorical statements? > >> It was Jakub and Richard's suggestion in bug 86714 to reject > >> the undefined excessive initializers and I happen to like > >> the idea. I don't recall anyone making a decision about what > > > > It was not my suggestion and it is already rejected with -pedantic-errors, > > which is all that is needed, reject it in pedantic mode. > > Otherwise there is a warning emitted by default which is enough. > But I think that's a mistake. I think a hard error at this point is > warranted and the safest thing to do. The normal behavior when it isn't an error is that the initializer is truncated. That is what happens if I use struct S { char a[4]; }; struct S r = { "abc" }; struct S s = { "abcd" }; struct S t = { "abcde" }; C says that in the s.a initializer is actually just 'a', 'b', 'c', 'd' rather than 'a', 'b', 'c', 'd', '\0', so there is a silent truncation in the language naturally, so the extension that truncates even more with a warning enabled by default is IMHO natural and doesn't need to be more pedantic. We've always truncated, so there should be no surprises. > > The suggestion was that if we don't reject (and no reason to change when we > > reject it), that we handle it right, which is what your optimization broke. > But it's not always clear what is right. That's my concern. If we had > rules which were clearly right, then applying those rules and continuing > is much more reasonable. Perhaps we haven't written them down, but we've always behaved that way. clang also truncates with a warning enabled by default, only rejects it like gcc with -pedantic-errors. So does icc. Jakub
On 08/03/18 23:15, Jeff Law wrote: > On 07/30/2018 02:21 PM, Bernd Edlinger wrote: >> On 07/30/18 21:52, Martin Sebor wrote: >>> On 07/30/2018 09:24 AM, Bernd Edlinger wrote: >>>> On 07/30/18 01:05, Martin Sebor wrote: >>>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >>>>>> Hi! >>>>>> >>>>>> This fixes two wrong code bugs where string_constant >>>>>> returns over length string constants. Initializers >>>>>> like that are rejected in C++, but valid in C. >>>>> >>>>> If by valid you are referring to declarations like the one in >>>>> the added test: >>>>> >>>>> const char a[2][3] = { "1234", "xyz" }; >>>>> >>>>> then (as I explained), the excess elements in "1234" make >>>>> the char[3] initialization and thus the test case undefined. >>>>> I have resolved bug 86711 as invalid on those grounds. >>>>> >>>>> Bug 86711 has a valid test case that needs to be fixed, along >>>>> with bug 86688 that I raised for the same underlying problem: >>>>> considering the excess nul as part of the string. As has been >>>>> discussed in a separate bug, rather than working around >>>>> the excessively long strings in the middle-end, it would be >>>>> preferable to avoid creating them to begin with. >>>>> >>>>> I'm already working on a fix for bug 86688, in part because >>>>> I introduced the code change and also because I'm making other >>>>> changes in this area -- bug 86552. Both of these in response >>>>> to your comments. >>>>> >>>> >>>> Sorry, I must admit, I have completely lost track on how many things >>>> you are trying to work in parallel. >>>> >>>> Nevertheless I started to review you pr86552 patch here: >>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html >>>> >>>> But so far you did not respond to me. >>>> >>>> Well actually I doubt your patch does apply to trunk, >>>> maybe you start to re-base that one, and post it again >>>> instead? >>> >>> I read your comments and have been busy working on enhancing >>> the patch (among other things). There are a large number of >>> additional contexts where constant strings are expected and >>> where a missing nul needs to be detected. Some include >>> additional instances of strlen calls that my initial patch >>> didn't handle, many more others that involve other string >>> functions. I have posted an updated patch that applies >>> cleanly and that handles the first set. >>> >>> There is also a class of problems involving constant character >>> arrays initialized by a braced list, as in char [] = { x, y, z }; >>> Those are currently not recognized as strings even if they are >>> nul-terminated, but they are far more likely to be a source of >>> these problems than string literals, certainly in C++ where >>> string initializers must fit in the array. I am testing a patch >>> to convert those into STRING_CST so they can be handled as well. >>> >>> Since initializing arrays with more elements than fit is >>> undefined in C and since the behavior is undefined at compile >>> time it seems to me that rejecting such initializers with >>> a hard error (as opposed to a warning) would be appropriate >>> and obviate having to deal with them in the middle-end. >>> >> >> We do not want to change what is currently accepted by the >> front end. period. > ?!? I don't follow this. When we can detect an error, we should issue > a suitable diagnostic. As is mentioned later in the thread, some > diagnostics are considered "pedantic" and are conditional. But I don't > see how you get to "We do not want to change what is currently accepted > by the front end. period." That seems like far too strong of a statement. > What I mean here is: we should fix a middle-end consistency issue with the string constants. When that is done, we can decide to make change a pedantic error into a hard error, but IMHO it should be an independent patch of it's own. By the way, I found that Fortran has non-zero terminated strings with index range starting from 1. Then there are also overlength strings in Fortran, that have excess precision. Should we forbid that Fortran feature as well? Then Ada and Go do not have zero-terminated strings, but I do not know if these can hit the strlen pass. C++ has a -fpermissive option that also allows overlength strings, but all test cases are C only. etc. etc. > I'm not sure I agree with this being a pedantic error only. See below... >> >> But there is no reason why ambiguous string constants >> have to be passed to the middle end. > Agreed -- if we're not outright rejecting, then we should present the > middle end with something reasonable. But.... > >> >> For instance char c[2] = "a\0"; should look like c[1] = "a"; >> while c[2] = "aaa"; should look like c[2] = "aa"; varasm.c >> will cut the excess precision off anyway. > I get the first case. The second is less clear. One could argue that > c[1] should be NUL or one could argue that c[1] should be 'a'. It's > the inability to know what is "more correct" and the security > implications of leaving the string unterminated that drives my opinion > that this should not be a pedantic error, but instead a hard error. > Yes, in the moment I just look at how C works, and try to make that the normal. However I think those details can be discussed. >> >> That is TREE_STRING_LENGTH (str) == 3 and TREE_STRING_POINTER(str) = "aa\0"; > See above. That's going to leave an unterminated string in the > resulting code. That has a negative security impact. > >> >> I propose to have all STRING_CST always be created by the >> FE with explicit nul termination, but the >> TYPE_SIZE_UNIT (TREE_TYPE (str)) >= TREE_STRING_LENGTH(str) in normal case (null-terminated) >> TREE_SIZE_UNIT (TREE_TYPE (str)) < TREE_STRING_LENGTH(str) if non zero terminated, >> truncated in the initializer. >> >> Do you understand what I mean? > I think you're starting from the wrong basis, namely that we shouldn't > be issuing a hard error for excess initializers like this when we don't > have a clear cut best way to handle it. > > jeff >
Hi, I'd like to ping for this patch: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html I will add a new BZ entry for the (minor) regression this patch introduces in gcc.dg/strlenopt-49.c and assign it to myself. Thanks Bernd. On 07/29/18 12:56, Bernd Edlinger wrote: > Hi! > > This fixes two wrong code bugs where string_constant > returns over length string constants. Initializers > like that are rejected in C++, but valid in C. > > I have xfailed strlenopt-49.c, which tests this feature. > Personally I don't think that it is worth the effort to > optimize something that is per se invalid in C++. > > The hunk in builtins.c is unrelated, but I would like > to use tree_to_shwi here, to be in line with the > tree_fits_shwi_p check above, and the rest of that > function which uses signed HWI throughout. > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? > > > Thanks > Bernd.
As I said below, the patch for PR 86552, 86711, 86714 that was first posted on July 19 fixes both of these issues and also diagnoses calls with unterminated strings: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00155.html On 08/12/2018 03:06 AM, Bernd Edlinger wrote: > Hi, > > I'd like to ping for this patch: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html > > I will add a new BZ entry for the (minor) regression this patch > introduces in gcc.dg/strlenopt-49.c and assign it to myself. > > > Thanks > Bernd. > > On 07/29/18 12:56, Bernd Edlinger wrote: >> Hi! >> >> This fixes two wrong code bugs where string_constant >> returns over length string constants. Initializers >> like that are rejected in C++, but valid in C. >> >> I have xfailed strlenopt-49.c, which tests this feature. >> Personally I don't think that it is worth the effort to >> optimize something that is per se invalid in C++. >> >> The hunk in builtins.c is unrelated, but I would like >> to use tree_to_shwi here, to be in line with the >> tree_fits_shwi_p check above, and the rest of that >> function which uses signed HWI throughout. >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? >> >> >> Thanks >> Bernd.
On 08/13/18 16:27, Martin Sebor wrote: > As I said below, the patch for PR 86552, 86711, 86714 that > was first posted on July 19 fixes both of these issues and > also diagnoses calls with unterminated strings: > > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00155.html > Sorry, but you you read my comment on your patch here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00184.html ???? So I am strictly opposed to implementing new warnings now, which just papers over existing design issues in this area. I think the most important problem in the strlen folding is that c_strlen and get_range_strlen do not have the the information what kind of string length they should return, (as measured in char/wchar16/wchar32) Most of the time only length in char makes any sense. Only for sprintf processing a different char type may be requested, but the sprintf code in the middle end does not know which is the wchar type. I have sent a followup patch that passes char size units to c_strlen and get_range_strlen here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00795.html I hope you understand, that I would like to fix design issues before implementing new features, since that will obviously become more difficult when even more features are being implemented without fixing some basics first. Bernd.
On 08/03/2018 03:28 PM, Jakub Jelinek wrote: > On Fri, Aug 03, 2018 at 03:16:41PM -0600, Jeff Law wrote: >> On 07/31/2018 12:33 AM, Jakub Jelinek wrote: >>> On Mon, Jul 30, 2018 at 10:01:38PM -0600, Martin Sebor wrote: >>>>> We do not want to change what is currently accepted by the >>>>> front end. period. >>>> >>>> On whose behalf are you making such categorical statements? >>>> It was Jakub and Richard's suggestion in bug 86714 to reject >>>> the undefined excessive initializers and I happen to like >>>> the idea. I don't recall anyone making a decision about what >>> >>> It was not my suggestion and it is already rejected with -pedantic-errors, >>> which is all that is needed, reject it in pedantic mode. >>> Otherwise there is a warning emitted by default which is enough. >> But I think that's a mistake. I think a hard error at this point is >> warranted and the safest thing to do. > > The normal behavior when it isn't an error is that the initializer is > truncated. That is what happens if I use > struct S { char a[4]; }; > struct S r = { "abc" }; > struct S s = { "abcd" }; > struct S t = { "abcde" }; > > C says that in the s.a initializer is actually just 'a', 'b', > 'c', 'd' rather than 'a', 'b', 'c', 'd', '\0', so there is a silent > truncation in the language naturally, so the extension that truncates even > more with a warning enabled by default is IMHO natural and doesn't need to > be more pedantic. We've always truncated, so there should be no surprises. I wasn't aware C mandated a behavior here. With that in mind I think we truncate per the C semantics and get on with our lives :-) Jeff
On 08/03/2018 03:38 PM, Bernd Edlinger wrote: > On 08/03/18 23:15, Jeff Law wrote: >> On 07/30/2018 02:21 PM, Bernd Edlinger wrote: >>> On 07/30/18 21:52, Martin Sebor wrote: >>>> On 07/30/2018 09:24 AM, Bernd Edlinger wrote: >>>>> On 07/30/18 01:05, Martin Sebor wrote: >>>>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >>>>>>> Hi! >>>>>>> >>>>>>> This fixes two wrong code bugs where string_constant >>>>>>> returns over length string constants. Initializers >>>>>>> like that are rejected in C++, but valid in C. >>>>>> >>>>>> If by valid you are referring to declarations like the one in >>>>>> the added test: >>>>>> >>>>>> const char a[2][3] = { "1234", "xyz" }; >>>>>> >>>>>> then (as I explained), the excess elements in "1234" make >>>>>> the char[3] initialization and thus the test case undefined. >>>>>> I have resolved bug 86711 as invalid on those grounds. >>>>>> >>>>>> Bug 86711 has a valid test case that needs to be fixed, along >>>>>> with bug 86688 that I raised for the same underlying problem: >>>>>> considering the excess nul as part of the string. As has been >>>>>> discussed in a separate bug, rather than working around >>>>>> the excessively long strings in the middle-end, it would be >>>>>> preferable to avoid creating them to begin with. >>>>>> >>>>>> I'm already working on a fix for bug 86688, in part because >>>>>> I introduced the code change and also because I'm making other >>>>>> changes in this area -- bug 86552. Both of these in response >>>>>> to your comments. >>>>>> >>>>> >>>>> Sorry, I must admit, I have completely lost track on how many things >>>>> you are trying to work in parallel. >>>>> >>>>> Nevertheless I started to review you pr86552 patch here: >>>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html >>>>> >>>>> But so far you did not respond to me. >>>>> >>>>> Well actually I doubt your patch does apply to trunk, >>>>> maybe you start to re-base that one, and post it again >>>>> instead? >>>> >>>> I read your comments and have been busy working on enhancing >>>> the patch (among other things). There are a large number of >>>> additional contexts where constant strings are expected and >>>> where a missing nul needs to be detected. Some include >>>> additional instances of strlen calls that my initial patch >>>> didn't handle, many more others that involve other string >>>> functions. I have posted an updated patch that applies >>>> cleanly and that handles the first set. >>>> >>>> There is also a class of problems involving constant character >>>> arrays initialized by a braced list, as in char [] = { x, y, z }; >>>> Those are currently not recognized as strings even if they are >>>> nul-terminated, but they are far more likely to be a source of >>>> these problems than string literals, certainly in C++ where >>>> string initializers must fit in the array. I am testing a patch >>>> to convert those into STRING_CST so they can be handled as well. >>>> >>>> Since initializing arrays with more elements than fit is >>>> undefined in C and since the behavior is undefined at compile >>>> time it seems to me that rejecting such initializers with >>>> a hard error (as opposed to a warning) would be appropriate >>>> and obviate having to deal with them in the middle-end. >>>> >>> >>> We do not want to change what is currently accepted by the >>> front end. period. >> ?!? I don't follow this. When we can detect an error, we should issue >> a suitable diagnostic. As is mentioned later in the thread, some >> diagnostics are considered "pedantic" and are conditional. But I don't >> see how you get to "We do not want to change what is currently accepted >> by the front end. period." That seems like far too strong of a statement. >> > > What I mean here is: we should fix a middle-end consistency issue with > the string constants. When that is done, we can decide to make > change a pedantic error into a hard error, but IMHO it should be an independent > patch of it's own. Note Jakub has indicated why changing this to a hard error would be a mistake and what the semantics ought to be in this scenario. so ignore what I said about turning this into a hard error. I think we do truncation per the defined semantics and present the middle end with "canonicalized" literals. > > By the way, I found that Fortran has non-zero terminated strings with > index range starting from 1. > Then there are also overlength strings in Fortran, that have excess precision. > Should we forbid that Fortran feature as well? I have no clue. > > Then Ada and Go do not have zero-terminated strings, but I do not know > if these can hit the strlen pass. I wouldn't think so, but maybe in code that's meant to interoperate with C. Or if they implemented a C looking strlen and GCC recognized it and turned it into a strlen call. I don't think we do anything like that now, but you never know when someone might add it. jeff
Hi! After the other patch has been applied, I re-based this patch accordingly. Except the mechanical changes, there are a few notable differences to the previous version: In string_constant, I added a similar check for the STRING_CSTs because when callers don't use mem_size, they assume to be able to read "TREE_STRING_LENGTH (array)" bytes, but that is not always the case, for languages that don't always use zero-terminated strings, for instance hollerith strings in fortran. --- gcc/expr.c 2018-08-17 05:32:57.332211963 +0200 +++ gcc/expr.c 2018-08-16 23:08:23.544940795 +0200 @@ -11372,6 +11372,9 @@ string_constant (tree arg, tree *ptr_off *ptr_offset = fold_convert (sizetype, offset); if (mem_size) *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (array)); + else if (compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (array)), + TREE_STRING_LENGTH (array)) < 0) + return NULL_TREE; return array; } The range check in c_getstr was refined as well: This I added, because vla arrays can be initialized with string constants, especially since the 71625 patch was installed: In this case we end up with mem_size that fails to be constant. @@ -14606,25 +14603,17 @@ c_getstr (tree src, unsigned HOST_WIDE_I offset = tree_to_uhwi (offset_node); } + if (!tree_fits_uhwi_p (mem_size)) + return NULL; + /* STRING_LENGTH is the size of the string literal, including any embedded NULs. STRING_SIZE is the size of the array the string literal is stored in. */ Also the rest of the string length checks are refined, to return actually zero-terminated single byte strings when strlen is not given, and return something not necessarily zero-terminated which is suitable for memxxx-functions otherwise. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. gcc: 2018-08-17 Bernd Edlinger <bernd.edlinger@hotmail.de> PR middle-end/86711 PR middle-end/86714 * expr.c (string_constant): Don't return truncated string literals. * fold-const.c (c_getstr): Fix function comment. Remove unused third argument. Fix range checks. * fold-const.c (c_getstr): Adjust protoype. testsuite: 2018-08-17 Bernd Edlinger <bernd.edlinger@hotmail.de> PR middle-end/86711 PR middle-end/86714 * gcc.c-torture/execute/pr86711.c: New test. * gcc.c-torture/execute/pr86714.c: New test. diff -Npur gcc/expr.c gcc/expr.c --- gcc/expr.c 2018-08-17 05:32:57.332211963 +0200 +++ gcc/expr.c 2018-08-16 23:08:23.544940795 +0200 @@ -11372,6 +11372,9 @@ string_constant (tree arg, tree *ptr_off *ptr_offset = fold_convert (sizetype, offset); if (mem_size) *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (array)); + else if (compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (array)), + TREE_STRING_LENGTH (array)) < 0) + return NULL_TREE; return array; } @@ -11414,26 +11417,10 @@ string_constant (tree arg, tree *ptr_off if (!init || TREE_CODE (init) != STRING_CST) return NULL_TREE; - tree array_size = DECL_SIZE_UNIT (array); - if (!array_size || TREE_CODE (array_size) != INTEGER_CST) - return NULL_TREE; - - /* Avoid returning a string that doesn't fit in the array - it is stored in, like - const char a[4] = "abcde"; - but do handle those that fit even if they have excess - initializers, such as in - const char a[4] = "abc\000\000"; - The excess elements contribute to TREE_STRING_LENGTH() - but not to strlen(). */ - unsigned HOST_WIDE_INT charsize - = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (init)))); - unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init); - length = string_length (TREE_STRING_POINTER (init), charsize, - length / charsize); if (mem_size) *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (init)); - else if (compare_tree_int (array_size, length + 1) < 0) + else if (compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), + TREE_STRING_LENGTH (init)) < 0) return NULL_TREE; *ptr_offset = offset; diff -Npur gcc/fold-const.c gcc/fold-const.c --- gcc/fold-const.c 2018-07-16 08:49:39.000000000 +0200 +++ gcc/fold-const.c 2018-08-16 23:31:11.490869136 +0200 @@ -14577,23 +14577,20 @@ fold_build_pointer_plus_hwi_loc (locatio /* Return a pointer P to a NUL-terminated string representing the sequence of constant characters referred to by SRC (or a subsequence of such characters within it if SRC is a reference to a string plus some - constant offset). If STRLEN is non-null, store stgrlen(P) in *STRLEN. - If STRSIZE is non-null, store in *STRSIZE the size of the array - the string is stored in; in that case, even though P points to a NUL - terminated string, SRC need not refer to one. This can happen when - SRC refers to a constant character array initialized to all non-NUL - values, as in the C declaration: char a[4] = "1234"; */ + constant offset). If STRLEN is non-null, store the number of bytes + in the string constant including the terminating NUL char. *STRLEN is + typically strlen(P) + 1 in the absence of embedded NUL characters. */ const char * -c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */, - unsigned HOST_WIDE_INT *strsize /* = NULL */) +c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */) { tree offset_node; + tree mem_size; if (strlen) *strlen = 0; - src = string_constant (src, &offset_node); + src = string_constant (src, &offset_node, &mem_size); if (src == 0) return NULL; @@ -14606,25 +14603,17 @@ c_getstr (tree src, unsigned HOST_WIDE_I offset = tree_to_uhwi (offset_node); } + if (!tree_fits_uhwi_p (mem_size)) + return NULL; + /* STRING_LENGTH is the size of the string literal, including any embedded NULs. STRING_SIZE is the size of the array the string literal is stored in. */ unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src); - unsigned HOST_WIDE_INT string_size = string_length; - tree type = TREE_TYPE (src); - if (tree size = TYPE_SIZE_UNIT (type)) - if (tree_fits_shwi_p (size)) - string_size = tree_to_uhwi (size); + unsigned HOST_WIDE_INT string_size = tree_to_uhwi (mem_size); - if (strlen) - { - /* Compute and store the length of the substring at OFFSET. - All offsets past the initial length refer to null strings. */ - if (offset <= string_length) - *strlen = string_length - offset; - else - *strlen = 0; - } + if (string_length > string_size) + string_length = string_size; const char *string = TREE_STRING_POINTER (src); @@ -14632,21 +14621,26 @@ c_getstr (tree src, unsigned HOST_WIDE_I || offset >= string_size) return NULL; - if (strsize) + if (strlen) { - /* Support even constant character arrays that aren't proper - NUL-terminated strings. */ - *strsize = string_size; + /* Compute and store the length of the substring at OFFSET. + All offsets past the initial length refer to null strings. */ + if (offset < string_length) + *strlen = string_length - offset; + else + *strlen = 1; } - else if (string[string_length - 1] != '\0') + else { - /* Support only properly NUL-terminated strings but handle - consecutive strings within the same array, such as the six - substrings in "1\0002\0003". */ - return NULL; + tree eltype = TREE_TYPE (TREE_TYPE (src)); + /* Support only properly NUL-terminated single byte strings. */ + if (tree_to_uhwi (TYPE_SIZE_UNIT (eltype)) != 1) + return NULL; + if (string[string_length - 1] != '\0') + return NULL; } - return offset <= string_length ? string + offset : ""; + return offset < string_length ? string + offset : ""; } /* Given a tree T, compute which bits in T may be nonzero. */ diff -Npur gcc/fold-const.h gcc/fold-const.h --- gcc/fold-const.h 2018-07-16 08:49:39.000000000 +0200 +++ gcc/fold-const.h 2018-08-16 22:38:49.962205027 +0200 @@ -187,8 +187,7 @@ extern bool expr_not_equal_to (tree t, c extern tree const_unop (enum tree_code, tree, tree); extern tree const_binop (enum tree_code, tree, tree, tree); extern bool negate_mathfn_p (combined_fn); -extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL, - unsigned HOST_WIDE_INT * = NULL); +extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL); extern wide_int tree_nonzero_bits (const_tree); /* Return OFF converted to a pointer offset type suitable as offset for diff -Npur gcc/testsuite/gcc.c-torture/execute/pr86711.c gcc/testsuite/gcc.c-torture/execute/pr86711.c --- gcc/testsuite/gcc.c-torture/execute/pr86711.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr86711.c 2018-08-16 22:38:49.963205014 +0200 @@ -0,0 +1,11 @@ +/* PR middle-end/86711 */ + +static const char a[2][4] = { "1234", "5678" }; + +int main () +{ + void *p = __builtin_memchr (a, 0, 5); + + if (p) + __builtin_abort (); +} diff -Npur gcc/testsuite/gcc.c-torture/execute/pr86714.c gcc/testsuite/gcc.c-torture/execute/pr86714.c --- gcc/testsuite/gcc.c-torture/execute/pr86714.c 1970-01-01 01:00:00.000000000 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr86714.c 2018-08-16 22:38:49.963205014 +0200 @@ -0,0 +1,12 @@ +/* PR middle-end/86714 */ + +const char a[2][3] = { "1234", "xyz" }; +char b[6]; + +int main () +{ + __builtin_memcpy (b, a, 4); + __builtin_memset (b + 4, 'a', 2); + if (__builtin_memcmp (b, "123xaa", 6)) + __builtin_abort (); +}
On 08/17/2018 03:14 AM, Bernd Edlinger wrote: > Hi! > > > After the other patch has been applied, I re-based this patch accordingly. > > Except the mechanical changes, there are a few notable differences to the > previous version: > > In string_constant, I added a similar check for the STRING_CSTs > because when callers don't use mem_size, they assume to be > able to read "TREE_STRING_LENGTH (array)" bytes, but that is > not always the case, for languages that don't always use > zero-terminated strings, for instance hollerith strings in fortran. > > --- gcc/expr.c 2018-08-17 05:32:57.332211963 +0200 > +++ gcc/expr.c 2018-08-16 23:08:23.544940795 +0200 > @@ -11372,6 +11372,9 @@ string_constant (tree arg, tree *ptr_off > *ptr_offset = fold_convert (sizetype, offset); > if (mem_size) > *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (array)); > + else if (compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (array)), > + TREE_STRING_LENGTH (array)) < 0) > + return NULL_TREE; > return array; > } > Yes. I purposefully didn't include this change in its entirety as it was dependent upon your earlier patch. Instead I twiddled your patch so that it applied on the current trunk and didn't regress anything while keeping the core concept you were introducing. I'm also confident that change breaks one or more tests in the testsuite. I'm not sure why you didn't see the regression. But I certainly saw it with the variant shown above. Anyway, the basic idea was to carve out the basic concept of your patch to allow callers to specify how to count without regressing the trunk. That allows us to carve out an issue, resolve it and move on. The interdependent and conflicting patches are a nightmare to sort out. > > The range check in c_getstr was refined as well: > > This I added, because vla arrays can be initialized with string constants, > especially since the 71625 patch was installed: > In this case we end up with mem_size that fails to be constant. > > > @@ -14606,25 +14603,17 @@ c_getstr (tree src, unsigned HOST_WIDE_I > offset = tree_to_uhwi (offset_node); > } > > + if (!tree_fits_uhwi_p (mem_size)) > + return NULL; > + > /* STRING_LENGTH is the size of the string literal, including any > embedded NULs. STRING_SIZE is the size of the array the string > literal is stored in. */ > > Also the rest of the string length checks are refined, to return > actually zero-terminated single byte strings when strlen is not given, > and return something not necessarily zero-terminated which is > suitable for memxxx-functions otherwise. > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? Not yet. There's a lot to go over here and I haven't finished reviewing all the discussions around 86711/86714. Jeff
On 08/18/18 06:01, Jeff Law wrote: > On 08/17/2018 03:14 AM, Bernd Edlinger wrote: >> Hi! >> >> >> After the other patch has been applied, I re-based this patch accordingly. >> >> Except the mechanical changes, there are a few notable differences to the >> previous version: >> >> In string_constant, I added a similar check for the STRING_CSTs >> because when callers don't use mem_size, they assume to be >> able to read "TREE_STRING_LENGTH (array)" bytes, but that is >> not always the case, for languages that don't always use >> zero-terminated strings, for instance hollerith strings in fortran. >> >> --- gcc/expr.c 2018-08-17 05:32:57.332211963 +0200 >> +++ gcc/expr.c 2018-08-16 23:08:23.544940795 +0200 >> @@ -11372,6 +11372,9 @@ string_constant (tree arg, tree *ptr_off >> *ptr_offset = fold_convert (sizetype, offset); >> if (mem_size) >> *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (array)); >> + else if (compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (array)), >> + TREE_STRING_LENGTH (array)) < 0) >> + return NULL_TREE; >> return array; >> } >> > Yes. I purposefully didn't include this change in its entirety as it > was dependent upon your earlier patch. Instead I twiddled your patch > so that it applied on the current trunk and didn't regress anything > while keeping the core concept you were introducing. > That's a more or less theoretical possibility that I just considered useful for symmetry, and maybe Ada/Go strings. So it should be completely impossible to have this change anything in C (hopefully!!!). If it would happen, that would probably be a bug that needs to be fixed. > > I'm also confident that change breaks one or more tests in the > testsuite. I'm not sure why you didn't see the regression. But I > certainly saw it with the variant shown above. > I tested the patch against an older version of the trunk, but today, I repeated the regression test against r263644. This time I attach the test results for your reference: gcc-trunk-263644-test.txt: unchanged r263644 gcc-trunk-263644-0-test.txt: r263644 + this patch gcc-trunk-263644-1-test.txt: r263644 + this patch without string_constant As you can see, these regressions are already in r263644: FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.target/i386/20040112-1.c scan-assembler testb I have never seen those before, so they are brand new. FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 1\\n FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 3\\n FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 4\\n these were already there on monday, but not last week. The test result with this patch as is does not change anything. But what's surprising, is this: --- gcc-trunk-r263644-test.txt 2018-08-18 13:49:27.214851609 +0200 +++ gcc-trunk-r263644-1-test.txt 2018-08-18 13:49:27.070851601 +0200 @@ -18,6 +18,12 @@ Running target unix +FAIL: gcc.c-torture/execute/pr86714.c -O1 execution test +FAIL: gcc.c-torture/execute/pr86714.c -O2 execution test +FAIL: gcc.c-torture/execute/pr86714.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test +FAIL: gcc.c-torture/execute/pr86714.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test +FAIL: gcc.c-torture/execute/pr86714.c -O3 -g execution test +FAIL: gcc.c-torture/execute/pr86714.c -Os execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 -flto -fno-use-linker-plugin -flto-partiti There is another regression in gcc-trunk-r263644-1-test.txt only which is likely only noise that happens randomly and can be ignored: === libgo Summary for unix === # of expected passes 163 Running target unix/-m32 FAIL: sync The result is very interesting, and is probably what you were looking for: The test was run with both test cases, but only pr86714 failed, so the change in c_getstr Fixes pr86711, while the change in string_constant fixes pr86714 (or both). The patch could actually be split between pr86711 and pr86714. The c_getstr / p86711 should be applied before string_constant / pr86714. Thanks Bernd. > Anyway, the basic idea was to carve out the basic concept of your patch > to allow callers to specify how to count without regressing the trunk. > > That allows us to carve out an issue, resolve it and move on. The > interdependent and conflicting patches are a nightmare to sort out. > >> >> The range check in c_getstr was refined as well: >> >> This I added, because vla arrays can be initialized with string constants, >> especially since the 71625 patch was installed: >> In this case we end up with mem_size that fails to be constant. >> >> >> @@ -14606,25 +14603,17 @@ c_getstr (tree src, unsigned HOST_WIDE_I >> offset = tree_to_uhwi (offset_node); >> } >> >> + if (!tree_fits_uhwi_p (mem_size)) >> + return NULL; >> + >> /* STRING_LENGTH is the size of the string literal, including any >> embedded NULs. STRING_SIZE is the size of the array the string >> literal is stored in. */ >> >> Also the rest of the string length checks are refined, to return >> actually zero-terminated single byte strings when strlen is not given, >> and return something not necessarily zero-terminated which is >> suitable for memxxx-functions otherwise. > >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > Not yet. There's a lot to go over here and I haven't finished reviewing > all the discussions around 86711/86714. > > > Jeff > cat <<'EOF' | === acats tests === === acats Summary === # of expected passes 2320 # of unexpected failures 0 Native configuration is x86_64-pc-linux-gnu === brig tests === Running target unix === brig Summary === # of unsupported tests 1 === gcc tests === Running target unix FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.target/i386/20040112-1.c scan-assembler testb === gcc Summary === # of expected passes 133752 # of unexpected failures 7 # of expected failures 422 # of unsupported tests 2104 /home/ed/gnu/gcc-build-0/gcc/xgcc version 9.0.0 20180818 (experimental) (GCC) === gfortran tests === Running target unix === gfortran Summary === # of expected passes 47667 # of expected failures 104 # of unsupported tests 81 /home/ed/gnu/gcc-build-0/gcc/testsuite/gfortran/../../gfortran version 9.0.0 20180818 (experimental) (GCC) === g++ tests === Running target unix FAIL: g++.dg/pr80481.C -std=gnu++11 scan-assembler-not vmovaps FAIL: g++.dg/pr80481.C -std=gnu++14 scan-assembler-not vmovaps FAIL: g++.dg/pr80481.C -std=gnu++98 scan-assembler-not vmovaps FAIL: g++.dg/pr83239.C -std=gnu++98 (test for excess errors) FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test === g++ Summary === # of expected passes 126700 # of unexpected failures 10 # of expected failures 526 # of unsupported tests 4993 /home/ed/gnu/gcc-build-0/gcc/testsuite/g++/../../xg++ version 9.0.0 20180818 (experimental) (GCC) === gnat tests === Running target unix FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 1\\n FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 3\\n FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 4\\n === gnat Summary === # of expected passes 2872 # of unexpected failures 3 # of expected failures 24 # of unsupported tests 3 /home/ed/gnu/gcc-build-0/gcc/gnatmake version 9.0.0 20180818 (experimental) === go tests === Running target unix FAIL: ./index0-out.go execution, -O0 -g -fno-var-tracking-assignments === go Summary === # of expected passes 7285 # of unexpected failures 1 # of expected failures 1 # of untested testcases 6 # of unsupported tests 1 /home/ed/gnu/gcc-build-0/gcc/testsuite/go/../../gccgo version 9.0.0 20180818 (experimental) (GCC) === obj-c++ tests === Running target unix === obj-c++ Summary === # of expected passes 1456 # of expected failures 10 # of unsupported tests 77 /home/ed/gnu/gcc-build-0/gcc/testsuite/obj-c++/../../xg++ version 9.0.0 20180818 (experimental) (GCC) === objc tests === Running target unix === objc Summary === # of expected passes 2797 # of expected failures 6 # of unsupported tests 68 /home/ed/gnu/gcc-build-0/gcc/xgcc version 9.0.0 20180818 (experimental) (GCC) === gotools tests === === gotools Summary === # of expected passes 389 # of untested testcases 190 /home/ed/gnu/gcc-build-0/./gcc/gccgo version 9.0.0 20180818 (experimental) (GCC) === libatomic tests === Running target unix === libatomic Summary === # of expected passes 54 === libffi tests === Running target unix === libffi Summary === # of expected passes 2214 === libgo tests === Running target unix FAIL: sync === libgo Summary for unix === # of expected passes 162 # of unexpected failures 1 Running target unix/-m32 === libgo Summary for unix/-m32 === # of expected passes 163 === libgo Summary === # of expected passes 325 # of unexpected failures 1 /home/ed/gnu/gcc-build-0/./gcc/gccgo version 9.0.0 20180818 (experimental) (GCC) === libgomp tests === Running target unix UNRESOLVED: libgomp.oacc-c++/non-scalar-data.C -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -O2 compilation failed to produce executable === libgomp Summary === # of expected passes 6052 # of expected failures 3 # of unresolved testcases 1 # of unsupported tests 322 === libitm tests === Running target unix === libitm Summary === # of expected passes 42 # of expected failures 3 # of unsupported tests 1 === libstdc++ tests === Running target unix === libstdc++ Summary === # of expected passes 12647 # of expected failures 77 # of unsupported tests 582 Compiler version: 9.0.0 20180818 (experimental) (GCC) Platform: x86_64-pc-linux-gnu configure flags: --prefix=/home/ed/gnu/install --enable-languages=all EOF Mail -s "Results for 9.0.0 20180818 (experimental) (GCC) testsuite on x86_64-pc-linux-gnu" gcc-testresults@gcc.gnu.org && true cat <<'EOF' | === acats tests === === acats Summary === # of expected passes 2320 # of unexpected failures 0 Native configuration is x86_64-pc-linux-gnu === brig tests === Running target unix === brig Summary === # of unsupported tests 1 === gcc tests === Running target unix FAIL: gcc.c-torture/execute/pr86714.c -O1 execution test FAIL: gcc.c-torture/execute/pr86714.c -O2 execution test FAIL: gcc.c-torture/execute/pr86714.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr86714.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.c-torture/execute/pr86714.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr86714.c -Os execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.target/i386/20040112-1.c scan-assembler testb === gcc Summary === # of expected passes 133746 # of unexpected failures 13 # of expected failures 422 # of unsupported tests 2104 /home/ed/gnu/gcc-build-1/gcc/xgcc version 9.0.0 20180818 (experimental) (GCC) === gfortran tests === Running target unix === gfortran Summary === # of expected passes 47667 # of expected failures 104 # of unsupported tests 81 /home/ed/gnu/gcc-build-1/gcc/testsuite/gfortran/../../gfortran version 9.0.0 20180818 (experimental) (GCC) === g++ tests === Running target unix FAIL: g++.dg/pr80481.C -std=gnu++11 scan-assembler-not vmovaps FAIL: g++.dg/pr80481.C -std=gnu++14 scan-assembler-not vmovaps FAIL: g++.dg/pr80481.C -std=gnu++98 scan-assembler-not vmovaps FAIL: g++.dg/pr83239.C -std=gnu++98 (test for excess errors) FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test === g++ Summary === # of expected passes 126700 # of unexpected failures 10 # of expected failures 526 # of unsupported tests 4993 /home/ed/gnu/gcc-build-1/gcc/testsuite/g++/../../xg++ version 9.0.0 20180818 (experimental) (GCC) === gnat tests === Running target unix FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 1\\n FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 3\\n FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 4\\n === gnat Summary === # of expected passes 2872 # of unexpected failures 3 # of expected failures 24 # of unsupported tests 3 /home/ed/gnu/gcc-build-1/gcc/gnatmake version 9.0.0 20180818 (experimental) === go tests === Running target unix FAIL: ./index0-out.go execution, -O0 -g -fno-var-tracking-assignments === go Summary === # of expected passes 7285 # of unexpected failures 1 # of expected failures 1 # of untested testcases 6 # of unsupported tests 1 /home/ed/gnu/gcc-build-1/gcc/testsuite/go/../../gccgo version 9.0.0 20180818 (experimental) (GCC) === obj-c++ tests === Running target unix === obj-c++ Summary === # of expected passes 1456 # of expected failures 10 # of unsupported tests 77 /home/ed/gnu/gcc-build-1/gcc/testsuite/obj-c++/../../xg++ version 9.0.0 20180818 (experimental) (GCC) === objc tests === Running target unix === objc Summary === # of expected passes 2797 # of expected failures 6 # of unsupported tests 68 /home/ed/gnu/gcc-build-1/gcc/xgcc version 9.0.0 20180818 (experimental) (GCC) === gotools tests === === gotools Summary === # of expected passes 389 # of untested testcases 190 /home/ed/gnu/gcc-build-1/./gcc/gccgo version 9.0.0 20180818 (experimental) (GCC) === libatomic tests === Running target unix === libatomic Summary === # of expected passes 54 === libffi tests === Running target unix === libffi Summary === # of expected passes 2214 === libgo tests === Running target unix === libgo Summary for unix === # of expected passes 163 Running target unix/-m32 FAIL: sync === libgo Summary for unix/-m32 === # of expected passes 162 # of unexpected failures 1 === libgo Summary === # of expected passes 325 # of unexpected failures 1 /home/ed/gnu/gcc-build-1/./gcc/gccgo version 9.0.0 20180818 (experimental) (GCC) === libgomp tests === Running target unix UNRESOLVED: libgomp.oacc-c++/non-scalar-data.C -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -O2 compilation failed to produce executable === libgomp Summary === # of expected passes 6052 # of expected failures 3 # of unresolved testcases 1 # of unsupported tests 322 === libitm tests === Running target unix === libitm Summary === # of expected passes 42 # of expected failures 3 # of unsupported tests 1 === libstdc++ tests === Running target unix === libstdc++ Summary === # of expected passes 12647 # of expected failures 77 # of unsupported tests 582 Compiler version: 9.0.0 20180818 (experimental) (GCC) Platform: x86_64-pc-linux-gnu configure flags: --prefix=/home/ed/gnu/install --enable-languages=all EOF Mail -s "Results for 9.0.0 20180818 (experimental) (GCC) testsuite on x86_64-pc-linux-gnu" gcc-testresults@gcc.gnu.org && true cat <<'EOF' | === acats tests === === acats Summary === # of expected passes 2320 # of unexpected failures 0 Native configuration is x86_64-pc-linux-gnu === brig tests === Running target unix === brig Summary === # of unsupported tests 1 === gcc tests === Running target unix FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.target/i386/20040112-1.c scan-assembler testb === gcc Summary === # of expected passes 133724 # of unexpected failures 7 # of expected failures 422 # of unsupported tests 2104 /home/ed/gnu/gcc-build/gcc/xgcc version 9.0.0 20180818 (experimental) (GCC) === gfortran tests === Running target unix === gfortran Summary === # of expected passes 47667 # of expected failures 104 # of unsupported tests 81 /home/ed/gnu/gcc-build/gcc/testsuite/gfortran/../../gfortran version 9.0.0 20180818 (experimental) (GCC) === g++ tests === Running target unix FAIL: g++.dg/pr80481.C -std=gnu++11 scan-assembler-not vmovaps FAIL: g++.dg/pr80481.C -std=gnu++14 scan-assembler-not vmovaps FAIL: g++.dg/pr80481.C -std=gnu++98 scan-assembler-not vmovaps FAIL: g++.dg/pr83239.C -std=gnu++98 (test for excess errors) FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O0 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 execution test FAIL: c-c++-common/torture/builtin-arith-overflow-p-17.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test === g++ Summary === # of expected passes 126700 # of unexpected failures 10 # of expected failures 526 # of unsupported tests 4993 /home/ed/gnu/gcc-build/gcc/testsuite/g++/../../xg++ version 9.0.0 20180818 (experimental) (GCC) === gnat tests === Running target unix FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 1\\n FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 3\\n FAIL: gnat.dg/dinst.adb scan-assembler loc [0-9] 5 [0-9]( is_stmt [0-9])? discriminator 4\\n === gnat Summary === # of expected passes 2872 # of unexpected failures 3 # of expected failures 24 # of unsupported tests 3 /home/ed/gnu/gcc-build/gcc/gnatmake version 9.0.0 20180818 (experimental) === go tests === Running target unix FAIL: ./index0-out.go execution, -O0 -g -fno-var-tracking-assignments === go Summary === # of expected passes 7285 # of unexpected failures 1 # of expected failures 1 # of untested testcases 6 # of unsupported tests 1 /home/ed/gnu/gcc-build/gcc/testsuite/go/../../gccgo version 9.0.0 20180818 (experimental) (GCC) === obj-c++ tests === Running target unix === obj-c++ Summary === # of expected passes 1456 # of expected failures 10 # of unsupported tests 77 /home/ed/gnu/gcc-build/gcc/testsuite/obj-c++/../../xg++ version 9.0.0 20180818 (experimental) (GCC) === objc tests === Running target unix === objc Summary === # of expected passes 2797 # of expected failures 6 # of unsupported tests 68 /home/ed/gnu/gcc-build/gcc/xgcc version 9.0.0 20180818 (experimental) (GCC) === gotools tests === === gotools Summary === # of expected passes 389 # of untested testcases 190 /home/ed/gnu/gcc-build/./gcc/gccgo version 9.0.0 20180818 (experimental) (GCC) === libatomic tests === Running target unix === libatomic Summary === # of expected passes 54 === libffi tests === Running target unix === libffi Summary === # of expected passes 2214 === libgo tests === Running target unix === libgo Summary for unix === # of expected passes 163 Running target unix/-m32 === libgo Summary for unix/-m32 === # of expected passes 163 === libgo Summary === # of expected passes 326 /home/ed/gnu/gcc-build/./gcc/gccgo version 9.0.0 20180818 (experimental) (GCC) === libgomp tests === Running target unix UNRESOLVED: libgomp.oacc-c++/non-scalar-data.C -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -O2 compilation failed to produce executable === libgomp Summary === # of expected passes 6052 # of expected failures 3 # of unresolved testcases 1 # of unsupported tests 322 === libitm tests === Running target unix === libitm Summary === # of expected passes 42 # of expected failures 3 # of unsupported tests 1 === libstdc++ tests === Running target unix === libstdc++ Summary === # of expected passes 12647 # of expected failures 77 # of unsupported tests 582 Compiler version: 9.0.0 20180818 (experimental) (GCC) Platform: x86_64-pc-linux-gnu configure flags: --prefix=/home/ed/gnu/install --enable-languages=all EOF Mail -s "Results for 9.0.0 20180818 (experimental) (GCC) testsuite on x86_64-pc-linux-gnu" gcc-testresults@gcc.gnu.org && true
On 08/17/2018 03:14 AM, Bernd Edlinger wrote: > Hi! > > > After the other patch has been applied, I re-based this patch accordingly. > > Except the mechanical changes, there are a few notable differences to the > previous version: > > In string_constant, I added a similar check for the STRING_CSTs > because when callers don't use mem_size, they assume to be > able to read "TREE_STRING_LENGTH (array)" bytes, but that is > not always the case, for languages that don't always use > zero-terminated strings, for instance hollerith strings in fortran. > > --- gcc/expr.c 2018-08-17 05:32:57.332211963 +0200 > +++ gcc/expr.c 2018-08-16 23:08:23.544940795 +0200 > @@ -11372,6 +11372,9 @@ string_constant (tree arg, tree *ptr_off > *ptr_offset = fold_convert (sizetype, offset); > if (mem_size) > *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (array)); > + else if (compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (array)), > + TREE_STRING_LENGTH (array)) < 0) > + return NULL_TREE; > return array; > } > > > The range check in c_getstr was refined as well: > > This I added, because vla arrays can be initialized with string constants, > especially since the 71625 patch was installed: > In this case we end up with mem_size that fails to be constant. > > > @@ -14606,25 +14603,17 @@ c_getstr (tree src, unsigned HOST_WIDE_I > offset = tree_to_uhwi (offset_node); > } > > + if (!tree_fits_uhwi_p (mem_size)) > + return NULL; > + > /* STRING_LENGTH is the size of the string literal, including any > embedded NULs. STRING_SIZE is the size of the array the string > literal is stored in. */ > > Also the rest of the string length checks are refined, to return > actually zero-terminated single byte strings when strlen is not given, > and return something not necessarily zero-terminated which is > suitable for memxxx-functions otherwise. > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? > > > Thanks > Bernd. > > > patch-pr86711.diff > > > gcc: > 2018-08-17 Bernd Edlinger <bernd.edlinger@hotmail.de> > > PR middle-end/86711 > PR middle-end/86714 > * expr.c (string_constant): Don't return truncated string literals. > * fold-const.c (c_getstr): Fix function comment. Remove unused third > argument. Fix range checks. > * fold-const.c (c_getstr): Adjust protoype. > > testsuite: > 2018-08-17 Bernd Edlinger <bernd.edlinger@hotmail.de> > > PR middle-end/86711 > PR middle-end/86714 > * gcc.c-torture/execute/pr86711.c: New test. > * gcc.c-torture/execute/pr86714.c: New test. Note that Martin's patch covers both these tests in slightly better ways. Jeff
Index: gcc/builtins.c =================================================================== --- gcc/builtins.c (revision 263045) +++ gcc/builtins.c (working copy) @@ -617,7 +617,7 @@ c_strlen (tree src, int only_value) if (tree size = TYPE_SIZE_UNIT (type)) if (tree_fits_shwi_p (size)) { - maxelts = tree_to_uhwi (size); + maxelts = tree_to_shwi (size); maxelts = maxelts / eltsize - 1; } Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 263045) +++ gcc/expr.c (working copy) @@ -11410,24 +11410,14 @@ string_constant (tree arg, tree *ptr_offset) if (!init || TREE_CODE (init) != STRING_CST) return NULL_TREE; - tree array_size = DECL_SIZE_UNIT (array); - if (!array_size || TREE_CODE (array_size) != INTEGER_CST) - return NULL_TREE; - /* Avoid returning a string that doesn't fit in the array - it is stored in, like - const char a[4] = "abcde"; - but do handle those that fit even if they have excess - initializers, such as in - const char a[4] = "abc\000\000"; - The excess elements contribute to TREE_STRING_LENGTH() - but not to strlen(). */ - unsigned HOST_WIDE_INT charsize - = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (init)))); - unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init); - length = string_length (TREE_STRING_POINTER (init), charsize, - length / charsize); - if (compare_tree_int (array_size, length + 1) < 0) + it is stored in, like: + const char a[4] = "abcd"; + because callers expect to be able to access the string + up to the limit imposed by TREE_STRING_LENGTH which + always includes the terminating NUL char. */ + if (compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), + TREE_STRING_LENGTH (init)) < 0) return NULL_TREE; *ptr_offset = offset; Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c (revision 263045) +++ gcc/fold-const.c (working copy) @@ -14577,16 +14577,12 @@ fold_build_pointer_plus_hwi_loc (location_t loc, t /* Return a pointer P to a NUL-terminated string representing the sequence of constant characters referred to by SRC (or a subsequence of such characters within it if SRC is a reference to a string plus some - constant offset). If STRLEN is non-null, store stgrlen(P) in *STRLEN. - If STRSIZE is non-null, store in *STRSIZE the size of the array - the string is stored in; in that case, even though P points to a NUL - terminated string, SRC need not refer to one. This can happen when - SRC refers to a constant character array initialized to all non-NUL - values, as in the C declaration: char a[4] = "1234"; */ + constant offset). If STRLEN is non-null, store the number of bytes + in the string constant including the terminating NUL char. *STRLEN is + typically strlen(P) + 1 in the absence of embedded NUL characters. */ const char * -c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */, - unsigned HOST_WIDE_INT *strsize /* = NULL */) +c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */) { tree offset_node; @@ -14613,40 +14609,24 @@ const char * unsigned HOST_WIDE_INT string_size = string_length; tree type = TREE_TYPE (src); if (tree size = TYPE_SIZE_UNIT (type)) - if (tree_fits_shwi_p (size)) + if (tree_fits_uhwi_p (size)) string_size = tree_to_uhwi (size); + if (offset >= string_size) + return NULL; + if (strlen) { /* Compute and store the length of the substring at OFFSET. All offsets past the initial length refer to null strings. */ - if (offset <= string_length) + if (offset < string_length) *strlen = string_length - offset; else - *strlen = 0; + *strlen = 1; } const char *string = TREE_STRING_POINTER (src); - - if (string_length == 0 - || offset >= string_size) - return NULL; - - if (strsize) - { - /* Support even constant character arrays that aren't proper - NUL-terminated strings. */ - *strsize = string_size; - } - else if (string[string_length - 1] != '\0') - { - /* Support only properly NUL-terminated strings but handle - consecutive strings within the same array, such as the six - substrings in "1\0002\0003". */ - return NULL; - } - - return offset <= string_length ? string + offset : ""; + return offset < string_length ? string + offset : ""; } /* Given a tree T, compute which bits in T may be nonzero. */ Index: gcc/fold-const.h =================================================================== --- gcc/fold-const.h (revision 263045) +++ gcc/fold-const.h (working copy) @@ -187,8 +187,7 @@ extern bool expr_not_equal_to (tree t, const wide_ extern tree const_unop (enum tree_code, tree, tree); extern tree const_binop (enum tree_code, tree, tree, tree); extern bool negate_mathfn_p (combined_fn); -extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL, - unsigned HOST_WIDE_INT * = NULL); +extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL); extern wide_int tree_nonzero_bits (const_tree); /* Return OFF converted to a pointer offset type suitable as offset for Index: gcc/testsuite/gcc.c-torture/execute/pr86711.c =================================================================== --- gcc/testsuite/gcc.c-torture/execute/pr86711.c (revision 0) +++ gcc/testsuite/gcc.c-torture/execute/pr86711.c (working copy) @@ -0,0 +1,11 @@ +/* PR middle-end/86711 */ + +static const char a[2][4] = { "1234", "5678" }; + +int main () +{ + void *p = __builtin_memchr (a, 0, 5); + + if (p) + __builtin_abort (); +} Index: gcc/testsuite/gcc.c-torture/execute/pr86714.c =================================================================== --- gcc/testsuite/gcc.c-torture/execute/pr86714.c (revision 0) +++ gcc/testsuite/gcc.c-torture/execute/pr86714.c (working copy) @@ -0,0 +1,12 @@ +/* PR middle-end/86714 */ + +const char a[2][3] = { "1234", "xyz" }; +char b[6]; + +int main () +{ + __builtin_memcpy (b, a, 4); + __builtin_memset (b + 4, 'a', 2); + if (__builtin_memcmp (b, "123xaa", 6)) + __builtin_abort (); +} Index: gcc/testsuite/gcc.dg/strlenopt-49.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-49.c (revision 263045) +++ gcc/testsuite/gcc.dg/strlenopt-49.c (working copy) @@ -45,9 +45,9 @@ int cmp88 (void) return cmp88; } -/* { dg-final { scan-tree-dump-times "strlen" 0 "gimple" } } - { dg-final { scan-tree-dump-times "len0 = 0;" 1 "gimple" } } - { dg-final { scan-tree-dump-times "len = 18;" 1 "gimple" } } - { dg-final { scan-tree-dump-times "lenx = 8;" 1 "gimple" } } - { dg-final { scan-tree-dump-times "leny = 0;" 1 "gimple" } } - { dg-final { scan-tree-dump-times "cmp88 = 0;" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "strlen" 0 "gimple" { xfail *-*-* } } } + { dg-final { scan-tree-dump-times "len0 = 0;" 1 "gimple" { xfail *-*-* } } } + { dg-final { scan-tree-dump-times "len = 18;" 1 "gimple" { xfail *-*-* } } } + { dg-final { scan-tree-dump-times "lenx = 8;" 1 "gimple" { xfail *-*-* } } } + { dg-final { scan-tree-dump-times "leny = 0;" 1 "gimple" { xfail *-*-* } } } + { dg-final { scan-tree-dump-times "cmp88 = 0;" 1 "gimple" { xfail *-*-* } } } */