Message ID | alpine.LNX.2.20.13.2001201801090.7204@monopod.intra.ispras.ru |
---|---|
State | New |
Headers | show |
Series | doc: clarify the situation with pointer arithmetic | expand |
On 1/20/20 8:08 AM, Alexander Monakov wrote: > Hi, > > we have this paragraph in the documentation that attempts to prohibit something > that is allowed by the language. Instead, I think we should say that this > generally should work and explain that a problem in GCC implementation > breaks this. > > OK to apply? > > Thanks. > Alexander > > > * doc/implement-c.texi (Arrays and Pointers): Rewrite the paragraph > about pointer-as-integer arithmetic. > > diff --git a/gcc/doc/implement-c.texi b/gcc/doc/implement-c.texi > index 692297b69c4..beee2510670 100644 > --- a/gcc/doc/implement-c.texi > +++ b/gcc/doc/implement-c.texi > @@ -401,11 +401,15 @@ pointer representation is smaller than the integer type, extends according > to the signedness of the integer type if the pointer representation > is larger than the integer type, otherwise the bits are unchanged. > > -When casting from pointer to integer and back again, the resulting > -pointer must reference the same object as the original pointer, otherwise > -the behavior is undefined. That is, one may not use integer arithmetic to > -avoid the undefined behavior of pointer arithmetic as proscribed in > -C99 and C11 6.5.6/8. > +Arithmetic on integers derived from pointers can produce a value such > +that casting it back produces a valid pointer corresponding to one of > +the original pointers. Thus, integer arithmetic allows to express > +computations that might not be expressible as pointer arithmetic without > +undefined behavior. However, at a certain point the distinction between > +pointers and integers is lost (when GCC translates from GIMPLE internal > +representation to RTL), but some optimizations still attempt to track > +pointer arithmetic beyond that point. In some cases this may cause > +valid code to be incorrectly optimized. > > @item > @cite{The size of the result of subtracting two pointers to elements > I'm not happy with this -- we shouldn't be talking about internal concepts like GIMPLE and RTL in the GCC user manual. Can we instead talk about which user-visible optimization options cause problems, so that users who feel the urgent need to write hacky code like that know what to disable? (I'd guess it would be things involving alias analysis.) Otherwise, I think we should stick with the current "don't do this" language or some variant of it that explains briefly that, while such code is technically allowed by the C standard, it confuses GCC's optimizers. -Sandra
On Mon, 20 Jan 2020, Sandra Loosemore wrote: > I'm not happy with this -- we shouldn't be talking about internal concepts > like GIMPLE and RTL in the GCC user manual. Can we instead talk about which > user-visible optimization options cause problems, so that users who feel the > urgent need to write hacky code like that know what to disable? (I'd guess it > would be things involving alias analysis.) Otherwise, I think we should stick > with the current "don't do this" language or some variant of it that explains > briefly that, while such code is technically allowed by the C standard, it > confuses GCC's optimizers. The problem is in code shared by multiple RTL optimizers, so I'm afraid we cannot give a list of optimizations to disable. Instead, we can suggest that users can place optimization barriers on problematic pointers: ptr = (void *) iptr; __asm__ ("" : "+g"(ptr)); // use *ptr i.e. prevent incorrect tracking of possible pointer targets by copying the pointer value through an asm statement. To make the intent of the asm more clear, users may wish to enumerate possible targets in input constraints of such asm: ptr = (void *) iptr; __asm__ ("" : "+g"(ptr) : "X"(&obj1), "X"(&obj2), ...); // use *ptr Alexander
On Mon, 20 Jan 2020, Alexander Monakov wrote: > Hi, > > we have this paragraph in the documentation that attempts to prohibit > something that is allowed by the language. Instead, I think we should > say that this generally should work and explain that a problem in GCC > implementation breaks this. Is this based on the proposals to adopt a PNVI model (as described in N2362 / N2363 / N2364) for C2x? Do you have a more detailed analysis of exactly which issues would need changes in GCC to follow the proposed PNVI-ae-udi semantics? Setting up a meta-bug in Bugzilla with dependencies on such issues might be useful, for example - I know there are existing bugs you've filed or commented on, but it would help to have a list in a single place of issues for implementing PNVI-ae-udi. It's not obvious that documentation relating to things that are implementation-defined in existing C standard versions, where detailed memory model questions were not defined, is an appropriate place to discuss questions of how some optimizations might not follow a more precise definition proposed for C2x.
On Mon, 20 Jan 2020, Joseph Myers wrote: > On Mon, 20 Jan 2020, Alexander Monakov wrote: > > > Hi, > > > > we have this paragraph in the documentation that attempts to prohibit > > something that is allowed by the language. Instead, I think we should > > say that this generally should work and explain that a problem in GCC > > implementation breaks this. > > Is this based on the proposals to adopt a PNVI model (as described in > N2362 / N2363 / N2364) for C2x? Do you have a more detailed analysis of > exactly which issues would need changes in GCC to follow the proposed > PNVI-ae-udi semantics? Setting up a meta-bug in Bugzilla with > dependencies on such issues might be useful, for example - I know there > are existing bugs you've filed or commented on, but it would help to have > a list in a single place of issues for implementing PNVI-ae-udi. > > It's not obvious that documentation relating to things that are > implementation-defined in existing C standard versions, where detailed > memory model questions were not defined, is an appropriate place to > discuss questions of how some optimizations might not follow a more > precise definition proposed for C2x. My intent was more basic. I observed that the paragraph can be interpreted as saying that if you have a cast 'I1 = (intptr_t) P1', then perform some computations on I1 that do not in any way depend on values of other pointers, then casting the result back can not point to a different object than P1. But that is not very helpful, because this is the case where the user might have used normal pointer arithmetic in the first place. The important cases involve computations that depend on multiple pointers to unrelated objects. I think that the case discussed in the proposed patch is already not ambiguous and does not need further clarifications to the standard. I also think we should provide the users with rationale when imposing such restrictions, and, as Sandra noted, offer a way to work around the problem. However I don't mind dropping the patch if it's not appropriate. Richard, can you share your opinion here? Thanks. Alexander
On Tue, 21 Jan 2020, Alexander Monakov wrote: > On Mon, 20 Jan 2020, Joseph Myers wrote: > > > On Mon, 20 Jan 2020, Alexander Monakov wrote: > > > > > Hi, > > > > > > we have this paragraph in the documentation that attempts to prohibit > > > something that is allowed by the language. Instead, I think we should > > > say that this generally should work and explain that a problem in GCC > > > implementation breaks this. > > > > Is this based on the proposals to adopt a PNVI model (as described in > > N2362 / N2363 / N2364) for C2x? Do you have a more detailed analysis of > > exactly which issues would need changes in GCC to follow the proposed > > PNVI-ae-udi semantics? Setting up a meta-bug in Bugzilla with > > dependencies on such issues might be useful, for example - I know there > > are existing bugs you've filed or commented on, but it would help to have > > a list in a single place of issues for implementing PNVI-ae-udi. > > > > It's not obvious that documentation relating to things that are > > implementation-defined in existing C standard versions, where detailed > > memory model questions were not defined, is an appropriate place to > > discuss questions of how some optimizations might not follow a more > > precise definition proposed for C2x. > > My intent was more basic. I observed that the paragraph can be interpreted as > saying that if you have a cast 'I1 = (intptr_t) P1', then perform some > computations on I1 that do not in any way depend on values of other pointers, > then casting the result back can not point to a different object than P1. > But that is not very helpful, because this is the case where the user might have > used normal pointer arithmetic in the first place. The important cases > involve computations that depend on multiple pointers to unrelated objects. > > I think that the case discussed in the proposed patch is already not ambiguous > and does not need further clarifications to the standard. > > I also think we should provide the users with rationale when imposing such > restrictions, and, as Sandra noted, offer a way to work around the problem. > > However I don't mind dropping the patch if it's not appropriate. Richard, > can you share your opinion here? Let me say a few things here. First I always thought the only thing C guarantees is that you can convert a pointer to an integer and back (the same value!) then you get the same pointer if the integer type has sufficient size. The C standard nowhere specifies semantics of the case where you modify the integer "representation" and then convert that to a pointer. But that may have changed. Second. Fact is RTL does not distinguish between pointers and integers and thus any attempt to make something valid when you use integers and invalid when you use pointers is not going to work. Third. GCC tracks pointed to things through integers. That means it a) assumes the integer representation cannot "leave" an object as the current documentation states b) it handles "mixing" pointers to different objects in their integer representation correctly as to that it assumes the result converted to a pointer might point to either object the tracking implementation details also track pieces and funneling pointers through FP values. Fourth. That PNVI (I assume it's the whole pointer-provenance stuff) wants to get the "best" of both which can never be done since a compiler needs to have a way to be conservative - in this area it's conflicting conservative treatment which is impossible. Fifth. GCC has issues involving equivalences and when (implicitely) turning control into data dependences. The C standard seems to have conflicting (to GCC) wording with regard to equivalences of pointers vs. integers (well, not actually sure). For all these issues I usually point at the second issue above because trying to plug holes without addressing that very thing isn't going to work (unless you want to plug the "hole" for integer code as well). And yeah, a meta-bug or even better a single WIKI page listing issues (and maybe referencing the various partially redundant and partially mixing different issues bugzillas) would be helpful here. To be the entry-point in bugzilla is PR49330, see comment#7 for the fundamental RTL representation issue. Thanks, Richard.
On Tue, 21 Jan 2020, Richard Biener wrote: > Fourth. That PNVI (I assume it's the whole pointer-provenance stuff) > wants to get the "best" of both which can never be done since a compiler > needs to have a way to be conservative - in this area it's conflicting > conservative treatment which is impossible. This paragraph is unclear, I don't immediately see what the conflicting goals are. The rest is clear enough given the previous discussions I saw. Did you mean the restriction that you cannot do arithmetic involving two integers based on pointers, get a value corresponding to one of them, cast it back and get a pointer suitable for accessing either of two originally pointed-to objects? I don't see that as a conflict because it places a restriction on users, not the compiler. Alexander
On Tue, 21 Jan 2020, Alexander Monakov wrote: > My intent was more basic. I observed that the paragraph can be interpreted as > saying that if you have a cast 'I1 = (intptr_t) P1', then perform some > computations on I1 that do not in any way depend on values of other pointers, > then casting the result back can not point to a different object than P1. Yes. I don't think the wording in the existing standard (where anything other than casting the same integer value back to pointer type - and, furthermore, the result is only specified to compare equal to the original pointer, not to be otherwise usable in place of it) can distinguish that version (PVI) from PNVI. > But that is not very helpful, because this is the case where the user > might have used normal pointer arithmetic in the first place. I think a typical use case of intptr_t might be e.g. masking off low bits of a pointer value for alignment while remaining within the same object. Another one might be to compare pointers to different objects in implementing memmove-like functions, where ordered pointer comparison would not be defined in ISO C. > I think that the case discussed in the proposed patch is already not ambiguous > and does not need further clarifications to the standard. I don't think such a requirement for PNVI can be deduced from the existing text.
On Tue, 21 Jan 2020, Richard Biener wrote: > Second. Fact is RTL does not distinguish between pointers and > integers and thus any attempt to make something valid when you > use integers and invalid when you use pointers is not going to work. That simply means that an earlier stage in the compiler would need to mark cases of integers converted from pointers in some way as "could have provenance of any address-exposed object" (that being the effect of PNVI-ae models) unless it can prove the offsets etc. applied do keep the pointer within the original object.
On Tue, 21 Jan 2020, Alexander Monakov wrote: > On Tue, 21 Jan 2020, Richard Biener wrote: > > > Fourth. That PNVI (I assume it's the whole pointer-provenance stuff) > > wants to get the "best" of both which can never be done since a compiler > > needs to have a way to be conservative - in this area it's conflicting > > conservative treatment which is impossible. > > This paragraph is unclear, I don't immediately see what the conflicting goals > are. The rest is clear enough given the previous discussions I saw. > > Did you mean the restriction that you cannot do arithmetic involving two > integers based on pointers, get a value corresponding to one of them, > cast it back and get a pointer suitable for accessing either of two > originally pointed-to objects? I don't see that as a conflict because > it places a restriction on users, not the compiler. As far as I remember the discussions PNVI requires to track provenance for correctness, you may not miss or attach wrong provenance to a pointer and there's only "single" provenance, not "many" (aka, may point to A and B). I don't see how you can ever implement that. But maybe I'm misremembering. Richard.
On Wed, 22 Jan 2020, Joseph Myers wrote: > On Tue, 21 Jan 2020, Richard Biener wrote: > > > Second. Fact is RTL does not distinguish between pointers and > > integers and thus any attempt to make something valid when you > > use integers and invalid when you use pointers is not going to work. > > That simply means that an earlier stage in the compiler would need to mark > cases of integers converted from pointers in some way as "could have > provenance of any address-exposed object" (that being the effect of > PNVI-ae models) unless it can prove the offsets etc. applied do keep the > pointer within the original object. We already have that, it's called points-to analysis which computes "points-to sets" even for integers. But even there it assumes you can't leave an object via arithmetic, doing that would be quite bad for optimization in the current state or prohibitly expensive since it would require exact offset tracking. Now invent something that transfers this to RTL. Patches to the main piece where the issue triggers (find_base_term + base_alias_check and the very similar find_base_value) welcome. Richard.
On 1/22/20 8:32 AM, Richard Biener wrote: > On Tue, 21 Jan 2020, Alexander Monakov wrote: > >> On Tue, 21 Jan 2020, Richard Biener wrote: >> >>> Fourth. That PNVI (I assume it's the whole pointer-provenance stuff) >>> wants to get the "best" of both which can never be done since a compiler >>> needs to have a way to be conservative - in this area it's conflicting >>> conservative treatment which is impossible. >> >> This paragraph is unclear, I don't immediately see what the conflicting goals >> are. The rest is clear enough given the previous discussions I saw. >> >> Did you mean the restriction that you cannot do arithmetic involving two >> integers based on pointers, get a value corresponding to one of them, >> cast it back and get a pointer suitable for accessing either of two >> originally pointed-to objects? I don't see that as a conflict because >> it places a restriction on users, not the compiler. > > As far as I remember the discussions PNVI requires to track > provenance for correctness, you may not miss or attach wrong provenance > to a pointer and there's only "single" provenance, not "many" > (aka, may point to A and B). I don't see how you can ever implement that. The PVNI variant preferred by the object model group is referred to as "PNVI-ae-udi" which stands for "PNVI exposed-address user- disambiguation." (The PNVI part stands for "Provenance Not Via Integers.) This base PVNI model basically prohibits provenance tracking via integers, making it possible for programs to derive pointers to unrelated objects via casts between pointers and integers (and modifying the integer in between the casts). This is considered a new restriction on implementations because the standard doesn't permit it (as you said upthread, all it specifies is that a pointer is equal to one obtained by casting the original to a intptr_t and back). The -ae-udi variant limits this restriction on implementations to escaped pointers and provides a means for users/programs to disambiguate between pointers to adjacent objects (i.e., a past the end pointer and one to the beginning of the object stored there). The latest proposal is in N2362, with an overview in N2378). At the last WG14 meeting there was broad discomfort with adopting the proposal for C2X because of the absence of implementation experience and concerns raised by implementers. The guidance to the study group was to target a separate technical specification for the proposal and allow time for implementation experience. If the feedback from implementers is positive (whatever that might mean) WG14 said it would consider adopting the model for a revision of C after C2X. Overall, the impact of the proposals as well as their goal is to constrain implementations to the (presumed) benefit of programs in terms of expressiveness. There are numerous examples of code that's currently treated as undefined by one or more compilers (as a consequence of optimizations) that the model makes valid. I'm not aware of any optimization opportunities the proposal might open up in GCC. Martin
On Wed, Jan 22, 2020 at 12:40 PM Martin Sebor <msebor@gmail.com> wrote: > > On 1/22/20 8:32 AM, Richard Biener wrote: > > On Tue, 21 Jan 2020, Alexander Monakov wrote: > > > >> On Tue, 21 Jan 2020, Richard Biener wrote: > >> > >>> Fourth. That PNVI (I assume it's the whole pointer-provenance stuff) > >>> wants to get the "best" of both which can never be done since a compiler > >>> needs to have a way to be conservative - in this area it's conflicting > >>> conservative treatment which is impossible. > >> > >> This paragraph is unclear, I don't immediately see what the conflicting goals > >> are. The rest is clear enough given the previous discussions I saw. > >> > >> Did you mean the restriction that you cannot do arithmetic involving two > >> integers based on pointers, get a value corresponding to one of them, > >> cast it back and get a pointer suitable for accessing either of two > >> originally pointed-to objects? I don't see that as a conflict because > >> it places a restriction on users, not the compiler. > > > > As far as I remember the discussions PNVI requires to track > > provenance for correctness, you may not miss or attach wrong provenance > > to a pointer and there's only "single" provenance, not "many" > > (aka, may point to A and B). I don't see how you can ever implement that. > > The PVNI variant preferred by the object model group is referred > to as "PNVI-ae-udi" which stands for "PNVI exposed-address user- > disambiguation." (The PNVI part stands for "Provenance Not Via > Integers.) This base PVNI model basically prohibits provenance > tracking via integers, making it possible for programs to derive > pointers to unrelated objects via casts between pointers and > integers (and modifying the integer in between the casts). This > is considered a new restriction on implementations because > the standard doesn't permit it (as you said upthread, all it > specifies is that a pointer is equal to one obtained by casting > the original to a intptr_t and back). > > The -ae-udi variant limits this restriction on implementations > to escaped pointers and provides a means for users/programs to > disambiguate between pointers to adjacent objects (i.e., a past > the end pointer and one to the beginning of the object stored > there). The latest proposal is in N2362, with an overview in > N2378). At the last WG14 meeting there was broad discomfort > with adopting the proposal for C2X because of the absence of > implementation experience and concerns raised by implementers. > The guidance to the study group was to target a separate technical > specification for the proposal and allow time for implementation > experience. If the feedback from implementers is positive > (whatever that might mean) WG14 said it would consider adopting > the model for a revision of C after C2X. > > Overall, the impact of the proposals as well as their goal is to > constrain implementations to the (presumed) benefit of programs > in terms of expressiveness. There are numerous examples of code > that's currently treated as undefined by one or more compilers > (as a consequence of optimizations) that the model makes valid. > I'm not aware of any optimization opportunities the proposal > might open up in GCC. Well, PNVI limits optimization opportunities of GCC which currently _does_ track provenance through integers and thus only allows a very limited set(*) of "unrelated" pointers to appear here (documented is that none are, implementation details differ from version to version). There are no optimization "opportunities" by making pointer <-> integer conversions lose information. (*) for recent versions we allow pointers to globals to be "invented" but place strict restrictions on automatic variables based on the idea that you cannot have reliable absolute addressing of those (but you could place objects at 0x12340 if you like via linker scripts) Then there are of course bugs in GCC (just found PR93381) when tracking provenance through integers (GCC also disregards the possibility of someone actually moving pointers in very weird ways which you could say is a bug). You also can't easily disregard aligning bitwise ands of leaving the current object if you consider overaligning so you'd again need precise tracking of pointer offsets in points-to analysis which we don't have. Richard. > > Martin
Am Donnerstag, den 23.01.2020, 14:18 +0100 schrieb Richard Biener: > On Wed, Jan 22, 2020 at 12:40 PM Martin Sebor <msebor@gmail.com> wrote: > > > > On 1/22/20 8:32 AM, Richard Biener wrote: > > > On Tue, 21 Jan 2020, Alexander Monakov wrote: > > > > > > > On Tue, 21 Jan 2020, Richard Biener wrote: > > > > > > > > > Fourth. That PNVI (I assume it's the whole pointer-provenance stuff) > > > > > wants to get the "best" of both which can never be done since a compiler > > > > > needs to have a way to be conservative - in this area it's conflicting > > > > > conservative treatment which is impossible. > > > > > > > > This paragraph is unclear, I don't immediately see what the conflicting goals > > > > are. The rest is clear enough given the previous discussions I saw. > > > > > > > > Did you mean the restriction that you cannot do arithmetic involving two > > > > integers based on pointers, get a value corresponding to one of them, > > > > cast it back and get a pointer suitable for accessing either of two > > > > originally pointed-to objects? I don't see that as a conflict because > > > > it places a restriction on users, not the compiler. > > > > > > As far as I remember the discussions PNVI requires to track > > > provenance for correctness, you may not miss or attach wrong provenance > > > to a pointer and there's only "single" provenance, not "many" > > > (aka, may point to A and B). I don't see how you can ever implement that. I have not idea how you came to that conclusion. PNVI is perfectly compatible with a naive compiler who does not track provenance at all as well as an abstract machine that actually carries run-time provenance around with each pointer and checks every operation. It was designed specifically to allow both cases and everything in between (especially compilers who track provenance during compile time but the programs then do not track provenance at run-time). You may be confused by the abstract formulation that indeed assigns a single provenance to each pointer. A compiler would track its *knowledge about provenance*, which would be a set of possible targets. > > The PVNI variant preferred by the object model group is referred > > to as "PNVI-ae-udi" which stands for "PNVI exposed-address user- > > disambiguation." (The PNVI part stands for "Provenance Not Via > > Integers.) This base PVNI model basically prohibits provenance > > tracking via integers, making it possible for programs to derive > > pointers to unrelated objects via casts between pointers and > > integers (and modifying the integer in between the casts). This > > is considered a new restriction on implementations because > > the standard doesn't permit it (as you said upthread, all it > > specifies is that a pointer is equal to one obtained by casting > > the original to a intptr_t and back). This is not entirely clear what the standard means. 7.20.1.4. In my opinion, converting the same integer back should yield a valid pointer where "same" is defined in the usual sense (i.e. via mathematical identity and not via provenance). > > The -ae-udi variant limits this restriction on implementations > > to escaped pointers and provides a means for users/programs to > > disambiguate between pointers to adjacent objects (i.e., a past > > the end pointer and one to the beginning of the object stored > > there). The latest proposal is in N2362, with an overview in > > N2378). At the last WG14 meeting there was broad discomfort > > with adopting the proposal for C2X because of the absence of > > implementation experience and concerns raised by implementers. > > The guidance to the study group was to target a separate technical > > specification for the proposal and allow time for implementation > > experience. If the feedback from implementers is positive > > (whatever that might mean) WG14 said it would consider adopting > > the model for a revision of C after C2X. > > > > Overall, the impact of the proposals as well as their goal is to > > constrain implementations to the (presumed) benefit of programs > > in terms of expressiveness. There are numerous examples of code > > that's currently treated as undefined by one or more compilers > > (as a consequence of optimizations) that the model makes valid. > > I'm not aware of any optimization opportunities the proposal > > might open up in GCC. > > Well, PNVI limits optimization opportunities of GCC which currently > _does_ track provenance through integers and thus only allows > a very limited set(*) of "unrelated" pointers to appear here (documented > is that none are, implementation details differ from version to version). > > There are no optimization "opportunities" by making pointer <-> integer > conversions lose information. You are right: It is meant to constrain optimizations. The reasoning behind this that currently all compilers behave inconsistently and this is not terrible useful to anybody. At the same time, there does not appear to be any reasonable way how integers can have provenance. Any rules we came up with really got complicated and are also fundamentally at odds with the usual mathematical properties of integers one would naively expect. > (*) for recent versions we allow pointers to globals to be "invented" but > place strict restrictions on automatic variables based on the idea that > you cannot have reliable absolute addressing of those (but you could > place objects at 0x12340 if you like via linker scripts) The idea with PVNI-ae would be that pointers could be "invented" only to "exposed" objects, i.e. address taken and the address was cast to int (or escaped compiler analysis). So pointers to other automatic variables can not be invented. > Then there are of course bugs in GCC (just found PR93381) when > tracking provenance through integers (GCC also disregards the > possibility of someone actually moving pointers in very weird ways > which you could say is a bug). Yes, nice example. With the current standard, it is not even clear whether this is a bug or not. Martin > You also can't easily disregard aligning bitwise ands of leaving the > current object if you consider overaligning so you'd again need > precise tracking of pointer offsets in points-to analysis which we don't have. > > Richard. > > > > > Martin
On Fri, Jan 24, 2020 at 12:46 AM Uecker, Martin <Martin.Uecker@med.uni-goettingen.de> wrote: > > Am Donnerstag, den 23.01.2020, 14:18 +0100 schrieb Richard Biener: > > On Wed, Jan 22, 2020 at 12:40 PM Martin Sebor <msebor@gmail.com> wrote: > > > > > > On 1/22/20 8:32 AM, Richard Biener wrote: > > > > On Tue, 21 Jan 2020, Alexander Monakov wrote: > > > > > > > > > On Tue, 21 Jan 2020, Richard Biener wrote: > > > > > > > > > > > Fourth. That PNVI (I assume it's the whole pointer-provenance stuff) > > > > > > wants to get the "best" of both which can never be done since a compiler > > > > > > needs to have a way to be conservative - in this area it's conflicting > > > > > > conservative treatment which is impossible. > > > > > > > > > > This paragraph is unclear, I don't immediately see what the conflicting goals > > > > > are. The rest is clear enough given the previous discussions I saw. > > > > > > > > > > Did you mean the restriction that you cannot do arithmetic involving two > > > > > integers based on pointers, get a value corresponding to one of them, > > > > > cast it back and get a pointer suitable for accessing either of two > > > > > originally pointed-to objects? I don't see that as a conflict because > > > > > it places a restriction on users, not the compiler. > > > > > > > > As far as I remember the discussions PNVI requires to track > > > > provenance for correctness, you may not miss or attach wrong provenance > > > > to a pointer and there's only "single" provenance, not "many" > > > > (aka, may point to A and B). I don't see how you can ever implement that. > > I have not idea how you came to that conclusion. PNVI is perfectly > compatible with a naive compiler who does not track provenance at > all as well as an abstract machine that actually carries run-time > provenance around with each pointer and checks every operation. > It was designed specifically to allow both cases and everything > in between (especially compilers who track provenance during > compile time but the programs then do not track provenance at > run-time). > > You may be confused by the abstract formulation that indeed > assigns a single provenance to each pointer. A compiler would > track its *knowledge about provenance*, which would be a set > of possible targets. Well, the question is whether PVNI allows the compiler to put any additional restriction on what the provenance of an interger is. It appears not, so any attempt to track provenance through integers is doomed until the cases are very simple. I'm not sure that's desirable (*). > > > The PVNI variant preferred by the object model group is referred > > > to as "PNVI-ae-udi" which stands for "PNVI exposed-address user- > > > disambiguation." (The PNVI part stands for "Provenance Not Via > > > Integers.) This base PVNI model basically prohibits provenance > > > tracking via integers, making it possible for programs to derive > > > pointers to unrelated objects via casts between pointers and > > > integers (and modifying the integer in between the casts). This > > > is considered a new restriction on implementations because > > > the standard doesn't permit it (as you said upthread, all it > > > specifies is that a pointer is equal to one obtained by casting > > > the original to a intptr_t and back). > > This is not entirely clear what the standard means. 7.20.1.4. > > In my opinion, converting the same integer back should yield > a valid pointer where "same" is defined in the usual sense > (i.e. via mathematical identity and not via provenance). > > > > The -ae-udi variant limits this restriction on implementations > > > to escaped pointers and provides a means for users/programs to > > > disambiguate between pointers to adjacent objects (i.e., a past > > > the end pointer and one to the beginning of the object stored > > > there). The latest proposal is in N2362, with an overview in > > > N2378). At the last WG14 meeting there was broad discomfort > > > with adopting the proposal for C2X because of the absence of > > > implementation experience and concerns raised by implementers. > > > The guidance to the study group was to target a separate technical > > > specification for the proposal and allow time for implementation > > > experience. If the feedback from implementers is positive > > > (whatever that might mean) WG14 said it would consider adopting > > > the model for a revision of C after C2X. > > > > > > Overall, the impact of the proposals as well as their goal is to > > > constrain implementations to the (presumed) benefit of programs > > > in terms of expressiveness. There are numerous examples of code > > > that's currently treated as undefined by one or more compilers > > > (as a consequence of optimizations) that the model makes valid. > > > I'm not aware of any optimization opportunities the proposal > > > might open up in GCC. > > > > Well, PNVI limits optimization opportunities of GCC which currently > > _does_ track provenance through integers and thus only allows > > a very limited set(*) of "unrelated" pointers to appear here (documented > > is that none are, implementation details differ from version to version). > > > > There are no optimization "opportunities" by making pointer <-> integer > > conversions lose information. > > You are right: It is meant to constrain optimizations. > > The reasoning behind this that currently all compilers behave > inconsistently and this is not terrible useful to anybody. > > At the same time, there does not appear to > be any reasonable way how integers can have provenance. > Any rules we came up with really got complicated and are > also fundamentally at odds with the usual mathematical > properties of integers one would naively expect. So the original point where GCC started to track provenance through non-pointers (PVNI should really be PVNNP since I guess tracking provenance through floats isn't to be done either ;)) was a testcase showing that Matlab (IIRC) generated C code funneled pointers through a pair of floats (obviously a single float isn't enough for 64bit pointers ...) and that prevents a good deal of optimization due to missed alias analysis. I fixed that and now GCC happily tracks provenance through a pair of floats ... (*) this also shows the level of "obfuscation" needed to fool compilers to lose provenance knowledge is hard to predict. > > > (*) for recent versions we allow pointers to globals to be "invented" but > > place strict restrictions on automatic variables based on the idea that > > you cannot have reliable absolute addressing of those (but you could > > place objects at 0x12340 if you like via linker scripts) > > The idea with PVNI-ae would be that pointers could be "invented" > only to "exposed" objects, i.e. address taken and the address > was cast to int (or escaped compiler analysis). So pointers to > other automatic variables can not be invented. > > > Then there are of course bugs in GCC (just found PR93381) when > > tracking provenance through integers (GCC also disregards the > > possibility of someone actually moving pointers in very weird ways > > which you could say is a bug). > > Yes, nice example. With the current standard, it is not even > clear whether this is a bug or not. > > > Martin > > > You also can't easily disregard aligning bitwise ands of leaving the > > current object if you consider overaligning so you'd again need > > precise tracking of pointer offsets in points-to analysis which we don't have. > > > > Richard. > > > > > > > > Martin
Hi Richard, thank you for your response. Am Montag, den 27.01.2020, 15:42 +0100 schrieb Richard Biener: > On Fri, Jan 24, 2020 at 12:46 AM Uecker, Martin > <Martin.Uecker@med.uni-goettingen.de> wrote: > > > > Am Donnerstag, den 23.01.2020, 14:18 +0100 schrieb Richard Biener: > > > On Wed, Jan 22, 2020 at 12:40 PM Martin Sebor <msebor@gmail.com> wrote: > > > > > > > > On 1/22/20 8:32 AM, Richard Biener wrote: > > > > > On Tue, 21 Jan 2020, Alexander Monakov wrote: > > > > > > > > > > > On Tue, 21 Jan 2020, Richard Biener wrote: > > > > > > > > > > > > > Fourth. That PNVI (I assume it's the whole pointer-provenance stuff) > > > > > > > wants to get the "best" of both which can never be done since a compiler > > > > > > > needs to have a way to be conservative - in this area it's conflicting > > > > > > > conservative treatment which is impossible. > > > > > > > > > > > > This paragraph is unclear, I don't immediately see what the conflicting goals > > > > > > are. The rest is clear enough given the previous discussions I saw. > > > > > > > > > > > > Did you mean the restriction that you cannot do arithmetic involving two > > > > > > integers based on pointers, get a value corresponding to one of them, > > > > > > cast it back and get a pointer suitable for accessing either of two > > > > > > originally pointed-to objects? I don't see that as a conflict because > > > > > > it places a restriction on users, not the compiler. > > > > > > > > > > As far as I remember the discussions PNVI requires to track > > > > > provenance for correctness, you may not miss or attach wrong provenance > > > > > to a pointer and there's only "single" provenance, not "many" > > > > > (aka, may point to A and B). I don't see how you can ever implement that. > > > > I have not idea how you came to that conclusion. PNVI is perfectly > > compatible with a naive compiler who does not track provenance at > > all as well as an abstract machine that actually carries run-time > > provenance around with each pointer and checks every operation. > > It was designed specifically to allow both cases and everything > > in between (especially compilers who track provenance during > > compile time but the programs then do not track provenance at > > run-time). > > > > You may be confused by the abstract formulation that indeed > > assigns a single provenance to each pointer. A compiler would > > track its *knowledge about provenance*, which would be a set > > of possible targets. > > Well, the question is whether PVNI allows the compiler to put any > additional restriction on what the provenance of an interger is. It > appears not, so any attempt to track provenance through integers > is doomed until the cases are very simple. I'm not sure that's desirable (*). PVNI makes something which is now implementation defined have well defined semantics. This is not possible to do without putting additional restrictions on the compiler and this will make some optimizations non-compliant. In both committees (C and C++) there is a consensus that the rules about provenance should be made clear and explicit. So far, this is the only well-developed proposal. .... > > > Well, PNVI limits optimization opportunities of GCC which currently > > > _does_ track provenance through integers and thus only allows > > > a very limited set(*) of "unrelated" pointers to appear here (documented > > > is that none are, implementation details differ from version to version). > > > > > > There are no optimization "opportunities" by making pointer <-> integer > > > conversions lose information. > > > > You are right: It is meant to constrain optimizations. > > > > The reasoning behind this that currently all compilers behave > > inconsistently and this is not terrible useful to anybody. > > > > At the same time, there does not appear to > > be any reasonable way how integers can have provenance. > > Any rules we came up with really got complicated and are > > also fundamentally at odds with the usual mathematical > > properties of integers one would naively expect. > > So the original point where GCC started to track provenance through > non-pointers (PVNI should really be PVNNP since I guess tracking > provenance through floats isn't to be done either ;)) was a testcase > showing that Matlab (IIRC) generated C code funneled pointers through > a pair of floats (obviously a single float isn't enough for 64bit pointers ...) > and that prevents a good deal of optimization due to missed alias analysis. > I fixed that and now GCC happily tracks provenance through a pair of > floats ... Yes, this is impressive. And yes, the proposal would forbid this (i.e. doing aliasing analysis based on the provenance tracked through integers). PVNI would make the Matlab code well-defined (under some conditions), but break the optimization based on tracking provenance through floats. But as the currently the code is not portable at all, I do not think it is really a problem if we compiler offers this optimization only as a non-standard compiler option. > (*) this also shows the level of "obfuscation" needed to fool compilers > to lose provenance knowledge is hard to predict. Well, this is exactly the problem we want to address by defining a clear way to do this. Casting to an integer would be the way to state: "consider the pointer as escaped and forget the provenance" and casting an integer to a pointer would mean "this pointer may point to all objects whose pointer has escaped". This would give the programmer explicit control about this aspect and make most existing code using pointer-to-integer casts well-defined. At the same time, this should be simple to add to existing points-to analysis. Best, Martin
On Tue, 28 Jan 2020, Uecker, Martin wrote: > > (*) this also shows the level of "obfuscation" needed to fool compilers > > to lose provenance knowledge is hard to predict. > > Well, this is exactly the problem we want to address by defining > a clear way to do this. Casting to an integer would be the way > to state: "consider the pointer as escaped and forget the > provenance" and casting an integer to a pointer would > mean "this pointer may point to all objects whose pointer has > escaped". This would give the programmer explicit control about > this aspect and make most existing code using pointer-to-integer > casts well-defined. At the same time, this should be simple > to add to existing points-to analysis. Can you explain why you make it required for the compiler to treat the points-to set unnecessarily broader than it could prove? In the Matlab example, there's a simple chain of computations that the compiler is following to prove that the pointer resulting from the final cast is derived from exactly one other pointer (no other pointers have participated in the computations). Or, in other words: is there an example where a programmer can distinguish between the requirement you explain above vs. the fine-grained interpretation that GIMPLE aims to implement (at least as I understand it), which is: when the program creates a pointer by means of non-pointer computations (casts, representation access, etc), the resulting pointer may point to: * any object which address could have participated in the computation (which is at worst the entire set of "exposed" objects up to that program point, but can be much narrower if the compiler can see the entire chain of computations) * any objects which is not "exposed" but could have known address, e.g. because it is placed at a specific address during linking Thanks. Alexander
On Tue, Jan 28, 2020 at 8:20 AM Alexander Monakov <amonakov@ispras.ru> wrote: > > On Tue, 28 Jan 2020, Uecker, Martin wrote: > > > > (*) this also shows the level of "obfuscation" needed to fool compilers > > > to lose provenance knowledge is hard to predict. > > > > Well, this is exactly the problem we want to address by defining > > a clear way to do this. Casting to an integer would be the way > > to state: "consider the pointer as escaped and forget the > > provenance" and casting an integer to a pointer would > > mean "this pointer may point to all objects whose pointer has > > escaped". This would give the programmer explicit control about > > this aspect and make most existing code using pointer-to-integer > > casts well-defined. At the same time, this should be simple > > to add to existing points-to analysis. > > Can you explain why you make it required for the compiler to treat the > points-to set unnecessarily broader than it could prove? In the Matlab > example, there's a simple chain of computations that the compiler is > following to prove that the pointer resulting from the final cast is > derived from exactly one other pointer (no other pointers have > participated in the computations). > > Or, in other words: > > is there an example where a programmer can distinguish between the > requirement you explain above vs. the fine-grained interpretation > that GIMPLE aims to implement (at least as I understand it), which is: > > when the program creates a pointer by means of non-pointer computations > (casts, representation access, etc), the resulting pointer may point to: > > * any object which address could have participated in the computation > (which is at worst the entire set of "exposed" objects up to that > program point, but can be much narrower if the compiler can see > the entire chain of computations) > > * any objects which is not "exposed" but could have known address, e.g. > because it is placed at a specific address during linking Note for the current PTA implementation there's almost no cases we can handle conservatively enough. Consider the simple int a[4]; int *p = &a[1]; uintptr_t pi = (uintptr_t)p; pi += 4; int *q = (int *)pi; our PTA knows that p points to a (but not the exact offset), same for pi (the cast doesn't change the value). Now you add 4 - this could lead you outside of 'a' so the points-to set becomes 'a and anything'. I'm also not sure what PVNI does to int a[4]; int *p = &a[1]; p += 10; uintptr_t pi = (uintptr_t)p; p = (int *)pi; we assume that p points to a even after p += 10 (but it of course points outside of the object - obvious here, but not necessarily in more obfuscated cases). Now, can we assume pi points to a? The cast isn't value-changing. Do we have to assume (int *)pi points to anything? So, is p = (int *)(uintptr_t)p; something like "laundering" a pointer? We don't preserve such simple round-trip casts since they are value-preserving. Are they provenance preserving? Richard. > Thanks. > Alexander
Am Dienstag, den 28.01.2020, 10:20 +0300 schrieb Alexander Monakov: > On Tue, 28 Jan 2020, Uecker, Martin wrote: > > > > (*) this also shows the level of "obfuscation" needed to fool compilers > > > to lose provenance knowledge is hard to predict. > > > > Well, this is exactly the problem we want to address by defining > > a clear way to do this. Casting to an integer would be the way > > to state: "consider the pointer as escaped and forget the > > provenance" and casting an integer to a pointer would > > mean "this pointer may point to all objects whose pointer has > > escaped". This would give the programmer explicit control about > > this aspect and make most existing code using pointer-to-integer > > casts well-defined. At the same time, this should be simple > > to add to existing points-to analysis. > > Can you explain why you make it required for the compiler to treat the > points-to set unnecessarily broader than it could prove? In the Matlab > example, there's a simple chain of computations that the compiler is > following to prove that the pointer resulting from the final cast is > derived from exactly one other pointer (no other pointers have > participated in the computations). > > Or, in other words: > > is there an example where a programmer can distinguish between the > requirement you explain above vs. the fine-grained interpretation > that GIMPLE aims to implement (at least as I understand it), which is: > > when the program creates a pointer by means of non-pointer computations > (casts, representation access, etc), the resulting pointer may point to: > > * any object which address could have participated in the computation > (which is at worst the entire set of "exposed" objects up to that > program point, but can be much narrower if the compiler can see > the entire chain of computations) > > * any objects which is not "exposed" but could have known address, e.g. > because it is placed at a specific address during linking Unfortunately, this is not as simple. It is not trivial to define the set of objects whose "address could have participated in the computation" int a = ... random number int b = &y; if (a == b) { int *p = (int*)a; Did '&y' participate in the computation? What if you output and integer using I/O and read it back in? What if you copy an integer using control flow? There are many similar questions like this. If we want to make this part of the standard, we need to formulate rules for all integer operations about how the provenance flows. There are several problems with this. A compiler needs to be able to compute the complete points-to set. If it might miss an object which is allowed to be used it has to be conservative with aliasing - the analysis become useless. So we always have the trade-off between making the rules simpler and restrict the tracking or specify more complicated rules that allow sophisticated tracking but then all compilers that want to do this optimization have to implement this complicated rules or need to fall back to being conservative. Finally, all integer operations would have a potential hidden second meaning when applied to addresses of objects, which makes it much easier to reason about. Assume you have a function int difference(int a, int b) { return a - b; } And later you replace an expression 'difference(a, a)' with '0' in the program. This seems a trivial and logical thing to do, but if this expression was applied to addresses, you might have broken a provenance chain and introduced a bug. In my opinion, integers should stay integers with simple logical properties. Gruß, Martin
Am Dienstag, den 28.01.2020, 11:01 +0100 schrieb Richard Biener: > On Tue, Jan 28, 2020 at 8:20 AM Alexander Monakov <amonakov@ispras.ru> wrote: > > > > On Tue, 28 Jan 2020, Uecker, Martin wrote: > > > > > > (*) this also shows the level of "obfuscation" needed to fool compilers > > > > to lose provenance knowledge is hard to predict. > > > > > > Well, this is exactly the problem we want to address by defining > > > a clear way to do this. Casting to an integer would be the way > > > to state: "consider the pointer as escaped and forget the > > > provenance" and casting an integer to a pointer would > > > mean "this pointer may point to all objects whose pointer has > > > escaped". This would give the programmer explicit control about > > > this aspect and make most existing code using pointer-to-integer > > > casts well-defined. At the same time, this should be simple > > > to add to existing points-to analysis. > > > > Can you explain why you make it required for the compiler to treat the > > points-to set unnecessarily broader than it could prove? In the Matlab > > example, there's a simple chain of computations that the compiler is > > following to prove that the pointer resulting from the final cast is > > derived from exactly one other pointer (no other pointers have > > participated in the computations). > > > > Or, in other words: > > > > is there an example where a programmer can distinguish between the > > requirement you explain above vs. the fine-grained interpretation > > that GIMPLE aims to implement (at least as I understand it), which is: > > > > when the program creates a pointer by means of non-pointer computations > > (casts, representation access, etc), the resulting pointer may point to: > > > > * any object which address could have participated in the computation > > (which is at worst the entire set of "exposed" objects up to that > > program point, but can be much narrower if the compiler can see > > the entire chain of computations) > > > > * any objects which is not "exposed" but could have known address, e.g. > > because it is placed at a specific address during linking > > Note for the current PTA implementation there's almost no cases we can > handle conservatively enough. Consider the simple > > int a[4]; > int *p = &a[1]; > uintptr_t pi = (uintptr_t)p; > pi += 4; > int *q = (int *)pi; > > our PTA knows that p points to a (but not the exact offset), same for pi > (the cast doesn't change the value). Now you add 4 - this could lead > you outside of 'a' so the points-to set becomes 'a and anything'. PVNI would say that 'a' gets exposed in the first case and then 'q' can point to all exposed objects because of the second cast. A correct and conservative implementation would do this: In the PTA you would need to mark the address of a escaped and then later assign a conservative points-to set (all escaped objects) to q. Yes, this limits optimizations, but I do not think this is terrible. (optimizations could be re-enabled with a compiler option) > I'm also not sure what PVNI does to > > int a[4]; > int *p = &a[1]; > p += 10; > uintptr_t pi = (uintptr_t)p; > p = (int *)pi; > > we assume that p points to a even after p += 10 (but it of course points > outside of the object - obvious here, but not necessarily in more > obfuscated cases). This is UB because a pointer to outside of the object is formed. > Now, can we assume pi points to a? The cast isn't value-changing. Do we have > to assume (int *)pi points to anything? So, is > > p = (int *)(uintptr_t)p; > > something like "laundering" a pointer? We don't preserve such simple round-trip > casts since they are value-preserving. Are they provenance preserving? Yes, such a pair would be "laundering" as it allows 'p' to then point to any exposed object provenance-wise. For such casts the FE would maybe add a marker. Maybe a calls to builtin functions 'builtin_expose(a)' and 'builting_bless'. (having those builtins would be interesting on its own, btw). Having said this, some optimizations could still be allowed using the "as-if" rule and other lines of reasoning. Specifally, PVNI states that 'p' gets assigned the provenance of the object the integer values is the address of. So if the compiler can proof that the address belongs to certain objects it can reassign the points-to set to the new 'p'. Only if there is ambiguity between which objects the address belongs to, the reasoning needs to be more conservative. For example: int a[3]; int b[3]; &b; // b also exposed int *p = (int*)(uintptr_t)&a[3]; Here, p could point to the one-after address of 'a' or the first element of 'b'. (but only because 'b' was also exposed). If the compiler can prove that something like this can not happen (e.g. by considering offsets), it can still do some tracking of points-to sets. Best, Martin > Richard. > > > Thanks. > > Alexander
On Tue, 28 Jan 2020, Uecker, Martin wrote: > Unfortunately, this is not as simple. It is not trivial to > define the set of objects whose "address could have participated > in the computation" > [...] I am not satisfied with the response, but I'm not sure I should continue arguing here. Alexander
On Tue, Jan 28, 2020 at 1:24 PM Uecker, Martin <Martin.Uecker@med.uni-goettingen.de> wrote: > > Am Dienstag, den 28.01.2020, 11:01 +0100 schrieb Richard Biener: > > On Tue, Jan 28, 2020 at 8:20 AM Alexander Monakov <amonakov@ispras.ru> wrote: > > > > > > On Tue, 28 Jan 2020, Uecker, Martin wrote: > > > > > > > > (*) this also shows the level of "obfuscation" needed to fool compilers > > > > > to lose provenance knowledge is hard to predict. > > > > > > > > Well, this is exactly the problem we want to address by defining > > > > a clear way to do this. Casting to an integer would be the way > > > > to state: "consider the pointer as escaped and forget the > > > > provenance" and casting an integer to a pointer would > > > > mean "this pointer may point to all objects whose pointer has > > > > escaped". This would give the programmer explicit control about > > > > this aspect and make most existing code using pointer-to-integer > > > > casts well-defined. At the same time, this should be simple > > > > to add to existing points-to analysis. > > > > > > Can you explain why you make it required for the compiler to treat the > > > points-to set unnecessarily broader than it could prove? In the Matlab > > > example, there's a simple chain of computations that the compiler is > > > following to prove that the pointer resulting from the final cast is > > > derived from exactly one other pointer (no other pointers have > > > participated in the computations). > > > > > > Or, in other words: > > > > > > is there an example where a programmer can distinguish between the > > > requirement you explain above vs. the fine-grained interpretation > > > that GIMPLE aims to implement (at least as I understand it), which is: > > > > > > when the program creates a pointer by means of non-pointer computations > > > (casts, representation access, etc), the resulting pointer may point to: > > > > > > * any object which address could have participated in the computation > > > (which is at worst the entire set of "exposed" objects up to that > > > program point, but can be much narrower if the compiler can see > > > the entire chain of computations) > > > > > > * any objects which is not "exposed" but could have known address, e.g. > > > because it is placed at a specific address during linking > > > > Note for the current PTA implementation there's almost no cases we can > > handle conservatively enough. Consider the simple > > > > int a[4]; > > int *p = &a[1]; > > uintptr_t pi = (uintptr_t)p; > > pi += 4; > > int *q = (int *)pi; > > > > our PTA knows that p points to a (but not the exact offset), same for pi > > (the cast doesn't change the value). Now you add 4 - this could lead > > you outside of 'a' so the points-to set becomes 'a and anything'. > > PVNI would say that 'a' gets exposed in the first case and > then 'q' can point to all exposed objects because of the > second cast. > > A correct and conservative implementation would do this: > In the PTA you would need to mark the address of a escaped > and then later assign a conservative points-to set (all > escaped objects) to q. I see (I'm asking all these questions to see what implementing -fpta-pnvi would mean). That might be a feasible implementation route. How does "escaped" as used in your answer apply when doing inter-procedural analysis? I guess we would need to assume points-to sets include automatic variables that escaped in the caller. > Yes, this limits optimizations, but I do not think this is > terrible. (optimizations could be re-enabled with a compiler > option) We'll see. > > I'm also not sure what PVNI does to > > > > int a[4]; > > int *p = &a[1]; > > p += 10; > > uintptr_t pi = (uintptr_t)p; > > p = (int *)pi; > > > > we assume that p points to a even after p += 10 (but it of course points > > outside of the object - obvious here, but not necessarily in more > > obfuscated cases). > > This is UB because a pointer to outside of the object is formed. > > > Now, can we assume pi points to a? The cast isn't value-changing. Do we have > > to assume (int *)pi points to anything? So, is > > > > p = (int *)(uintptr_t)p; > > > > something like "laundering" a pointer? We don't preserve such simple round-trip > > casts since they are value-preserving. Are they provenance preserving? > > Yes, such a pair would be "laundering" as it allows 'p' to then > point to any exposed object provenance-wise. > > For such casts the FE would maybe add a marker. Maybe a calls > to builtin functions 'builtin_expose(a)' and 'builting_bless'. > (having those builtins would be interesting on its own, btw). Uh, ok. > Having said this, some optimizations could still be allowed using > the "as-if" rule and other lines of reasoning. Specifally, PVNI > states that 'p' gets assigned the provenance of the object the > integer values is the address of. So if the compiler can proof > that the address belongs to certain objects it can reassign the > points-to set to the new 'p'. Only if there is ambiguity between > which objects the address belongs to, the reasoning needs to > be more conservative. > > For example: > int a[3]; > int b[3]; > > &b; // b also exposed > int *p = (int*)(uintptr_t)&a[3]; > > Here, p could point to the one-after address of 'a' or the > first element of 'b'. (but only because 'b' was also exposed). > > If the compiler can prove that something like this can not > happen (e.g. by considering offsets), it can still do some > tracking of points-to sets. That's probably the very case that we'll get wrong since we definitely won't be able to reliably preserve these kind of laundering points... I guess they could be obfuscated like union { void *p; long l; } u; u.p = p; u.l = u.l; p = u.p; where GCC (and IIRC also now the standard) allows the type-punning when reading u.p via u.l and vice versa. That is, the "conversions" might be hidden in the memory access types. That means (our PTA tracks points-to sets of memory) that all stores of pointers (even to automatic vars that are themselves not exposed) make them escaped and CSE would need to desperately avoid CSEing the load from u.p to p or insert laundering operations. I guess I'd me much more happy if PVNI said that when an integer is converted to a pointer and the integer is value-equivalent to pointers { p1, p2, ... } then the provenance of the resulting pointer is that of p1 (or p2, ... which is semantically equivalent) and when two pointers p1 and p2 are value-equivalent and their provenance is not the same then the behavior is undefined. That is, int a, b; int *p = &a + 1; int *q = &b; if (p == q) ... undefined ... Richard. > Best, > Martin > > > > > Richard. > > > > > Thanks. > > > Alexander
Am Mittwoch, den 29.01.2020, 09:45 +0100 schrieb Richard Biener: > On Tue, Jan 28, 2020 at 1:24 PM Uecker, Martin > <Martin.Uecker@med.uni-goettingen.de> wrote: > > > > > > Note for the current PTA implementation there's almost no cases we can > > > handle conservatively enough. Consider the simple > > > > > > int a[4]; > > > int *p = &a[1]; > > > uintptr_t pi = (uintptr_t)p; > > > pi += 4; > > > int *q = (int *)pi; > > > > > > our PTA knows that p points to a (but not the exact offset), same for pi > > > (the cast doesn't change the value). Now you add 4 - this could lead > > > you outside of 'a' so the points-to set becomes 'a and anything'. > > > > PVNI would say that 'a' gets exposed in the first case and > > then 'q' can point to all exposed objects because of the > > second cast. > > > > A correct and conservative implementation would do this: > > In the PTA you would need to mark the address of a escaped > > and then later assign a conservative points-to set (all > > escaped objects) to q. > > I see (I'm asking all these questions to see what implementing -fpta-pnvi > would mean). Thank you for looking into this. > That might be a feasible implementation route. How > does "escaped" as used in your answer apply when doing inter-procedural > analysis? I guess we would need to assume points-to sets include > automatic variables that escaped in the caller. Yes. Is this not something the compiler assumes in general? If you have code like this: int pi = foo(p); int *q = bar(pi); where 'foo' and 'bar' are unknown to the compiler, it should have the same semantics with regard to PTA as needed for the casts in PVNI. (except that one knows that q and p have the same value which could be exploited for optimization) > > Yes, this limits optimizations, but I do not think this is > > terrible. (optimizations could be re-enabled with a compiler > > option) > > We'll see. > > > > I'm also not sure what PVNI does to > > > > > > int a[4]; > > > int *p = &a[1]; > > > p += 10; > > > uintptr_t pi = (uintptr_t)p; > > > p = (int *)pi; > > > > > > we assume that p points to a even after p += 10 (but it of course points > > > outside of the object - obvious here, but not necessarily in more > > > obfuscated cases). > > > > This is UB because a pointer to outside of the object is formed. > > > > > Now, can we assume pi points to a? The cast isn't value-changing. Do we have > > > to assume (int *)pi points to anything? So, is > > > > > > p = (int *)(uintptr_t)p; > > > > > > something like "laundering" a pointer? We don't preserve such simple round-trip > > > casts since they are value-preserving. Are they provenance preserving? > > > > Yes, such a pair would be "laundering" as it allows 'p' to then > > point to any exposed object provenance-wise. > > > > For such casts the FE would maybe add a marker. Maybe a calls > > to builtin functions 'builtin_expose(a)' and 'builting_bless'. > > (having those builtins would be interesting on its own, btw). > > Uh, ok. For testing, I implemented builtin_expose by adding in 'find_func_aliases_for_builtin_call' case BUILT_IN_ESCAPE: { make_escape_constraint (gimple_call_arg (t, 0)); return true; } which seems to work, but I wasn't sure about the other function. There are concurrent algorithms which revive dead pointers. They might also need explicit control to work around provenance issues. > > Having said this, some optimizations could still be allowed using > > the "as-if" rule and other lines of reasoning. Specifally, PVNI > > states that 'p' gets assigned the provenance of the object the > > integer values is the address of. So if the compiler can proof > > that the address belongs to certain objects it can reassign the > > points-to set to the new 'p'. Only if there is ambiguity between > > which objects the address belongs to, the reasoning needs to > > be more conservative. > > > > For example: > > int a[3]; > > int b[3]; > > > > &b; // b also exposed Correction: this should be: (intptr_t)&b; as it is the cast to integers and not just the address-taken operation that marks the object exposed. > > int *p = (int*)(uintptr_t)&a[3]; > > > > Here, p could point to the one-after address of 'a' or the > > first element of 'b'. (but only because 'b' was also exposed). > > > > If the compiler can prove that something like this can not > > happen (e.g. by considering offsets), it can still do some > > tracking of points-to sets. > > That's probably the very case that we'll get wrong since > we definitely won't be able to reliably preserve these > kind of laundering points... Maybe the FE could add the builtins ? > I guess they could be obfuscated like > > union { void *p; long l; } u; > > u.p = p; > u.l = u.l; > p = u.p; > > where GCC (and IIRC also now the standard) allows the > type-punning when reading u.p via u.l and vice versa. It is allowed in C but not in C++. > That is, the "conversions" might be hidden in the > memory access types. That means (our PTA tracks > points-to sets of memory) that all stores of pointers > (even to automatic vars that are themselves not exposed) > make them escaped and CSE would need to desperately > avoid CSEing the load from u.p to p or insert laundering > operations. I am not sure I fully understand the proposal. > I guess I'd me much more happy if PVNI said that when > an integer is converted to a pointer and the integer > is value-equivalent to pointers { p1, p2, ... } then > the provenance of the resulting pointer is > that of p1 (or p2, ... which is semantically equivalent) (if the provenance is the same) > and when two pointers p1 and p2 are > value-equivalent and their provenance is not the > same then the behavior is undefined. I see. Then here.. int a[3]; int b[3]; (uintptr_t)&b[0]; // b also exposed int *p = (int*)(uintptr_t)&a[3]; ..the behavior is undefined because the two pointers have identical addresses but different provenance. I agree, from a compiler writer's point-of-view this would be a good solution. But to a programmer, this would be quite difficult to explain. The preference of the working group was that the casts should just work in all cases and do what the programmer intended, even if this prevents some optimization. But I will see that this is added to the list of options under consideration. PVNI-ae-ud assigns the provenance of an exposed object at the address. If there are two possible objects (as in the example above), the pointer could point to both but then has to be used consistently only with only one object. Essentially, we want the pointer to have exactly one provenance but we might delay the decision. The idea is that a compiler might figure out the correct provenance later, e.g. by observing accesses. It is possible to formulate some conditions about when a pointer converted from an integer could get assigned the points-to-set of a value-equivalent pointer: 1) using knowledge about object location in memory: If there is no adjacent object which was exposed, one can conclude that the provenance is the object at this address. 2) based on offsets: If the pointer points in the middle of an object, there is also no ambiguity. 3) a mix of both, to differentiate objects before and after in memory. > That is, > > int a, b; > int *p = &a + 1; > int *q = &b; > if (p == q) > ... undefined ... We considered making the comparison undefined in the specific situation where one of the pointer is one-after pointer and the other a pointer to the beginning of a different object. This would solve the problems with conditional equivalences. Others proposed to make the result of the comparison unspecified, but I think this does not help. At the moment, the consensus is that pointer comparison should be always allowed and the result should only depend on the address. Again, the idea is to make is simpler and more consistent for the programmer. But yes, this makes it more difficult for the compiler writer. Best, Martin
On Wed, Jan 29, 2020 at 3:00 PM Uecker, Martin <Martin.Uecker@med.uni-goettingen.de> wrote: > > Am Mittwoch, den 29.01.2020, 09:45 +0100 schrieb Richard Biener: > > On Tue, Jan 28, 2020 at 1:24 PM Uecker, Martin > > <Martin.Uecker@med.uni-goettingen.de> wrote: > > > > > > > > > Note for the current PTA implementation there's almost no cases we can > > > > handle conservatively enough. Consider the simple > > > > > > > > int a[4]; > > > > int *p = &a[1]; > > > > uintptr_t pi = (uintptr_t)p; > > > > pi += 4; > > > > int *q = (int *)pi; > > > > > > > > our PTA knows that p points to a (but not the exact offset), same for pi > > > > (the cast doesn't change the value). Now you add 4 - this could lead > > > > you outside of 'a' so the points-to set becomes 'a and anything'. > > > > > > PVNI would say that 'a' gets exposed in the first case and > > > then 'q' can point to all exposed objects because of the > > > second cast. > > > > > > A correct and conservative implementation would do this: > > > In the PTA you would need to mark the address of a escaped > > > and then later assign a conservative points-to set (all > > > escaped objects) to q. > > > > I see (I'm asking all these questions to see what implementing -fpta-pnvi > > would mean). > > Thank you for looking into this. > > > That might be a feasible implementation route. How > > does "escaped" as used in your answer apply when doing inter-procedural > > analysis? I guess we would need to assume points-to sets include > > automatic variables that escaped in the caller. > > Yes. Is this not something the compiler assumes in general? > > If you have code like this: > > int pi = foo(p); > int *q = bar(pi); > > where 'foo' and 'bar' are unknown to the compiler, it > should have the same semantics with regard to PTA > as needed for the casts in PVNI. > > (except that one knows that q and p have the same > value which could be exploited for optimization) > > > > Yes, this limits optimizations, but I do not think this is > > > terrible. (optimizations could be re-enabled with a compiler > > > option) > > > > We'll see. > > > > > > I'm also not sure what PVNI does to > > > > > > > > int a[4]; > > > > int *p = &a[1]; > > > > p += 10; > > > > uintptr_t pi = (uintptr_t)p; > > > > p = (int *)pi; > > > > > > > > we assume that p points to a even after p += 10 (but it of course points > > > > outside of the object - obvious here, but not necessarily in more > > > > obfuscated cases). > > > > > > This is UB because a pointer to outside of the object is formed. > > > > > > > Now, can we assume pi points to a? The cast isn't value-changing. Do we have > > > > to assume (int *)pi points to anything? So, is > > > > > > > > p = (int *)(uintptr_t)p; > > > > > > > > something like "laundering" a pointer? We don't preserve such simple round-trip > > > > casts since they are value-preserving. Are they provenance preserving? > > > > > > Yes, such a pair would be "laundering" as it allows 'p' to then > > > point to any exposed object provenance-wise. > > > > > > For such casts the FE would maybe add a marker. Maybe a calls > > > to builtin functions 'builtin_expose(a)' and 'builting_bless'. > > > (having those builtins would be interesting on its own, btw). > > > > Uh, ok. > > For testing, I implemented builtin_expose by adding > in 'find_func_aliases_for_builtin_call' > > case BUILT_IN_ESCAPE: > { > make_escape_constraint (gimple_call_arg (t, 0)); > return true; > } > > which seems to work, but I wasn't sure about > the other function. > > There are concurrent algorithms which revive dead pointers. > They might also need explicit control to work around provenance > issues. > > > > Having said this, some optimizations could still be allowed using > > > the "as-if" rule and other lines of reasoning. Specifally, PVNI > > > states that 'p' gets assigned the provenance of the object the > > > integer values is the address of. So if the compiler can proof > > > that the address belongs to certain objects it can reassign the > > > points-to set to the new 'p'. Only if there is ambiguity between > > > which objects the address belongs to, the reasoning needs to > > > be more conservative. > > > > > > For example: > > > int a[3]; > > > int b[3]; > > > > > > &b; // b also exposed > > Correction: this should be: > > (intptr_t)&b; > > as it is the cast to integers and not just the address-taken > operation that marks the object exposed. > > > > int *p = (int*)(uintptr_t)&a[3]; > > > > > > Here, p could point to the one-after address of 'a' or the > > > first element of 'b'. (but only because 'b' was also exposed). > > > > > > If the compiler can prove that something like this can not > > > happen (e.g. by considering offsets), it can still do some > > > tracking of points-to sets. > > > > That's probably the very case that we'll get wrong since > > we definitely won't be able to reliably preserve these > > kind of laundering points... > > Maybe the FE could add the builtins ? > > > I guess they could be obfuscated like > > > > union { void *p; long l; } u; > > > > u.p = p; > > u.l = u.l; > > p = u.p; > > > > where GCC (and IIRC also now the standard) allows the > > type-punning when reading u.p via u.l and vice versa. > > It is allowed in C but not in C++. > > > That is, the "conversions" might be hidden in the > > memory access types. That means (our PTA tracks > > points-to sets of memory) that all stores of pointers > > (even to automatic vars that are themselves not exposed) > > make them escaped and CSE would need to desperately > > avoid CSEing the load from u.p to p or insert laundering > > operations. > > I am not sure I fully understand the proposal. > > > I guess I'd me much more happy if PVNI said that when > > an integer is converted to a pointer and the integer > > is value-equivalent to pointers { p1, p2, ... } then > > the provenance of the resulting pointer is > > that of p1 (or p2, ... which is semantically equivalent) > > (if the provenance is the same) > > > and when two pointers p1 and p2 are > > value-equivalent and their provenance is not the > > same then the behavior is undefined. > > I see. Then here.. > > int a[3]; > int b[3]; > > (uintptr_t)&b[0]; // b also exposed > int *p = (int*)(uintptr_t)&a[3]; > > ..the behavior is undefined because the > two pointers have identical addresses > but different provenance. > > I agree, from a compiler writer's point-of-view > this would be a good solution. But to a programmer, > this would be quite difficult to explain. > The preference of the working group was that the casts > should just work in all cases and do what the > programmer intended, even if this prevents some > optimization. But I will see that this is > added to the list of options under consideration. > > > PVNI-ae-ud assigns the provenance of an > exposed object at the address. If there > are two possible objects (as in the example > above), the pointer could point to both but > then has to be used consistently only with > only one object. Essentially, we want the > pointer to have exactly one provenance but > we might delay the decision. The idea is > that a compiler might figure out the correct > provenance later, e.g. by observing accesses. I thought about alternatives to PVNI and implementation consequences. But all different kind of must-behave-like-this guarantees face serious implementation difficulties I think so the only alternative to PVNI (which I think is implementable but at a optimization opportunity cost) is one that makes two pointers with the same value always have the same provenance (and otherwise make the behavior undefined). > It is possible to formulate > some conditions about when a pointer converted > from an integer could get assigned the > points-to-set of a value-equivalent pointer: > > 1) using knowledge about object location in > memory: If there is no adjacent object which > was exposed, one can conclude that the > provenance is the object at this address. Usually at the point compilers want to know objects are not laid out. So what compilers do is simply say the user cannot possibly know so it can choose at will (even if later object layout disagrees). > 2) based on offsets: If the pointer points > in the middle of an object, there is also > no ambiguity. The difficulty here lies in the requirement of exact offset tracking which makes (some?) points-to implementations prohibitly expensive. But yes, sure. > 3) a mix of both, to differentiate objects > before and after in memory. > > > > That is, > > > > int a, b; > > int *p = &a + 1; > > int *q = &b; > > if (p == q) > > ... undefined ... > > We considered making the comparison undefined in the > specific situation where one of the pointer is one-after > pointer and the other a pointer to the beginning of a > different object. This would solve the problems with > conditional equivalences. Note my proposal doesn't make the comparison undefined but the case where both are equivalent cannot be reached at runtime without invoking undefined behavior. That means we can optimize the comparison based on provenance where p points to a and q points to b. > Others proposed to make the result of the comparison > unspecified, but I think this does not help. Indeed. It's not unspecified, it's known to evaluate to false. I think there's existing wording in the standard that allows it to evaluate to true for pointers one-after-the-object, that would need to be changed of course. > At the moment, the consensus is that pointer > comparison should be always allowed and the > result should only depend on the address. Again, > the idea is to make is simpler and more consistent > for the programmer. But yes, this makes it more > difficult for the compiler writer. it's a conflict of interest on the user side as well - users expect DWIM semantics but at the same time want fastest possible code... Richard. > Best, > Martin
Am Donnerstag, den 30.01.2020, 09:30 +0100 schrieb Richard Biener: > On Wed, Jan 29, 2020 at 3:00 PM Uecker, Martin > <Martin.Uecker@med.uni-goettingen.de> wrote: ... > > > I guess I'd me much more happy if PVNI said that when > > > an integer is converted to a pointer and the integer > > > is value-equivalent to pointers { p1, p2, ... } then > > > the provenance of the resulting pointer is > > > that of p1 (or p2, ... which is semantically equivalent) > > > > (if the provenance is the same) > > > > > and when two pointers p1 and p2 are > > > value-equivalent and their provenance is not the > > > same then the behavior is undefined. > > > > I see. Then here.. > > > > int a[3]; > > int b[3]; > > > > (uintptr_t)&b[0]; // b also exposed > > int *p = (int*)(uintptr_t)&a[3]; > > > > ..the behavior is undefined because the > > two pointers have identical addresses > > but different provenance. > > > > I agree, from a compiler writer's point-of-view > > this would be a good solution. But to a programmer, > > this would be quite difficult to explain. > > The preference of the working group was that the casts > > should just work in all cases and do what the > > programmer intended, even if this prevents some > > optimization. But I will see that this is > > added to the list of options under consideration. > > > > > > PVNI-ae-ud assigns the provenance of an > > exposed object at the address. If there > > are two possible objects (as in the example > > above), the pointer could point to both but > > then has to be used consistently only with > > only one object. Essentially, we want the > > pointer to have exactly one provenance but > > we might delay the decision. The idea is > > that a compiler might figure out the correct > > provenance later, e.g. by observing accesses. > > I thought about alternatives to PVNI and implementation > consequences. But all different kind of must-behave-like-this > guarantees face serious implementation difficulties I think > so the only alternative to PVNI (which I think is implementable > but at a optimization opportunity cost) is one that makes > two pointers with the same value always have the same > provenance (and otherwise make the behavior undefined). This would need to come with precise rules about when the occurance of two such pointers is UB, e.g. comparisons of such pointers, or that two such pointers are cast to int in the same execution. The mere existance of such pointers should be quite common and should not already be UB. But I am uncomfortable with the idea that comparison of pointers is always allowed except for some special case which then is UB. This might cause are and very difficult to find bugs. > > It is possible to formulate > > some conditions about when a pointer converted > > from an integer could get assigned the > > points-to-set of a value-equivalent pointer: > > > > 1) using knowledge about object location in > > memory: If there is no adjacent object which > > was exposed, one can conclude that the > > provenance is the object at this address. > > Usually at the point compilers want to know objects > are not laid out. So what compilers do is simply > say the user cannot possibly know so it can > choose at will (even if later object layout disagrees). The compiler is free to choose at will. But in my opinion, it then has to stick with its choice. Otherwise, this leads to really abstract and confusing semantics. The wording of the standard also implies that UB is based on actual behavior. > > 2) based on offsets: If the pointer points > > in the middle of an object, there is also > > no ambiguity. > > The difficulty here lies in the requirement of > exact offset tracking which makes (some?) > points-to implementations prohibitly expensive. > But yes, sure. Yes, but perhaps there are some low-hanging fruit where it is easy to determine. > > 3) a mix of both, to differentiate objects > > before and after in memory. > > > > > > > That is, > > > > > > int a, b; > > > int *p = &a + 1; > > > int *q = &b; > > > if (p == q) > > > ... undefined ... > > > > We considered making the comparison undefined in the > > specific situation where one of the pointer is one-after > > pointer and the other a pointer to the beginning of a > > different object. This would solve the problems with > > conditional equivalences. > > Note my proposal doesn't make the comparison undefined > but the case where both are equivalent cannot be reached > at runtime without invoking undefined behavior. That means > we can optimize the comparison based on provenance > where p points to a and q points to b. Sorry, I did not get this. What are the exact conditions for UB? > > Others proposed to make the result of the comparison > > unspecified, but I think this does not help. > > Indeed. It's not unspecified, it's known to evaluate to false. > I think there's existing wording in the standard that > allows it to evaluate to true for pointers one-after-the-object, > that would need to be changed of course. The problem is that if the comparison if not optimized and the pointers have the same address, then it would evaluate to true at run-time. If I understand correctly, you somehow want to make this case be UB, but I haven't quite understood how (if it is not the comparison of such pointers that invokes UB). > > At the moment, the consensus is that pointer > > comparison should be always allowed and the > > result should only depend on the address. Again, > > the idea is to make is simpler and more consistent > > for the programmer. But yes, this makes it more > > difficult for the compiler writer. > > it's a conflict of interest on the user side as well - users > expect DWIM semantics but at the same time want > fastest possible code... Yes. It is not just DWIM but also more easily to understand semantics which should lead to less bugs. The general feeling is that C moved a bit too much to the side of fastest possible code. My personal preference would be to put PVNI-ae-ud in the standard and have compiler options which re-enable these - then unsafe from the standard' point-of-view - optimization for those who need it. Best, Martin
Hi, On Thu, 30 Jan 2020, Uecker, Martin wrote: > > guarantees face serious implementation difficulties I think > > so the only alternative to PVNI (which I think is implementable > > but at a optimization opportunity cost) is one that makes > > two pointers with the same value always have the same > > provenance (and otherwise make the behavior undefined). > > This would need to come with precise rules about > when the occurance of two such pointers is UB, > e.g. comparisons of such pointers, or that > two such pointers are cast to int in the same > execution. > > The mere existance of such pointers should be > quite common and should not already be UB. > > But I am uncomfortable with the idea that > comparison of pointers is always allowed except > for some special case which then is UB. This > might cause are and very difficult to find bugs. As Richi said, the comparison itself wouldn't be UB, all comparisons would be allowed. But _if_ the pointers compare equal, they must have same (or overlapping) provenance (i.e. when they have not, then _that_ is UB). > > > Others proposed to make the result of the comparison unspecified, > > > but I think this does not help. > > > > Indeed. It's not unspecified, it's known to evaluate to false. I > > think there's existing wording in the standard that allows it to > > evaluate to true for pointers one-after-the-object, that would need to > > be changed of course. > > The problem is that if the comparison if not optimized > and the pointers have the same address, then it would > evaluate to true at run-time. If I understand correctly, > you somehow want to make this case be UB, but I haven't > quite understood how (if it is not the comparison of such > pointers that invokes UB). By saying something like "if two pointers compare equal they must have the same provenance, otherwise the behaviour is undefined". (I don't know if this definition would or would not help with the problems PVNI poses to compilers). Ciao, Michael.
Hi, On Thu, 30 Jan 2020, Michael Matz wrote: > > and the pointers have the same address, then it would evaluate to true > > at run-time. If I understand correctly, you somehow want to make this > > case be UB, but I haven't quite understood how (if it is not the > > comparison of such pointers that invokes UB). > > By saying something like "if two pointers compare equal they must have > the same provenance, otherwise the behaviour is undefined". > > (I don't know if this definition would or would not help with the > problems PVNI poses to compilers). Or, actually I know at least one case. The problem with allowing value-equivalent pointers to have non-overlapping provenance is the following: many of the compiler optimizations are based on as-if rules. Now, if it's very easy for users to detect certain situations, that means that the as-if rules can't be invoked as often. In this specific instance, if the user writes a program where the compiler would optimize mem accesses based on non-overlapping provenance (e.g. a stored value is propagated downwards over a store of different provenance), and then somewhere else also compares these non-overlapping pointers for equality, and then, if they are equal prints out "hah! invalid optimization detected", and the outcome of the comparison of non-overlapping pointers weren't left unspecified, then that's the reason why the compiler would have to globally disable the first optimization (at least when it can't prove that there aren't any such comparisons). Ideally we don't want that :) Ciao, Michael.
Am Donnerstag, den 30.01.2020, 16:50 +0000 schrieb Michael Matz: > Hi, > > On Thu, 30 Jan 2020, Uecker, Martin wrote: > > > > guarantees face serious implementation difficulties I think > > > so the only alternative to PVNI (which I think is implementable > > > but at a optimization opportunity cost) is one that makes > > > two pointers with the same value always have the same > > > provenance (and otherwise make the behavior undefined). > > > > This would need to come with precise rules about > > when the occurance of two such pointers is UB, > > e.g. comparisons of such pointers, or that > > two such pointers are cast to int in the same > > execution. > > > > The mere existance of such pointers should be > > quite common and should not already be UB. > > > > But I am uncomfortable with the idea that > > comparison of pointers is always allowed except > > for some special case which then is UB. This > > might cause are and very difficult to find bugs. > > As Richi said, the comparison itself wouldn't be UB, all comparisons would > be allowed. But _if_ the pointers compare equal, they must have same (or > overlapping) provenance (i.e. when they have not, then _that_ is UB). Sorry, I still don't get it. In the following example, int a[1]; int b[1]; it is often the case that &a[1] and &b[0] compare equal because they have the same address but the pointer have different provenance. Or does there need to be an actual evaluation of a comparison operations? In this case, I do not see the difference to what I said. Best, Martin > > > > Others proposed to make the result of the comparison unspecified, > > > > but I think this does not help. > > > > > > Indeed. It's not unspecified, it's known to evaluate to false. I > > > think there's existing wording in the standard that allows it to > > > evaluate to true for pointers one-after-the-object, that would need to > > > be changed of course. > > > > The problem is that if the comparison if not optimized > > and the pointers have the same address, then it would > > evaluate to true at run-time. If I understand correctly, > > you somehow want to make this case be UB, but I haven't > > quite understood how (if it is not the comparison of such > > pointers that invokes UB). > > By saying something like "if two pointers compare equal they must have the > same provenance, otherwise the behaviour is undefined". > (I don't know if this definition would or would not help with the problems > PVNI poses to compilers). > > > Ciao, > Michael.
On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin <Martin.Uecker@med.uni-goettingen.de> wrote: > > Am Donnerstag, den 30.01.2020, 16:50 +0000 schrieb Michael Matz: > > Hi, > > > > On Thu, 30 Jan 2020, Uecker, Martin wrote: > > > > > > guarantees face serious implementation difficulties I think > > > > so the only alternative to PVNI (which I think is implementable > > > > but at a optimization opportunity cost) is one that makes > > > > two pointers with the same value always have the same > > > > provenance (and otherwise make the behavior undefined). > > > > > > This would need to come with precise rules about > > > when the occurance of two such pointers is UB, > > > e.g. comparisons of such pointers, or that > > > two such pointers are cast to int in the same > > > execution. > > > > > > The mere existance of such pointers should be > > > quite common and should not already be UB. > > > > > > But I am uncomfortable with the idea that > > > comparison of pointers is always allowed except > > > for some special case which then is UB. This > > > might cause are and very difficult to find bugs. > > > > As Richi said, the comparison itself wouldn't be UB, all comparisons would > > be allowed. But _if_ the pointers compare equal, they must have same (or > > overlapping) provenance (i.e. when they have not, then _that_ is UB). > > Sorry, I still don't get it. In the following example, > > int a[1]; > int b[1]; > > it is often the case that &a[1] and &b[0] compare equal > because they have the same address but the pointer > have different provenance. > > Or does there need to be an actual evaluation of a comparison > operations? In this case, I do not see the difference to what > I said. I guess I wanted to say that if you do if (&a[1] == &b[0]) if (&a[1] != &b[0]) abort (); then the abort might happen. I'm using the term "undefined behavior" here. So whenever you create a value based on two pointers with the same value and different provenance you invoke undefined behavior. That allows the compiler to optimize int *q, *r; if (q == r) *r = 1; into if (q == r) *q = 1; which it is currently not allowed to do because of that dread one-after-the object equality compare, not because of PNVI, but similar cases obviously can be constructed with integers (and make our live difficult as we're tracking provenance through integers). Compilers fundamentally work with value-equivalences, the above example shows we may not. That's IMHO a defect in the standard. Richard. > > Best, > Martin > > > > > > Others proposed to make the result of the comparison unspecified, > > > > > but I think this does not help. > > > > > > > > Indeed. It's not unspecified, it's known to evaluate to false. I > > > > think there's existing wording in the standard that allows it to > > > > evaluate to true for pointers one-after-the-object, that would need to > > > > be changed of course. > > > > > > The problem is that if the comparison if not optimized > > > and the pointers have the same address, then it would > > > evaluate to true at run-time. If I understand correctly, > > > you somehow want to make this case be UB, but I haven't > > > quite understood how (if it is not the comparison of such > > > pointers that invokes UB). > > > > By saying something like "if two pointers compare equal they must have the > > same provenance, otherwise the behaviour is undefined". > > > (I don't know if this definition would or would not help with the problems > > PVNI poses to compilers). > > > > > > Ciao, > > Michael.
Am Freitag, den 31.01.2020, 09:02 +0100 schrieb Richard Biener: > On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin > <Martin.Uecker@med.uni-goettingen.de> wrote: > > > > Am Donnerstag, den 30.01.2020, 16:50 +0000 schrieb Michael Matz: > > > Hi, > > > > > > On Thu, 30 Jan 2020, Uecker, Martin wrote: > > > > > > > > guarantees face serious implementation difficulties I think > > > > > so the only alternative to PVNI (which I think is implementable > > > > > but at a optimization opportunity cost) is one that makes > > > > > two pointers with the same value always have the same > > > > > provenance (and otherwise make the behavior undefined). > > > > > > > > This would need to come with precise rules about > > > > when the occurance of two such pointers is UB, > > > > e.g. comparisons of such pointers, or that > > > > two such pointers are cast to int in the same > > > > execution. > > > > > > > > The mere existance of such pointers should be > > > > quite common and should not already be UB. > > > > > > > > But I am uncomfortable with the idea that > > > > comparison of pointers is always allowed except > > > > for some special case which then is UB. This > > > > might cause are and very difficult to find bugs. > > > > > > As Richi said, the comparison itself wouldn't be UB, all comparisons would > > > be allowed. But _if_ the pointers compare equal, they must have same (or > > > overlapping) provenance (i.e. when they have not, then _that_ is UB). > > > > Sorry, I still don't get it. In the following example, > > > > int a[1]; > > int b[1]; > > > > it is often the case that &a[1] and &b[0] compare equal > > because they have the same address but the pointer > > have different provenance. > > > > Or does there need to be an actual evaluation of a comparison > > operations? In this case, I do not see the difference to what > > I said. > > I guess I wanted to say that if you do > > if (&a[1] == &b[0]) > if (&a[1] != &b[0]) > abort (); > > then the abort might happen. I'm using the term "undefined behavior" > here. So whenever you create a value based on two pointers with > the same value and different provenance you invoke undefined behavior. Yes, but it is tricky because one needs to define "create a value based on two pointers with..." Assuming one does not track provenance through integers, the only way to create expressions using two pointers are comparisons, pointer subtraction, and the tertiary operator. The tertiary operator seems unproblematic. For pointer subtraction, the standard already requires same provenance. For comparisons, one could consider making this case UB. But I fear this could be the source of subtle bugs. Then there is the question about what happens if a programm inspects the representation bytes of a pointer directly... > That allows the compiler to optimize > > int *q, *r; > if (q == r) > *r = 1; > > into > > if (q == r) > *q = 1; > > which it is currently not allowed to do because of that dread one-after-the > object equality compare, not because of PNVI, but similar cases Yes, but as provenance is tracked at compile-time, you could still do the optimization if you assign the right provenance to the replaced variable, i.e. you replace 'r' with 'q' but keep the provenance of 'r'. So while this puts a burden on the compiler writers, it seems feasible. Or am I missing something? > obviously can be constructed with integers (and make our live difficult > as we're tracking provenance through integers). As in PVNI integers do not have provenance, such an optimization would always be valid for integers as would all other natural algebraic optimizations for integers. I consider this a major strength of the proposal and I kind of hoped that compiler writers would agree. > Compilers fundamentally work with value-equivalences, the above example > shows we may not. That's IMHO a defect in the standard. I consider provenance to be part of the value. Think about architectures with descriptors that actually trap if you use the wrong pointer. This nicely corresponds to a concept of abstract pointers which not simple the address of a memory location. The problems we have that we can not (cheaply) track provenance at runtime on modern CPUs and only the address part of the pointer is available ar runtime. For the standard, this implies that the rules must work both abstract pointers with provenance and address-only pointers where information about provenance is not available. Whenever there is a discrepancy between these two models, we can either make it UB or use the semantics of the address-only case. The only real problematic case we have with PVNI is comparisons for one-after-the object pointers with a pointer of different provenance. The only choices we have is to make this UB or to make the result well-defined and based on the address. Both choices have disadvantages. If we track provenance through integers, there are many other difficult problems. The reason is that you then cannot work with value-equivalences anymore even for integer expressions which are much more complex. The amount of additional problems we create here is the main reason we want to have PVNI and not track provenance through integers. Best, Martin
On Fri, Jan 31, 2020 at 1:05 PM Uecker, Martin <Martin.Uecker@med.uni-goettingen.de> wrote: > > Am Freitag, den 31.01.2020, 09:02 +0100 schrieb Richard Biener: > > On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin > > <Martin.Uecker@med.uni-goettingen.de> wrote: > > > > > > Am Donnerstag, den 30.01.2020, 16:50 +0000 schrieb Michael Matz: > > > > Hi, > > > > > > > > On Thu, 30 Jan 2020, Uecker, Martin wrote: > > > > > > > > > > guarantees face serious implementation difficulties I think > > > > > > so the only alternative to PVNI (which I think is implementable > > > > > > but at a optimization opportunity cost) is one that makes > > > > > > two pointers with the same value always have the same > > > > > > provenance (and otherwise make the behavior undefined). > > > > > > > > > > This would need to come with precise rules about > > > > > when the occurance of two such pointers is UB, > > > > > e.g. comparisons of such pointers, or that > > > > > two such pointers are cast to int in the same > > > > > execution. > > > > > > > > > > The mere existance of such pointers should be > > > > > quite common and should not already be UB. > > > > > > > > > > But I am uncomfortable with the idea that > > > > > comparison of pointers is always allowed except > > > > > for some special case which then is UB. This > > > > > might cause are and very difficult to find bugs. > > > > > > > > As Richi said, the comparison itself wouldn't be UB, all comparisons would > > > > be allowed. But _if_ the pointers compare equal, they must have same (or > > > > overlapping) provenance (i.e. when they have not, then _that_ is UB). > > > > > > Sorry, I still don't get it. In the following example, > > > > > > int a[1]; > > > int b[1]; > > > > > > it is often the case that &a[1] and &b[0] compare equal > > > because they have the same address but the pointer > > > have different provenance. > > > > > > Or does there need to be an actual evaluation of a comparison > > > operations? In this case, I do not see the difference to what > > > I said. > > > > I guess I wanted to say that if you do > > > > if (&a[1] == &b[0]) > > if (&a[1] != &b[0]) > > abort (); > > > > then the abort might happen. I'm using the term "undefined behavior" > > here. So whenever you create a value based on two pointers with > > the same value and different provenance you invoke undefined behavior. > > Yes, but it is tricky because one needs to define > "create a value based on two pointers with..." > > Assuming one does not track provenance through integers, > the only way to create expressions using two pointers > are comparisons, pointer subtraction, and the tertiary > operator. > > The tertiary operator seems unproblematic. For pointer > subtraction, the standard already requires same provenance. > > For comparisons, one could consider making this case UB. > But I fear this could be the source of subtle bugs. > > Then there is the question about what happens if a > programm inspects the representation bytes of a > pointer directly... At least the pointer is then no longer a register ;) > > That allows the compiler to optimize > > > > int *q, *r; > > if (q == r) > > *r = 1; > > > > into > > > > if (q == r) > > *q = 1; > > > > which it is currently not allowed to do because of that dread one-after-the > > object equality compare, not because of PNVI, but similar cases > > Yes, but as provenance is tracked at compile-time, you could still > do the optimization if you assign the right provenance to the > replaced variable, i.e. you replace 'r' with 'q' but keep the > provenance of 'r'. So while this puts a burden on the compiler > writers, it seems feasible. Or am I missing something? With SSA it's not easy since q before the comparison is the same so it's *q_1 = 2; if (q_1 == r_2) *r_2 = 1; -> *q_1 = 1; and we cannot change the provenance of q_1 since that affects the earlier store. We'd have to somehow attach provenance to all _operations_ where it matters (the dereference in this case). That's a much larger change. > > obviously can be constructed with integers (and make our live difficult > > as we're tracking provenance through integers). > > As in PVNI integers do not have provenance, such an optimization would > always be valid for integers as would all other natural algebraic > optimizations for integers. I consider this a major strength of > the proposal and I kind of hoped that compiler writers would agree. Yes, sure - it avoids this class of problems. PVNI is probably the very simplest approach to fix whatever problem it tries to fix ;) > > Compilers fundamentally work with value-equivalences, the above example > > shows we may not. That's IMHO a defect in the standard. > > I consider provenance to be part of the value. Think about > architectures with descriptors that actually trap if you use > the wrong pointer. This nicely corresponds to a concept > of abstract pointers which not simple the address of a > memory location. > > The problems we have that we can not (cheaply) track provenance > at runtime on modern CPUs and only the address part of the pointer > is available ar runtime. For the standard, this implies > that the rules must work both abstract pointers with provenance > and address-only pointers where information about provenance > is not available. Whenever there is a discrepancy between > these two models, we can either make it UB or use the semantics > of the address-only case. > > The only real problematic case we have with PVNI is comparisons > for one-after-the object pointers with a pointer of different > provenance. The only choices we have is to make this UB or > to make the result well-defined and based on the address. > Both choices have disadvantages. > > If we track provenance through integers, there are many > other difficult problems. The reason is that you then > cannot work with value-equivalences anymore even for > integer expressions which are much more complex. > The amount of additional problems we create here > is the main reason we want to have PVNI and not > track provenance through integers. I understand that. Richard. > Best, > Martin > >
diff --git a/gcc/doc/implement-c.texi b/gcc/doc/implement-c.texi index 692297b69c4..beee2510670 100644 --- a/gcc/doc/implement-c.texi +++ b/gcc/doc/implement-c.texi @@ -401,11 +401,15 @@ pointer representation is smaller than the integer type, extends according to the signedness of the integer type if the pointer representation is larger than the integer type, otherwise the bits are unchanged. -When casting from pointer to integer and back again, the resulting -pointer must reference the same object as the original pointer, otherwise -the behavior is undefined. That is, one may not use integer arithmetic to -avoid the undefined behavior of pointer arithmetic as proscribed in -C99 and C11 6.5.6/8. +Arithmetic on integers derived from pointers can produce a value such +that casting it back produces a valid pointer corresponding to one of +the original pointers. Thus, integer arithmetic allows to express +computations that might not be expressible as pointer arithmetic without +undefined behavior. However, at a certain point the distinction between +pointers and integers is lost (when GCC translates from GIMPLE internal +representation to RTL), but some optimizations still attempt to track +pointer arithmetic beyond that point. In some cases this may cause +valid code to be incorrectly optimized. @item @cite{The size of the result of subtracting two pointers to elements