Message ID | cover.1727861750.git.alx@kernel.org |
---|---|
Headers | show |
Series | c: Add __lengthof__ operator | expand |
Patches 1, 2 and 3 are logically nothing to do with this feature. I'll wait for them to be reviewed so that we only have a single-patch series, before doing final review of the main patch. Since the feature was accepted as _Lengthof, that's the form that should be added to GCC; no __lengthof__ variant needed. In general in GCC, although not strictly required by the standard in this case, we use pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a new C2Y feature that's not in C23 (if -pedantic with a pre-C2Y standard, or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 -pedantic, no diagnostic with -std=c23 -pedantic-errors -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning with -std=c2y -pedantic-errors -Wc23-c2y-compat). (pedwarn_c23 handles that logic, you just need the pedwarn_c23 call and the tests for those various cases.)
Hi Joseph, On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote: > Patches 1, 2 and 3 are logically nothing to do with this feature. I'll > wait for them to be reviewed so that we only have a single-patch series, > before doing final review of the main patch. I do not fully understand. Who has to review patches 1,2,3? Also, do you want to merge them, then I resend patch 4 as a single patch, and then you review that one? If so, that looks like a good plan to me. Thanks! > Since the feature was accepted as _Lengthof, that's the form that should > be added to GCC; no __lengthof__ variant needed. Okay, I'm indiferent to choosing between both of those names; since they are equally harmful. ;) On the other hand, I'm tempted to propose a different name, and force ISO to reconsider and follow. The discussion on this list was more thorough than the short discussion at WG14, which didn't really take into consideration the dangers of harmful and error-prone nomenclature. > In general in GCC, > although not strictly required by the standard in this case, we use > pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a > new C2Y feature that's not in C23 Thanks; will do. Have a lovely night! Alex > (if -pedantic with a pre-C2Y standard, > or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to > verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 > -pedantic, no diagnostic with -std=c23 -pedantic-errors > -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning > with -std=c2y -pedantic-errors -Wc23-c2y-compat). (pedwarn_c23 handles > that logic, you just need the pedwarn_c23 call and the tests for those > various cases.) > > -- > Joseph S. Myers > josmyers@redhat.com >
On Tue, Oct 08, 2024 at 02:04:39AM GMT, Alejandro Colomar wrote: > Hi Joseph, > > On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote: > > Patches 1, 2 and 3 are logically nothing to do with this feature. I'll > > wait for them to be reviewed so that we only have a single-patch series, > > before doing final review of the main patch. > > I do not fully understand. Who has to review patches 1,2,3? Also, do > you want to merge them, then I resend patch 4 as a single patch, and > then you review that one? If so, that looks like a good plan to me. > > Thanks! > > > Since the feature was accepted as _Lengthof, that's the form that should > > be added to GCC; no __lengthof__ variant needed. > > Okay, I'm indiferent to choosing between both of those names; since they > are equally harmful. ;) > > On the other hand, I'm tempted to propose a different name, and force > ISO to reconsider and follow. The discussion on this list was more > thorough than the short discussion at WG14, which didn't really take > into consideration the dangers of harmful and error-prone nomenclature. > > > In general in GCC, > > although not strictly required by the standard in this case, we use > > pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a > > new C2Y feature that's not in C23 > > Thanks; will do. On the other hand, should we provide a version of the operator that is free from pedantic warnings? A GNU extension? Cheers, Alex > > Have a lovely night! > Alex > > > (if -pedantic with a pre-C2Y standard, > > or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to > > verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 > > -pedantic, no diagnostic with -std=c23 -pedantic-errors > > -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning > > with -std=c2y -pedantic-errors -Wc23-c2y-compat). (pedwarn_c23 handles > > that logic, you just need the pedwarn_c23 call and the tests for those > > various cases.) > > > > -- > > Joseph S. Myers > > josmyers@redhat.com > > > > -- > <https://www.alejandro-colomar.es/>
On Tue, Oct 08, 2024 at 02:09:52AM +0200, Alejandro Colomar wrote: > On the other hand, should we provide a version of the operator that is > free from pedantic warnings? A GNU extension? No, why? One can always use (__extension__ _Lengthof (...)). Jakub
On Tue, Oct 08, 2024 at 08:45:48AM GMT, Jakub Jelinek wrote: > On Tue, Oct 08, 2024 at 02:09:52AM +0200, Alejandro Colomar wrote: > > On the other hand, should we provide a version of the operator that is > > free from pedantic warnings? A GNU extension? > > No, why? One can always use (__extension__ _Lengthof (...)). Just wondering if we should do it for symmetry with __alignof__. But yeah, no need. Cheers, Alex > > Jakub >
I think it would be beneficial to have different syntax/spelling for features still in development. That way we, as a committee, can tweak it as we please, while mitigating effect on early adopters. If what ends in final document is exactly the same as in early phrases, then great, all users are left to do is simple find and replace. Warning about using an experimental feature that is prone to changes is obviously useful, but it disappears after upgrade to latest standard. If there were diverges between early and final versions, it would be nice to have warnings about that too. ~ J.Ł. On 2024-10-07 19:35 CEST, Joseph Myers <josmyers@redhat.com> wrote: > Patches 1, 2 and 3 are logically nothing to do with this feature. I'll > wait for them to be reviewed so that we only have a single-patch series, > before doing final review of the main patch. > > Since the feature was accepted as _Lengthof, that's the form that should > be added to GCC; no __lengthof__ variant needed. In general in GCC, > although not strictly required by the standard in this case, we use > pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a > new C2Y feature that's not in C23 (if -pedantic with a pre-C2Y standard, > or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to > verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 > -pedantic, no diagnostic with -std=c23 -pedantic-errors > -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning > with -std=c2y -pedantic-errors -Wc23-c2y-compat). (pedwarn_c23 handles > that logic, you just need the pedwarn_c23 call and the tests for those > various cases.)
Hi Jakub, On Tue, Oct 08, 2024 at 10:25:24AM GMT, Jakub Łukasiewicz wrote: > I think it would be beneficial to have different syntax/spelling for > features still in development. That way we, as a committee, can tweak it as > we please, while mitigating effect on early adopters. > > If what ends in final document is exactly the same as in early phrases, then > great, all users are left to do is simple find and replace. > > Warning about using an experimental feature that is prone to changes is > obviously useful, but it disappears after upgrade to latest standard. If > there were diverges between early and final versions, it would be nice to > have warnings about that too. How about adding the __lower__ version now as a GNU extension with compatible semantics, and when it's closer to an ISO C2y release add the _Upper one? That gives us more freedom. Cheers, Alex > > ~ J.Ł. > > On 2024-10-07 19:35 CEST, Joseph Myers <josmyers@redhat.com> wrote: > > Patches 1, 2 and 3 are logically nothing to do with this feature. I'll > > wait for them to be reviewed so that we only have a single-patch series, > > before doing final review of the main patch. > > > > Since the feature was accepted as _Lengthof, that's the form that should > > be added to GCC; no __lengthof__ variant needed. In general in GCC, > > although not strictly required by the standard in this case, we use > > pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a > > new C2Y feature that's not in C23 (if -pedantic with a pre-C2Y standard, > > or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to > > verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 > > -pedantic, no diagnostic with -std=c23 -pedantic-errors > > -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, > > warning with -std=c2y -pedantic-errors -Wc23-c2y-compat). (pedwarn_c23 > > handles that logic, you just need the pedwarn_c23 call and the tests for > > those various cases.)
Yup :) For example that ~ J.Ł. On 2024-10-08 10:33 CEST, Alejandro Colomar <alx@kernel.org> wrote: > Hi Jakub, > > On Tue, Oct 08, 2024 at 10:25:24AM GMT, Jakub Łukasiewicz wrote: > > I think it would be beneficial to have different syntax/spelling for > > features still in development. That way we, as a committee, can tweak it as > > we please, while mitigating effect on early adopters. > > > > If what ends in final document is exactly the same as in early phrases, then > > great, all users are left to do is simple find and replace. > > > > Warning about using an experimental feature that is prone to changes is > > obviously useful, but it disappears after upgrade to latest standard. If > > there were diverges between early and final versions, it would be nice to > > have warnings about that too. > > How about adding the __lower__ version now as a GNU extension with > compatible semantics, and when it's closer to an ISO C2y release add the > _Upper one? > > That gives us more freedom. > > Cheers, > Alex > > > > > ~ J.Ł. > > > > On 2024-10-07 19:35 CEST, Joseph Myers <josmyers@redhat.com> wrote: > > > Patches 1, 2 and 3 are logically nothing to do with this feature. I'll > > > wait for them to be reviewed so that we only have a single-patch series, > > > before doing final review of the main patch. > > > > > > Since the feature was accepted as _Lengthof, that's the form that should > > > be added to GCC; no __lengthof__ variant needed. In general in GCC, > > > although not strictly required by the standard in this case, we use > > > pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a > > > new C2Y feature that's not in C23 (if -pedantic with a pre-C2Y standard, > > > or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to > > > verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 > > > -pedantic, no diagnostic with -std=c23 -pedantic-errors > > > -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, > > > warning with -std=c2y -pedantic-errors -Wc23-c2y-compat). (pedwarn_c23 > > > handles that logic, you just need the pedwarn_c23 call and the tests for > > > those various cases.)
On Tue, 8 Oct 2024, Alejandro Colomar wrote: > On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote: > > Patches 1, 2 and 3 are logically nothing to do with this feature. I'll > > wait for them to be reviewed so that we only have a single-patch series, > > before doing final review of the main patch. > > I do not fully understand. Who has to review patches 1,2,3? Also, do Someone who is a maintainer or reviewer of relevant parts of the compiler. Maybe 90% of the people CC:ed are not GCC maintainers or reviewers and should not be included on these patches at all. > you want to merge them, then I resend patch 4 as a single patch, and > then you review that one? If so, that looks like a good plan to me. Yes, patch 4 as a single patch. With _Lengthof. No other names, no __countof__, no __lengthof__. _Lengthof is a perfectly good name, no need to be gratuitously incompatible.
On Tue, 8 Oct 2024, Alejandro Colomar wrote: > How about adding the __lower__ version now as a GNU extension with > compatible semantics, and when it's closer to an ISO C2y release add the > _Upper one? No, we don't need two names. Just _Lengthof suffices. If semantics change in some way during C2y development, we can update the implementation to match. We have a rule that the C++ library ABI is explicitly unstable for features in a new standard where support is still under development, and only commit to a stable C++ library ABI supporting existing binaries once the support for that version is mature enough. I think it's reasonable also to say that features from C2y (including when supported in older standards modes) are unstable and liable to change when the semantics in C2y change. (Liable to change doesn't mean changing on a release branch after the first release from that branch. But it could in some cases mean changing incompatibly while mainline is in regression fixes mode, to avoid releasing something significantly incompatible with later changes.)
Hi Joseph, On Tue, Oct 08, 2024 at 01:19:06PM GMT, Joseph Myers wrote: > On Tue, 8 Oct 2024, Alejandro Colomar wrote: > > > On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote: > > > Patches 1, 2 and 3 are logically nothing to do with this feature. I'll > > > wait for them to be reviewed so that we only have a single-patch series, > > > before doing final review of the main patch. > > > > I do not fully understand. Who has to review patches 1,2,3? Also, do > > Someone who is a maintainer or reviewer of relevant parts of the compiler. > Maybe 90% of the people CC:ed are not GCC maintainers or reviewers and > should not be included on these patches at all. Those are people with interest in one way or another in this feature (most of them, patch 4). While patches 1,2,3 are irrelevant to them, I kept them on the entire thread for simplicity. > > > you want to merge them, then I resend patch 4 as a single patch, and > > then you review that one? If so, that looks like a good plan to me. > > Yes, patch 4 as a single patch. With _Lengthof. No other names, no > __countof__, no __lengthof__. _Lengthof is a perfectly good name, no need > to be gratuitously incompatible. It is not gratuitously, IMO. You already know my concerns about it; please sed(1) yourself the name of the operator from these patches, and append your signature below mine, if you want to rename it. I won't do that, for I think it introduces a security problem that will slowly develop. If you wish to wait for Graz to make sure there's no incompatibility with ISO, that's another possibility. Have a lovely day! Alex
On Tue, Oct 08, 2024 at 03:28:29PM +0200, Alejandro Colomar wrote: > On Tue, Oct 08, 2024 at 01:19:06PM GMT, Joseph Myers wrote: > > On Tue, 8 Oct 2024, Alejandro Colomar wrote: > > > > > On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote: > > > > Patches 1, 2 and 3 are logically nothing to do with this feature. I'll > > > > wait for them to be reviewed so that we only have a single-patch series, > > > > before doing final review of the main patch. > > > > > > I do not fully understand. Who has to review patches 1,2,3? Also, do > > > > Someone who is a maintainer or reviewer of relevant parts of the compiler. > > Maybe 90% of the people CC:ed are not GCC maintainers or reviewers and > > should not be included on these patches at all. > > Those are people with interest in one way or another in this feature > (most of them, patch 4). While patches 1,2,3 are irrelevant to them, I > kept them on the entire thread for simplicity. I'd just suggest posting the patch 4 alone adjusted so that it doesn't need 2,3. Those 2 can be handled completely independently from patch 4 and for 1 there is no dependency, it is just a random unrelated patch. > > > you want to merge them, then I resend patch 4 as a single patch, and > > > then you review that one? If so, that looks like a good plan to me. > > > > Yes, patch 4 as a single patch. With _Lengthof. No other names, no > > __countof__, no __lengthof__. _Lengthof is a perfectly good name, no need > > to be gratuitously incompatible. > > It is not gratuitously, IMO. You already know my concerns about it; > please sed(1) yourself the name of the operator from these patches, and > append your signature below mine, if you want to rename it. I won't do > that, for I think it introduces a security problem that will slowly > develop. > > If you wish to wait for Graz to make sure there's no incompatibility > with ISO, that's another possibility. I don't see why we'd need to wait for next WG14 meeting here, as the paper has been voted into the next standard draft, compilers can implement the feature, and as with anything else if the standard is changed later, it will be adjusted to match it (or withdrawn/renamed or whatever). That is always a risk of using features from unreleased standards (but even features in released standards can be tweaked with DRs later on). It doesn't make sense to have two different operators just because we fear it will change. We do have some cases of different operators for the same thing if it was first implemented as an extension and only later standardized under different name or with different rules. That is not the case here. Jakub
On Tue, 8 Oct 2024, Alejandro Colomar wrote: > If you wish to wait for Graz to make sure there's no incompatibility > with ISO, that's another possibility. The name could be changed in GCC after Graz (while in regression-fixes-only mode for GCC 15) if WG14 changes the name in Graz. It wouldn't be a good idea to add as a new feature while in regression-fixes-only mode; waiting to after Graz for that would mean only adding it in GCC 16. "length" is well-understood as referring to the number of elements of an array even if not explicitly defined as such in the standard (cf. the noun "variable" being well-understood by users of C even though the standard always says "object"). And there was along-the-lines support in Strasbourg for N3187 which, among other things, would make the use of "length" for this consistent and explicit. I have plenty of concerns with both the wording and incompatibility of various changes suggested there related to what's allowed as a sizeof operand and associated semantics but I don't think there were any concerns in that discussion about the use of the term "length".
On Tue, Oct 08, 2024 at 03:40:53PM GMT, Jakub Jelinek wrote: > On Tue, Oct 08, 2024 at 03:28:29PM +0200, Alejandro Colomar wrote: > > On Tue, Oct 08, 2024 at 01:19:06PM GMT, Joseph Myers wrote: > > > On Tue, 8 Oct 2024, Alejandro Colomar wrote: > > > > > > > On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote: > > > > > Patches 1, 2 and 3 are logically nothing to do with this feature. I'll > > > > > wait for them to be reviewed so that we only have a single-patch series, > > > > > before doing final review of the main patch. > > > > > > > > I do not fully understand. Who has to review patches 1,2,3? Also, do > > > > > > Someone who is a maintainer or reviewer of relevant parts of the compiler. > > > Maybe 90% of the people CC:ed are not GCC maintainers or reviewers and > > > should not be included on these patches at all. > > > > Those are people with interest in one way or another in this feature > > (most of them, patch 4). While patches 1,2,3 are irrelevant to them, I > > kept them on the entire thread for simplicity. > > I'd just suggest posting the patch 4 alone adjusted so that it doesn't need > 2,3. Those 2 can be handled completely independently from patch 4 and for 1 > there is no dependency, it is just a random unrelated patch. 2,3 are needed for not making 4 read horribly. 1 is needed for the commit message of 4. I don't mind waiting for gcc-16, if that needs to happen. > > > > > you want to merge them, then I resend patch 4 as a single patch, and > > > > then you review that one? If so, that looks like a good plan to me. > > > > > > Yes, patch 4 as a single patch. With _Lengthof. No other names, no > > > __countof__, no __lengthof__. _Lengthof is a perfectly good name, no need > > > to be gratuitously incompatible. > > > > It is not gratuitously, IMO. You already know my concerns about it; > > please sed(1) yourself the name of the operator from these patches, and > > append your signature below mine, if you want to rename it. I won't do > > that, for I think it introduces a security problem that will slowly > > develop. > > > > If you wish to wait for Graz to make sure there's no incompatibility > > with ISO, that's another possibility. > > I don't see why we'd need to wait for next WG14 meeting here, as the paper > has been voted into the next standard draft, compilers can implement the > feature, Because I don't like the paper that has been voted into the standard. I kind of presented that paper against my will. I wish GCC merged the feature with a different name, and forced the standard to reconsider what they merged, which I consider to be a security problem. Alternatively, I wish GCC decided to do nothing, wait for Graz, where I'll try to convince WG14 to change what was voted. But merging what was voted into the standard would be nefarious, IMO. Have a lovely day! Alex > and as with anything else if the standard is changed later, it will > be adjusted to match it (or withdrawn/renamed or whatever). That is always > a risk of using features from unreleased standards (but even features > in released standards can be tweaked with DRs later on). > It doesn't make sense to have two different operators just because we fear > it will change. > We do have some cases of different operators for the same thing if it was > first implemented as an extension and only later standardized under > different name or with different rules. > That is not the case here. > > Jakub >
Hi Joseph, On Tue, Oct 08, 2024 at 02:04:12PM GMT, Joseph Myers wrote: > On Tue, 8 Oct 2024, Alejandro Colomar wrote: > > > If you wish to wait for Graz to make sure there's no incompatibility > > with ISO, that's another possibility. > > The name could be changed in GCC after Graz (while in > regression-fixes-only mode for GCC 15) if WG14 changes the name in Graz. > It wouldn't be a good idea to add as a new feature while in > regression-fixes-only mode; waiting to after Graz for that would mean only > adding it in GCC 16. I prefer waiting for Graz. Merging _Lengthof already would provide less incentive for WG14 to change their mind. I don't mind having this in GCC 16 instead of 15, if that's the only way. Alternatively, you could consider countof for GCC 15, and optionally rename it later if my arguments are dismissed. Or you can merge yourself _Lengthof already, by sed(1)ing my patches, if you wish. > "length" is well-understood as referring to the number of elements of an > array even if not explicitly defined as such in the standard (cf. the noun > "variable" being well-understood by users of C even though the standard > always says "object"). > And there was along-the-lines support in > Strasbourg for N3187 I interpret something along the lines of n3187 to mean "we want to make ISO C consistent with language" --which is an opinion I share--, rather than specifically meaning that length is the only acceptable term. > which, among other things, would make the use of > "length" for this consistent and explicit. > I have plenty of concerns with > both the wording and incompatibility of various changes suggested there > related to what's allowed as a sizeof operand and associated semantics but > I don't think there were any concerns in that discussion about the use of > the term "length". Now you've seen mine. I will oppose to n3187 as much as I can, as long as it offers length as the term. Cheers, Alex
> Because I don't like the paper that has been voted into the standard. > I kind of presented that paper against my will. I wish GCC merged the > feature with a different name, and forced the standard to reconsider > what they merged, which I consider to be a security problem. > > Alternatively, I wish GCC decided to do nothing, wait for Graz, where > I'll try to convince WG14 to change what was voted. > > But merging what was voted into the standard would be nefarious, IMO. I don't understand this security problem that you are referring to. The vast majority of strings use 'char' as the element type. Existing code might look something like this: #define A "foo" #define B "bar" #define STRING_LEN(s) (sizeof(s) - 1) char *c = malloc(STRING_LEN(A) + STRING_LEN(B) + 1); if (c) { strcpy(c, A); strcat(c, B); } Supposing that _Length gets support in GCC, the equivalent source code would be almost identical and the compiled code would be identical: #define A "foo" #define B "bar" #define STRING_LEN(s) (_Lengthof(s) - 1) char *c = malloc(STRING_LEN(A) + STRING_LEN(B) + 1); if (c) { strcpy(c, A); strcat(c, B); } Are you concerned that people will start writing new code that does something like the following? #define A "foo" #define B "bar" char *c = malloc(_Lengthof(A) + _Lengthof(B)); if (c) { strcpy(c, A); strcat(c, B); } If they do, the only consequence will be that the string buffer is longer than it needs to be; not shorter. Best regards, -- Christopher Bazley Staff Software Engineer, GPU team, Central Engineering Group ARM Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK. Web: http://www.arm.com/
On Tue, 8 Oct 2024, Alejandro Colomar wrote: > > I have plenty of concerns with > > both the wording and incompatibility of various changes suggested there > > related to what's allowed as a sizeof operand and associated semantics but > > I don't think there were any concerns in that discussion about the use of > > the term "length". > > Now you've seen mine. I will oppose to n3187 as much as I can, as long > as it offers length as the term. It's "size", not "length", that has been used more inconsistently. Take for example this sentence from K&R2 (I don't have K&R1), section 4.9 (Initialization): "When the size of the array is omitted, the compiler will compute the length by counting the initializers, of which there are 12 in this case.". Length of strings is simply array length of a sub-array that doesn't include the terminating null character (a case of the sub-object issue of exactly what object is relevant in what context, not anything to do with the term "length").
Joseph wrote: > Length of strings is simply array length of a > sub-array that doesn't include the terminating null character (a case of > the sub-object issue of exactly what object is relevant in what context, > not anything to do with the term "length"). Exactly! This is what eventually persuaded me to stop doubting that 'length' is the most appropriate term (and I did sincerely doubt it, for a while). We can't eliminate such off-by-one errors from programs by choice of terminology, and certainly not by a retroactive change in terminology.
Hello everybody, can you please leave me out of this ever spinning discussion? The discussion in Minneapolis has fulfilled my needs on this for the next 10 years at least. I consider forcing people to opt out of mailings as being quite rude. Thanks Jₑₙₛ
[CC -= Jens] Hi Joseph, On Tue, Oct 08, 2024 at 03:13:19PM GMT, Joseph Myers wrote: > On Tue, 8 Oct 2024, Alejandro Colomar wrote: > > > > I have plenty of concerns with > > > both the wording and incompatibility of various changes suggested there > > > related to what's allowed as a sizeof operand and associated semantics but > > > I don't think there were any concerns in that discussion about the use of > > > the term "length". > > > > Now you've seen mine. I will oppose to n3187 as much as I can, as long > > as it offers length as the term. > > It's "size", not "length", that has been used more inconsistently. Completely agree. In fact, I've always supported using NITEMS() for things like #define STRLCPY(d, s) strlcpy(d, s, NITEMS(d)) <https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq/> However, at least the term size doesn't result in off-by-one issues. With char-sized elements, it's not a problem. And with wider elements, it's so obviously different from the size, that people know what they're doing. And, I don't see why the standard couldn't just standardize on "number of elements". I don't see a need for using "length" in ISO C. I think removing the phrase "number of elements" from ISO C would be negative --yes, Jens suggested he intended to do that in an eventual revision of his paper, which prompted me attempting to shoot it down with n3327--. > Take > for example this sentence from K&R2 (I don't have K&R1), section 4.9 > (Initialization): "When the size of the array is omitted, the compiler > will compute the length by counting the initializers, of which there are > 12 in this case.". Heh. :) > Length of strings is simply array length of a > sub-array that doesn't include the terminating null character (a case of > the sub-object issue of exactly what object is relevant in what context, > not anything to do with the term "length"). I thought of that, and I temporarily tried to convince myself that the term can be used to mean that. However, I don't fully convince myself that programmers won't fall into off-by-one bugs due to this. If you believe I'm wrong, I think it's okay. I wouldn't mind if you want to sed(1) the name yourself from my patch. But in good faith, I cannot send that patch. You'll have to add your signature, and add a line saying you changed the name of the operator. But please go ahead if that's what you think is best. After all, we've had our arguments, and there's no clear consensus on either side, so a maintainer has to emit the final decision. Whatever you decide, I'll be okay with it. Have a lovely night! Alex
[CC -= Jens] Hi Chris, On Tue, Oct 08, 2024 at 03:13:11PM GMT, Chris Bazley wrote: > > Because I don't like the paper that has been voted into the standard. > > I kind of presented that paper against my will. I wish GCC merged the > > feature with a different name, and forced the standard to reconsider > > what they merged, which I consider to be a security problem. > > > > Alternatively, I wish GCC decided to do nothing, wait for Graz, where > > I'll try to convince WG14 to change what was voted. > > > > But merging what was voted into the standard would be nefarious, IMO. > > I don't understand this security problem that you are referring to. > > The vast majority of strings use 'char' as the element type. > > Existing code might look something like this: > > #define A "foo" > #define B "bar" > #define STRING_LEN(s) (sizeof(s) - 1) > > char *c = malloc(STRING_LEN(A) + STRING_LEN(B) + 1); > if (c) { > strcpy(c, A); > strcat(c, B); > } > > Supposing that _Length gets support in GCC, the equivalent source code would be almost > identical and the compiled code would be identical: > > #define A "foo" > #define B "bar" > #define STRING_LEN(s) (_Lengthof(s) - 1) > > char *c = malloc(STRING_LEN(A) + STRING_LEN(B) + 1); > if (c) { > strcpy(c, A); > strcat(c, B); > } > > Are you concerned that people will start writing new code that does something like the following? > > #define A "foo" > #define B "bar" > > char *c = malloc(_Lengthof(A) + _Lengthof(B)); > if (c) { > strcpy(c, A); > strcat(c, B); > } > > If they do, the only consequence will be that the string buffer is longer than it needs to be; not shorter. Yes, off-by-one bugs on the safe side are more frequent than on the unsafe side in this case. However, I expect unsafe off-by-ones too. And even in the safe side, there's the chance of secondary problems like the following: Let's say the maximum supported size is limited by a system limit. For example, sysconf(_SC_LOGIN_NAME_MAX) or LOGIN_NAME_MAX. If you try to allocate one extra byte, so sysconf(_SC_LOGIN_NAME_MAX)+1, you may overflow something somewhere, or cause some other important issues in your system if you manage to create a user with such a long username. Or your program will just crash and cause a DoS. Or another combination of events that may cause another class of bugs. In all cases, there's an off-by-one somewhere, but will result in a different bug type. I'm not fabricating, BTW. Here's a list of off-by-one bugs in login code, precisely due to this size-length naming issue: <https://github.com/shadow-maint/shadow/commit/6551709e96b2bc6f084fdf170ad5bcc11f0038ab> <https://github.com/shadow-maint/shadow/commit/15882a5f904b3c277d73254a6953c1051db55df4> Have a lovely day! Alex > > Best regards, > -- > Christopher Bazley > Staff Software Engineer, GPU team, Central Engineering Group > ARM Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK. > Web: http://www.arm.com/
On Wed, 9 Oct 2024, Alejandro Colomar wrote: > I'm not fabricating, BTW. Here's a list of off-by-one bugs in login > code, precisely due to this size-length naming issue: > <https://github.com/shadow-maint/shadow/commit/6551709e96b2bc6f084fdf170ad5bcc11f0038ab> > <https://github.com/shadow-maint/shadow/commit/15882a5f904b3c277d73254a6953c1051db55df4> Those don't look to me like they're much to do with size/length *naming* confusion. It's a conceptual confusion about whether the value needed by a particular API includes a null terminator or not, not about what you call size and what you call length. As such, a name without "length" wouldn't help, because if you say countof, there would still be the same confusion about whether the bytes you are counting are meant to include a null terminator or not. You could maybe avoid some cases of such off-by-one errors by language features that tie an array length more closely to a pointer (such as .IDENTIFIER proposals where IDENTIFIER is required to be const size_t, in cases where a pointer-to-VLA is passed, if there were appropriate constraints to require a matching pair of const size_t object and pointer to [.IDENTIFIER] VLA to be passed from caller to callee - more general versions with such strict requirements about passing matching pairs would be less likely to ensure correct sizes everywhere, and this idea about ensuring matching pairs isn't in N3188, it's an idea combining things from multiple papers). But I think naming is essentially orthogonal to any kind of language feature that might enable reliable bounded pointers.
Hi Joseph, On Wed, Oct 09, 2024 at 05:05:16PM GMT, Joseph Myers wrote: > On Wed, 9 Oct 2024, Alejandro Colomar wrote: > > > I'm not fabricating, BTW. Here's a list of off-by-one bugs in login > > code, precisely due to this size-length naming issue: > > <https://github.com/shadow-maint/shadow/commit/6551709e96b2bc6f084fdf170ad5bcc11f0038ab> > > <https://github.com/shadow-maint/shadow/commit/15882a5f904b3c277d73254a6953c1051db55df4> > > Those don't look to me like they're much to do with size/length *naming* > confusion. It's a conceptual confusion about whether the value needed by > a particular API includes a null terminator or not, not about what you > call size and what you call length. The documentation of an API starts by its prototype. void login_prompt(char *name, int len); void login_prompt(char *name, int size); The former should _not_ include a NUL terminator in the argument. The latter should. If those names are meaningless, there are more chances of being confused. The bugs were introduced in <https://github.com/shadow-maint/shadow/commit/3b7cc053872cf4ce77fd74dc037f43f34e951e83> which changed old code that was misusing the term length for referring to the size (or number of elements, pedantically), for code that used the actual size. The author of the change didn't give much meaning to the difference between size and length, and thought they were interchangeable, and so the bugs were introduced. As long as one has a clear distinction, that wouldn't have happened. > As such, a name without "length" > wouldn't help, because if you say countof, there would still be the same > confusion about whether the bytes you are counting are meant to include a > null terminator or not. There are 3 terms: - size: Size in bytes of an object (possibly an array). - length: Number of non-null characters in a string. - n: Number of elements of an array. When the array is of char, since sizeof(char)==1, size and n are interchangeable, and both obviously include the NUL terminator. If you prefer nelementsof() over countof(), I'm all-in for it. Just ask for it, and I'll send a patch using nelementsof(). countof() is a new term, so it doesn't yet have a meaning (except as given by the attribute), but it naturally fits more with number of elements. But from all of the terms that there are, length is the only one that doesn't include the NUL, so count is fine. As long as you don't use length, it should include the NUL. > > You could maybe avoid some cases of such off-by-one errors by language > features that tie an array length more closely to a pointer (such as > .IDENTIFIER proposals where IDENTIFIER is required to be const size_t, in > cases where a pointer-to-VLA is passed, if there were appropriate > constraints to require a matching pair of const size_t object and pointer > to [.IDENTIFIER] VLA to be passed from caller to callee - more general > versions with such strict requirements about passing matching pairs would > be less likely to ensure correct sizes everywhere, and this idea about > ensuring matching pairs isn't in N3188, it's an idea combining things from > multiple papers). Yeah, Martin and I have the intention of moving in that direction. countof() [or whatever the name is] will hopefully soon work on array parameters. > But I think naming is essentially orthogonal to any > kind of language feature that might enable reliable bounded pointers. I still think conceptual confusions like this one (two) start with API design and documentation, which itself starts in API naming. Have a lovely night! Alex
On Wed, 9 Oct 2024, Alejandro Colomar wrote: > The documentation of an API starts by its prototype. > > void login_prompt(char *name, int len); > void login_prompt(char *name, int size); > > The former should _not_ include a NUL terminator in the argument. > The latter should. If those names are meaningless, there are more > chances of being confused. You need actual API *documentation*, not just expecting people to guess based on a name. One of those commit messages refers to non-null-terminated utmp structures. I'd say the actual error-prone antipattern seen here is the use of such arrays (fixed width, not necessarily null-terminated) to store things that might otherwise be thought of as strings, rather than anything to do with naming.
On 2024-10-09 20:48 CEST, Alejandro Colomar <alx@kernel.org> wrote: > countof() is a new term, so it doesn't yet have a meaning (except > as given by the attribute), but it naturally fits more with number > of elements. How would you call, for example, a function that returns how many times a value is contained in a data structure (be it array, linked list, or any other)? ~ J.Ł.
Hi Joseph, On Wed, Oct 09, 2024 at 07:31:50PM GMT, Joseph Myers wrote: > On Wed, 9 Oct 2024, Alejandro Colomar wrote: > > > The documentation of an API starts by its prototype. > > > > void login_prompt(char *name, int len); > > void login_prompt(char *name, int size); > > > > The former should _not_ include a NUL terminator in the argument. > > The latter should. If those names are meaningless, there are more > > chances of being confused. > > You need actual API *documentation*, not just expecting people to guess > based on a name. Every little bit adds up. Documentation is simpler if there is naming consistency. We have SYNOPSISes in the man pages, and they're up front, because they constitute an important part of the documentation. If each manual page used a different naming convention, you'd have to carefully read each page to understand an API. And you'd have to be extra careful about every little detail. If instead there's a careful consistency across the entire Linux man-pages project (for example), including naming consistency, you can read a new page, and have a rough idea of how it works after just a few looks at the page; hopefully even just by having a look at the SYNOPSIS plus the first few lines of the DESCRIPTION. Documentation should not be surprising, but rather confirm what you already guessed by looking at the API itself. > One of those commit messages refers to non-null-terminated utmp > structures. I'd say the actual error-prone antipattern seen here is the > use of such arrays (fixed width, not necessarily null-terminated) to store > things that might otherwise be thought of as strings, rather than anything > to do with naming. Yeah, utmp(5) didn't help either. It's hard to quantify how much each problem contributed to the actual bugs, but I tend to think both factors contributed. Have a lovely night! Alex
Hi Jakub, On Wed, Oct 09, 2024 at 09:40:11PM GMT, Jakub Łukasiewicz wrote: > On 2024-10-09 20:48 CEST, Alejandro Colomar <alx@kernel.org> wrote: > > countof() is a new term, so it doesn't yet have a meaning (except as > > given by the attribute), but it naturally fits more with number of > > elements. > > How would you call, for example, a function that returns how many times a > value is contained in a data structure (be it array, linked list, or any > other)? list_count() or similar would be a good name. It's length that's dangerous to overload because (1) it's already used by strings, and (2) strings have the danger of the NUL terminator which is not counted by its length. But for example, it's not dangerous to misuse size for the number of elements of an array, because they're so obviously different that you'll not introduce a bug easily. I think it's okay to say wcslcpy() gets a size as the third parameter, even if pedantically it's a number of elements. So, using "count" for both arrays, and user-defined types such as linked lists, is just fine. The only _dangerous_ term is length. Have a lovely night! Alex > > ~ J.Ł.
On Wed, 9 Oct 2024, Alejandro Colomar wrote: > Every little bit adds up. Documentation is simpler if there is naming > consistency. We have SYNOPSISes in the man pages, and they're up front, > because they constitute an important part of the documentation. We also have a convention for future standard C interfaces to put the length before the pointer so that a VLA parameter declaration can be used that makes very clear the intent for how many elements the array has, which seems much better for that purpose than relying on the name of a parameter.
On Wed, Oct 09, 2024 at 09:11:52PM GMT, Joseph Myers wrote: > On Wed, 9 Oct 2024, Alejandro Colomar wrote: > > > Every little bit adds up. Documentation is simpler if there is naming > > consistency. We have SYNOPSISes in the man pages, and they're up front, > > because they constitute an important part of the documentation. > > We also have a convention for future standard C interfaces to put the > length before the pointer so that a VLA parameter declaration can be used > that makes very clear the intent for how many elements the array has, > which seems much better for that purpose than relying on the name of a > parameter. I doubt that this will be doable for string functions. Even newer additions to <string.h> will most likely have the size as the last element, if just for consistency with the existing APIs. And this issue is primarily a string issue, so it won't be solved. [.identifier] is more likely to help with this. Cheers, Alex > > -- > Joseph S. Myers > josmyers@redhat.com >
Hello, Could you please unCC me from this discussion? Despite I originally made this proposal, I no longer have an opinion on the subject and there is not much I can add to the discussion anyway. Thank you all very much for your efforts into improving C. Best regards, -- Xavier Del Campo Romero 9 Oct 2024, 23:20 by alx@kernel.org: > On Wed, Oct 09, 2024 at 09:11:52PM GMT, Joseph Myers wrote: > >> On Wed, 9 Oct 2024, Alejandro Colomar wrote: >> >> > Every little bit adds up. Documentation is simpler if there is naming >> > consistency. We have SYNOPSISes in the man pages, and they're up front, >> > because they constitute an important part of the documentation. >> >> We also have a convention for future standard C interfaces to put the >> length before the pointer so that a VLA parameter declaration can be used >> that makes very clear the intent for how many elements the array has, >> which seems much better for that purpose than relying on the name of a >> parameter. >> > > I doubt that this will be doable for string functions. Even newer > additions to <string.h> will most likely have the size as the last > element, if just for consistency with the existing APIs. And this issue > is primarily a string issue, so it won't be solved. > > [.identifier] is more likely to help with this. > > Cheers, > Alex > >> >> -- >> Joseph S. Myers >> josmyers@redhat.com >> > > -- > <https://www.alejandro-colomar.es/> >
Hi Joseph, On Wed, Oct 09, 2024 at 09:11:52PM GMT, Joseph Myers wrote: > On Wed, 9 Oct 2024, Alejandro Colomar wrote: > > > Every little bit adds up. Documentation is simpler if there is naming > > consistency. We have SYNOPSISes in the man pages, and they're up front, > > because they constitute an important part of the documentation. > > We also have a convention for future standard C interfaces to put the > length before the pointer so that a VLA parameter declaration can be used > that makes very clear the intent for how many elements the array has, > which seems much better for that purpose than relying on the name of a > parameter. Just as a confirmation of what I already said: none of the arguments convince me. They seem mitigations to the damage that overloading the term length can do. I stand on my proposal of either __nelementsof__(), __countof__() (with no preference), any derivative of those, or almost anything that doesn't derive from "length" (well, I also veto "dimension", "extent", and "range", for different reasons, but anything else is fair game). If you want _Lengthof, please sed(1) it yourself and sign the patch below my signature. I don't think you (or myself) can convince me of changing my mind, so it's up to you to decide what you want to do. I think it would be good to have this in GCC 15, so if you're convinced of _Lengthof(), please go ahead already. I don't think delaying this further will change the mind of any of us. Have a lovely day! Alex