mbox series

[v8,0/3] c: Add __lengthof__ operator

Message ID cover.1723419712.git.alx@kernel.org
Headers show
Series c: Add __lengthof__ operator | expand

Message

Alejandro Colomar Aug. 11, 2024, 11:46 p.m. UTC
Hi!

v8:

-  Reformat (simplify) change-log entries.
-  Improve wording of documentation.
-  Add link to LLVM issue in commit message.

I've added a GitHub issue in the LLVM project reporting about the
existence of this patch set:
<https://github.com/llvm/llvm-project/issues/102836>

Have a lovely night!
Alex

Alejandro Colomar (3):
  gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
  Merge definitions of array_type_nelts_top()
  c: Add __lengthof__ operator

 gcc/c-family/c-common.cc                |  26 ++++
 gcc/c-family/c-common.def               |   3 +
 gcc/c-family/c-common.h                 |   2 +
 gcc/c/c-decl.cc                         |  30 +++--
 gcc/c/c-fold.cc                         |   7 +-
 gcc/c/c-parser.cc                       |  61 +++++++---
 gcc/c/c-tree.h                          |   4 +
 gcc/c/c-typeck.cc                       | 118 ++++++++++++++++++-
 gcc/config/aarch64/aarch64.cc           |   2 +-
 gcc/config/i386/i386.cc                 |   2 +-
 gcc/cp/cp-tree.h                        |   1 -
 gcc/cp/decl.cc                          |   2 +-
 gcc/cp/init.cc                          |   8 +-
 gcc/cp/lambda.cc                        |   3 +-
 gcc/cp/operators.def                    |   1 +
 gcc/cp/tree.cc                          |  13 --
 gcc/doc/extend.texi                     |  31 +++++
 gcc/expr.cc                             |   8 +-
 gcc/fortran/trans-array.cc              |   2 +-
 gcc/fortran/trans-openmp.cc             |   4 +-
 gcc/rust/backend/rust-tree.cc           |  13 --
 gcc/rust/backend/rust-tree.h            |   2 -
 gcc/target.h                            |   3 +
 gcc/testsuite/gcc.dg/lengthof-compile.c | 115 ++++++++++++++++++
 gcc/testsuite/gcc.dg/lengthof-vla.c     |  46 ++++++++
 gcc/testsuite/gcc.dg/lengthof.c         | 150 ++++++++++++++++++++++++
 gcc/tree.cc                             |  17 ++-
 gcc/tree.h                              |   3 +-
 28 files changed, 598 insertions(+), 79 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lengthof-compile.c
 create mode 100644 gcc/testsuite/gcc.dg/lengthof-vla.c
 create mode 100644 gcc/testsuite/gcc.dg/lengthof.c

Range-diff against v7:
1:  8b68e250503 ! 1:  a6aa38c9013 gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
    @@ Commit message
     
         gcc/ChangeLog:
     
    -            * tree.cc (array_type_nelts): Rename function ...
    -            (array_type_nelts_minus_one): ... to this name.  The old name
    -            was misleading.
    -            * tree.h (array_type_nelts): Rename function ...
    -            (array_type_nelts_minus_one): ... to this name.  The old name
    -            was misleading.
    +            * tree.cc (array_type_nelts, array_type_nelts_minus_one):
    +            * tree.h (array_type_nelts, array_type_nelts_minus_one):
                 * expr.cc (count_type_elements):
    -            Rename array_type_nelts() => array_type_nelts_minus_one()
                 * config/aarch64/aarch64.cc
    -            (pure_scalable_type_info::analyze_array): Likewise.
    -            * config/i386/i386.cc (ix86_canonical_va_list_type): Likewise.
    +            (pure_scalable_type_info::analyze_array):
    +            * config/i386/i386.cc (ix86_canonical_va_list_type):
    +            Rename array_type_nelts() => array_type_nelts_minus_one()
    +            The old name was misleading.
     
         gcc/c/ChangeLog:
     
                 * c-decl.cc (one_element_array_type_p, get_parm_array_spec):
    +            * c-fold.cc (c_fold_array_ref):
                 Rename array_type_nelts() => array_type_nelts_minus_one()
    -            * c-fold.cc (c_fold_array_ref): Likewise.
     
         gcc/cp/ChangeLog:
     
                 * decl.cc (reshape_init_array):
    +            * init.cc
    +            (build_zero_init_1):
    +            (build_value_init_noctor):
    +            (build_vec_init):
    +            (build_delete):
    +            * lambda.cc (add_capture):
    +            * tree.cc (array_type_nelts_top):
                 Rename array_type_nelts() => array_type_nelts_minus_one()
    -            * init.cc (build_zero_init_1): Likewise.
    -            (build_value_init_noctor): Likewise.
    -            (build_vec_init): Likewise.
    -            (build_delete): Likewise.
    -            * lambda.cc (add_capture): Likewise.
    -            * tree.cc (array_type_nelts_top): Likewise.
     
         gcc/fortran/ChangeLog:
     
                 * trans-array.cc (structure_alloc_comps):
    +            * trans-openmp.cc
    +            (gfc_walk_alloc_comps):
    +            (gfc_omp_clause_linear_ctor):
                 Rename array_type_nelts() => array_type_nelts_minus_one()
    -            * trans-openmp.cc (gfc_walk_alloc_comps): Likewise.
    -            (gfc_omp_clause_linear_ctor): Likewise.
     
         gcc/rust/ChangeLog:
     
2:  21433097103 ! 2:  43300a17e4a Merge definitions of array_type_nelts_top()
    @@ Commit message
         gcc/ChangeLog:
     
                 * tree.h (array_type_nelts_top):
    -            * tree.cc (array_type_nelts_top): Define function (moved from
    -            gcc/cp/).
    +            * tree.cc (array_type_nelts_top):
    +            Define function (moved from gcc/cp/).
     
         gcc/cp/ChangeLog:
     
                 * cp-tree.h (array_type_nelts_top):
    -            * tree.cc (array_type_nelts_top): Remove function (move
    -            to gcc/).
    +            * tree.cc (array_type_nelts_top):
    +            Remove function (move to gcc/).
     
         gcc/rust/ChangeLog:
     
                 * backend/rust-tree.h (array_type_nelts_top):
    -            * backend/rust-tree.cc (array_type_nelts_top): Remove function.
    +            * backend/rust-tree.cc (array_type_nelts_top):
    +            Remove function.
     
         Signed-off-by: Alejandro Colomar <alx@kernel.org>
     
3:  4bd3837d09c ! 3:  e6af87d54af c: Add __lengthof__ operator
    @@ Commit message
     
         Link: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2529.pdf
         Link: https://inbox.sourceware.org/gcc/M8S4oQy--3-2@tutanota.com/T/
    +    Link: https://github.com/llvm/llvm-project/issues/102836
         Suggested-by: Xavier Del Campo Romero <xavi.dcr@tutanota.com>
         Co-developed-by: Martin Uecker <uecker@tugraz.at>
         Signed-off-by: Alejandro Colomar <alx@kernel.org>
    @@ gcc/doc/extend.texi: If the operand of the @code{__alignof__} expression is a fu
     +The keyword @code{__lengthof__} determines the length of an array operand,
     +that is, the number of elements in the array.
     +Its syntax is similar to @code{sizeof}.
    -+The operand must be a complete array type or an expression of that type.
    ++The operand must be
    ++a parenthesized complete array type name
    ++or an expression of such a type.
     +For example:
     +
     +@smallexample
    @@ gcc/doc/extend.texi: If the operand of the @code{__alignof__} expression is a fu
     +__lengthof__ (int [7][3]);  // returns 7
     +@end smallexample
     +
    -+The operand is not evaluated
    -+if the top-level length designator is an integer constant expression
    -+(in this case, the operator results in an integer constant expression);
    -+and it is evaluated
    -+if the top-level length designator is not an integer constant expression
    -+(in this case, the operator results in a run-time value).
    ++The result of this operator is an integer constant expression,
    ++unless the top-level array is a variable-length array.
    ++The operand is only evaluated
    ++if the top-level array is a variable-length array.
     +For example:
     +
     +@smallexample

Comments

Alejandro Colomar Aug. 12, 2024, 11:34 p.m. UTC | #1
Hi David,

I want to send an updated version of n2529.  The original author didn't
respond to my mail, so I'll take over.  I've been preparing a GCC patch
set for adding the feature to GCC, and have informed Clang developers
about it too.

The title would be

    _Lengthof - New pointer-proof keyword to determine array length (v2)

Can you please assign me a number for it?  Thanks.

Cheers,
Alex
Alejandro Colomar Aug. 13, 2024, 7:33 a.m. UTC | #2
On Tue, Aug 13, 2024 at 01:34:58AM GMT, Alejandro Colomar wrote:
> Hi David,

I obviously meant Daniel.  :-)

> 
> I want to send an updated version of n2529.  The original author didn't
> respond to my mail, so I'll take over.  I've been preparing a GCC patch
> set for adding the feature to GCC, and have informed Clang developers
> about it too.
> 
> The title would be
> 
>     _Lengthof - New pointer-proof keyword to determine array length (v2)
> 
> Can you please assign me a number for it?  Thanks.
> 
> Cheers,
> Alex
> 
> 
> -- 
> <https://www.alejandro-colomar.es/>
Alejandro Colomar Aug. 13, 2024, 3:02 p.m. UTC | #3
Hi,

On Tue, Aug 13, 2024 at 01:34:58AM GMT, Alejandro Colomar wrote:
> I want to send an updated version of n2529.  The original author didn't
> respond to my mail, so I'll take over.  I've been preparing a GCC patch
> set for adding the feature to GCC, and have informed Clang developers
> about it too.
> 
> The title would be
> 
>     _Lengthof - New pointer-proof keyword to determine array length (v2)
> 
> Can you please assign me a number for it?  Thanks.

Attached is a draft for a paper (both the man(7) source and the
generated PDF).

I have only added lengthof for now, not _Lengthof, as suggested by Jens.
Depending on feedback, I'll propose the uglified version.

Cheers,
Alex
Xavier Del Campo Romero Aug. 13, 2024, 10:38 p.m. UTC | #4
I have been overseeing these last emails - thank you very much for your
efforts, Alex! I did not reply until now because I do not have prior
experience with gcc internals, so my feedback would probably have not
been that useful.

Those emails from 2020 were in fact discussing two completely different
proposals at once:

1. Add _Lengthof + #include <stdlengthof.h>
2. Allow static qualifier on compound literals

Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and as
you already know by now, proposal #1 received some negative feedback,
suggesting _Typeof/typeof + some macro magic as a pragmatic workaround
instead.

Since the proposal did not get much traction and I would had been
unable to contribute to gcc myself, I just gave up on it. IIRC the
deadline for new proposals closed soon after, anyway.

But I am glad that someone with proper experience took the initiative.
I still think the proposal is relevant and has interesting use cases.

> I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> Depending on feedback, I'll propose the uglified version.

Probably, all of us know why the uglified version is the usual approach
preferred by the C standard: we do not know how many applications would
break otherwise.

However, we see that this trend is now changing with C23, so probably
it makes sense to define lengthof directly.

As for the parentheses, I personally think lengthof should follow
similar rules compared to sizeof.

Best regards,

--
Xavier Del Campo Romero



Aug 13, 2024, 15:02 by alx@kernel.org:

> Hi,
>
> On Tue, Aug 13, 2024 at 01:34:58AM GMT, Alejandro Colomar wrote:
>
>> I want to send an updated version of n2529.  The original author didn't
>> respond to my mail, so I'll take over.  I've been preparing a GCC patch
>> set for adding the feature to GCC, and have informed Clang developers
>> about it too.
>>
>> The title would be
>>
>> _Lengthof - New pointer-proof keyword to determine array length (v2)
>>
>> Can you please assign me a number for it?  Thanks.
>>
>
> Attached is a draft for a paper (both the man(7) source and the
> generated PDF).
>
> I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> Depending on feedback, I'll propose the uglified version.
>
> Cheers,
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
>
Alejandro Colomar Aug. 13, 2024, 11:27 p.m. UTC | #5
Hi Xavier,

On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> I have been overseeing these last emails -

Ahhh, good to know; thanks!  :)

> thank you very much for your
> efforts, Alex!

:-)

> I did not reply until now because I do not have prior
> experience with gcc internals, so my feedback would probably have not
> been that useful.

Ok.

> Those emails from 2020 were in fact discussing two completely different
> proposals at once:
> 
> 1. Add _Lengthof + #include <stdlengthof.h>
> 2. Allow static qualifier on compound literals

Yup.

> Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and as
> you already know by now, proposal #1 received some negative feedback,
> suggesting _Typeof/typeof + some macro magic as a pragmatic workaround
> instead.

The original author of that negative feedback talked to me in private
a week ago, and said he likes my proposal.  We have no negative feedback
anymore.  :)

> Since the proposal did not get much traction and I would had been
> unable to contribute to gcc myself, I just gave up on it. IIRC the
> deadline for new proposals closed soon after, anyway.

Ok.

> But I am glad that someone with proper experience took the initiative.

Fun fact: this is my second non-trivial patch to GCC.  I wouldn't say I
had the proper experience with GCC internals when I started this patch
set.  But I'm unemployed at the moment, which gives me all the time I
need for learning those.  :)

> I still think the proposal is relevant and has interesting use cases.
> 
> > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > Depending on feedback, I'll propose the uglified version.
> 
> Probably, all of us know why the uglified version is the usual approach
> preferred by the C standard: we do not know how many applications would
> break otherwise.

Yup.

> However, we see that this trend is now changing with C23, so probably
> it makes sense to define lengthof directly.

Yeah, since Jens is in WG14 and he suggested to follow this trend, maybe
we can.  If not, it's trivial to change the proposal to use the uglified
name plus a macro.

Checking <https://codesearch.debian.net>, I see that while several
projects have a lengthof() macro, all of them use it with semantics
compatible with this keyword, so it shouldn't break too much.  Maybe
those projects will start receiving diagnostics that they're redefining
a standard keyword, but that's not too bad.

> As for the parentheses, I personally think lengthof should follow
> similar rules compared to sizeof.

I think most people agree with this.

> 
> Best regards,

Have a lovely night!
Alex
Jₑₙₛ Gustedt Aug. 14, 2024, 5:58 a.m. UTC | #6
Hi, 

Am 14. August 2024 00:38:53 MESZ schrieb Xavier Del Campo Romero <xavi.dcr@tutanota.com>:
> I have been overseeing these last emails - thank you very much for your
> efforts, Alex! I did not reply until now because I do not have prior
> experience with gcc internals, so my feedback would probably have not
> been that useful.
> 
> Those emails from 2020 were in fact discussing two completely different
> proposals at once:
> 
> 1. Add _Lengthof + #include <stdlengthof.h>
> 2. Allow static qualifier on compound literals
> 
> Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), 

this was together with Alex

> and as
> you already know by now, proposal #1 received some negative feedback,
> suggesting _Typeof/typeof + some macro magic as a pragmatic workaround
> instead.
> 
> Since the proposal did not get much traction and I would had been
> unable to contribute to gcc myself, I just gave up on it. IIRC the
> deadline for new proposals closed soon after, anyway.
> 
> But I am glad that someone with proper experience took the initiative.
> I still think the proposal is relevant and has interesting use cases.
> 
> > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > Depending on feedback, I'll propose the uglified version.
> 
> Probably, all of us know why the uglified version is the usual approach
> preferred by the C standard: we do not know how many applications would
> break otherwise.
> 
> However, we see that this trend is now changing with C23, so probably
> it makes sense to define lengthof directly.

When I suggested that the double-underscore version is sufficient, I was not thinking that there would be a paper to WG 14 so quickly. For integration into go and clang
the double underscore is certainly enough. Then for a standardization
that is another question.


> As for the parentheses, I personally think lengthof should follow
> similar rules compared to sizeof.
> 
> Best regards,
> 
> --
> Xavier Del Campo Romero
> 
> 
> 
> Aug 13, 2024, 15:02 by alx@kernel.org:
> 
> > Hi,
> >
> > On Tue, Aug 13, 2024 at 01:34:58AM GMT, Alejandro Colomar wrote:
> >
> >> I want to send an updated version of n2529.  The original author didn't
> >> respond to my mail, so I'll take over.  I've been preparing a GCC patch
> >> set for adding the feature to GCC, and have informed Clang developers
> >> about it too.
> >>
> >> The title would be
> >>
> >> _Lengthof - New pointer-proof keyword to determine array length (v2)
> >>
> >> Can you please assign me a number for it?  Thanks.
> >>
> >
> > Attached is a draft for a paper (both the man(7) source and the
> > generated PDF).
> >
> > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > Depending on feedback, I'll propose the uglified version.
> >
> > Cheers,
> > Alex
> >
> > --
> > <https://www.alejandro-colomar.es/>
> >
> 

Jens
Jₑₙₛ Gustedt Aug. 14, 2024, 6:11 a.m. UTC | #7
Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> Hi Xavier,
> 
> On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > I have been overseeing these last emails -
> 
> Ahhh, good to know; thanks!  :)
> 
> > thank you very much for your
> > efforts, Alex!
> 
> :-)
> 
> > I did not reply until now because I do not have prior
> > experience with gcc internals, so my feedback would probably have not
> > been that useful.
> 
> Ok.
> 
> > Those emails from 2020 were in fact discussing two completely different
> > proposals at once:
> > 
> > 1. Add _Lengthof + #include <stdlengthof.h>
> > 2. Allow static qualifier on compound literals
> 
> Yup.
> 
> > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and as
> > you already know by now, proposal #1 received some negative feedback,
> > suggesting _Typeof/typeof + some macro magic as a pragmatic workaround
> > instead.
> 
> The original author of that negative feedback talked to me in private
> a week ago, and said he likes my proposal.  We have no negative feedback
> anymore.  :)
> 
> > Since the proposal did not get much traction and I would had been
> > unable to contribute to gcc myself, I just gave up on it. IIRC the
> > deadline for new proposals closed soon after, anyway.
> 
> Ok.
> 
> > But I am glad that someone with proper experience took the initiative.
> 
> Fun fact: this is my second non-trivial patch to GCC.  I wouldn't say I
> had the proper experience with GCC internals when I started this patch
> set.  But I'm unemployed at the moment, which gives me all the time I
> need for learning those.  :)
> 
> > I still think the proposal is relevant and has interesting use cases.
> > 
> > > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > > Depending on feedback, I'll propose the uglified version.
> > 
> > Probably, all of us know why the uglified version is the usual approach
> > preferred by the C standard: we do not know how many applications would
> > break otherwise.
> 
> Yup.
> 
> > However, we see that this trend is now changing with C23, so probably
> > it makes sense to define lengthof directly.
> 
> Yeah, since Jens is in WG14 and he suggested to follow this trend, maybe
> we can.  If not, it's trivial to change the proposal to use the uglified
> name plus a macro.
> 
> Checking <https://codesearch.debian.net>, I see that while several
> projects have a lengthof() macro, all of them use it with semantics
> compatible with this keyword, so it shouldn't break too much.  Maybe
> those projects will start receiving diagnostics that they're redefining
> a standard keyword, but that's not too bad.

For a WG14 paper you should add these findings to support that choice.
Another option would be for WG14 to standardize the then existing implementation with the double underscores.

> > As for the parentheses, I personally think lengthof should follow
> > similar rules compared to sizeof.
> 
> I think most people agree with this.

I still don't, in particular not for standardisation.

We have to remember that there are many small C compilers out there. 
I would not want unnecessary burden on them. So my preferred choice would be
a standardisation as a macro, similar to offsetof.
gcc (and clang) could then just map that to their builtin, other compilers could use
whatever they have at the moment, even just the macros that you have in the paper as a starting point. 

The rest would be "quality of implementation"

What time horizon do you see to add the feature for array parameters?

Thanks
Jens


> > Best regards,
> 
> Have a lovely night!
> Alex
>
Alejandro Colomar Aug. 14, 2024, 8:41 a.m. UTC | #8
Hi Jens, Martin,

On Wed, Aug 14, 2024 at 08:11:20AM GMT, Jens Gustedt wrote:
> > Checking <https://codesearch.debian.net>, I see that while several
> > projects have a lengthof() macro, all of them use it with semantics
> > compatible with this keyword, so it shouldn't break too much.  Maybe
> > those projects will start receiving diagnostics that they're redefining
> > a standard keyword, but that's not too bad.
> 
> For a WG14 paper you should add these findings to support that choice.
> Another option would be for WG14 to standardize the then existing implementation with the double underscores.

Makes sense; I'll add that into new "Prior art" and "Backwards
compatibility" sections within the paper.

> > > As for the parentheses, I personally think lengthof should follow
> > > similar rules compared to sizeof.
> > 
> > I think most people agree with this.
> 
> I still don't, in particular not for standardisation.
> 
> We have to remember that there are many small C compilers out there. 
> I would not want unnecessary burden on them. So my preferred choice would be
> a standardisation as a macro, similar to offsetof.
> gcc (and clang) could then just map that to their builtin, other compilers could use
> whatever they have at the moment, even just the macros that you have in the paper as a starting point. 
> 
> The rest would be "quality of implementation"

Hmmm, sounds reasonable.

Some doubts:

If we allow a compiler to implement it as a predefined macro that
expands to the usual sizeof division, it might produce double evaluation
in some VLA cases.  That would be surprising to some programs, which may
expect either 0 or 1 evaluations, but not 2.  Maybe we can leave it as
unspecified behavior, and an implementation may document that double
evaluation may happen if the input is a VLA?

> What time horizon do you see to add the feature for array parameters?

Martin, what do you think?  I think the only blocking thing for me is
what you mentioned about turning function parameters into arrays that
decay almost everywhere.  Once that's set up, my code will probably work
with them without modification, or maybe with just a little tweak.  Do
you have an idea of how much time that can take you?

I expect it to be well before C2y.  Maybe a year or two?

Have a lovely day!
Alex

> Thanks
> Jens
Ballman, Aaron Aug. 14, 2024, 11:31 a.m. UTC | #9
Sorry for top-posting, my work account is stuck on Outlook. :-/

> For a WG14 paper you should add these findings to support that choice.
> Another option would be for WG14 to standardize the then existing implementation with the double underscores.

+1, it's always good to explain prior art and existing uses as part of the paper. However, please also point out that C++ has a prior art as well which is slightly different and very much worth considering: they have one API for getting the array's rank, and another for getting a specific rank's extent. This is a general solution that doesn't require the programmer to have deep knowledge of C's declarator syntax and how it relates to multidimensional arrays.

That said, I suspect WG14 would not be keen on standardizing `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break: 

https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
(and many, many others)

>> > As for the parentheses, I personally think lengthof should follow 
>> > similar rules compared to sizeof.
>> 
>> I think most people agree with this.
>
> I still don't, in particular not for standardisation.
> 
> We have to remember that there are many small C compilers out there.

Those compilers already have to handle parsing this for sizeof, so that's not particularly compelling (even if we wanted to design C for the lowest common denominator of implementation effort, which I'm not convinced is a good approach these days). That said, if we went with a rank/extent design, I think we'd *have* to use parens because the extent interface would take two operands (the array and the rank you're interested in getting the extent of) and it would be inconsistent for the rank interface to then not require parens.

~Aaron

-----Original Message-----
From: Jens Gustedt <jens.gustedt@inria.fr> 
Sent: Wednesday, August 14, 2024 2:11 AM
To: Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero <xavi.dcr@tutanota.com>
Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; Florian Weimer <fweimer@redhat.com>; Andreas Schwab <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>; Ballman, Aaron <aaron.ballman@intel.com>
Subject: Re: v2.1 Draft for a lengthof paper

Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> Hi Xavier,
> 
> On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > I have been overseeing these last emails -
> 
> Ahhh, good to know; thanks!  :)
> 
> > thank you very much for your
> > efforts, Alex!
> 
> :-)
> 
> > I did not reply until now because I do not have prior experience 
> > with gcc internals, so my feedback would probably have not been that 
> > useful.
> 
> Ok.
> 
> > Those emails from 2020 were in fact discussing two completely 
> > different proposals at once:
> > 
> > 1. Add _Lengthof + #include <stdlengthof.h> 2. Allow static 
> > qualifier on compound literals
> 
> Yup.
> 
> > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and 
> > as you already know by now, proposal #1 received some negative 
> > feedback, suggesting _Typeof/typeof + some macro magic as a 
> > pragmatic workaround instead.
> 
> The original author of that negative feedback talked to me in private 
> a week ago, and said he likes my proposal.  We have no negative 
> feedback anymore.  :)
> 
> > Since the proposal did not get much traction and I would had been 
> > unable to contribute to gcc myself, I just gave up on it. IIRC the 
> > deadline for new proposals closed soon after, anyway.
> 
> Ok.
> 
> > But I am glad that someone with proper experience took the initiative.
> 
> Fun fact: this is my second non-trivial patch to GCC.  I wouldn't say 
> I had the proper experience with GCC internals when I started this 
> patch set.  But I'm unemployed at the moment, which gives me all the 
> time I need for learning those.  :)
> 
> > I still think the proposal is relevant and has interesting use cases.
> > 
> > > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > > Depending on feedback, I'll propose the uglified version.
> > 
> > Probably, all of us know why the uglified version is the usual 
> > approach preferred by the C standard: we do not know how many 
> > applications would break otherwise.
> 
> Yup.
> 
> > However, we see that this trend is now changing with C23, so 
> > probably it makes sense to define lengthof directly.
> 
> Yeah, since Jens is in WG14 and he suggested to follow this trend, 
> maybe we can.  If not, it's trivial to change the proposal to use the 
> uglified name plus a macro.
> 
> Checking <https://codesearch.debian.net>, I see that while several 
> projects have a lengthof() macro, all of them use it with semantics 
> compatible with this keyword, so it shouldn't break too much.  Maybe 
> those projects will start receiving diagnostics that they're 
> redefining a standard keyword, but that's not too bad.

For a WG14 paper you should add these findings to support that choice.
Another option would be for WG14 to standardize the then existing implementation with the double underscores.

> > As for the parentheses, I personally think lengthof should follow 
> > similar rules compared to sizeof.
> 
> I think most people agree with this.

I still don't, in particular not for standardisation.

We have to remember that there are many small C compilers out there. 
I would not want unnecessary burden on them. So my preferred choice would be a standardisation as a macro, similar to offsetof.
gcc (and clang) could then just map that to their builtin, other compilers could use whatever they have at the moment, even just the macros that you have in the paper as a starting point. 

The rest would be "quality of implementation"

What time horizon do you see to add the feature for array parameters?

Thanks
Jens


> > Best regards,
> 
> Have a lovely night!
> Alex
> 


--
Jens Gustedt - INRIA & ICube, Strasbourg, France
Jₑₙₛ Gustedt Aug. 14, 2024, 12:17 p.m. UTC | #10
Hi Aaron,

Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> Sorry for top-posting, my work account is stuck on Outlook. :-/
> 
> > For a WG14 paper you should add these findings to support that choice.
> > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> 
> +1, it's always good to explain prior art and existing uses as part of the paper. However, please also point out that C++ has a prior art as well which is slightly different and very much worth considering: they have one API for getting the array's rank, and another for getting a specific rank's extent. This is a general solution that doesn't require the programmer to have deep knowledge of C's declarator syntax and how it relates to multidimensional arrays.
> 
> That said, I suspect WG14 would not be keen on standardizing `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break: 
> 
> https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> (and many, many others)
> 
> >> > As for the parentheses, I personally think lengthof should follow 
> >> > similar rules compared to sizeof.
> >> 
> >> I think most people agree with this.
> >
> > I still don't, in particular not for standardisation.
> > 
> > We have to remember that there are many small C compilers out there.
> 
> Those compilers already have to handle parsing this for sizeof, so that's not particularly compelling (even if we wanted to design C for the lowest common denominator of implementation effort, which I'm not convinced is a good approach these days). That said, if we went with a rank/extent design, I think we'd *have* to use parens because the extent interface would take two operands (the array and the rank you're interested in getting the extent of) and it would be inconsistent for the rank interface to then not require parens.

I think that this argument goes too short. E. g. implementation that already have
compound expressions (or lambdas ;-) may provide a quality implementation using `static_assert` and `typeof` alone, and don't have to touch their compiler at all.

We should not impose an implementation in the language where doing it in a header can be completely sufficient.

Plus, implementing as a macro in a header (probably <stddef.h>) makes also a feature test, for those applications that already have something similar. 
this was basically what we did for `unreachable` and I think it worked out fine.

Jens

> ~Aaron
> 
> -----Original Message-----
> From: Jens Gustedt <jens.gustedt@inria.fr> 
> Sent: Wednesday, August 14, 2024 2:11 AM
> To: Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero <xavi.dcr@tutanota.com>
> Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; Florian Weimer <fweimer@redhat.com>; Andreas Schwab <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>; Ballman, Aaron <aaron.ballman@intel.com>
> Subject: Re: v2.1 Draft for a lengthof paper
> 
> Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> > Hi Xavier,
> > 
> > On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > > I have been overseeing these last emails -
> > 
> > Ahhh, good to know; thanks!  :)
> > 
> > > thank you very much for your
> > > efforts, Alex!
> > 
> > :-)
> > 
> > > I did not reply until now because I do not have prior experience 
> > > with gcc internals, so my feedback would probably have not been that 
> > > useful.
> > 
> > Ok.
> > 
> > > Those emails from 2020 were in fact discussing two completely 
> > > different proposals at once:
> > > 
> > > 1. Add _Lengthof + #include <stdlengthof.h> 2. Allow static 
> > > qualifier on compound literals
> > 
> > Yup.
> > 
> > > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and 
> > > as you already know by now, proposal #1 received some negative 
> > > feedback, suggesting _Typeof/typeof + some macro magic as a 
> > > pragmatic workaround instead.
> > 
> > The original author of that negative feedback talked to me in private 
> > a week ago, and said he likes my proposal.  We have no negative 
> > feedback anymore.  :)
> > 
> > > Since the proposal did not get much traction and I would had been 
> > > unable to contribute to gcc myself, I just gave up on it. IIRC the 
> > > deadline for new proposals closed soon after, anyway.
> > 
> > Ok.
> > 
> > > But I am glad that someone with proper experience took the initiative.
> > 
> > Fun fact: this is my second non-trivial patch to GCC.  I wouldn't say 
> > I had the proper experience with GCC internals when I started this 
> > patch set.  But I'm unemployed at the moment, which gives me all the 
> > time I need for learning those.  :)
> > 
> > > I still think the proposal is relevant and has interesting use cases.
> > > 
> > > > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > > > Depending on feedback, I'll propose the uglified version.
> > > 
> > > Probably, all of us know why the uglified version is the usual 
> > > approach preferred by the C standard: we do not know how many 
> > > applications would break otherwise.
> > 
> > Yup.
> > 
> > > However, we see that this trend is now changing with C23, so 
> > > probably it makes sense to define lengthof directly.
> > 
> > Yeah, since Jens is in WG14 and he suggested to follow this trend, 
> > maybe we can.  If not, it's trivial to change the proposal to use the 
> > uglified name plus a macro.
> > 
> > Checking <https://codesearch.debian.net>, I see that while several 
> > projects have a lengthof() macro, all of them use it with semantics 
> > compatible with this keyword, so it shouldn't break too much.  Maybe 
> > those projects will start receiving diagnostics that they're 
> > redefining a standard keyword, but that's not too bad.
> 
> For a WG14 paper you should add these findings to support that choice.
> Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> 
> > > As for the parentheses, I personally think lengthof should follow 
> > > similar rules compared to sizeof.
> > 
> > I think most people agree with this.
> 
> I still don't, in particular not for standardisation.
> 
> We have to remember that there are many small C compilers out there. 
> I would not want unnecessary burden on them. So my preferred choice would be a standardisation as a macro, similar to offsetof.
> gcc (and clang) could then just map that to their builtin, other compilers could use whatever they have at the moment, even just the macros that you have in the paper as a starting point. 
> 
> The rest would be "quality of implementation"
> 
> What time horizon do you see to add the feature for array parameters?
> 
> Thanks
> Jens
> 
> 
> > > Best regards,
> > 
> > Have a lovely night!
> > Alex
> > 
> 
> 
> --
> Jens Gustedt - INRIA & ICube, Strasbourg, France
Ballman, Aaron Aug. 14, 2024, 12:40 p.m. UTC | #11
> I think that this argument goes too short. E. g. implementation that already have compound expressions (or lambdas ;-) may provide a > quality implementation using `static_assert` and `typeof` alone, and don't have to touch their compiler at all.
>
> We should not impose an implementation in the language where doing it in a header can be completely sufficient.

But can doing this in a header be completely sufficient in practice? e.g., the user who passes a pointer rather than an array is in for quite a surprise, or passing a struct, or passing a FAM, etc. If we want to put constraints on the interface, that may be more challenging to do from a header file than from the compiler. offsetof is a cautionary tale in that compilers that want a reasonable QoI basically all implement this as a builtin rather than the header-only version.

> Plus, implementing as a macro in a header (probably <stddef.h>) makes also a feature test, for those applications that already have something similar. 
> this was basically what we did for `unreachable` and I think it worked out fine.

True!

I'm still thinking on how important rank + extent is vs overall array length. If C had constexpr functions, then I'd almost certainly want array rank and extent to be the building blocks and then lengthof can be a constexpr function looping over rank and summing extents. But we don't have that yet, and "bird hand" vs "bird in bush"... :-D

~Aaron

-----Original Message-----
From: Jens Gustedt <jens.gustedt@inria.fr> 
Sent: Wednesday, August 14, 2024 8:18 AM
To: Ballman, Aaron <aaron.ballman@intel.com>; Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero <xavi.dcr@tutanota.com>
Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; Florian Weimer <fweimer@redhat.com>; Andreas Schwab <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>
Subject: RE: v2.1 Draft for a lengthof paper

Hi Aaron,

Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> Sorry for top-posting, my work account is stuck on Outlook. :-/
> 
> > For a WG14 paper you should add these findings to support that choice.
> > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> 
> +1, it's always good to explain prior art and existing uses as part of the paper. However, please also point out that C++ has a prior art as well which is slightly different and very much worth considering: they have one API for getting the array's rank, and another for getting a specific rank's extent. This is a general solution that doesn't require the programmer to have deep knowledge of C's declarator syntax and how it relates to multidimensional arrays.
> 
> That said, I suspect WG14 would not be keen on standardizing `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break: 
> 
> https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src
> /cmd/mailx/names.c?L53-55
> https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_f
> w.c?L292-294
> https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/bl
> ob/src/spur64.stack/validImage.c?L7014-7018
> (and many, many others)
> 
> >> > As for the parentheses, I personally think lengthof should follow 
> >> > similar rules compared to sizeof.
> >> 
> >> I think most people agree with this.
> >
> > I still don't, in particular not for standardisation.
> > 
> > We have to remember that there are many small C compilers out there.
> 
> Those compilers already have to handle parsing this for sizeof, so that's not particularly compelling (even if we wanted to design C for the lowest common denominator of implementation effort, which I'm not convinced is a good approach these days). That said, if we went with a rank/extent design, I think we'd *have* to use parens because the extent interface would take two operands (the array and the rank you're interested in getting the extent of) and it would be inconsistent for the rank interface to then not require parens.

I think that this argument goes too short. E. g. implementation that already have compound expressions (or lambdas ;-) may provide a quality implementation using `static_assert` and `typeof` alone, and don't have to touch their compiler at all.

We should not impose an implementation in the language where doing it in a header can be completely sufficient.

Plus, implementing as a macro in a header (probably <stddef.h>) makes also a feature test, for those applications that already have something similar. 
this was basically what we did for `unreachable` and I think it worked out fine.

Jens

> ~Aaron
> 
> -----Original Message-----
> From: Jens Gustedt <jens.gustedt@inria.fr>
> Sent: Wednesday, August 14, 2024 2:11 AM
> To: Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero 
> <xavi.dcr@tutanota.com>
> Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh 
> <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers 
> <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub 
> Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing 
> Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; 
> Florian Weimer <fweimer@redhat.com>; Andreas Schwab 
> <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang 
> <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>; Ballman, 
> Aaron <aaron.ballman@intel.com>
> Subject: Re: v2.1 Draft for a lengthof paper
> 
> Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> > Hi Xavier,
> > 
> > On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > > I have been overseeing these last emails -
> > 
> > Ahhh, good to know; thanks!  :)
> > 
> > > thank you very much for your
> > > efforts, Alex!
> > 
> > :-)
> > 
> > > I did not reply until now because I do not have prior experience 
> > > with gcc internals, so my feedback would probably have not been 
> > > that useful.
> > 
> > Ok.
> > 
> > > Those emails from 2020 were in fact discussing two completely 
> > > different proposals at once:
> > > 
> > > 1. Add _Lengthof + #include <stdlengthof.h> 2. Allow static 
> > > qualifier on compound literals
> > 
> > Yup.
> > 
> > > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and 
> > > as you already know by now, proposal #1 received some negative 
> > > feedback, suggesting _Typeof/typeof + some macro magic as a 
> > > pragmatic workaround instead.
> > 
> > The original author of that negative feedback talked to me in 
> > private a week ago, and said he likes my proposal.  We have no 
> > negative feedback anymore.  :)
> > 
> > > Since the proposal did not get much traction and I would had been 
> > > unable to contribute to gcc myself, I just gave up on it. IIRC the 
> > > deadline for new proposals closed soon after, anyway.
> > 
> > Ok.
> > 
> > > But I am glad that someone with proper experience took the initiative.
> > 
> > Fun fact: this is my second non-trivial patch to GCC.  I wouldn't 
> > say I had the proper experience with GCC internals when I started 
> > this patch set.  But I'm unemployed at the moment, which gives me 
> > all the time I need for learning those.  :)
> > 
> > > I still think the proposal is relevant and has interesting use cases.
> > > 
> > > > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > > > Depending on feedback, I'll propose the uglified version.
> > > 
> > > Probably, all of us know why the uglified version is the usual 
> > > approach preferred by the C standard: we do not know how many 
> > > applications would break otherwise.
> > 
> > Yup.
> > 
> > > However, we see that this trend is now changing with C23, so 
> > > probably it makes sense to define lengthof directly.
> > 
> > Yeah, since Jens is in WG14 and he suggested to follow this trend, 
> > maybe we can.  If not, it's trivial to change the proposal to use 
> > the uglified name plus a macro.
> > 
> > Checking <https://codesearch.debian.net>, I see that while several 
> > projects have a lengthof() macro, all of them use it with semantics 
> > compatible with this keyword, so it shouldn't break too much.  Maybe 
> > those projects will start receiving diagnostics that they're 
> > redefining a standard keyword, but that's not too bad.
> 
> For a WG14 paper you should add these findings to support that choice.
> Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> 
> > > As for the parentheses, I personally think lengthof should follow 
> > > similar rules compared to sizeof.
> > 
> > I think most people agree with this.
> 
> I still don't, in particular not for standardisation.
> 
> We have to remember that there are many small C compilers out there. 
> I would not want unnecessary burden on them. So my preferred choice would be a standardisation as a macro, similar to offsetof.
> gcc (and clang) could then just map that to their builtin, other compilers could use whatever they have at the moment, even just the macros that you have in the paper as a starting point. 
> 
> The rest would be "quality of implementation"
> 
> What time horizon do you see to add the feature for array parameters?
> 
> Thanks
> Jens
> 
> 
> > > Best regards,
> > 
> > Have a lovely night!
> > Alex
> > 
> 
> 
> --
> Jens Gustedt - INRIA & ICube, Strasbourg, France
Alejandro Colomar Aug. 14, 2024, 12:58 p.m. UTC | #12
Hi Aaron, Jens,

On Wed, Aug 14, 2024 at 02:17:52PM GMT, Jens Gustedt wrote:
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part
> > of the paper. However, please also point out that C++ has a prior
> > art as well which is slightly different and very much worth
> > considering: they have one API for getting the array's rank,
> > and another for getting a specific rank's extent. This is a general
> > solution that doesn't require the programmer to have deep knowledge
> > of C's declarator syntax and how it relates to multidimensional
> > arrays.

I have added that to my draft.  I'll publish it soon as a reply to the
GCC mailing list.  See below for details of what I have added for now.

> > 
> > That said, I suspect WG14 would not be keen on standardizing
> > `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break: 
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)

What regex did you use for searching?

I was thinking of renaming the proposal to elementsof(), to avoid
confusion between length of an array and length of a string.  Would you
mind checking if elementsof() is ok?

> > >> > As for the parentheses, I personally think lengthof should follow 
> > >> > similar rules compared to sizeof.
> > >> 
> > >> I think most people agree with this.
> > >
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so
> > that's not particularly compelling

Agree.  I suspect it will be simpler for existing compilers to follow
sizeof than to have new syntax.  However, it's easy to keep it as a QoI
detail, so I've temporarily changed the wording to require parentheses,
and let implementations lift that restriction.

> > (even if we wanted to design C
> > for the lowest common denominator of implementation effort, which
> > I'm not convinced is a good approach these days).

Off-topic, but I wish that had been the approach when a few
implementations (I suspect proprietary vendors; this was never
disclosed) rejected redefining NULL as the right thing: (void *) 0.

I fixed one of the last free-software implementations of NULL that
expanded to 0, and nullptr would probably never have been added if WG14
had not accepted the pressure from such horrible implementations.

<https://github.com/cc65/cc65/issues/1823>

> > That said, if we went with a rank/extent design, I think we'd *have*
> > to use parens because the extent interface would take two operands
> > (the array and the rank you're interested in getting the extent of)
> > and it would be inconsistent for the rank interface to then not
> > require parens.

   Prior art
     C
            It is common in C programs to get the number of elements of
            an array via the usual sizeof division and  wrap  it  in  a
            macro.  Common names include:

            •  ARRAY_SIZE()
            •  NELEM()
            •  NELEMS()
            •  NITEMS()
            •  NELTS()
            •  elementsof()
            •  lengthof()

     C++
            In  C++,  there  are several standard features to determine
            the number of elements of an array:

            std::size()   (since C++17)
            std::ssize()  (since C++20)
                   The syntax of these is  identical  to  the  usual  C
                   macros named above.

                   It’s  a  bit different, since it’s a general purpose
                   sizing template, which works on non‐array types too,
                   with different semantics.

                   But when applied to an array, it has the same seman‐
                   tics as the macros above.

            std::extent  (since C++23)
                   The syntax of this is quite different.   It  uses  a
                   numeric index as a second parameter to determine the
                   dimension  in which the number of elements should be
                   counted.

                   C arrays are much simpler than C++’s many array‐like
                   types, and I don’t see a reason why  we  would  need
                   something  as  complex  as  std::extent  in C.  Cer‐
                   tainly, existing projects have not developed such  a
                   macro, even if it is technically possible:

                       #define DEREFERENCE(a, n) DEREFERENCE_ ## n (a, c)
                       #define DEREFERENCE_9(a)  (*********(a))
                       #define DEREFERENCE_8(a)  (********(a))
                       #define DEREFERENCE_7(a)  (*******(a))
                       #define DEREFERENCE_6(a)  (******(a))
                       #define DEREFERENCE_5(a)  (*****(a))
                       #define DEREFERENCE_4(a)  (****(a))
                       #define DEREFERENCE_3(a)  (***(a))
                       #define DEREFERENCE_2(a)  (**(a))
                       #define DEREFERENCE_1(a)  (*(a))
                       #define DEREFERENCE_0(a)  ((a))
                       #define extent(a, n)      nitems(DEREFERENCE(a, n))

                   If any project needs that syntax, they can implement
                   their  own  trivial  wrapper  macro, as demonstrated
                   above.

            Existing prior art in C seems to favour a design that  fol‐
            lows the syntax of other operators like sizeof.

> I think that this argument goes too short. E. g. implementation that
> already have compound expressions (or lambdas ;-) may provide a
> quality implementation using `static_assert` and `typeof` alone, and
> don't have to touch their compiler at all.
> 
> We should not impose an implementation in the language where doing it
> in a header can be completely sufficient.

I have concerns about a libc (or a predefined macro) implementation:
the sizeof division causes double evaluation with any VLAs, while my
implementation for GCC has less cases of evaluation, and when it needs
to evaluate, it only does it once.  It would be hard to find a good
wording that would allow an implementation to implement this as a macro.

   constexpr
     The  usual  sizeof division evaluates the operand and results in a
     run‐time value in cases where it wouldn’t be  necessary.   If  the
     top‐level  array  number  of  elements is determined by an integer
     constant expression, but an internal array is a VLA,  sizeof  must
     evaluate:

            int  a[7][n];
            int  (*p)[7][n];

            p = &a;
            nitems(*p++);

     With  a  elementsof operator, this would result in an integer con‐
     stant expression of value 7.

   Double evaluation
     With the sizeof‐based implementation from above, the example  from
     above causes double evaluation of *p++.

> Plus, implementing as a macro in a header (probably <stddef.h>) makes
> also a feature test, for those applications that already have
> something similar. 

This is interesting.  But I think an implementation could just

	#define lengthof lengthof

to provide a feature-test macro.

> this was basically what we did for `unreachable` and I think it worked
> out fine.
> 
> Jens

Have a lovely day!
Alex
Alejandro Colomar Aug. 14, 2024, 1:13 p.m. UTC | #13
Hi Aaron,

On Wed, Aug 14, 2024 at 12:40:41PM GMT, Ballman, Aaron wrote:
> > We should not impose an implementation in the language where doing
> > it in a header can be completely sufficient.
> 
> But can doing this in a header be completely sufficient in practice?
> e.g., the user who passes a pointer rather than an array is in for
> quite a surprise, or passing a struct, or passing a FAM, etc. If we
> want to put constraints on the interface, that may be more challenging
> to do from a header file than from the compiler.

I've provided a C23-portable and safe implementation of lengthof() as a
macro:

   Portability
     Prior  to C23 it was impossible to do this portably, but since C23
     it is possible to portably write a macro that determines the  num‐
     ber  of  elements  of an array, that is, the number of elements in
     the array.

            #define must_be(e)                                      \
            (                                                       \
                0 * (int) sizeof(                                   \
                    struct {                                        \
                        static_assert(e);                           \
                        int ISO_C_forbids_a_struct_with_no_members; \
                    }                                               \
                )                                                   \
            )
            #define is_array(a)                                     \
            (                                                       \
                _Generic(&(a),                                      \
                    typeof((a)[0]) **:  0,                          \
                    default:            1                           \
                )                                                   \
            )
            #define sizeof_array(a)  (sizeof(a) + must_be(is_array(a)))
            #define nitems(a)        (sizeof_array(a) / sizeof((a)[0]))

     While diagnostics could be better, with good  helper‐macro  names,
     they are decent.

The issues with this implementation are also listed in the paper.
Here's a TL;DR:

-  It doesn't accept type names.

-  In results unnecessarily in run-time values where a keyword could
   result in an integer constant expression:

            int  a[7][n];
            int  (*p)[7][n];

            p = &a;
            nitems(*p++);

-  Double evaluation: not only the macro evaluates in more cases than a
   keyword, it evaluates twice (due to the two sizeof calls).

-  Less diagnostics.  Since there are less constant expressions, there
   are less opportunities to catch UB.

So far, we've lived with all of those issues (plus the lack of
portability, since this could only be implemented via compiler
extensions until C23).

But ideally, I'd like to avoid the wording juggling that would be
required to allow such an implementation.  Here's an example of the
difference in wording that would be required:

     The elementsof operator yields the number of elements
     of its operand.
     The number of elements is determined from the type of the operand.
     The result is an integer.
     If the number of elements of the array type is variable,
     the operand is evaluated;
    +otherwise,
    +if the operand is a variable-length array,
    +it is unspecified whether the operand is evaluated;
     otherwise,
     the operand is not evaluated and the result is an integer constant.
    +If the operand is evaluated,
    +it is unspecified the number of times it is evaluated.

Which sounds very suspicious.

> I'm still thinking on how important rank + extent is vs overall array
> length. If C had constexpr functions, then I'd almost certainly want
> array rank and extent to be the building blocks and then lengthof can
> be a constexpr function looping over rank and summing extents. But we
> don't have that yet, and "bird hand" vs "bird in bush"... :-D

Or you can build it the other way around: define extent() as a macro
that wraps lengthof().

About rank, I suspect you could also develop something with _Generic(3),
but I didn't try.

Cheers,
Alex
Ballman, Aaron Aug. 14, 2024, 1:21 p.m. UTC | #14
> What regex did you use for searching?

I went cheap and easy rather than trying to narrow down:
https://sourcegraph.com/search?q=context:global+lang:C+lengthof&patternType=regexp&sm=0

> I was thinking of renaming the proposal to elementsof(), to avoid confusion between length of an array and length of a string.  Would you mind checking if elementsof() is ok?

From what I was seeing, it looks to be used more uniformly as a function-like macro accepting a single argument.

~Aaron

-----Original Message-----
From: Alejandro Colomar <alx@kernel.org> 
Sent: Wednesday, August 14, 2024 8:58 AM
To: Jens Gustedt <jens.gustedt@inria.fr>; Ballman, Aaron <aaron.ballman@intel.com>
Cc: Xavier Del Campo Romero <xavi.dcr@tutanota.com>; Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; Florian Weimer <fweimer@redhat.com>; Andreas Schwab <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>
Subject: Re: v2.1 Draft for a lengthof paper

Hi Aaron, Jens,

On Wed, Aug 14, 2024 at 02:17:52PM GMT, Jens Gustedt wrote:
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part
> > of the paper. However, please also point out that C++ has a prior 
> > art as well which is slightly different and very much worth
> > considering: they have one API for getting the array's rank, and 
> > another for getting a specific rank's extent. This is a general 
> > solution that doesn't require the programmer to have deep knowledge 
> > of C's declarator syntax and how it relates to multidimensional 
> > arrays.

I have added that to my draft.  I'll publish it soon as a reply to the GCC mailing list.  See below for details of what I have added for now.

> > 
> > That said, I suspect WG14 would not be keen on standardizing 
> > `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break:
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/s
> > rc/cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod
> > _fw.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/
> > blob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)

What regex did you use for searching?

I was thinking of renaming the proposal to elementsof(), to avoid confusion between length of an array and length of a string.  Would you mind checking if elementsof() is ok?

> > >> > As for the parentheses, I personally think lengthof should 
> > >> > follow similar rules compared to sizeof.
> > >> 
> > >> I think most people agree with this.
> > >
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so 
> > that's not particularly compelling

Agree.  I suspect it will be simpler for existing compilers to follow sizeof than to have new syntax.  However, it's easy to keep it as a QoI detail, so I've temporarily changed the wording to require parentheses, and let implementations lift that restriction.

> > (even if we wanted to design C
> > for the lowest common denominator of implementation effort, which 
> > I'm not convinced is a good approach these days).

Off-topic, but I wish that had been the approach when a few implementations (I suspect proprietary vendors; this was never
disclosed) rejected redefining NULL as the right thing: (void *) 0.

I fixed one of the last free-software implementations of NULL that expanded to 0, and nullptr would probably never have been added if WG14 had not accepted the pressure from such horrible implementations.

<https://github.com/cc65/cc65/issues/1823>

> > That said, if we went with a rank/extent design, I think we'd *have* 
> > to use parens because the extent interface would take two operands 
> > (the array and the rank you're interested in getting the extent of) 
> > and it would be inconsistent for the rank interface to then not 
> > require parens.

   Prior art
     C
            It is common in C programs to get the number of elements of
            an array via the usual sizeof division and  wrap  it  in  a
            macro.  Common names include:

            •  ARRAY_SIZE()
            •  NELEM()
            •  NELEMS()
            •  NITEMS()
            •  NELTS()
            •  elementsof()
            •  lengthof()

     C++
            In  C++,  there  are several standard features to determine
            the number of elements of an array:

            std::size()   (since C++17)
            std::ssize()  (since C++20)
                   The syntax of these is  identical  to  the  usual  C
                   macros named above.

                   It’s  a  bit different, since it’s a general purpose
                   sizing template, which works on non‐array types too,
                   with different semantics.

                   But when applied to an array, it has the same seman‐
                   tics as the macros above.

            std::extent  (since C++23)
                   The syntax of this is quite different.   It  uses  a
                   numeric index as a second parameter to determine the
                   dimension  in which the number of elements should be
                   counted.

                   C arrays are much simpler than C++’s many array‐like
                   types, and I don’t see a reason why  we  would  need
                   something  as  complex  as  std::extent  in C.  Cer‐
                   tainly, existing projects have not developed such  a
                   macro, even if it is technically possible:

                       #define DEREFERENCE(a, n) DEREFERENCE_ ## n (a, c)
                       #define DEREFERENCE_9(a)  (*********(a))
                       #define DEREFERENCE_8(a)  (********(a))
                       #define DEREFERENCE_7(a)  (*******(a))
                       #define DEREFERENCE_6(a)  (******(a))
                       #define DEREFERENCE_5(a)  (*****(a))
                       #define DEREFERENCE_4(a)  (****(a))
                       #define DEREFERENCE_3(a)  (***(a))
                       #define DEREFERENCE_2(a)  (**(a))
                       #define DEREFERENCE_1(a)  (*(a))
                       #define DEREFERENCE_0(a)  ((a))
                       #define extent(a, n)      nitems(DEREFERENCE(a, n))

                   If any project needs that syntax, they can implement
                   their  own  trivial  wrapper  macro, as demonstrated
                   above.

            Existing prior art in C seems to favour a design that  fol‐
            lows the syntax of other operators like sizeof.

> I think that this argument goes too short. E. g. implementation that 
> already have compound expressions (or lambdas ;-) may provide a 
> quality implementation using `static_assert` and `typeof` alone, and 
> don't have to touch their compiler at all.
> 
> We should not impose an implementation in the language where doing it 
> in a header can be completely sufficient.

I have concerns about a libc (or a predefined macro) implementation:
the sizeof division causes double evaluation with any VLAs, while my implementation for GCC has less cases of evaluation, and when it needs to evaluate, it only does it once.  It would be hard to find a good wording that would allow an implementation to implement this as a macro.

   constexpr
     The  usual  sizeof division evaluates the operand and results in a
     run‐time value in cases where it wouldn’t be  necessary.   If  the
     top‐level  array  number  of  elements is determined by an integer
     constant expression, but an internal array is a VLA,  sizeof  must
     evaluate:

            int  a[7][n];
            int  (*p)[7][n];

            p = &a;
            nitems(*p++);

     With  a  elementsof operator, this would result in an integer con‐
     stant expression of value 7.

   Double evaluation
     With the sizeof‐based implementation from above, the example  from
     above causes double evaluation of *p++.

> Plus, implementing as a macro in a header (probably <stddef.h>) makes 
> also a feature test, for those applications that already have 
> something similar.

This is interesting.  But I think an implementation could just

	#define lengthof lengthof

to provide a feature-test macro.

> this was basically what we did for `unreachable` and I think it worked 
> out fine.
> 
> Jens

Have a lovely day!
Alex

--
<https://www.alejandro-colomar.es/>
Jₑₙₛ Gustedt Aug. 14, 2024, 1:24 p.m. UTC | #15
Am 14. August 2024 14:40:41 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> > I think that this argument goes too short. E. g. implementation that already have compound expressions (or lambdas ;-) may provide a > quality implementation using `static_assert` and `typeof` alone, and don't have to touch their compiler at all.
> >
> > We should not impose an implementation in the language where doing it in a header can be completely sufficient.
> 
> But can doing this in a header be completely sufficient in practice? 

Ithindso.

> e.g., the user who passes a pointer rather than an array is in for quite a surprise, or passing a struct, or passing a FAM, etc. If we want to put constraints on the interface, that may be more challenging to do from a header file than from the compiler. offsetof is a cautionary tale in that compilers that want a reasonable QoI basically all implement this as a builtin rather than the header-only version.

Yes,  with the tools that I listed and the ideas that are already in the
paper you can basically do all that, including given valuable feedback
in case of failure. 

I am currently on a summer bike trip, so not able to provide
a full reference implantation. But could do so, once I am back. 


> > Plus, implementing as a macro in a header (probably <stddef.h>) makes also a feature test, for those applications that already have something similar. 
> > this was basically what we did for `unreachable` and I think it worked out fine.
> 
> True!
> 
> I'm still thinking on how important rank + extent is vs overall array length. If C had constexpr functions, then I'd almost certainly want array rank and extent to be the building blocks and then lengthof can be a constexpr function looping over rank and summing extents. But we don't have that yet, and "bird hand" vs "bird in bush"... :-D

Why would you be looping? lengthof only addresses the outer dimension
sizeof would need a loop, no ?

Generally I would be opposed to imposing a complicated solution for a simple
feature

Jens

> 
> ~Aaron
> 
> -----Original Message-----
> From: Jens Gustedt <jens.gustedt@inria.fr> 
> Sent: Wednesday, August 14, 2024 8:18 AM
> To: Ballman, Aaron <aaron.ballman@intel.com>; Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero <xavi.dcr@tutanota.com>
> Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; Florian Weimer <fweimer@redhat.com>; Andreas Schwab <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>
> Subject: RE: v2.1 Draft for a lengthof paper
> 
> Hi Aaron,
> 
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part of the paper. However, please also point out that C++ has a prior art as well which is slightly different and very much worth considering: they have one API for getting the array's rank, and another for getting a specific rank's extent. This is a general solution that doesn't require the programmer to have deep knowledge of C's declarator syntax and how it relates to multidimensional arrays.
> > 
> > That said, I suspect WG14 would not be keen on standardizing `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break: 
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src
> > /cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_f
> > w.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/bl
> > ob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)
> > 
> > >> > As for the parentheses, I personally think lengthof should follow 
> > >> > similar rules compared to sizeof.
> > >> 
> > >> I think most people agree with this.
> > >
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so that's not particularly compelling (even if we wanted to design C for the lowest common denominator of implementation effort, which I'm not convinced is a good approach these days). That said, if we went with a rank/extent design, I think we'd *have* to use parens because the extent interface would take two operands (the array and the rank you're interested in getting the extent of) and it would be inconsistent for the rank interface to then not require parens.
> 
> I think that this argument goes too short. E. g. implementation that already have compound expressions (or lambdas ;-) may provide a quality implementation using `static_assert` and `typeof` alone, and don't have to touch their compiler at all.
> 
> We should not impose an implementation in the language where doing it in a header can be completely sufficient.
> 
> Plus, implementing as a macro in a header (probably <stddef.h>) makes also a feature test, for those applications that already have something similar. 
> this was basically what we did for `unreachable` and I think it worked out fine.
> 
> Jens
> 
> > ~Aaron
> > 
> > -----Original Message-----
> > From: Jens Gustedt <jens.gustedt@inria.fr>
> > Sent: Wednesday, August 14, 2024 2:11 AM
> > To: Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero 
> > <xavi.dcr@tutanota.com>
> > Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh 
> > <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers 
> > <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub 
> > Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing 
> > Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; 
> > Florian Weimer <fweimer@redhat.com>; Andreas Schwab 
> > <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang 
> > <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>; Ballman, 
> > Aaron <aaron.ballman@intel.com>
> > Subject: Re: v2.1 Draft for a lengthof paper
> > 
> > Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> > > Hi Xavier,
> > > 
> > > On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > > > I have been overseeing these last emails -
> > > 
> > > Ahhh, good to know; thanks!  :)
> > > 
> > > > thank you very much for your
> > > > efforts, Alex!
> > > 
> > > :-)
> > > 
> > > > I did not reply until now because I do not have prior experience 
> > > > with gcc internals, so my feedback would probably have not been 
> > > > that useful.
> > > 
> > > Ok.
> > > 
> > > > Those emails from 2020 were in fact discussing two completely 
> > > > different proposals at once:
> > > > 
> > > > 1. Add _Lengthof + #include <stdlengthof.h> 2. Allow static 
> > > > qualifier on compound literals
> > > 
> > > Yup.
> > > 
> > > > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and 
> > > > as you already know by now, proposal #1 received some negative 
> > > > feedback, suggesting _Typeof/typeof + some macro magic as a 
> > > > pragmatic workaround instead.
> > > 
> > > The original author of that negative feedback talked to me in 
> > > private a week ago, and said he likes my proposal.  We have no 
> > > negative feedback anymore.  :)
> > > 
> > > > Since the proposal did not get much traction and I would had been 
> > > > unable to contribute to gcc myself, I just gave up on it. IIRC the 
> > > > deadline for new proposals closed soon after, anyway.
> > > 
> > > Ok.
> > > 
> > > > But I am glad that someone with proper experience took the initiative.
> > > 
> > > Fun fact: this is my second non-trivial patch to GCC.  I wouldn't 
> > > say I had the proper experience with GCC internals when I started 
> > > this patch set.  But I'm unemployed at the moment, which gives me 
> > > all the time I need for learning those.  :)
> > > 
> > > > I still think the proposal is relevant and has interesting use cases.
> > > > 
> > > > > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > > > > Depending on feedback, I'll propose the uglified version.
> > > > 
> > > > Probably, all of us know why the uglified version is the usual 
> > > > approach preferred by the C standard: we do not know how many 
> > > > applications would break otherwise.
> > > 
> > > Yup.
> > > 
> > > > However, we see that this trend is now changing with C23, so 
> > > > probably it makes sense to define lengthof directly.
> > > 
> > > Yeah, since Jens is in WG14 and he suggested to follow this trend, 
> > > maybe we can.  If not, it's trivial to change the proposal to use 
> > > the uglified name plus a macro.
> > > 
> > > Checking <https://codesearch.debian.net>, I see that while several 
> > > projects have a lengthof() macro, all of them use it with semantics 
> > > compatible with this keyword, so it shouldn't break too much.  Maybe 
> > > those projects will start receiving diagnostics that they're 
> > > redefining a standard keyword, but that's not too bad.
> > 
> > For a WG14 paper you should add these findings to support that choice.
> > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> > 
> > > > As for the parentheses, I personally think lengthof should follow 
> > > > similar rules compared to sizeof.
> > > 
> > > I think most people agree with this.
> > 
> > I still don't, in particular not for standardisation.
> > 
> > We have to remember that there are many small C compilers out there. 
> > I would not want unnecessary burden on them. So my preferred choice would be a standardisation as a macro, similar to offsetof.
> > gcc (and clang) could then just map that to their builtin, other compilers could use whatever they have at the moment, even just the macros that you have in the paper as a starting point. 
> > 
> > The rest would be "quality of implementation"
> > 
> > What time horizon do you see to add the feature for array parameters?
> > 
> > Thanks
> > Jens
> > 
> > 
> > > > Best regards,
> > > 
> > > Have a lovely night!
> > > Alex
> > > 
> > 
> > 
> > --
> > Jens Gustedt - INRIA & ICube, Strasbourg, France
> 
>
Martin Uecker Aug. 14, 2024, 1:50 p.m. UTC | #16
Am Mittwoch, dem 14.08.2024 um 12:40 +0000 schrieb Ballman, Aaron:
> > I think that this argument goes too short. E. g. implementation that already have compound expressions (or lambdas
> > ;-) may provide a > quality implementation using `static_assert` and `typeof` alone, and don't have to touch their
> > compiler at all.
> > 
> > We should not impose an implementation in the language where doing it in a header can be completely sufficient.
> 
> But can doing this in a header be completely sufficient in practice? e.g., the user who passes a pointer rather than
> an array is in for quite a surprise, or passing a struct, or passing a FAM, etc. If we want to put constraints on the
> interface, that may be more challenging to do from a header file than from the compiler. offsetof is a cautionary tale
> in that compilers that want a reasonable QoI basically all implement this as a builtin rather than the header-only
> version.
> 
> > Plus, implementing as a macro in a header (probably <stddef.h>) makes also a feature test, for those applications
> > that already have something similar. 
> > this was basically what we did for `unreachable` and I think it worked out fine.
> 
> True!
> 
> I'm still thinking on how important rank + extent is vs overall array length. If C had constexpr functions, then I'd
> almost certainly want array rank and extent to be the building blocks and then lengthof can be a constexpr function
> looping over rank and summing extents. But we don't have that yet, and "bird hand" vs "bird in bush"... :-D

An operator that returns an array with all dimensions of a multi-dimensional
array would make a a lot of sense to me. 


double array[4][3][2];

// array_dims(array) = (constexpr size_t[3]){ 4, 3, 2 }

int dim1 = (array_dims(array))[0]
int dim2 = (array_dims(array))[1]
int dim3 = (array_dims(array))[2]
 
You can then implement lengthof in terms of this operator:

#define lengthof(x) (array_dims(array)[0])

and you can obtain the rank by applying lengthof to the array:

#define rank(x) lengthof(array_dims(x))


If the array is constexpr for regular arrays and array
indexing returns a constant again for constexpr arrays, this
would all work out.

Martin


> 
> ~Aaron
> 
> -----Original Message-----
> From: Jens Gustedt <jens.gustedt@inria.fr> 
> Sent: Wednesday, August 14, 2024 8:18 AM
> To: Ballman, Aaron <aaron.ballman@intel.com>; Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero
> <xavi.dcr@tutanota.com>
> Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>;
> Joseph Myers <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Kees Cook
> <keescook@chromium.org>; Qing Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; Florian Weimer
> <fweimer@redhat.com>; Andreas Schwab <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang
> <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>
> Subject: RE: v2.1 Draft for a lengthof paper
> 
> Hi Aaron,
> 
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part of the paper. However, please also point out
> > that C++ has a prior art as well which is slightly different and very much worth considering: they have one API for
> > getting the array's rank, and another for getting a specific rank's extent. This is a general solution that doesn't
> > require the programmer to have deep knowledge of C's declarator syntax and how it relates to multidimensional
> > arrays.
> > 
> > That said, I suspect WG14 would not be keen on standardizing `lengthof` without an ugly keyword given that there are
> > plenty of other uses of it that would break: 
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src
> > /cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_f
> > w.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/bl
> > ob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)
> > 
> > > > > As for the parentheses, I personally think lengthof should follow 
> > > > > similar rules compared to sizeof.
> > > > 
> > > > I think most people agree with this.
> > > 
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so that's not particularly compelling (even if we
> > wanted to design C for the lowest common denominator of implementation effort, which I'm not convinced is a good
> > approach these days). That said, if we went with a rank/extent design, I think we'd *have* to use parens because the
> > extent interface would take two operands (the array and the rank you're interested in getting the extent of) and it
> > would be inconsistent for the rank interface to then not require parens.
> 
> I think that this argument goes too short. E. g. implementation that already have compound expressions (or lambdas ;-)
> may provide a quality implementation using `static_assert` and `typeof` alone, and don't have to touch their compiler
> at all.
> 
> We should not impose an implementation in the language where doing it in a header can be completely sufficient.
> 
> Plus, implementing as a macro in a header (probably <stddef.h>) makes also a feature test, for those applications that
> already have something similar. 
> this was basically what we did for `unreachable` and I think it worked out fine.
> 
> Jens
> 
> > ~Aaron
> > 
> > -----Original Message-----
> > From: Jens Gustedt <jens.gustedt@inria.fr>
> > Sent: Wednesday, August 14, 2024 2:11 AM
> > To: Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero 
> > <xavi.dcr@tutanota.com>
> > Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh 
> > <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers 
> > <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub 
> > Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing 
> > Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; 
> > Florian Weimer <fweimer@redhat.com>; Andreas Schwab 
> > <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang 
> > <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>; Ballman, 
> > Aaron <aaron.ballman@intel.com>
> > Subject: Re: v2.1 Draft for a lengthof paper
> > 
> > Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> > > Hi Xavier,
> > > 
> > > On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > > > I have been overseeing these last emails -
> > > 
> > > Ahhh, good to know; thanks!  :)
> > > 
> > > > thank you very much for your
> > > > efforts, Alex!
> > > 
> > > :-)
> > > 
> > > > I did not reply until now because I do not have prior experience 
> > > > with gcc internals, so my feedback would probably have not been 
> > > > that useful.
> > > 
> > > Ok.
> > > 
> > > > Those emails from 2020 were in fact discussing two completely 
> > > > different proposals at once:
> > > > 
> > > > 1. Add _Lengthof + #include <stdlengthof.h> 2. Allow static 
> > > > qualifier on compound literals
> > > 
> > > Yup.
> > > 
> > > > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and 
> > > > as you already know by now, proposal #1 received some negative 
> > > > feedback, suggesting _Typeof/typeof + some macro magic as a 
> > > > pragmatic workaround instead.
> > > 
> > > The original author of that negative feedback talked to me in 
> > > private a week ago, and said he likes my proposal.  We have no 
> > > negative feedback anymore.  :)
> > > 
> > > > Since the proposal did not get much traction and I would had been 
> > > > unable to contribute to gcc myself, I just gave up on it. IIRC the 
> > > > deadline for new proposals closed soon after, anyway.
> > > 
> > > Ok.
> > > 
> > > > But I am glad that someone with proper experience took the initiative.
> > > 
> > > Fun fact: this is my second non-trivial patch to GCC.  I wouldn't 
> > > say I had the proper experience with GCC internals when I started 
> > > this patch set.  But I'm unemployed at the moment, which gives me 
> > > all the time I need for learning those.  :)
> > > 
> > > > I still think the proposal is relevant and has interesting use cases.
> > > > 
> > > > > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > > > > Depending on feedback, I'll propose the uglified version.
> > > > 
> > > > Probably, all of us know why the uglified version is the usual 
> > > > approach preferred by the C standard: we do not know how many 
> > > > applications would break otherwise.
> > > 
> > > Yup.
> > > 
> > > > However, we see that this trend is now changing with C23, so 
> > > > probably it makes sense to define lengthof directly.
> > > 
> > > Yeah, since Jens is in WG14 and he suggested to follow this trend, 
> > > maybe we can.  If not, it's trivial to change the proposal to use 
> > > the uglified name plus a macro.
> > > 
> > > Checking <https://codesearch.debian.net>, I see that while several 
> > > projects have a lengthof() macro, all of them use it with semantics 
> > > compatible with this keyword, so it shouldn't break too much.  Maybe 
> > > those projects will start receiving diagnostics that they're 
> > > redefining a standard keyword, but that's not too bad.
> > 
> > For a WG14 paper you should add these findings to support that choice.
> > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> > 
> > > > As for the parentheses, I personally think lengthof should follow 
> > > > similar rules compared to sizeof.
> > > 
> > > I think most people agree with this.
> > 
> > I still don't, in particular not for standardisation.
> > 
> > We have to remember that there are many small C compilers out there. 
> > I would not want unnecessary burden on them. So my preferred choice would be a standardisation as a macro, similar
> > to offsetof.
> > gcc (and clang) could then just map that to their builtin, other compilers could use whatever they have at the
> > moment, even just the macros that you have in the paper as a starting point. 
> > 
> > The rest would be "quality of implementation"
> > 
> > What time horizon do you see to add the feature for array parameters?
> > 
> > Thanks
> > Jens
> > 
> > 
> > > > Best regards,
> > > 
> > > Have a lovely night!
> > > Alex
> > > 
> > 
> > 
> > --
> > Jens Gustedt - INRIA & ICube, Strasbourg, France
> 
> 
> -- 
> Jens Gustedt - INRIA & ICube, Strasbourg, France
Jₑₙₛ Gustedt Aug. 14, 2024, 1:50 p.m. UTC | #17
Am 14. August 2024 14:58:16 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> Hi Aaron, Jens,
> 
> On Wed, Aug 14, 2024 at 02:17:52PM GMT, Jens Gustedt wrote:
> > Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> > > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > > 
> > > > For a WG14 paper you should add these findings to support that choice.
> > > > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> > > 
> > > +1, it's always good to explain prior art and existing uses as part
> > > of the paper. However, please also point out that C++ has a prior
> > > art as well which is slightly different and very much worth
> > > considering: they have one API for getting the array's rank,
> > > and another for getting a specific rank's extent. This is a general
> > > solution that doesn't require the programmer to have deep knowledge
> > > of C's declarator syntax and how it relates to multidimensional
> > > arrays.
> 
> I have added that to my draft.  I'll publish it soon as a reply to the
> GCC mailing list.  See below for details of what I have added for now.
> 
> > > 
> > > That said, I suspect WG14 would not be keen on standardizing
> > > `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break: 
> > > 
> > > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> > > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> > > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > > (and many, many others)
> 
> What regex did you use for searching?
> 
> I was thinking of renaming the proposal to elementsof(), to avoid
> confusion between length of an array and length of a string.  Would you
> mind checking if elementsof() is ok?

No, not for me. I really want as to go consistently to talk about
array length for this. Consistent terminology is important.


> > > >> > As for the parentheses, I personally think lengthof should follow 
> > > >> > similar rules compared to sizeof.
> > > >> 
> > > >> I think most people agree with this.
> > > >
> > > > I still don't, in particular not for standardisation.
> > > > 
> > > > We have to remember that there are many small C compilers out there.
> > > 
> > > Those compilers already have to handle parsing this for sizeof, so
> > > that's not particularly compelling
> 
> Agree.  I suspect it will be simpler for existing compilers to follow
> sizeof than to have new syntax.  However, it's easy to keep it as a QoI
> detail, so I've temporarily changed the wording to require parentheses,
> and let implementations lift that restriction.

great ! that is a reasonable approach, I think.

> > > (even if we wanted to design C
> > > for the lowest common denominator of implementation effort, which
> > > I'm not convinced is a good approach these days).
> 
> Off-topic, but I wish that had been the approach when a few
> implementations (I suspect proprietary vendors; this was never
> disclosed) rejected redefining NULL as the right thing: (void *) 0.
> 
> I fixed one of the last free-software implementations of NULL that
> expanded to 0, and nullptr would probably never have been added if WG14
> had not accepted the pressure from such horrible implementations.
> 
> <https://github.com/cc65/cc65/issues/1823>
> 
> > > That said, if we went with a rank/extent design, I think we'd *have*
> > > to use parens because the extent interface would take two operands
> > > (the array and the rank you're interested in getting the extent of)
> > > and it would be inconsistent for the rank interface to then not
> > > require parens.
> 
>    Prior art
>      C
>             It is common in C programs to get the number of elements of
>             an array via the usual sizeof division and  wrap  it  in  a
>             macro.  Common names include:
> 
>             •  ARRAY_SIZE()
>             •  NELEM()
>             •  NELEMS()
>             •  NITEMS()
>             •  NELTS()
>             •  elementsof()
>             •  lengthof()
> 
>      C++
>             In  C++,  there  are several standard features to determine
>             the number of elements of an array:
> 
>             std::size()   (since C++17)
>             std::ssize()  (since C++20)
>                    The syntax of these is  identical  to  the  usual  C
>                    macros named above.
> 
>                    It’s  a  bit different, since it’s a general purpose
>                    sizing template, which works on non‐array types too,
>                    with different semantics.
> 
>                    But when applied to an array, it has the same seman‐
>                    tics as the macros above.
> 
>             std::extent  (since C++23)
>                    The syntax of this is quite different.   It  uses  a
>                    numeric index as a second parameter to determine the
>                    dimension  in which the number of elements should be
>                    counted.
> 
>                    C arrays are much simpler than C++’s many array‐like
>                    types, and I don’t see a reason why  we  would  need
>                    something  as  complex  as  std::extent  in C.  Cer‐
>                    tainly, existing projects have not developed such  a
>                    macro, even if it is technically possible:
> 
>                        #define DEREFERENCE(a, n) DEREFERENCE_ ## n (a, c)
>                        #define DEREFERENCE_9(a)  (*********(a))
>                        #define DEREFERENCE_8(a)  (********(a))
>                        #define DEREFERENCE_7(a)  (*******(a))
>                        #define DEREFERENCE_6(a)  (******(a))
>                        #define DEREFERENCE_5(a)  (*****(a))
>                        #define DEREFERENCE_4(a)  (****(a))
>                        #define DEREFERENCE_3(a)  (***(a))
>                        #define DEREFERENCE_2(a)  (**(a))
>                        #define DEREFERENCE_1(a)  (*(a))
>                        #define DEREFERENCE_0(a)  ((a))
>                        #define extent(a, n)      nitems(DEREFERENCE(a, n))
> 
>                    If any project needs that syntax, they can implement
>                    their  own  trivial  wrapper  macro, as demonstrated
>                    above.
> 
>             Existing prior art in C seems to favour a design that  fol‐
>             lows the syntax of other operators like sizeof.
> 
> > I think that this argument goes too short. E. g. implementation that
> > already have compound expressions (or lambdas ;-) may provide a
> > quality implementation using `static_assert` and `typeof` alone, and
> > don't have to touch their compiler at all.
> > 
> > We should not impose an implementation in the language where doing it
> > in a header can be completely sufficient.
> 
> I have concerns about a libc (or a predefined macro) implementation:
> the sizeof division causes double evaluation with any VLAs, while my
> implementation for GCC has less cases of evaluation, and when it needs
> to evaluate, it only does it once.  It would be hard to find a good
> wording that would allow an implementation to implement this as a macro.

No, we should not allow double evaluation.

putting this in a `({    })` and doing a `typedef typeof(X) _my_type;` with the macro parameter `X` at the beginning completely avoids double evaluation. So quality implantations are
possible, but perhaps differently and with other builtins than we are
imagining. Don't impose the view of one particular implementation onto others.

Somewhere was brought in an argument with `offsetof`. 
This is exactly what we need. Implementations being able to start
with a simple solution (as everybody did in the beginning of `offsetof` ), and improve that implementation at their pace when they are ready for it. 

> 
>    constexpr
>      The  usual  sizeof division evaluates the operand and results in a
>      run‐time value in cases where it wouldn’t be  necessary.   If  the
>      top‐level  array  number  of  elements is determined by an integer
>      constant expression, but an internal array is a VLA,  sizeof  must
>      evaluate:
> 
>             int  a[7][n];
>             int  (*p)[7][n];
> 
>             p = &a;
>             nitems(*p++);
> 
>      With  a  elementsof operator, this would result in an integer con‐
>      stant expression of value 7.
> 
>    Double evaluation
>      With the sizeof‐based implementation from above, the example  from
>      above causes double evaluation of *p++.
> 
> > Plus, implementing as a macro in a header (probably <stddef.h>) makes
> > also a feature test, for those applications that already have
> > something similar. 
> 
> This is interesting.  But I think an implementation could just
> 
> 	#define lengthof lengthof
> 
> to provide a feature-test macro.

Sure, but leave some slack to implementations to do this in a way that's best for them

> > this was basically what we did for `unreachable` and I think it worked
> > out fine.

I still think that the different options that we had there can be used to ask the right questions for WG14. 

Jens
Ballman, Aaron Aug. 14, 2024, 1:59 p.m. UTC | #18
> I am currently on a summer bike trip, so not able to provide a full reference implantation. But could do so, once I am back.

No need (after thinking on this a bit more, I believe you're right that this can be done in a macro-only implementation; we might not go that route in Clang because of AST matching needs and whatnot, but that's not an issue), but thank you for the offer. Please enjoy your summer bike trip! 😊

> Why would you be looping? lengthof only addresses the outer dimension sizeof would need a loop, no ?

Due to poor reading comprehension, I missed in the paper that lengthof works on the outer dimension. 😉 I think having a way to get the flattened size of a multidimensional array is a useful feature.

~Aaron

-----Original Message-----
From: Jens Gustedt <jens.gustedt@inria.fr> 
Sent: Wednesday, August 14, 2024 9:25 AM
To: Ballman, Aaron <aaron.ballman@intel.com>; Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero <xavi.dcr@tutanota.com>
Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; Florian Weimer <fweimer@redhat.com>; Andreas Schwab <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>
Subject: RE: v2.1 Draft for a lengthof paper

Am 14. August 2024 14:40:41 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> > I think that this argument goes too short. E. g. implementation that already have compound expressions (or lambdas ;-) may provide a > quality implementation using `static_assert` and `typeof` alone, and don't have to touch their compiler at all.
> >
> > We should not impose an implementation in the language where doing it in a header can be completely sufficient.
> 
> But can doing this in a header be completely sufficient in practice? 

Ithindso.

> e.g., the user who passes a pointer rather than an array is in for quite a surprise, or passing a struct, or passing a FAM, etc. If we want to put constraints on the interface, that may be more challenging to do from a header file than from the compiler. offsetof is a cautionary tale in that compilers that want a reasonable QoI basically all implement this as a builtin rather than the header-only version.

Yes,  with the tools that I listed and the ideas that are already in the paper you can basically do all that, including given valuable feedback in case of failure. 

I am currently on a summer bike trip, so not able to provide a full reference implantation. But could do so, once I am back. 


> > Plus, implementing as a macro in a header (probably <stddef.h>) makes also a feature test, for those applications that already have something similar. 
> > this was basically what we did for `unreachable` and I think it worked out fine.
> 
> True!
> 
> I'm still thinking on how important rank + extent is vs overall array 
> length. If C had constexpr functions, then I'd almost certainly want 
> array rank and extent to be the building blocks and then lengthof can 
> be a constexpr function looping over rank and summing extents. But we 
> don't have that yet, and "bird hand" vs "bird in bush"... :-D

Why would you be looping? lengthof only addresses the outer dimension sizeof would need a loop, no ?

Generally I would be opposed to imposing a complicated solution for a simple feature

Jens

> 
> ~Aaron
> 
> -----Original Message-----
> From: Jens Gustedt <jens.gustedt@inria.fr>
> Sent: Wednesday, August 14, 2024 8:18 AM
> To: Ballman, Aaron <aaron.ballman@intel.com>; Alejandro Colomar 
> <alx@kernel.org>; Xavier Del Campo Romero <xavi.dcr@tutanota.com>
> Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh 
> <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers 
> <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub 
> Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing 
> Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; 
> Florian Weimer <fweimer@redhat.com>; Andreas Schwab 
> <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang 
> <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>
> Subject: RE: v2.1 Draft for a lengthof paper
> 
> Hi Aaron,
> 
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" <aaron.ballman@intel.com>:
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part of the paper. However, please also point out that C++ has a prior art as well which is slightly different and very much worth considering: they have one API for getting the array's rank, and another for getting a specific rank's extent. This is a general solution that doesn't require the programmer to have deep knowledge of C's declarator syntax and how it relates to multidimensional arrays.
> > 
> > That said, I suspect WG14 would not be keen on standardizing `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break: 
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/s
> > rc
> > /cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod
> > _f
> > w.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/
> > bl
> > ob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)
> > 
> > >> > As for the parentheses, I personally think lengthof should 
> > >> > follow similar rules compared to sizeof.
> > >> 
> > >> I think most people agree with this.
> > >
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so that's not particularly compelling (even if we wanted to design C for the lowest common denominator of implementation effort, which I'm not convinced is a good approach these days). That said, if we went with a rank/extent design, I think we'd *have* to use parens because the extent interface would take two operands (the array and the rank you're interested in getting the extent of) and it would be inconsistent for the rank interface to then not require parens.
> 
> I think that this argument goes too short. E. g. implementation that already have compound expressions (or lambdas ;-) may provide a quality implementation using `static_assert` and `typeof` alone, and don't have to touch their compiler at all.
> 
> We should not impose an implementation in the language where doing it in a header can be completely sufficient.
> 
> Plus, implementing as a macro in a header (probably <stddef.h>) makes also a feature test, for those applications that already have something similar. 
> this was basically what we did for `unreachable` and I think it worked out fine.
> 
> Jens
> 
> > ~Aaron
> > 
> > -----Original Message-----
> > From: Jens Gustedt <jens.gustedt@inria.fr>
> > Sent: Wednesday, August 14, 2024 2:11 AM
> > To: Alejandro Colomar <alx@kernel.org>; Xavier Del Campo Romero 
> > <xavi.dcr@tutanota.com>
> > Cc: Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh 
> > <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers 
> > <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub 
> > Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing 
> > Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; 
> > Florian Weimer <fweimer@redhat.com>; Andreas Schwab 
> > <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang 
> > <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>; Ballman, 
> > Aaron <aaron.ballman@intel.com>
> > Subject: Re: v2.1 Draft for a lengthof paper
> > 
> > Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> > > Hi Xavier,
> > > 
> > > On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > > > I have been overseeing these last emails -
> > > 
> > > Ahhh, good to know; thanks!  :)
> > > 
> > > > thank you very much for your
> > > > efforts, Alex!
> > > 
> > > :-)
> > > 
> > > > I did not reply until now because I do not have prior experience 
> > > > with gcc internals, so my feedback would probably have not been 
> > > > that useful.
> > > 
> > > Ok.
> > > 
> > > > Those emails from 2020 were in fact discussing two completely 
> > > > different proposals at once:
> > > > 
> > > > 1. Add _Lengthof + #include <stdlengthof.h> 2. Allow static 
> > > > qualifier on compound literals
> > > 
> > > Yup.
> > > 
> > > > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), 
> > > > and as you already know by now, proposal #1 received some 
> > > > negative feedback, suggesting _Typeof/typeof + some macro magic 
> > > > as a pragmatic workaround instead.
> > > 
> > > The original author of that negative feedback talked to me in 
> > > private a week ago, and said he likes my proposal.  We have no 
> > > negative feedback anymore.  :)
> > > 
> > > > Since the proposal did not get much traction and I would had 
> > > > been unable to contribute to gcc myself, I just gave up on it. 
> > > > IIRC the deadline for new proposals closed soon after, anyway.
> > > 
> > > Ok.
> > > 
> > > > But I am glad that someone with proper experience took the initiative.
> > > 
> > > Fun fact: this is my second non-trivial patch to GCC.  I wouldn't 
> > > say I had the proper experience with GCC internals when I started 
> > > this patch set.  But I'm unemployed at the moment, which gives me 
> > > all the time I need for learning those.  :)
> > > 
> > > > I still think the proposal is relevant and has interesting use cases.
> > > > 
> > > > > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > > > > Depending on feedback, I'll propose the uglified version.
> > > > 
> > > > Probably, all of us know why the uglified version is the usual 
> > > > approach preferred by the C standard: we do not know how many 
> > > > applications would break otherwise.
> > > 
> > > Yup.
> > > 
> > > > However, we see that this trend is now changing with C23, so 
> > > > probably it makes sense to define lengthof directly.
> > > 
> > > Yeah, since Jens is in WG14 and he suggested to follow this trend, 
> > > maybe we can.  If not, it's trivial to change the proposal to use 
> > > the uglified name plus a macro.
> > > 
> > > Checking <https://codesearch.debian.net>, I see that while several 
> > > projects have a lengthof() macro, all of them use it with 
> > > semantics compatible with this keyword, so it shouldn't break too 
> > > much.  Maybe those projects will start receiving diagnostics that 
> > > they're redefining a standard keyword, but that's not too bad.
> > 
> > For a WG14 paper you should add these findings to support that choice.
> > Another option would be for WG14 to standardize the then existing implementation with the double underscores.
> > 
> > > > As for the parentheses, I personally think lengthof should 
> > > > follow similar rules compared to sizeof.
> > > 
> > > I think most people agree with this.
> > 
> > I still don't, in particular not for standardisation.
> > 
> > We have to remember that there are many small C compilers out there. 
> > I would not want unnecessary burden on them. So my preferred choice would be a standardisation as a macro, similar to offsetof.
> > gcc (and clang) could then just map that to their builtin, other compilers could use whatever they have at the moment, even just the macros that you have in the paper as a starting point. 
> > 
> > The rest would be "quality of implementation"
> > 
> > What time horizon do you see to add the feature for array parameters?
> > 
> > Thanks
> > Jens
> > 
> > 
> > > > Best regards,
> > > 
> > > Have a lovely night!
> > > Alex
> > > 
> > 
> > 
> > --
> > Jens Gustedt - INRIA & ICube, Strasbourg, France
> 
> 


--
Jens Gustedt - INRIA & ICube, Strasbourg, France
Alejandro Colomar Aug. 14, 2024, 2 p.m. UTC | #19
Hi Aaron,

On Wed, Aug 14, 2024 at 01:21:18PM GMT, Ballman, Aaron wrote:
> > What regex did you use for searching?
> 
> I went cheap and easy rather than trying to narrow down:
> https://sourcegraph.com/search?q=context:global+lang:C+lengthof&patternType=regexp&sm=0

Ahh, context:global seems to be what I wanted.  Where is that
documented?

> > I was thinking of renaming the proposal to elementsof(), to avoid confusion between length of an array and length of a string.  Would you mind checking if elementsof() is ok?
> 
> From what I was seeing, it looks to be used more uniformly as a
> function-like macro accepting a single argument.

Thanks!  I'll rename it to elementsof().

Cheers,
Alex

> ~Aaron
Ballman, Aaron Aug. 14, 2024, 2:07 p.m. UTC | #20
> Ahh, context:global seems to be what I wanted.  Where is that documented?

For me it is the default when I go to https://sourcegraph.com/search but there's documentation at https://sourcegraph.com/docs/code-search/working/search_contexts

> Thanks!  I'll rename it to elementsof().

Rather than renaming it, I'd say that the name chosen in the proposed text is a placeholder, and have a section in the prose that describes different naming choices, pros and cons, suggests a name from you as the author, but asks WG14 to pick the final name. I know Jens mentioned he doesn’t like the name `elementsof` and I suspect if we ask five more people we'll get about seven more opinions on what the name could/should be. 😝

~Aaron

-----Original Message-----
From: Alejandro Colomar <alx@kernel.org> 
Sent: Wednesday, August 14, 2024 10:00 AM
To: Ballman, Aaron <aaron.ballman@intel.com>
Cc: Jens Gustedt <jens.gustedt@inria.fr>; Xavier Del Campo Romero <xavi.dcr@tutanota.com>; Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; Florian Weimer <fweimer@redhat.com>; Andreas Schwab <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>
Subject: Re: v2.1 Draft for a lengthof paper

Hi Aaron,

On Wed, Aug 14, 2024 at 01:21:18PM GMT, Ballman, Aaron wrote:
> > What regex did you use for searching?
> 
> I went cheap and easy rather than trying to narrow down:
> https://sourcegraph.com/search?q=context:global+lang:C+lengthof&patter
> nType=regexp&sm=0

Ahh, context:global seems to be what I wanted.  Where is that documented?

> > I was thinking of renaming the proposal to elementsof(), to avoid confusion between length of an array and length of a string.  Would you mind checking if elementsof() is ok?
> 
> From what I was seeing, it looks to be used more uniformly as a 
> function-like macro accepting a single argument.

Thanks!  I'll rename it to elementsof().

Cheers,
Alex

> ~Aaron

--
<https://www.alejandro-colomar.es/>
Alejandro Colomar Aug. 14, 2024, 2:12 p.m. UTC | #21
Hi Martin,

On Wed, Aug 14, 2024 at 03:50:00PM GMT, Martin Uecker wrote:
> An operator that returns an array with all dimensions of a multi-dimensional
> array would make a a lot of sense to me. 
> 
> 
> double array[4][3][2];
> 
> // array_dims(array) = (constexpr size_t[3]){ 4, 3, 2 }

And what if array[4][n][2]?  No constexpr anymore, which is bad.

> 
> int dim1 = (array_dims(array))[0]
> int dim2 = (array_dims(array))[1]
> int dim3 = (array_dims(array))[2]
>  
> You can then implement lengthof in terms of this operator:
> 
> #define lengthof(x) (array_dims(array)[0])

Not really.  This implementation would result in less constant
expressions that my proposal.  That's detrimental for diagnostics and
usability.

And the fundamental operator would be very complex, to allow users
implementing simpler wrappers.  I think the fundamental operators should
be as simple as possible, in the spirit of C, and let users build on top
of those basic tools.

This reminds me of the 'static' specifier for array parameters, which is
conflated with two meanings: nonnull and length.  I'd rather have a way
to specify nullness, and another one to specify length, and let users
compose them.

At first glance I oppose this array_dims operator.

> and you can obtain the rank by applying lengthof to the array:
> 
> #define rank(x) lengthof(array_dims(x))

I'm curious to see what kind of code would be enabled by a rank()
operator in C that we can't write at the moment.

> If the array is constexpr for regular arrays and array
> indexing returns a constant again for constexpr arrays, this
> would all work out.
> 
> Martin

Have a lovely day!
Alex
Alejandro Colomar Aug. 14, 2024, 2:31 p.m. UTC | #22
Hi Aaron,

On Wed, Aug 14, 2024 at 01:59:58PM GMT, Ballman, Aaron wrote:
> > Why would you be looping? lengthof only addresses the outer dimension sizeof would need a loop, no ?
> 
> Due to poor reading comprehension, I missed in the paper that lengthof
> works on the outer dimension. 😉 I think having a way to get the
> flattened size of a multidimensional array is a useful feature.

As long as you know the type of the inner-most element, you can do it.
This excludes auto, but I think you usually know this.

double x[4][5][6][7];
size_t n = sizeof(x) / sizeof(double);

This hard-codes 'double', but should be good enough usually.

Cheers,
Alex
Martin Uecker Aug. 14, 2024, 2:37 p.m. UTC | #23
Am Mittwoch, dem 14.08.2024 um 16:12 +0200 schrieb Alejandro Colomar:
> Hi Martin,
> 
> On Wed, Aug 14, 2024 at 03:50:00PM GMT, Martin Uecker wrote:
> > An operator that returns an array with all dimensions of a multi-dimensional
> > array would make a a lot of sense to me. 
> > 
> > 
> > double array[4][3][2];
> > 
> > // array_dims(array) = (constexpr size_t[3]){ 4, 3, 2 }
> 
> And what if array[4][n][2]?  No constexpr anymore, which is bad.

> > 
> > int dim1 = (array_dims(array))[0]
> > int dim2 = (array_dims(array))[1]
> > int dim3 = (array_dims(array))[2]
> >  
> > You can then implement lengthof in terms of this operator:
> > 
> > #define lengthof(x) (array_dims(array)[0])
> 
> Not really.  This implementation would result in less constant
> expressions that my proposal.  That's detrimental for diagnostics and
> usability.

Yes, this would be a downside when implementing lengthof
in this way.

> 
> And the fundamental operator would be very complex, to allow users
> implementing simpler wrappers.  I think the fundamental operators should
> be as simple as possible, in the spirit of C, and let users build on top
> of those basic tools.
> 
> This reminds me of the 'static' specifier for array parameters, which is
> conflated with two meanings: nonnull and length.  I'd rather have a way
> to specify nullness, and another one to specify length, and let users
> compose them.
> 
> At first glance I oppose this array_dims operator.

Opinionated as usual ;-)

> > and you can obtain the rank by applying lengthof to the array:
> > 
> > #define rank(x) lengthof(array_dims(x))
> 
> I'm curious to see what kind of code would be enabled by a rank()
> operator in C that we can't write at the moment.

There seems to be no generic way to get all dimensions from
a multi-dimensional array of arbitrary rank.


Martin

> 
> > If the array is constexpr for regular arrays and array
> > indexing returns a constant again for constexpr arrays, this
> > would all work out.
> > 
> > Martin
> 
> Have a lovely day!
> Alex
>
Alejandro Colomar Aug. 14, 2024, 2:47 p.m. UTC | #24
On Wed, Aug 14, 2024 at 03:50:21PM GMT, Jens Gustedt wrote:
> > > > 
> > > > That said, I suspect WG14 would not be keen on standardizing
> > > > `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break: 
> > > > 
> > > > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> > > > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> > > > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > > > (and many, many others)
> > 
> > What regex did you use for searching?
> > 
> > I was thinking of renaming the proposal to elementsof(), to avoid
> > confusion between length of an array and length of a string.  Would you
> > mind checking if elementsof() is ok?
> 
> No, not for me. I really want as to go consistently to talk about
> array length for this. Consistent terminology is important.

I understand your desire for consistency.  I think your paper is a net
improvement over the status quo (which is a mix of length, size, and
number of elements).  After your proposal, there will be only length and
number of elements.  That's great.

However, strlen(3) came first, and we must respect it.

Since you haven't proposed eliminating "number of elements" from the
standard, and it would still be used alongside length, I think
elementsof() would be consistent with your view (consistent with "number
of elements").

Alternatively, you could use a new term, for example extent, for
referring to the number of elements of an array.  That would be more
respectful to strlen(3), keeping a strong distinction between string
length and array ******.

Or how about always referring to it as "number of elements"?  It's
longer to type, but would be the most consistent approach.

Also, elementsof() is free to use, while lengthof() has a several
existing incompatible cases (as Aaron has shown), so we can't use that
name so freely.

> > I have concerns about a libc (or a predefined macro) implementation:
> > the sizeof division causes double evaluation with any VLAs, while my
> > implementation for GCC has less cases of evaluation, and when it needs
> > to evaluate, it only does it once.  It would be hard to find a good
> > wording that would allow an implementation to implement this as a macro.
> 
> No, we should not allow double evaluation.
> 
> putting this in a `({    })`

I would love to see a proposal for adding this GNU extension to ISO C.
Did nobody do it yet?  I could try to, if I find some time.  (But I'll
take a longish time for that; if anyone else does it, it would be
great.)

> and doing a `typedef typeof(X) _my_type;` with the macro parameter `X` at the beginning completely avoids double evaluation. So quality implantations are
> possible, but perhaps differently and with other builtins than we are
> imagining. Don't impose the view of one particular implementation onto others.

Ahhh, good.  I haven't thought of that possibility.  Sure, that makes
sense now.  It gives more strength to your proposal of allowing libc
implementations, and thus require parens in the standard.

> Somewhere was brought in an argument with `offsetof`. 
> This is exactly what we need. Implementations being able to start
> with a simple solution (as everybody did in the beginning of
> `offsetof`), and improve that implementation at their pace when they
> are ready for it. 

Agree.

> > > this was basically what we did for `unreachable` and I think it worked
> > > out fine.
> 
> I still think that the different options that we had there can be used
> to ask the right questions for WG14. 

I'm looking at it.  I've already taken some parts of it.  :)

Cheers,
Alex
Ballman, Aaron Aug. 14, 2024, 2:52 p.m. UTC | #25
> I would love to see a proposal for adding this GNU extension to ISO C.
> Did nobody do it yet?  I could try to, if I find some time.  (But I'll take a longish time for that; if anyone else does it, it would be
great.)

It's been discussed but hasn't moved forward because there are design issues with it (the odd way in which it produces a resulting value, sometimes surprising behavior with how it interacts with flow control, the fact that it can't be used in all contexts, etc). The committee was leaning more towards lambdas despite those being a bit orthogonal.

~Aaron

-----Original Message-----
From: Alejandro Colomar <alx@kernel.org> 
Sent: Wednesday, August 14, 2024 10:48 AM
To: Jens Gustedt <jens.gustedt@inria.fr>
Cc: Ballman, Aaron <aaron.ballman@intel.com>; Xavier Del Campo Romero <xavi.dcr@tutanota.com>; Gcc Patches <gcc-patches@gcc.gnu.org>; Daniel Plakosh <dplakosh@cert.org>; Martin Uecker <uecker@tugraz.at>; Joseph Myers <josmyers@redhat.com>; Gabriel Ravier <gabravier@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Kees Cook <keescook@chromium.org>; Qing Zhao <qing.zhao@oracle.com>; David Brown <david.brown@hesbynett.no>; Florian Weimer <fweimer@redhat.com>; Andreas Schwab <schwab@linux-m68k.org>; Timm Baeder <tbaeder@redhat.com>; A. Jiang <de34@live.cn>; Eugene Zelenko <eugene.zelenko@gmail.com>
Subject: Re: v2.1 Draft for a lengthof paper

On Wed, Aug 14, 2024 at 03:50:21PM GMT, Jens Gustedt wrote:
> > > > 
> > > > That said, I suspect WG14 would not be keen on standardizing 
> > > > `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break:
> > > > 
> > > > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/u
> > > > sr/src/cmd/mailx/names.c?L53-55
> > > > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/
> > > > ipod_fw.c?L292-294
> > > > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-v
> > > > m/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > > > (and many, many others)
> > 
> > What regex did you use for searching?
> > 
> > I was thinking of renaming the proposal to elementsof(), to avoid 
> > confusion between length of an array and length of a string.  Would 
> > you mind checking if elementsof() is ok?
> 
> No, not for me. I really want as to go consistently to talk about 
> array length for this. Consistent terminology is important.

I understand your desire for consistency.  I think your paper is a net improvement over the status quo (which is a mix of length, size, and number of elements).  After your proposal, there will be only length and number of elements.  That's great.

However, strlen(3) came first, and we must respect it.

Since you haven't proposed eliminating "number of elements" from the standard, and it would still be used alongside length, I think
elementsof() would be consistent with your view (consistent with "number of elements").

Alternatively, you could use a new term, for example extent, for referring to the number of elements of an array.  That would be more respectful to strlen(3), keeping a strong distinction between string length and array ******.

Or how about always referring to it as "number of elements"?  It's longer to type, but would be the most consistent approach.

Also, elementsof() is free to use, while lengthof() has a several existing incompatible cases (as Aaron has shown), so we can't use that name so freely.

> > I have concerns about a libc (or a predefined macro) implementation:
> > the sizeof division causes double evaluation with any VLAs, while my 
> > implementation for GCC has less cases of evaluation, and when it 
> > needs to evaluate, it only does it once.  It would be hard to find a 
> > good wording that would allow an implementation to implement this as a macro.
> 
> No, we should not allow double evaluation.
> 
> putting this in a `({    })`

I would love to see a proposal for adding this GNU extension to ISO C.
Did nobody do it yet?  I could try to, if I find some time.  (But I'll take a longish time for that; if anyone else does it, it would be
great.)

> and doing a `typedef typeof(X) _my_type;` with the macro parameter `X` 
> at the beginning completely avoids double evaluation. So quality 
> implantations are possible, but perhaps differently and with other builtins than we are imagining. Don't impose the view of one particular implementation onto others.

Ahhh, good.  I haven't thought of that possibility.  Sure, that makes sense now.  It gives more strength to your proposal of allowing libc implementations, and thus require parens in the standard.

> Somewhere was brought in an argument with `offsetof`. 
> This is exactly what we need. Implementations being able to start with 
> a simple solution (as everybody did in the beginning of `offsetof`), 
> and improve that implementation at their pace when they are ready for 
> it.

Agree.

> > > this was basically what we did for `unreachable` and I think it 
> > > worked out fine.
> 
> I still think that the different options that we had there can be used 
> to ask the right questions for WG14.

I'm looking at it.  I've already taken some parts of it.  :)

Cheers,
Alex

--
<https://www.alejandro-colomar.es/>
Alejandro Colomar Aug. 14, 2024, 3:01 p.m. UTC | #26
Hi Aaron,

On Wed, Aug 14, 2024 at 02:07:16PM GMT, Ballman, Aaron wrote:
> > Ahh, context:global seems to be what I wanted.  Where is that documented?
> 
> For me it is the default when I go to https://sourcegraph.com/search but there's documentation at https://sourcegraph.com/docs/code-search/working/search_contexts

Ahh, no, it was a red herring.  I though that was restricting the search
to global definitions.  There's no way to restrict to definitions,
right?  I'd like a way to discard uses, since that doesn't give much
info.

But for lengthof() it seems to quickly find incomatible cases, so we
were lucky that we don't need to restrict it.

> 
> > Thanks!  I'll rename it to elementsof().
> 
> Rather than renaming it, I'd say that the name chosen in the proposed
> text is a placeholder, and have a section in the prose that describes
> different naming choices, pros and cons, suggests a name from you as
> the author, but asks WG14 to pick the final name.
> I know Jens mentioned he doesn’t like the name `elementsof` and I
> suspect if we ask five more people we'll get about seven more opinions
> on what the name could/should be. 😝

Yup, but I want to have a placeholder that would be a name that I would
like, and a defendible one.  :-)

I'll add questions at the bottom, proposing alternatives.

Cheers,
Alex
Martin Uecker Aug. 14, 2024, 3:01 p.m. UTC | #27
Am Mittwoch, dem 14.08.2024 um 14:52 +0000 schrieb Ballman, Aaron:
> > I would love to see a proposal for adding this GNU extension to ISO C.
> > Did nobody do it yet?  I could try to, if I find some time.  (But I'll take a longish time for that; if anyone else
> > does it, it would be
> great.)
> 
> It's been discussed but hasn't moved forward because there are design issues with it (the odd way in which it produces
> a resulting value, sometimes surprising behavior with how it interacts with flow control, the fact that it can't be
> used in all contexts, etc). The committee was leaning more towards lambdas despite those being a bit orthogonal.

I do not think this is a fair characterization. We did not see any proposal
for ({ }) so it is not clear where the committee is leaning more towards.

Lambdas ultimately failed because they were too complex for not 
having any implementation and user experience in C.

I agree though that lambdas could be nicer, but I still have issues
with the last type-generic version and I do not have similar objections
against ({ }).

Martin
Jₑₙₛ Gustedt Aug. 14, 2024, 3:44 p.m. UTC | #28
Am 14. August 2024 16:47:32 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> On Wed, Aug 14, 2024 at 03:50:21PM GMT, Jens Gustedt wrote:
> > > > > 
> > > > > That said, I suspect WG14 would not be keen on standardizing
> > > > > `lengthof` without an ugly keyword given that there are plenty of other uses of it that would break: 
> > > > > 
> > > > > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> > > > > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> > > > > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > > > > (and many, many others)
> > > 
> > > What regex did you use for searching?
> > > 
> > > I was thinking of renaming the proposal to elementsof(), to avoid
> > > confusion between length of an array and length of a string.  Would you
> > > mind checking if elementsof() is ok?
> > 
> > No, not for me. I really want as to go consistently to talk about
> > array length for this. Consistent terminology is important.
> 
> I understand your desire for consistency.  I think your paper is a net
> improvement over the status quo (which is a mix of length, size, and
> number of elements).  After your proposal, there will be only length and
> number of elements.  That's great.
> 
> However, strlen(3) came first, and we must respect it.

Sure,  string length, a dynamic feature, and array length are two features.

But we also have VLA and not VNEA in the standard, So we should respect this ;-)

> Since you haven't proposed eliminating "number of elements" from the
> standard, and it would still be used alongside length, I think
> elementsof() would be consistent with your view (consistent with "number
> of elements").

didn't we ? Then this is actually a good idea to do so, thanks for the idea !

"elements of" is a stretch, linguistically, because you don't mean  the
elements themselves, you are referring to their number. "elementsof" for
me would refer to a list of these elements.

> Alternatively, you could use a new term, for example extent, for
> referring to the number of elements of an array.  That would be more
> respectful to strlen(3), keeping a strong distinction between string
> length and array ******.

Only that this separation doesn't exist, even now, as said, it is called
"variable length array"

> Or how about always referring to it as "number of elements"?  It's
> longer to type, but would be the most consistent approach.
> 
> Also, elementsof() is free to use, while lengthof() has a several
> existing incompatible cases (as Aaron has shown), so we can't use that
> name so freely.
> 
> > > I have concerns about a libc (or a predefined macro) implementation:
> > > the sizeof division causes double evaluation with any VLAs, while my
> > > implementation for GCC has less cases of evaluation, and when it needs
> > > to evaluate, it only does it once.  It would be hard to find a good
> > > wording that would allow an implementation to implement this as a macro.
> > 
> > No, we should not allow double evaluation.
> >  
> > putting this in a `({    })`
> 
> I would love to see a proposal for adding this GNU extension to ISO C.
> Did nobody do it yet?  I could try to, if I find some time.  (But I'll
> take a longish time for that; if anyone else does it, it would be
> great.)
> 
> > and doing a `typedef typeof(X) _my_type;` with the macro parameter `X` at the beginning completely avoids double evaluation. So quality implantations are
> > possible, but perhaps differently and with other builtins than we are
> > imagining. Don't impose the view of one particular implementation onto others.
> 
> Ahhh, good.  I haven't thought of that possibility.  Sure, that makes
> sense now.  It gives more strength to your proposal of allowing libc
> implementations, and thus require parens in the standard.
> 
> > Somewhere was brought in an argument with `offsetof`. 
> > This is exactly what we need. Implementations being able to start
> > with a simple solution (as everybody did in the beginning of
> > `offsetof`), and improve that implementation at their pace when they
> > are ready for it. 
> 
> Agree.
> 
> > > > this was basically what we did for `unreachable` and I think it worked
> > > > out fine.
> > 
> > I still think that the different options that we had there can be used
> > to ask the right questions for WG14. 
> 
> I'm looking at it.  I've already taken some parts of it.  :)
> 
> Cheers,
> Alex
>
Alejandro Colomar Sept. 1, 2024, 9:10 a.m. UTC | #29
Hi Jens, Martin,

On Wed, Aug 14, 2024 at 05:44:57PM GMT, Jens Gustedt wrote:
> Am 14. August 2024 16:47:32 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> > > > I was thinking of renaming the proposal to elementsof(), to avoid
> > > > confusion between length of an array and length of a string.  Would you
> > > > mind checking if elementsof() is ok?
> > > 
> > > No, not for me. I really want as to go consistently to talk about
> > > array length for this. Consistent terminology is important.
> > 
> > I understand your desire for consistency.  I think your paper is a net
> > improvement over the status quo (which is a mix of length, size, and
> > number of elements).  After your proposal, there will be only length and
> > number of elements.  That's great.
> > 
> > However, strlen(3) came first, and we must respect it.
> 
> Sure,  string length, a dynamic feature, and array length are two features.
> 
> But we also have VLA and not VNEA in the standard, So we should respect this ;-)

I hadn't thought about it until yesterday after Martin insisted in
preferring lengthof over nelementsof or a contraction of it, and worried
about nelementsof possibly causing ambiguity with multi-dimensional
arrays.  But:

VLA is a misnomer.
~~~~~~~~~~~~~~~~~~

First, let's assume length refers to the number of elements, as we all
agree that length should not refer to the size in bytes of an array,
since we already have the term "size" for it, which is consistent with
sizeof.

	int vla[3][n];

The array from above is a so-called variable length array, according to
the standard.  But it does not have a variable length, according to the
presumed meaning of length.  It does indeed have a variable size.  The
element of vla is itself an array, which is the one that really has a
variable length (or number of elements, as is the more technical term).

So, if n3187 develops, and really pretends to uniquely and unambiguously
use a term for the number of elements and another one for the size of an
array, it should also rename "variable length array" into "variable size
array".

It is indeed due to this problematic misuse of the colloquial term
length that "lenght" and not "number of elements" is misleading in
multi-dimensional arrays.  The standard is very strict in using NoE for
the first dimension of an array (so its true dimension), and not for
the dimensions of arrays that are elements of it.

And now you could say that this is only a problem of multi-dimensional
arrays.  It's not.  They're just the composition of arrays with elements
of type array.  The same problem arises with single dimensional arrays
in complex situations (although, admittedly, this is non-standard):

	$ cat vla.c 
	int
	main(void)
	{
		int n = 5;

		struct s {
			int  v[n];
		};

		struct s  a[3];

		return sizeof(a);
	}
	$ gcc -Wall -Wextra -Wpedantic vla.c 
	vla.c: In function ‘main’:
	vla.c:7:22: warning: a member of a structure or union cannot have a variably modified type [-Wpedantic]
	    7 |                 int  v[n];
	      |                      ^
	$ ./a.out; echo $?
	60

a is a VLA even if it is a single-dimension array of known constant
number of elements.  Huh?  :)

Terminology
~~~~~~~~~~~

Once we've determined that "length" in VLA does refer to the size and
not the number of elements, it's hard to justify a reformation of
terminology that would base on length meaning number of elements.

Indeed, either basing justifications of the origins of length on
strlen(3) or on VLA, we must conclude that "variable length array" must
be renamed to "variable size array".  I'm preparing a paper for that.

If eventually that paper would be accepted, I'd prepare a second paper
that would reform every use of size and length with arrays so that size
always refers to the size in bytes, length is completely removed, and
number of elements stands as the only term to refer to the number of
elements.


Have a lovely day!
Alex

> > Since you haven't proposed eliminating "number of elements" from the
> > standard, and it would still be used alongside length, I think
> > elementsof() would be consistent with your view (consistent with "number
> > of elements").
> 
> didn't we ? Then this is actually a good idea to do so, thanks for the idea !
Martin Uecker Sept. 1, 2024, 9:51 a.m. UTC | #30
Alex,

I am all for making things more consistent, but there is also a cost
to changing stuff too much.   length is the established 
term in most programming languages and I would recommend to stick
to it.

Note that it is not true that the standard consistently refers to 

char a[3][n]

as a VLA. It does so in the description in sizeof but not in the
type compatibility rules, at least as understood by most compilers. 
This is an inconsistency we *should* fix, but I do not think that
changing away from "length" is a good ida.

Note that "number of elements" is inherently an ambiguous term for
multi-dimensional arrays, and I am not sure how you want to avoid
this without making the wording more complex (e.g. "number of elements
of the outermost array).

So I would recommend not to go this way. You would need a really
good argument to convince me to vote for this, and I haven't seen
any such argument.

Martin



Am Sonntag, dem 01.09.2024 um 11:10 +0200 schrieb Alejandro Colomar:
> Hi Jens, Martin,
> 
> On Wed, Aug 14, 2024 at 05:44:57PM GMT, Jens Gustedt wrote:
> > Am 14. August 2024 16:47:32 MESZ schrieb Alejandro Colomar <alx@kernel.org>:
> > > > > I was thinking of renaming the proposal to elementsof(), to avoid
> > > > > confusion between length of an array and length of a string.  Would you
> > > > > mind checking if elementsof() is ok?
> > > > 
> > > > No, not for me. I really want as to go consistently to talk about
> > > > array length for this. Consistent terminology is important.
> > > 
> > > I understand your desire for consistency.  I think your paper is a net
> > > improvement over the status quo (which is a mix of length, size, and
> > > number of elements).  After your proposal, there will be only length and
> > > number of elements.  That's great.
> > > 
> > > However, strlen(3) came first, and we must respect it.
> > 
> > Sure,  string length, a dynamic feature, and array length are two features.
> > 
> > But we also have VLA and not VNEA in the standard, So we should respect this ;-)
> 
> I hadn't thought about it until yesterday after Martin insisted in
> preferring lengthof over nelementsof or a contraction of it, and worried
> about nelementsof possibly causing ambiguity with multi-dimensional
> arrays.  But:
> 
> VLA is a misnomer.
> ~~~~~~~~~~~~~~~~~~
> 
> First, let's assume length refers to the number of elements, as we all
> agree that length should not refer to the size in bytes of an array,
> since we already have the term "size" for it, which is consistent with
> sizeof.
> 
> 	int vla[3][n];
> 
> The array from above is a so-called variable length array, according to
> the standard.  But it does not have a variable length, according to the
> presumed meaning of length.  It does indeed have a variable size.  The
> element of vla is itself an array, which is the one that really has a
> variable length (or number of elements, as is the more technical term).
> 
> So, if n3187 develops, and really pretends to uniquely and unambiguously
> use a term for the number of elements and another one for the size of an
> array, it should also rename "variable length array" into "variable size
> array".
> 
> It is indeed due to this problematic misuse of the colloquial term
> length that "lenght" and not "number of elements" is misleading in
> multi-dimensional arrays.  The standard is very strict in using NoE for
> the first dimension of an array (so its true dimension), and not for
> the dimensions of arrays that are elements of it.
> 
> And now you could say that this is only a problem of multi-dimensional
> arrays.  It's not.  They're just the composition of arrays with elements
> of type array.  The same problem arises with single dimensional arrays
> in complex situations (although, admittedly, this is non-standard):
> 
> 	$ cat vla.c 
> 	int
> 	main(void)
> 	{
> 		int n = 5;
> 
> 		struct s {
> 			int  v[n];
> 		};
> 
> 		struct s  a[3];
> 
> 		return sizeof(a);
> 	}
> 	$ gcc -Wall -Wextra -Wpedantic vla.c 
> 	vla.c: In function ‘main’:
> 	vla.c:7:22: warning: a member of a structure or union cannot have a variably modified type [-Wpedantic]
> 	    7 |                 int  v[n];
> 	      |                      ^
> 	$ ./a.out; echo $?
> 	60
> 
> a is a VLA even if it is a single-dimension array of known constant
> number of elements.  Huh?  :)
> 
> Terminology
> ~~~~~~~~~~~
> 
> Once we've determined that "length" in VLA does refer to the size and
> not the number of elements, it's hard to justify a reformation of
> terminology that would base on length meaning number of elements.
> 
> Indeed, either basing justifications of the origins of length on
> strlen(3) or on VLA, we must conclude that "variable length array" must
> be renamed to "variable size array".  I'm preparing a paper for that.
> 
> If eventually that paper would be accepted, I'd prepare a second paper
> that would reform every use of size and length with arrays so that size
> always refers to the size in bytes, length is completely removed, and
> number of elements stands as the only term to refer to the number of
> elements.
> 
> 
> Have a lovely day!
> Alex
> 
> > > Since you haven't proposed eliminating "number of elements" from the
> > > standard, and it would still be used alongside length, I think
> > > elementsof() would be consistent with your view (consistent with "number
> > > of elements").
> > 
> > didn't we ? Then this is actually a good idea to do so, thanks for the idea !
>