mbox series

[v13,0/4] c: Add __lengthof__ operator

Message ID cover.1727861750.git.alx@kernel.org
Headers show
Series c: Add __lengthof__ operator | expand

Message

Alejandro Colomar Oct. 2, 2024, 9:41 a.m. UTC
Hi!

This operator is as voted in a WG14 meeting yesterday, with the only
difference that we name it __lengthof__ instead of _Lengthof, to be able
to add it without being bound by ISO bureaucracy.

No semantic changes since v12; only the rename, according to what WG14
preferred.  WG14 agreed on the semantic changes of the operator as I
implemented them in v12.

Changes since v12:

-  Rename s/__nelementsof__/__lengthof__/
-  Fix typo in documentation.

Below is a range diff against v12.

Have a lovely day!
Alex


Alejandro Colomar (4):
  contrib/: Add support for Cc: and Link: tags
  gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
  Merge definitions of array_type_nelts_top()
  c: Add __lengthof__ operator

 contrib/gcc-changelog/git_commit.py        |   5 +-
 gcc/c-family/c-common.cc                   |  26 ++++
 gcc/c-family/c-common.def                  |   3 +
 gcc/c-family/c-common.h                    |   2 +
 gcc/c/c-decl.cc                            |  32 +++--
 gcc/c/c-fold.cc                            |   7 +-
 gcc/c/c-parser.cc                          |  62 +++++++--
 gcc/c/c-tree.h                             |   4 +
 gcc/c/c-typeck.cc                          | 118 +++++++++++++++-
 gcc/config/aarch64/aarch64.cc              |   2 +-
 gcc/config/i386/i386.cc                    |   2 +-
 gcc/cp/cp-tree.h                           |   1 -
 gcc/cp/decl.cc                             |   2 +-
 gcc/cp/init.cc                             |   8 +-
 gcc/cp/lambda.cc                           |   3 +-
 gcc/cp/operators.def                       |   1 +
 gcc/cp/tree.cc                             |  13 --
 gcc/doc/extend.texi                        |  30 +++++
 gcc/expr.cc                                |   8 +-
 gcc/fortran/trans-array.cc                 |   2 +-
 gcc/fortran/trans-openmp.cc                |   4 +-
 gcc/rust/backend/rust-tree.cc              |  13 --
 gcc/rust/backend/rust-tree.h               |   2 -
 gcc/target.h                               |   3 +
 gcc/testsuite/gcc.dg/nelementsof-compile.c | 115 ++++++++++++++++
 gcc/testsuite/gcc.dg/nelementsof-vla.c     |  46 +++++++
 gcc/testsuite/gcc.dg/nelementsof.c         | 150 +++++++++++++++++++++
 gcc/tree.cc                                |  17 ++-
 gcc/tree.h                                 |   3 +-
 29 files changed, 604 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof-compile.c
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof-vla.c
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof.c

Range-diff against v12:
1:  d7fca49888a = 1:  d7fca49888a contrib/: Add support for Cc: and Link: tags
2:  e65245ac294 = 2:  e65245ac294 gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
3:  03de2d67bb1 = 3:  03de2d67bb1 Merge definitions of array_type_nelts_top()
4:  4373c48205d ! 4:  f635871da1f c: Add __nelementsof__ operator
    @@ Metadata
     Author: Alejandro Colomar <alx@kernel.org>
     
      ## Commit message ##
    -    c: Add __nelementsof__ operator
    +    c: Add __lengthof__ operator
     
         This operator is similar to sizeof but can only be applied to an array,
         and returns its number of elements.
    @@ Commit message
     
         gcc/ChangeLog:
     
    -            * doc/extend.texi: Document __nelementsof__ operator.
    -            * target.h (enum type_context_kind): Add __nelementsof__ operator.
    +            * doc/extend.texi: Document __lengthof__ operator.
    +            * target.h (enum type_context_kind): Add __lengthof__ operator.
     
         gcc/c-family/ChangeLog:
     
                 * c-common.h
                 * c-common.def:
    -            * c-common.cc (c_nelementsof_type): Add __nelementsof__ operator.
    +            * c-common.cc (c_lengthof_type): Add __lengthof__ operator.
     
         gcc/c/ChangeLog:
     
                 * c-tree.h
    -            (c_expr_nelementsof_expr, c_expr_nelementsof_type)
    +            (c_expr_lengthof_expr, c_expr_lengthof_type)
                 * c-decl.cc
                 (start_struct, finish_struct)
                 (start_enum, finish_enum)
                 * c-parser.cc
                 (c_parser_sizeof_expression)
    -            (c_parser_nelementsof_expression)
    -            (c_parser_sizeof_or_nelementsof_expression)
    +            (c_parser_lengthof_expression)
    +            (c_parser_sizeof_or_lengthof_expression)
                 (c_parser_unary_expression)
                 * c-typeck.cc
                 (build_external_ref)
                 (record_maybe_used_decl, pop_maybe_used)
                 (is_top_array_vla)
    -            (c_expr_nelementsof_expr, c_expr_nelementsof_type):
    -            Add __nelementsof__operator.
    +            (c_expr_lengthof_expr, c_expr_lengthof_type):
    +            Add __lengthof__operator.
     
         gcc/cp/ChangeLog:
     
    -            * operators.def: Add __nelementsof__ operator.
    +            * operators.def: Add __lengthof__ operator.
     
         gcc/testsuite/ChangeLog:
     
    -            * gcc.dg/nelementsof-compile.c
    -            * gcc.dg/nelementsof-vla.c
    -            * gcc.dg/nelementsof.c: Add tests for __nelementsof__ operator.
    +            * gcc.dg/lengthof-compile.c
    +            * gcc.dg/lengthof-vla.c
    +            * gcc.dg/lengthof.c: Add tests for __lengthof__ operator.
     
         Link: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3313.pdf>
         Link: <https://inbox.sourceware.org/gcc/M8S4oQy--3-2@tutanota.com/T/>
    @@ gcc/c-family/c-common.cc: const struct c_common_resword c_common_reswords[] =
        { "__inline",		RID_INLINE,	0 },
        { "__inline__",	RID_INLINE,	0 },
        { "__label__",	RID_LABEL,	0 },
    -+  { "__nelementsof__",	RID_NELEMENTSOF, 0 },
    ++  { "__lengthof__",	RID_LENGTHOF,	0 },
        { "__null",		RID_NULL,	0 },
        { "__real",		RID_REALPART,	0 },
        { "__real__",		RID_REALPART,	0 },
    @@ gcc/c-family/c-common.cc: c_alignof_expr (location_t loc, tree expr)
     +   Return the number of elements of an array.  */
     +
     +tree
    -+c_nelementsof_type (location_t loc, tree type)
    ++c_lengthof_type (location_t loc, tree type)
     +{
     +  enum tree_code type_code;
     +
     +  type_code = TREE_CODE (type);
     +  if (type_code != ARRAY_TYPE)
     +    {
    -+      error_at (loc, "invalid application of %<nelementsof%> to type %qT", type);
    ++      error_at (loc, "invalid application of %<lengthof%> to type %qT", type);
     +      return error_mark_node;
     +    }
     +  if (!COMPLETE_TYPE_P (type))
     +    {
     +      error_at (loc,
    -+		"invalid application of %<nelementsof%> to incomplete type %qT",
    ++		"invalid application of %<lengthof%> to incomplete type %qT",
     +		type);
     +      return error_mark_node;
     +    }
    @@ gcc/c-family/c-common.def: DEFTREECODE (EXCESS_PRECISION_EXPR, "excess_precision
         number.  */
      DEFTREECODE (USERDEF_LITERAL, "userdef_literal", tcc_exceptional, 3)
      
    -+/* Represents a 'nelementsof' expression.  */
    -+DEFTREECODE (NELEMENTSOF_EXPR, "nelementsof_expr", tcc_expression, 1)
    ++/* Represents a 'lengthof' expression.  */
    ++DEFTREECODE (LENGTHOF_EXPR, "lengthof_expr", tcc_expression, 1)
     +
      /* Represents a 'sizeof' expression during C++ template expansion,
         or for the purpose of -Wsizeof-pointer-memaccess warning.  */
    @@ gcc/c-family/c-common.h: enum rid
      
        /* C extensions */
        RID_ASM,       RID_TYPEOF,   RID_TYPEOF_UNQUAL, RID_ALIGNOF,  RID_ATTRIBUTE,
    -+  RID_NELEMENTSOF,
    ++  RID_LENGTHOF,
        RID_VA_ARG,
        RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,    RID_CHOOSE_EXPR,
        RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	   RID_BUILTIN_SHUFFLE,
    @@ gcc/c-family/c-common.h: extern tree c_common_truthvalue_conversion (location_t,
      extern void c_apply_type_quals_to_decl (int, tree);
      extern tree c_sizeof_or_alignof_type (location_t, tree, bool, bool, int);
      extern tree c_alignof_expr (location_t, tree);
    -+extern tree c_nelementsof_type (location_t, tree);
    ++extern tree c_lengthof_type (location_t, tree);
      /* Print an error message for invalid operands to arith operation CODE.
         NOP_EXPR is used as a special case (see truthvalue_conversion).  */
      extern void binary_op_error (rich_location *, enum tree_code, tree, tree);
    @@ gcc/c/c-decl.cc: start_struct (location_t loc, enum tree_code code, tree name,
           sizeof anyhow.  */
     -  if (warn_cxx_compat && (in_sizeof || in_typeof || in_alignof))
     +  if (warn_cxx_compat
    -+      && (in_sizeof || in_typeof || in_alignof || in_nelementsof))
    ++      && (in_sizeof || in_typeof || in_alignof || in_lengthof))
          warning_at (loc, OPT_Wc___compat,
      		"defining type in %qs expression is invalid in C++",
      		(in_sizeof
    @@ gcc/c/c-decl.cc: start_struct (location_t loc, enum tree_code code, tree name,
     +		    ? "typeof"
     +		    : (in_alignof
     +		       ? "alignof"
    -+		       : "nelementsof"))));
    ++		       : "lengthof"))));
      
        if (in_underspecified_init)
          error_at (loc, "%qT defined in underspecified object initializer", ref);
    @@ gcc/c/c-decl.cc: finish_struct (location_t loc, tree t, tree fieldlist, tree att
            if (warn_cxx_compat
      	  && struct_parse_info != NULL
     -	  && !in_sizeof && !in_typeof && !in_alignof)
    -+	  && !in_sizeof && !in_typeof && !in_alignof && !in_nelementsof)
    ++	  && !in_sizeof && !in_typeof && !in_alignof && !in_lengthof)
      	struct_parse_info->struct_types.safe_push (t);
           }
      
    @@ gcc/c/c-decl.cc: start_enum (location_t loc, struct c_enum_contents *the_enum, t
           as C++ doesn't permit statement exprs within sizeof anyhow.  */
     -  if (warn_cxx_compat && (in_sizeof || in_typeof || in_alignof))
     +  if (warn_cxx_compat
    -+      && (in_sizeof || in_typeof || in_alignof || in_nelementsof))
    ++      && (in_sizeof || in_typeof || in_alignof || in_lengthof))
          warning_at (loc, OPT_Wc___compat,
      		"defining type in %qs expression is invalid in C++",
      		(in_sizeof
    @@ gcc/c/c-decl.cc: start_enum (location_t loc, struct c_enum_contents *the_enum, t
     +		    ? "typeof"
     +		    : (in_alignof
     +		       ? "alignof"
    -+		       : "nelementsof"))));
    ++		       : "lengthof"))));
      
        if (in_underspecified_init)
          error_at (loc, "%qT defined in underspecified object initializer",
    @@ gcc/c/c-decl.cc: finish_enum (tree enumtype, tree values, tree attributes)
        if (warn_cxx_compat
            && struct_parse_info != NULL
     -      && !in_sizeof && !in_typeof && !in_alignof)
    -+      && !in_sizeof && !in_typeof && !in_alignof && !in_nelementsof)
    ++      && !in_sizeof && !in_typeof && !in_alignof && !in_lengthof)
          struct_parse_info->struct_types.safe_push (enumtype);
      
        /* Check for consistency with previous definition */
    @@ gcc/c/c-parser.cc: along with GCC; see the file COPYING3.  If not see
     +
     +#define c_parser_sizeof_expression(parser)                                    \
     +(                                                                             \
    -+  c_parser_sizeof_or_nelementsof_expression (parser, RID_SIZEOF)              \
    ++  c_parser_sizeof_or_lengthof_expression (parser, RID_SIZEOF)                 \
     +)
      
    -+#define c_parser_nelementsof_expression(parser)                               \
    ++#define c_parser_lengthof_expression(parser)                                  \
     +(                                                                             \
    -+  c_parser_sizeof_or_nelementsof_expression (parser, RID_NELEMENTSOF)         \
    ++  c_parser_sizeof_or_lengthof_expression (parser, RID_LENGTHOF)               \
     +)
     +
      /* We need to walk over decls with incomplete struct/union/enum types
    @@ gcc/c/c-parser.cc: static struct c_expr c_parser_binary_expression (c_parser *,
      static struct c_expr c_parser_cast_expression (c_parser *, struct c_expr *);
      static struct c_expr c_parser_unary_expression (c_parser *);
     -static struct c_expr c_parser_sizeof_expression (c_parser *);
    -+static struct c_expr c_parser_sizeof_or_nelementsof_expression (c_parser *,
    ++static struct c_expr c_parser_sizeof_or_lengthof_expression (c_parser *,
     +								enum rid);
      static struct c_expr c_parser_alignof_expression (c_parser *);
      static struct c_expr c_parser_postfix_expression (c_parser *);
    @@ gcc/c/c-parser.cc: c_parser_unary_expression (c_parser *parser)
          case CPP_KEYWORD:
            switch (c_parser_peek_token (parser)->keyword)
      	{
    -+	case RID_NELEMENTSOF:
    -+	  return c_parser_nelementsof_expression (parser);
    ++	case RID_LENGTHOF:
    ++	  return c_parser_lengthof_expression (parser);
      	case RID_SIZEOF:
      	  return c_parser_sizeof_expression (parser);
      	case RID_ALIGNOF:
    @@ gcc/c/c-parser.cc: c_parser_unary_expression (c_parser *parser)
      
      static struct c_expr
     -c_parser_sizeof_expression (c_parser *parser)
    -+c_parser_sizeof_or_nelementsof_expression (c_parser *parser, enum rid rid)
    ++c_parser_sizeof_or_lengthof_expression (c_parser *parser, enum rid rid)
      {
    -+  const char *op_name = (rid == RID_NELEMENTSOF) ? "nelementsof" : "sizeof";
    ++  const char *op_name = (rid == RID_LENGTHOF) ? "lengthof" : "sizeof";
        struct c_expr expr;
        struct c_expr result;
        location_t expr_loc;
    @@ gcc/c/c-parser.cc: c_parser_sizeof_expression (c_parser *parser)
        c_parser_consume_token (parser);
        c_inhibit_evaluation_warnings++;
     -  in_sizeof++;
    -+  if (rid == RID_NELEMENTSOF)
    -+    in_nelementsof++;
    ++  if (rid == RID_LENGTHOF)
    ++    in_lengthof++;
     +  else
     +    in_sizeof++;
        if (c_parser_next_token_is (parser, CPP_OPEN_PAREN)
    @@ gcc/c/c-parser.cc: c_parser_sizeof_expression (c_parser *parser)
      	  struct c_expr ret;
      	  c_inhibit_evaluation_warnings--;
     -	  in_sizeof--;
    -+	  if (rid == RID_NELEMENTSOF)
    -+	    in_nelementsof--;
    ++	  if (rid == RID_LENGTHOF)
    ++	    in_lengthof--;
     +	  else
     +	    in_sizeof--;
      	  ret.set_error ();
    @@ gcc/c/c-parser.cc: c_parser_sizeof_expression (c_parser *parser)
            c_inhibit_evaluation_warnings--;
     -      in_sizeof--;
     -      result = c_expr_sizeof_type (expr_loc, type_name);
    -+      if (rid == RID_NELEMENTSOF)
    ++      if (rid == RID_LENGTHOF)
     +	{
    -+	  in_nelementsof--;
    -+	  result = c_expr_nelementsof_type (expr_loc, type_name);
    ++	  in_lengthof--;
    ++	  result = c_expr_lengthof_type (expr_loc, type_name);
     +	}
     +      else
     +	{
    @@ gcc/c/c-parser.cc: c_parser_sizeof_expression (c_parser *parser)
     +    Xof_expr:
            c_inhibit_evaluation_warnings--;
     -      in_sizeof--;
    -+      if (rid == RID_NELEMENTSOF)
    -+	in_nelementsof--;
    ++      if (rid == RID_LENGTHOF)
    ++	in_lengthof--;
     +      else
     +	in_sizeof--;
            mark_exp_read (expr.value);
    @@ gcc/c/c-parser.cc: c_parser_sizeof_expression (c_parser *parser)
     -	error_at (expr_loc, "%<sizeof%> applied to a bit-field");
     -      result = c_expr_sizeof_expr (expr_loc, expr);
     +	error_at (expr_loc, "%qs applied to a bit-field", op_name);
    -+      if (rid == RID_NELEMENTSOF)
    -+	result = c_expr_nelementsof_expr (expr_loc, expr);
    ++      if (rid == RID_LENGTHOF)
    ++	result = c_expr_lengthof_expr (expr_loc, expr);
     +      else
     +	result = c_expr_sizeof_expr (expr_loc, expr);
          }
    @@ gcc/c/c-tree.h: extern int c_type_dwarf_attribute (const_tree, int);
      /* in c-typeck.cc */
      extern int in_alignof;
      extern int in_sizeof;
    -+extern int in_nelementsof;
    ++extern int in_lengthof;
      extern int in_typeof;
      extern bool c_in_omp_for;
      extern bool c_omp_array_section_p;
    @@ gcc/c/c-tree.h: extern tree build_external_ref (location_t, tree, bool, tree *);
      extern void pop_maybe_used (bool);
      extern struct c_expr c_expr_sizeof_expr (location_t, struct c_expr);
      extern struct c_expr c_expr_sizeof_type (location_t, struct c_type_name *);
    -+extern struct c_expr c_expr_nelementsof_expr (location_t, struct c_expr);
    -+extern struct c_expr c_expr_nelementsof_type (location_t loc,
    ++extern struct c_expr c_expr_lengthof_expr (location_t, struct c_expr);
    ++extern struct c_expr c_expr_lengthof_type (location_t loc,
     +					      struct c_type_name *);
      extern struct c_expr parser_build_unary_op (location_t, enum tree_code,
          					    struct c_expr);
    @@ gcc/c/c-typeck.cc: int in_alignof;
      /* The level of nesting inside "sizeof".  */
      int in_sizeof;
      
    -+/* The level of nesting inside "nelementsof".  */
    -+int in_nelementsof;
    ++/* The level of nesting inside "lengthof".  */
    ++int in_lengthof;
     +
      /* The level of nesting inside "typeof".  */
      int in_typeof;
    @@ gcc/c/c-typeck.cc: build_external_ref (location_t loc, tree id, bool fun, tree *
        if (TREE_CODE (ref) == FUNCTION_DECL && !in_alignof)
          {
     -      if (!in_sizeof && !in_typeof)
    -+      if (!in_sizeof && !in_typeof && !in_nelementsof)
    ++      if (!in_sizeof && !in_typeof && !in_lengthof)
      	C_DECL_USED (ref) = 1;
            else if (DECL_INITIAL (ref) == NULL_TREE
      	       && DECL_EXTERNAL (ref)
    @@ gcc/c/c-typeck.cc: struct maybe_used_decl
        /* The decl.  */
        tree decl;
     -  /* The level seen at (in_sizeof + in_typeof).  */
    -+  /* The level seen at (in_sizeof + in_typeof + in_nelementsof).  */
    ++  /* The level seen at (in_sizeof + in_typeof + in_lengthof).  */
        int level;
        /* The next one at this level or above, or NULL.  */
        struct maybe_used_decl *next;
    @@ gcc/c/c-typeck.cc: record_maybe_used_decl (tree decl)
        struct maybe_used_decl *t = XOBNEW (&parser_obstack, struct maybe_used_decl);
        t->decl = decl;
     -  t->level = in_sizeof + in_typeof;
    -+  t->level = in_sizeof + in_typeof + in_nelementsof;
    ++  t->level = in_sizeof + in_typeof + in_lengthof;
        t->next = maybe_used_decls;
        maybe_used_decls = t;
      }
    @@ gcc/c/c-typeck.cc: void
      {
        struct maybe_used_decl *p = maybe_used_decls;
     -  int cur_level = in_sizeof + in_typeof;
    -+  int cur_level = in_sizeof + in_typeof + in_nelementsof;
    ++  int cur_level = in_sizeof + in_typeof + in_lengthof;
        while (p && p->level > cur_level)
          {
            if (used)
    @@ gcc/c/c-typeck.cc: c_expr_sizeof_type (location_t loc, struct c_type_name *t)
     +  return var;
     +}
     +
    -+/* Return the result of nelementsof applied to EXPR.  */
    ++/* Return the result of lengthof applied to EXPR.  */
     +
     +struct c_expr
    -+c_expr_nelementsof_expr (location_t loc, struct c_expr expr)
    ++c_expr_lengthof_expr (location_t loc, struct c_expr expr)
     +{
     +  struct c_expr ret;
     +  if (expr.value == error_mark_node)
    @@ gcc/c/c-typeck.cc: c_expr_sizeof_type (location_t loc, struct c_type_name *t)
     +
     +      tree folded_expr = c_fully_fold (expr.value, require_constant_value,
     +				       &expr_const_operands);
    -+      ret.value = c_nelementsof_type (loc, TREE_TYPE (folded_expr));
    ++      ret.value = c_lengthof_type (loc, TREE_TYPE (folded_expr));
     +      c_last_sizeof_arg = expr.value;
     +      c_last_sizeof_loc = loc;
    -+      ret.original_code = NELEMENTSOF_EXPR;
    ++      ret.original_code = LENGTHOF_EXPR;
     +      ret.original_type = NULL;
     +      ret.m_decimal = 0;
     +      if (is_top_array_vla (TREE_TYPE (folded_expr)))
     +	{
    -+	  /* nelementsof is evaluated when given a vla.  */
    ++	  /* lengthof is evaluated when given a vla.  */
     +	  ret.value = build2 (C_MAYBE_CONST_EXPR, TREE_TYPE (ret.value),
     +			      folded_expr, ret.value);
     +	  C_MAYBE_CONST_EXPR_NON_CONST (ret.value) = !expr_const_operands;
    @@ gcc/c/c-typeck.cc: c_expr_sizeof_type (location_t loc, struct c_type_name *t)
     +  return ret;
     +}
     +
    -+/* Return the result of nelementsof applied to T, a structure for the type
    -+   name passed to nelementsof (rather than the type itself).  LOC is the
    ++/* Return the result of lengthof applied to T, a structure for the type
    ++   name passed to lengthof (rather than the type itself).  LOC is the
     +   location of the original expression.  */
     +
     +struct c_expr
    -+c_expr_nelementsof_type (location_t loc, struct c_type_name *t)
    ++c_expr_lengthof_type (location_t loc, struct c_type_name *t)
     +{
     +  tree type;
     +  struct c_expr ret;
     +  tree type_expr = NULL_TREE;
     +  bool type_expr_const = true;
     +  type = groktypename (t, &type_expr, &type_expr_const);
    -+  ret.value = c_nelementsof_type (loc, type);
    ++  ret.value = c_lengthof_type (loc, type);
     +  c_last_sizeof_arg = type;
     +  c_last_sizeof_loc = loc;
    -+  ret.original_code = NELEMENTSOF_EXPR;
    ++  ret.original_code = LENGTHOF_EXPR;
     +  ret.original_type = NULL;
     +  ret.m_decimal = 0;
     +  if (type == error_mark_node)
    @@ gcc/c/c-typeck.cc: c_expr_sizeof_type (location_t loc, struct c_type_name *t)
     +    {
     +      /* If the type is a [*] array, it is a VLA but is represented as
     +	 having a size of zero.  In such a case we must ensure that
    -+	 the result of nelementsof does not get folded to a constant by
    ++	 the result of lengthof does not get folded to a constant by
     +	 c_fully_fold, because if the length is evaluated the result is
     +	 not constant and so constraints on zero or negative size
    -+	 arrays must not be applied when this nelementsof call is inside
    ++	 arrays must not be applied when this lengthof call is inside
     +	 another array declarator.  */
     +      if (!type_expr)
     +	type_expr = integer_zero_node;
    @@ gcc/cp/operators.def: DEF_OPERATOR ("co_await", CO_AWAIT_EXPR, "aw", OVL_OP_FLAG
      
      /* These are extensions.  */
      DEF_OPERATOR ("alignof", ALIGNOF_EXPR, "az", OVL_OP_FLAG_UNARY)
    -+DEF_OPERATOR ("__nelementsof__", NELEMENTSOF_EXPR, "lz", OVL_OP_FLAG_UNARY)
    ++DEF_OPERATOR ("__lengthof__", LENGTHOF_EXPR, "lz", OVL_OP_FLAG_UNARY)
      DEF_OPERATOR ("__imag__", IMAGPART_EXPR, "v18__imag__", OVL_OP_FLAG_UNARY)
      DEF_OPERATOR ("__real__", REALPART_EXPR, "v18__real__", OVL_OP_FLAG_UNARY)
      
    @@ gcc/doc/extend.texi: If the operand of the @code{__alignof__} expression is a fu
      the expression evaluates to the alignment of the function which may
      be specified by attribute @code{aligned} (@pxref{Common Function Attributes}).
      
    -+@node nelementsof
    ++@node lengthof
     +@section Determining the Number of Elements of Arrays
    -+@cindex nelementsof
    ++@cindex lengthof
     +@cindex number of elements
     +
    -+The keyword @code{__elemetsf__} determines the length of an array operand,
    ++The keyword @code{__lengthof__} determines the length of an array operand,
     +that is, the number of elements in the array.
     +Its syntax is similar to @code{sizeof}.
     +The operand must be
    @@ gcc/doc/extend.texi: If the operand of the @code{__alignof__} expression is a fu
     +
     +@smallexample
     +int a[n];
    -+__elemetsf__ (a);  // returns n
    -+__elemetsf__ (int [7][3]);  // returns 7
    ++__lengthof__ (a);  // returns n
    ++__lengthof__ (int [7][3]);  // returns 7
     +@end smallexample
     +
     +The result of this operator is an integer constant expression,
    @@ gcc/doc/extend.texi: If the operand of the @code{__alignof__} expression is a fu
     +For example:
     +
     +@smallexample
    -+__elemetsf__ (int [7][n++]);  // integer constant expression
    -+__elemetsf__ (int [n++][7]);  // run-time value; n++ is evaluated
    ++__lengthof__ (int [7][n++]);  // integer constant expression
    ++__lengthof__ (int [n++][7]);  // run-time value; n++ is evaluated
     +@end smallexample
     +
      @node Inline
    @@ gcc/target.h: enum type_context_kind {
        TCTX_ALIGNOF,
      
     +  /* Directly measuring the number of elements of array T.  */
    -+  TCTX_NELEMENTSOF,
    ++  TCTX_LENGTHOF,
     +
        /* Creating objects of type T with static storage duration.  */
        TCTX_STATIC_STORAGE,
    @@ gcc/testsuite/gcc.dg/nelementsof-compile.c (new)
     +static int w[] = {1, 2, 3};
     +
     +static int z[0];
    -+static int y[__nelementsof__(z)];
    ++static int y[__lengthof__(z)];
     +
     +void
     +automatic(void)
     +{
    -+  __nelementsof__ (w);
    ++  __lengthof__ (w);
     +}
     +
     +void
     +incomplete (int p[])
     +{
    -+  __nelementsof__ (x);  /* { dg-error "incomplete" } */
    ++  __lengthof__ (x);  /* { dg-error "incomplete" } */
     +
     +  /* We want to support the following one in the future,
     +     but for now it should fail.  */
    -+  __nelementsof__ (p);  /* { dg-error "invalid" } */
    ++  __lengthof__ (p);  /* { dg-error "invalid" } */
     +}
     +
     +void
    @@ gcc/testsuite/gcc.dg/nelementsof-compile.c (new)
     +    int fam[];
     +  } s;
     +
    -+  __nelementsof__ (s.fam); /* { dg-error "incomplete" } */
    ++  __lengthof__ (s.fam); /* { dg-error "incomplete" } */
     +}
     +
    -+void fix_fix (int i, char (*a)[3][5], int (*x)[__nelementsof__ (*a)]);
    -+void fix_var (int i, char (*a)[3][i], int (*x)[__nelementsof__ (*a)]);
    -+void fix_uns (int i, char (*a)[3][*], int (*x)[__nelementsof__ (*a)]);
    ++void fix_fix (int i, char (*a)[3][5], int (*x)[__lengthof__ (*a)]);
    ++void fix_var (int i, char (*a)[3][i], int (*x)[__lengthof__ (*a)]);
    ++void fix_uns (int i, char (*a)[3][*], int (*x)[__lengthof__ (*a)]);
     +
     +void
     +func (void)
    @@ gcc/testsuite/gcc.dg/nelementsof-compile.c (new)
     +    int x[3];
     +  } s;
     +
    -+  __nelementsof__ (x); /* { dg-error "invalid" } */
    -+  __nelementsof__ (int); /* { dg-error "invalid" } */
    -+  __nelementsof__ (s); /* { dg-error "invalid" } */
    -+  __nelementsof__ (struct s); /* { dg-error "invalid" } */
    -+  __nelementsof__ (&x); /* { dg-error "invalid" } */
    -+  __nelementsof__ (p); /* { dg-error "invalid" } */
    -+  __nelementsof__ (int *); /* { dg-error "invalid" } */
    -+  __nelementsof__ (&s.x); /* { dg-error "invalid" } */
    -+  __nelementsof__ (int (*)[3]); /* { dg-error "invalid" } */
    ++  __lengthof__ (x); /* { dg-error "invalid" } */
    ++  __lengthof__ (int); /* { dg-error "invalid" } */
    ++  __lengthof__ (s); /* { dg-error "invalid" } */
    ++  __lengthof__ (struct s); /* { dg-error "invalid" } */
    ++  __lengthof__ (&x); /* { dg-error "invalid" } */
    ++  __lengthof__ (p); /* { dg-error "invalid" } */
    ++  __lengthof__ (int *); /* { dg-error "invalid" } */
    ++  __lengthof__ (&s.x); /* { dg-error "invalid" } */
    ++  __lengthof__ (int (*)[3]); /* { dg-error "invalid" } */
     +}
     +
     +static int f1();
    @@ gcc/testsuite/gcc.dg/nelementsof-compile.c (new)
     +{
     +  int b[n][n];
     +
    -+  __nelementsof__ (a[f1()]);
    -+  __nelementsof__ (b[f2()]);
    ++  __lengthof__ (a[f1()]);
    ++  __lengthof__ (b[f2()]);
     +}
     +
     +void
     +no_parens(void)
     +{
    -+  __nelementsof__ a;
    -+  __nelementsof__ *a;
    -+  __nelementsof__ (int [3]) {};
    ++  __lengthof__ a;
    ++  __lengthof__ *a;
    ++  __lengthof__ (int [3]) {};
     +
    -+  __nelementsof__ int [3]; /* { dg-error "expected expression before" } */
    ++  __lengthof__ int [3]; /* { dg-error "expected expression before" } */
     +}
     +
     +void
    @@ gcc/testsuite/gcc.dg/nelementsof-compile.c (new)
     +{
     +  int n = 7;
     +
    -+  _Static_assert (__nelementsof__ (int [3][n]) == 3);
    -+  _Static_assert (__nelementsof__ (int [n][3]) == 7); /* { dg-error "not constant" } */
    -+  _Static_assert (__nelementsof__ (int [0][3]) == 0);
    -+  _Static_assert (__nelementsof__ (int [0]) == 0);
    ++  _Static_assert (__lengthof__ (int [3][n]) == 3);
    ++  _Static_assert (__lengthof__ (int [n][3]) == 7); /* { dg-error "not constant" } */
    ++  _Static_assert (__lengthof__ (int [0][3]) == 0);
    ++  _Static_assert (__lengthof__ (int [0]) == 0);
     +
    -+  /* FIXME: nelementsof(int [0][n]) should result in a constant expression.  */
    -+  _Static_assert (__nelementsof__ (int [0][n]) == 0); /* { dg-error "not constant" } */
    ++  /* FIXME: lengthof(int [0][n]) should result in a constant expression.  */
    ++  _Static_assert (__lengthof__ (int [0][n]) == 0); /* { dg-error "not constant" } */
     +}
     
      ## gcc/testsuite/gcc.dg/nelementsof-vla.c (new) ##
    @@ gcc/testsuite/gcc.dg/nelementsof-vla.c (new)
     +
     +void fix_fix (int i,
     +	      char (*a)[3][5],
    -+	      int (*x)[__nelementsof__ (*a)]);
    ++	      int (*x)[__lengthof__ (*a)]);
     +void fix_var (int i,
     +	      char (*a)[3][i], /* dg-warn "variable" */
    -+	      int (*x)[__nelementsof__ (*a)]);
    ++	      int (*x)[__lengthof__ (*a)]);
     +void fix_uns (int i,
     +	      char (*a)[3][*],
    -+	      int (*x)[__nelementsof__ (*a)]);
    ++	      int (*x)[__lengthof__ (*a)]);
     +
     +void zro_fix (int i,
     +	      char (*a)[0][5],
    -+	      int (*x)[__nelementsof__ (*a)]);
    ++	      int (*x)[__lengthof__ (*a)]);
     +void zro_var (int i,
     +	      char (*a)[0][i], /* dg-warn "variable" */
    -+	      int (*x)[__nelementsof__ (*a)]);
    ++	      int (*x)[__lengthof__ (*a)]);
     +void zro_uns (int i,
     +	      char (*a)[0][*],
    -+	      int (*x)[__nelementsof__ (*a)]);
    ++	      int (*x)[__lengthof__ (*a)]);
     +
     +void var_fix (int i,
     +	      char (*a)[i][5], /* dg-warn "variable" */
    -+	      int (*x)[__nelementsof__ (*a)]); /* dg-warn "variable" */
    ++	      int (*x)[__lengthof__ (*a)]); /* dg-warn "variable" */
     +void var_var (int i,
     +	      char (*a)[i][i], /* dg-warn "variable" */
    -+	      int (*x)[__nelementsof__ (*a)]); /* dg-warn "variable" */
    ++	      int (*x)[__lengthof__ (*a)]); /* dg-warn "variable" */
     +void var_uns (int i,
     +	      char (*a)[i][*], /* dg-warn "variable" */
    -+	      int (*x)[__nelementsof__ (*a)]); /* dg-warn "variable" */
    ++	      int (*x)[__lengthof__ (*a)]); /* dg-warn "variable" */
     +
     +void uns_fix (int i,
     +	      char (*a)[*][5],
    -+	      int (*x)[__nelementsof__ (*a)]);
    ++	      int (*x)[__lengthof__ (*a)]);
     +void uns_var (int i,
     +	      char (*a)[*][i], /* dg-warn "variable" */
    -+	      int (*x)[__nelementsof__ (*a)]);
    ++	      int (*x)[__lengthof__ (*a)]);
     +void uns_uns (int i,
     +	      char (*a)[*][*],
    -+	      int (*x)[__nelementsof__ (*a)]);
    ++	      int (*x)[__lengthof__ (*a)]);
     +
     +// Can't test due to bug: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116284>
     +//static int z2[0];
    -+//static int y2[__nelementsof__(z2)];
    ++//static int y2[__lengthof__(z2)];
     
      ## gcc/testsuite/gcc.dg/nelementsof.c (new) ##
     @@
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +{
     +  short a[7];
     +
    -+  static_assert (__nelementsof__ (a) == 7);
    -+  static_assert (__nelementsof__ (long [0]) == 0);
    -+  static_assert (__nelementsof__ (unsigned [99]) == 99);
    ++  static_assert (__lengthof__ (a) == 7);
    ++  static_assert (__lengthof__ (long [0]) == 0);
    ++  static_assert (__lengthof__ (unsigned [99]) == 99);
     +}
     +
     +void
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +  int a[] = {1, 2, 3};
     +  int z[] = {};
     +
    -+  static_assert (__nelementsof__ (a) == 3);
    -+  static_assert (__nelementsof__ (z) == 0);
    ++  static_assert (__lengthof__ (a) == 3);
    ++  static_assert (__lengthof__ (z) == 0);
     +}
     +
     +void
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +  unsigned n;
     +
     +  n = 99;
    -+  assert (__nelementsof__ (short [n - 10]) == 99 - 10);
    ++  assert (__lengthof__ (short [n - 10]) == 99 - 10);
     +
     +  int v[n / 2];
    -+  assert (__nelementsof__ (v) == 99 / 2);
    ++  assert (__lengthof__ (v) == 99 / 2);
     +
     +  n = 0;
     +  int z[n];
    -+  assert (__nelementsof__ (z) == 0);
    ++  assert (__lengthof__ (z) == 0);
     +}
     +
     +void
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +    int a[8];
     +  } s;
     +
    -+  static_assert (__nelementsof__ (s.a) == 8);
    ++  static_assert (__lengthof__ (s.a) == 8);
     +}
     +
     +void
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +  int i;
     +
     +  i = 7;
    -+  assert (__nelementsof__ (struct {int x;}[i++]) == 7);
    ++  assert (__lengthof__ (struct {int x;}[i++]) == 7);
     +  assert (i == 7 + 1);
     +
     +  int v[i];
     +  int (*p)[i];
     +  p = &v;
    -+  assert (__nelementsof__ (*p++) == i);
    ++  assert (__lengthof__ (*p++) == i);
     +  assert (p - 1 == &v);
     +}
     +
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +  int i;
     +
     +  i = 3;
    -+  static_assert (__nelementsof__ (struct {int x[i++];}[3]) == 3);
    ++  static_assert (__lengthof__ (struct {int x[i++];}[3]) == 3);
     +  assert (i == 3);
     +}
     +
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +array_noeval (void)
     +{
     +  long a[5];
    -+  long (*p)[__nelementsof__ (a)];
    ++  long (*p)[__lengthof__ (a)];
     +
     +  p = &a;
    -+  static_assert (__nelementsof__ (*p++) == 5);
    ++  static_assert (__lengthof__ (*p++) == 5);
     +  assert (p == &a);
     +}
     +
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +{
     +  int i;
     +
    -+  static_assert (__nelementsof__ (int [0][4]) == 0);
    ++  static_assert (__lengthof__ (int [0][4]) == 0);
     +  i = 3;
    -+  assert (__nelementsof__ (int [0][i]) == 0);
    ++  assert (__lengthof__ (int [0][i]) == 0);
     +}
     +
     +void
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +{
     +  int i;
     +
    -+  static_assert (__nelementsof__ (int [7][4]) == 7);
    ++  static_assert (__lengthof__ (int [7][4]) == 7);
     +  i = 3;
    -+  static_assert (__nelementsof__ (int [7][i]) == 7);
    ++  static_assert (__lengthof__ (int [7][i]) == 7);
     +}
     +
     +void
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +  int i, j;
     +
     +  i = 7;
    -+  assert (__nelementsof__ (int [i++][4]) == 7);
    ++  assert (__lengthof__ (int [i++][4]) == 7);
     +  assert (i == 7 + 1);
     +
     +  i = 9;
     +  j = 3;
    -+  assert (__nelementsof__ (int [i++][j]) == 9);
    ++  assert (__lengthof__ (int [i++][j]) == 9);
     +  assert (i == 9 + 1);
     +}
     +
    @@ gcc/testsuite/gcc.dg/nelementsof.c (new)
     +  int a[7];
     +  int v[n];
     +
    -+  static_assert (__nelementsof__ a == 7); 
    -+  assert (__nelementsof__ v == 3); 
    ++  static_assert (__lengthof__ a == 7); 
    ++  assert (__lengthof__ v == 3); 
     +}
     +
     +int

Comments

Joseph Myers Oct. 7, 2024, 5:35 p.m. UTC | #1
Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
wait for them to be reviewed so that we only have a single-patch series, 
before doing final review of the main patch.

Since the feature was accepted as _Lengthof, that's the form that should 
be added to GCC; no __lengthof__ variant needed.  In general in GCC, 
although not strictly required by the standard in this case, we use 
pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a 
new C2Y feature that's not in C23 (if -pedantic with a pre-C2Y standard, 
or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to 
verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 
-pedantic, no diagnostic with -std=c23 -pedantic-errors 
-Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning 
with -std=c2y -pedantic-errors -Wc23-c2y-compat).  (pedwarn_c23 handles 
that logic, you just need the pedwarn_c23 call and the tests for those 
various cases.)
Alejandro Colomar Oct. 8, 2024, 12:04 a.m. UTC | #2
Hi Joseph,

On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote:
> Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
> wait for them to be reviewed so that we only have a single-patch series, 
> before doing final review of the main patch.

I do not fully understand.  Who has to review patches 1,2,3?  Also, do
you want to merge them, then I resend patch 4 as a single patch, and
then you review that one?  If so, that looks like a good plan to me.

Thanks!

> Since the feature was accepted as _Lengthof, that's the form that should 
> be added to GCC; no __lengthof__ variant needed.

Okay, I'm indiferent to choosing between both of those names; since they
are equally harmful.  ;)

On the other hand, I'm tempted to propose a different name, and force
ISO to reconsider and follow.  The discussion on this list was more
thorough than the short discussion at WG14, which didn't really take
into consideration the dangers of harmful and error-prone nomenclature.

> In general in GCC, 
> although not strictly required by the standard in this case, we use 
> pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a 
> new C2Y feature that's not in C23

Thanks; will do.

Have a lovely night!
Alex

> (if -pedantic with a pre-C2Y standard, 
> or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to 
> verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 
> -pedantic, no diagnostic with -std=c23 -pedantic-errors 
> -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning 
> with -std=c2y -pedantic-errors -Wc23-c2y-compat).  (pedwarn_c23 handles 
> that logic, you just need the pedwarn_c23 call and the tests for those 
> various cases.)
> 
> -- 
> Joseph S. Myers
> josmyers@redhat.com
>
Alejandro Colomar Oct. 8, 2024, 12:09 a.m. UTC | #3
On Tue, Oct 08, 2024 at 02:04:39AM GMT, Alejandro Colomar wrote:
> Hi Joseph,
> 
> On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote:
> > Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
> > wait for them to be reviewed so that we only have a single-patch series, 
> > before doing final review of the main patch.
> 
> I do not fully understand.  Who has to review patches 1,2,3?  Also, do
> you want to merge them, then I resend patch 4 as a single patch, and
> then you review that one?  If so, that looks like a good plan to me.
> 
> Thanks!
> 
> > Since the feature was accepted as _Lengthof, that's the form that should 
> > be added to GCC; no __lengthof__ variant needed.
> 
> Okay, I'm indiferent to choosing between both of those names; since they
> are equally harmful.  ;)
> 
> On the other hand, I'm tempted to propose a different name, and force
> ISO to reconsider and follow.  The discussion on this list was more
> thorough than the short discussion at WG14, which didn't really take
> into consideration the dangers of harmful and error-prone nomenclature.
> 
> > In general in GCC, 
> > although not strictly required by the standard in this case, we use 
> > pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a 
> > new C2Y feature that's not in C23
> 
> Thanks; will do.

On the other hand, should we provide a version of the operator that is
free from pedantic warnings?  A GNU extension?

Cheers,
Alex

> 
> Have a lovely night!
> Alex
> 
> > (if -pedantic with a pre-C2Y standard, 
> > or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to 
> > verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 
> > -pedantic, no diagnostic with -std=c23 -pedantic-errors 
> > -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning 
> > with -std=c2y -pedantic-errors -Wc23-c2y-compat).  (pedwarn_c23 handles 
> > that logic, you just need the pedwarn_c23 call and the tests for those 
> > various cases.)
> > 
> > -- 
> > Joseph S. Myers
> > josmyers@redhat.com
> > 
> 
> -- 
> <https://www.alejandro-colomar.es/>
Jakub Jelinek Oct. 8, 2024, 6:45 a.m. UTC | #4
On Tue, Oct 08, 2024 at 02:09:52AM +0200, Alejandro Colomar wrote:
> On the other hand, should we provide a version of the operator that is
> free from pedantic warnings?  A GNU extension?

No, why?  One can always use (__extension__ _Lengthof (...)).

	Jakub
Alejandro Colomar Oct. 8, 2024, 8:11 a.m. UTC | #5
On Tue, Oct 08, 2024 at 08:45:48AM GMT, Jakub Jelinek wrote:
> On Tue, Oct 08, 2024 at 02:09:52AM +0200, Alejandro Colomar wrote:
> > On the other hand, should we provide a version of the operator that is
> > free from pedantic warnings?  A GNU extension?
> 
> No, why?  One can always use (__extension__ _Lengthof (...)).

Just wondering if we should do it for symmetry with __alignof__.  But
yeah, no need.

Cheers,
Alex

> 
> 	Jakub
>
Jakub Łukasiewicz Oct. 8, 2024, 8:25 a.m. UTC | #6
I think it would be beneficial to have different syntax/spelling for 
features still in development. That way we, as a committee, can tweak 
it as we please, while mitigating effect on early adopters.

If what ends in final document is exactly the same as in early phrases, 
then great, all users are left to do is simple find and replace.

Warning about using an experimental feature that is prone to changes 
is obviously useful, but it disappears after upgrade to latest 
standard. If there were diverges between early and final versions, 
it would be nice to have warnings about that too.

~ J.Ł.

On 2024-10-07 19:35 CEST, Joseph Myers <josmyers@redhat.com> wrote:
> Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
> wait for them to be reviewed so that we only have a single-patch series, 
> before doing final review of the main patch.
>
> Since the feature was accepted as _Lengthof, that's the form that should 
> be added to GCC; no __lengthof__ variant needed.  In general in GCC, 
> although not strictly required by the standard in this case, we use 
> pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a 
> new C2Y feature that's not in C23 (if -pedantic with a pre-C2Y standard, 
> or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to 
> verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 
> -pedantic, no diagnostic with -std=c23 -pedantic-errors 
> -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning 
> with -std=c2y -pedantic-errors -Wc23-c2y-compat).  (pedwarn_c23 handles 
> that logic, you just need the pedwarn_c23 call and the tests for those 
> various cases.)
Alejandro Colomar Oct. 8, 2024, 8:33 a.m. UTC | #7
Hi Jakub,

On Tue, Oct 08, 2024 at 10:25:24AM GMT, Jakub Łukasiewicz wrote:
> I think it would be beneficial to have different syntax/spelling for
> features still in development. That way we, as a committee, can tweak it as
> we please, while mitigating effect on early adopters.
> 
> If what ends in final document is exactly the same as in early phrases, then
> great, all users are left to do is simple find and replace.
> 
> Warning about using an experimental feature that is prone to changes is
> obviously useful, but it disappears after upgrade to latest standard. If
> there were diverges between early and final versions, it would be nice to
> have warnings about that too.

How about adding the __lower__ version now as a GNU extension with
compatible semantics, and when it's closer to an ISO C2y release add the
_Upper one?

That gives us more freedom.

Cheers,
Alex

> 
> ~ J.Ł.
> 
> On 2024-10-07 19:35 CEST, Joseph Myers <josmyers@redhat.com> wrote:
> > Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll
> > wait for them to be reviewed so that we only have a single-patch series,
> > before doing final review of the main patch.
> > 
> > Since the feature was accepted as _Lengthof, that's the form that should
> > be added to GCC; no __lengthof__ variant needed.  In general in GCC,
> > although not strictly required by the standard in this case, we use
> > pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a
> > new C2Y feature that's not in C23 (if -pedantic with a pre-C2Y standard,
> > or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to
> > verify this (error with -std=c23 -pedantic-errors, warning with -std=c23
> > -pedantic, no diagnostic with -std=c23 -pedantic-errors
> > -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors,
> > warning with -std=c2y -pedantic-errors -Wc23-c2y-compat).  (pedwarn_c23
> > handles that logic, you just need the pedwarn_c23 call and the tests for
> > those various cases.)
Jakub Łukasiewicz Oct. 8, 2024, 8:59 a.m. UTC | #8
Yup :)
For example that

~ J.Ł.

On 2024-10-08 10:33 CEST, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Jakub,
>
> On Tue, Oct 08, 2024 at 10:25:24AM GMT, Jakub Łukasiewicz wrote:
> > I think it would be beneficial to have different syntax/spelling for
> > features still in development. That way we, as a committee, can tweak it as
> > we please, while mitigating effect on early adopters.
> > 
> > If what ends in final document is exactly the same as in early phrases, then
> > great, all users are left to do is simple find and replace.
> > 
> > Warning about using an experimental feature that is prone to changes is
> > obviously useful, but it disappears after upgrade to latest standard. If
> > there were diverges between early and final versions, it would be nice to
> > have warnings about that too.
>
> How about adding the __lower__ version now as a GNU extension with
> compatible semantics, and when it's closer to an ISO C2y release add the
> _Upper one?
>
> That gives us more freedom.
>
> Cheers,
> Alex
>
> > 
> > ~ J.Ł.
> > 
> > On 2024-10-07 19:35 CEST, Joseph Myers <josmyers@redhat.com> wrote:
> > > Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll
> > > wait for them to be reviewed so that we only have a single-patch series,
> > > before doing final review of the main patch.
> > > 
> > > Since the feature was accepted as _Lengthof, that's the form that should
> > > be added to GCC; no __lengthof__ variant needed.  In general in GCC,
> > > although not strictly required by the standard in this case, we use
> > > pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a
> > > new C2Y feature that's not in C23 (if -pedantic with a pre-C2Y standard,
> > > or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to
> > > verify this (error with -std=c23 -pedantic-errors, warning with -std=c23
> > > -pedantic, no diagnostic with -std=c23 -pedantic-errors
> > > -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors,
> > > warning with -std=c2y -pedantic-errors -Wc23-c2y-compat).  (pedwarn_c23
> > > handles that logic, you just need the pedwarn_c23 call and the tests for
> > > those various cases.)
Joseph Myers Oct. 8, 2024, 1:19 p.m. UTC | #9
On Tue, 8 Oct 2024, Alejandro Colomar wrote:

> On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote:
> > Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
> > wait for them to be reviewed so that we only have a single-patch series, 
> > before doing final review of the main patch.
> 
> I do not fully understand.  Who has to review patches 1,2,3?  Also, do

Someone who is a maintainer or reviewer of relevant parts of the compiler.  
Maybe 90% of the people CC:ed are not GCC maintainers or reviewers and 
should not be included on these patches at all.

> you want to merge them, then I resend patch 4 as a single patch, and
> then you review that one?  If so, that looks like a good plan to me.

Yes, patch 4 as a single patch.  With _Lengthof.  No other names, no 
__countof__, no __lengthof__.  _Lengthof is a perfectly good name, no need 
to be gratuitously incompatible.
Joseph Myers Oct. 8, 2024, 1:23 p.m. UTC | #10
On Tue, 8 Oct 2024, Alejandro Colomar wrote:

> How about adding the __lower__ version now as a GNU extension with
> compatible semantics, and when it's closer to an ISO C2y release add the
> _Upper one?

No, we don't need two names.  Just _Lengthof suffices.  If semantics 
change in some way during C2y development, we can update the 
implementation to match.

We have a rule that the C++ library ABI is explicitly unstable for 
features in a new standard where support is still under development, and 
only commit to a stable C++ library ABI supporting existing binaries once 
the support for that version is mature enough.  I think it's reasonable 
also to say that features from C2y (including when supported in older 
standards modes) are unstable and liable to change when the semantics in 
C2y change.  (Liable to change doesn't mean changing on a release branch 
after the first release from that branch.  But it could in some cases mean 
changing incompatibly while mainline is in regression fixes mode, to avoid 
releasing something significantly incompatible with later changes.)
Alejandro Colomar Oct. 8, 2024, 1:28 p.m. UTC | #11
Hi Joseph,

On Tue, Oct 08, 2024 at 01:19:06PM GMT, Joseph Myers wrote:
> On Tue, 8 Oct 2024, Alejandro Colomar wrote:
> 
> > On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote:
> > > Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
> > > wait for them to be reviewed so that we only have a single-patch series, 
> > > before doing final review of the main patch.
> > 
> > I do not fully understand.  Who has to review patches 1,2,3?  Also, do
> 
> Someone who is a maintainer or reviewer of relevant parts of the compiler.  
> Maybe 90% of the people CC:ed are not GCC maintainers or reviewers and 
> should not be included on these patches at all.

Those are people with interest in one way or another in this feature
(most of them, patch 4).  While patches 1,2,3 are irrelevant to them, I
kept them on the entire thread for simplicity.

> 
> > you want to merge them, then I resend patch 4 as a single patch, and
> > then you review that one?  If so, that looks like a good plan to me.
> 
> Yes, patch 4 as a single patch.  With _Lengthof.  No other names, no 
> __countof__, no __lengthof__.  _Lengthof is a perfectly good name, no need 
> to be gratuitously incompatible.

It is not gratuitously, IMO.  You already know my concerns about it;
please sed(1) yourself the name of the operator from these patches, and
append your signature below mine, if you want to rename it.  I won't do
that, for I think it introduces a security problem that will slowly
develop.

If you wish to wait for Graz to make sure there's no incompatibility
with ISO, that's another possibility.


Have a lovely day!
Alex
Jakub Jelinek Oct. 8, 2024, 1:40 p.m. UTC | #12
On Tue, Oct 08, 2024 at 03:28:29PM +0200, Alejandro Colomar wrote:
> On Tue, Oct 08, 2024 at 01:19:06PM GMT, Joseph Myers wrote:
> > On Tue, 8 Oct 2024, Alejandro Colomar wrote:
> > 
> > > On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote:
> > > > Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
> > > > wait for them to be reviewed so that we only have a single-patch series, 
> > > > before doing final review of the main patch.
> > > 
> > > I do not fully understand.  Who has to review patches 1,2,3?  Also, do
> > 
> > Someone who is a maintainer or reviewer of relevant parts of the compiler.  
> > Maybe 90% of the people CC:ed are not GCC maintainers or reviewers and 
> > should not be included on these patches at all.
> 
> Those are people with interest in one way or another in this feature
> (most of them, patch 4).  While patches 1,2,3 are irrelevant to them, I
> kept them on the entire thread for simplicity.

I'd just suggest posting the patch 4 alone adjusted so that it doesn't need
2,3.  Those 2 can be handled completely independently from patch 4 and for 1
there is no dependency, it is just a random unrelated patch.

> > > you want to merge them, then I resend patch 4 as a single patch, and
> > > then you review that one?  If so, that looks like a good plan to me.
> > 
> > Yes, patch 4 as a single patch.  With _Lengthof.  No other names, no 
> > __countof__, no __lengthof__.  _Lengthof is a perfectly good name, no need 
> > to be gratuitously incompatible.
> 
> It is not gratuitously, IMO.  You already know my concerns about it;
> please sed(1) yourself the name of the operator from these patches, and
> append your signature below mine, if you want to rename it.  I won't do
> that, for I think it introduces a security problem that will slowly
> develop.
> 
> If you wish to wait for Graz to make sure there's no incompatibility
> with ISO, that's another possibility.

I don't see why we'd need to wait for next WG14 meeting here, as the paper
has been voted into the next standard draft, compilers can implement the
feature, and as with anything else if the standard is changed later, it will
be adjusted to match it (or withdrawn/renamed or whatever).  That is always
a risk of using features from unreleased standards (but even features
in released standards can be tweaked with DRs later on).
It doesn't make sense to have two different operators just because we fear
it will change.
We do have some cases of different operators for the same thing if it was
first implemented as an extension and only later standardized under
different name or with different rules.
That is not the case here.

	Jakub
Joseph Myers Oct. 8, 2024, 2:04 p.m. UTC | #13
On Tue, 8 Oct 2024, Alejandro Colomar wrote:

> If you wish to wait for Graz to make sure there's no incompatibility
> with ISO, that's another possibility.

The name could be changed in GCC after Graz (while in 
regression-fixes-only mode for GCC 15) if WG14 changes the name in Graz.  
It wouldn't be a good idea to add as a new feature while in 
regression-fixes-only mode; waiting to after Graz for that would mean only 
adding it in GCC 16.

"length" is well-understood as referring to the number of elements of an 
array even if not explicitly defined as such in the standard (cf. the noun 
"variable" being well-understood by users of C even though the standard 
always says "object").  And there was along-the-lines support in 
Strasbourg for N3187 which, among other things, would make the use of 
"length" for this consistent and explicit.  I have plenty of concerns with 
both the wording and incompatibility of various changes suggested there 
related to what's allowed as a sizeof operand and associated semantics but 
I don't think there were any concerns in that discussion about the use of 
the term "length".
Alejandro Colomar Oct. 8, 2024, 2:49 p.m. UTC | #14
On Tue, Oct 08, 2024 at 03:40:53PM GMT, Jakub Jelinek wrote:
> On Tue, Oct 08, 2024 at 03:28:29PM +0200, Alejandro Colomar wrote:
> > On Tue, Oct 08, 2024 at 01:19:06PM GMT, Joseph Myers wrote:
> > > On Tue, 8 Oct 2024, Alejandro Colomar wrote:
> > > 
> > > > On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote:
> > > > > Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
> > > > > wait for them to be reviewed so that we only have a single-patch series, 
> > > > > before doing final review of the main patch.
> > > > 
> > > > I do not fully understand.  Who has to review patches 1,2,3?  Also, do
> > > 
> > > Someone who is a maintainer or reviewer of relevant parts of the compiler.  
> > > Maybe 90% of the people CC:ed are not GCC maintainers or reviewers and 
> > > should not be included on these patches at all.
> > 
> > Those are people with interest in one way or another in this feature
> > (most of them, patch 4).  While patches 1,2,3 are irrelevant to them, I
> > kept them on the entire thread for simplicity.
> 
> I'd just suggest posting the patch 4 alone adjusted so that it doesn't need
> 2,3.  Those 2 can be handled completely independently from patch 4 and for 1
> there is no dependency, it is just a random unrelated patch.

2,3 are needed for not making 4 read horribly.
1 is needed for the commit message of 4.

I don't mind waiting for gcc-16, if that needs to happen.

> 
> > > > you want to merge them, then I resend patch 4 as a single patch, and
> > > > then you review that one?  If so, that looks like a good plan to me.
> > > 
> > > Yes, patch 4 as a single patch.  With _Lengthof.  No other names, no 
> > > __countof__, no __lengthof__.  _Lengthof is a perfectly good name, no need 
> > > to be gratuitously incompatible.
> > 
> > It is not gratuitously, IMO.  You already know my concerns about it;
> > please sed(1) yourself the name of the operator from these patches, and
> > append your signature below mine, if you want to rename it.  I won't do
> > that, for I think it introduces a security problem that will slowly
> > develop.
> > 
> > If you wish to wait for Graz to make sure there's no incompatibility
> > with ISO, that's another possibility.
> 
> I don't see why we'd need to wait for next WG14 meeting here, as the paper
> has been voted into the next standard draft, compilers can implement the
> feature,

Because I don't like the paper that has been voted into the standard.
I kind of presented that paper against my will.  I wish GCC merged the
feature with a different name, and forced the standard to reconsider
what they merged, which I consider to be a security problem.

Alternatively, I wish GCC decided to do nothing, wait for Graz, where
I'll try to convince WG14 to change what was voted.

But merging what was voted into the standard would be nefarious, IMO.


Have a lovely day!
Alex

> and as with anything else if the standard is changed later, it will
> be adjusted to match it (or withdrawn/renamed or whatever).  That is always
> a risk of using features from unreleased standards (but even features
> in released standards can be tweaked with DRs later on).
> It doesn't make sense to have two different operators just because we fear
> it will change.
> We do have some cases of different operators for the same thing if it was
> first implemented as an extension and only later standardized under
> different name or with different rules.
> That is not the case here.
> 
> 	Jakub
>
Alejandro Colomar Oct. 8, 2024, 2:59 p.m. UTC | #15
Hi Joseph,

On Tue, Oct 08, 2024 at 02:04:12PM GMT, Joseph Myers wrote:
> On Tue, 8 Oct 2024, Alejandro Colomar wrote:
> 
> > If you wish to wait for Graz to make sure there's no incompatibility
> > with ISO, that's another possibility.
> 
> The name could be changed in GCC after Graz (while in 
> regression-fixes-only mode for GCC 15) if WG14 changes the name in Graz.  
> It wouldn't be a good idea to add as a new feature while in 
> regression-fixes-only mode; waiting to after Graz for that would mean only 
> adding it in GCC 16.

I prefer waiting for Graz.  Merging _Lengthof already would provide less
incentive for WG14 to change their mind.  I don't mind having this in
GCC 16 instead of 15, if that's the only way.

Alternatively, you could consider countof for GCC 15, and optionally
rename it later if my arguments are dismissed.

Or you can merge yourself _Lengthof already, by sed(1)ing my patches, if
you wish.

> "length" is well-understood as referring to the number of elements of an 
> array even if not explicitly defined as such in the standard (cf. the noun 
> "variable" being well-understood by users of C even though the standard 
> always says "object").

> And there was along-the-lines support in 
> Strasbourg for N3187

I interpret something along the lines of n3187 to mean "we want to make
ISO C consistent with language" --which is an opinion I share--, rather
than specifically meaning that length is the only acceptable term.

> which, among other things, would make the use of 
> "length" for this consistent and explicit.

> I have plenty of concerns with 
> both the wording and incompatibility of various changes suggested there 
> related to what's allowed as a sizeof operand and associated semantics but 
> I don't think there were any concerns in that discussion about the use of 
> the term "length".

Now you've seen mine.  I will oppose to n3187 as much as I can, as long
as it offers length as the term.


Cheers,
Alex
Chris Bazley Oct. 8, 2024, 3:13 p.m. UTC | #16
> ​Because I don't like the paper that has been voted into the standard.
> I kind of presented that paper against my will.  I wish GCC merged the
> feature with a different name, and forced the standard to reconsider
> what they merged, which I consider to be a security problem.
>
> Alternatively, I wish GCC decided to do nothing, wait for Graz, where
> I'll try to convince WG14 to change what was voted.
>
> But merging what was voted into the standard would be nefarious, IMO.

I don't understand this security problem that you are referring to.

The vast majority of strings use 'char' as the element type.

Existing code might look something like this:

#define A "foo"
#define B "bar"
#define STRING_LEN(s) (sizeof(s) - 1)

char *c = malloc(STRING_LEN(A) + STRING_LEN(B) + 1);
if (c) {
  strcpy(c, A);
  strcat(c, B);
}

Supposing that _Length gets support in GCC, the equivalent source code would be almost
identical and the compiled code would be identical:

#define A "foo"
#define B "bar"
#define STRING_LEN(s) (_Lengthof(s) - 1)

char *c = malloc(STRING_LEN(A) + STRING_LEN(B) + 1);
if (c) {
  strcpy(c, A);
  strcat(c, B);
}

Are you concerned that people will start writing new code that does something like the following?

#define A "foo"
#define B "bar"

char *c = malloc(_Lengthof(A) + _Lengthof(B));
if (c) {
  strcpy(c, A);
  strcat(c, B);
}

If they do, the only consequence will be that the string buffer is longer than it needs to be; not shorter.

Best regards,
--
Christopher Bazley
Staff Software Engineer, GPU team, Central Engineering Group
ARM Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.
Web:   http://www.arm.com/
Joseph Myers Oct. 8, 2024, 3:13 p.m. UTC | #17
On Tue, 8 Oct 2024, Alejandro Colomar wrote:

> > I have plenty of concerns with 
> > both the wording and incompatibility of various changes suggested there 
> > related to what's allowed as a sizeof operand and associated semantics but 
> > I don't think there were any concerns in that discussion about the use of 
> > the term "length".
> 
> Now you've seen mine.  I will oppose to n3187 as much as I can, as long
> as it offers length as the term.

It's "size", not "length", that has been used more inconsistently.  Take 
for example this sentence from K&R2 (I don't have K&R1), section 4.9 
(Initialization): "When the size of the array is omitted, the compiler 
will compute the length by counting the initializers, of which there are 
12 in this case.".  Length of strings is simply array length of a 
sub-array that doesn't include the terminating null character (a case of 
the sub-object issue of exactly what object is relevant in what context, 
not anything to do with the term "length").
Chris Bazley Oct. 8, 2024, 3:21 p.m. UTC | #18
Joseph wrote:
> Length of strings is simply array length of a
> sub-array that doesn't include the terminating null character (a case of
> the sub-object issue of exactly what object is relevant in what context,
> not anything to do with the term "length").

Exactly! This is what eventually persuaded me to stop doubting that 'length'
is the most appropriate term (and I did sincerely doubt it, for a while).

We can't eliminate such off-by-one errors from programs by choice of
terminology, and certainly not by a retroactive change in terminology.
Jₑₙₛ Gustedt Oct. 8, 2024, 3:54 p.m. UTC | #19
Hello everybody,
can you please leave me out of this ever spinning discussion?
The discussion in Minneapolis has fulfilled my needs on this for the
next 10 years at least.

I consider forcing people to opt out of mailings as being quite rude.

Thanks
Jₑₙₛ
Alejandro Colomar Oct. 8, 2024, 7:14 p.m. UTC | #20
[CC -= Jens]

Hi Joseph,

On Tue, Oct 08, 2024 at 03:13:19PM GMT, Joseph Myers wrote:
> On Tue, 8 Oct 2024, Alejandro Colomar wrote:
> 
> > > I have plenty of concerns with 
> > > both the wording and incompatibility of various changes suggested there 
> > > related to what's allowed as a sizeof operand and associated semantics but 
> > > I don't think there were any concerns in that discussion about the use of 
> > > the term "length".
> > 
> > Now you've seen mine.  I will oppose to n3187 as much as I can, as long
> > as it offers length as the term.
> 
> It's "size", not "length", that has been used more inconsistently.

Completely agree.

In fact, I've always supported using NITEMS() for things like

	#define STRLCPY(d, s)  strlcpy(d, s, NITEMS(d))

<https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq/>

However, at least the term size doesn't result in off-by-one issues.
With char-sized elements, it's not a problem.  And with wider elements,
it's so obviously different from the size, that people know what they're
doing.

And, I don't see why the standard couldn't just standardize on "number
of elements".  I don't see a need for using "length" in ISO C.  I think
removing the phrase "number of elements" from ISO C would be negative
--yes, Jens suggested he intended to do that in an eventual revision of
his paper, which prompted me attempting to shoot it down with n3327--.

>  Take 
> for example this sentence from K&R2 (I don't have K&R1), section 4.9 
> (Initialization): "When the size of the array is omitted, the compiler 
> will compute the length by counting the initializers, of which there are 
> 12 in this case.".

Heh.  :)

>  Length of strings is simply array length of a 
> sub-array that doesn't include the terminating null character (a case of 
> the sub-object issue of exactly what object is relevant in what context, 
> not anything to do with the term "length").

I thought of that, and I temporarily tried to convince myself that the
term can be used to mean that.  However, I don't fully convince myself
that programmers won't fall into off-by-one bugs due to this.

If you believe I'm wrong, I think it's okay.  I wouldn't mind if you
want to sed(1) the name yourself from my patch.  But in good faith, I
cannot send that patch.  You'll have to add your signature, and add a
line saying you changed the name of the operator.  But please go ahead
if that's what you think is best.  After all, we've had our arguments,
and there's no clear consensus on either side, so a maintainer has to
emit the final decision.  Whatever you decide, I'll be okay with it.


Have a lovely night!
Alex
Alejandro Colomar Oct. 9, 2024, 12:11 p.m. UTC | #21
[CC -= Jens]

Hi Chris,

On Tue, Oct 08, 2024 at 03:13:11PM GMT, Chris Bazley wrote:
> > ​Because I don't like the paper that has been voted into the standard.
> > I kind of presented that paper against my will.  I wish GCC merged the
> > feature with a different name, and forced the standard to reconsider
> > what they merged, which I consider to be a security problem.
> >
> > Alternatively, I wish GCC decided to do nothing, wait for Graz, where
> > I'll try to convince WG14 to change what was voted.
> >
> > But merging what was voted into the standard would be nefarious, IMO.
> 
> I don't understand this security problem that you are referring to.
> 
> The vast majority of strings use 'char' as the element type.
> 
> Existing code might look something like this:
> 
> #define A "foo"
> #define B "bar"
> #define STRING_LEN(s) (sizeof(s) - 1)
> 
> char *c = malloc(STRING_LEN(A) + STRING_LEN(B) + 1);
> if (c) {
>   strcpy(c, A);
>   strcat(c, B);
> }
> 
> Supposing that _Length gets support in GCC, the equivalent source code would be almost
> identical and the compiled code would be identical:
> 
> #define A "foo"
> #define B "bar"
> #define STRING_LEN(s) (_Lengthof(s) - 1)
> 
> char *c = malloc(STRING_LEN(A) + STRING_LEN(B) + 1);
> if (c) {
>   strcpy(c, A);
>   strcat(c, B);
> }
> 
> Are you concerned that people will start writing new code that does something like the following?
> 
> #define A "foo"
> #define B "bar"
> 
> char *c = malloc(_Lengthof(A) + _Lengthof(B));
> if (c) {
>   strcpy(c, A);
>   strcat(c, B);
> }
> 
> If they do, the only consequence will be that the string buffer is longer than it needs to be; not shorter.

Yes, off-by-one bugs on the safe side are more frequent than on the
unsafe side in this case.  However, I expect unsafe off-by-ones too.
And even in the safe side, there's the chance of secondary problems like
the following:

Let's say the maximum supported size is limited by a system limit.
For example, sysconf(_SC_LOGIN_NAME_MAX) or LOGIN_NAME_MAX.  If you try
to allocate one extra byte, so sysconf(_SC_LOGIN_NAME_MAX)+1, you may
overflow something somewhere, or cause some other important issues in
your system if you manage to create a user with such a long username.
Or your program will just crash and cause a DoS.

Or another combination of events that may cause another class of bugs.
In all cases, there's an off-by-one somewhere, but will result in a
different bug type.


I'm not fabricating, BTW.  Here's a list of off-by-one bugs in login
code, precisely due to this size-length naming issue:
<https://github.com/shadow-maint/shadow/commit/6551709e96b2bc6f084fdf170ad5bcc11f0038ab>
<https://github.com/shadow-maint/shadow/commit/15882a5f904b3c277d73254a6953c1051db55df4>


Have a lovely day!
Alex

> 
> Best regards,
> --
> Christopher Bazley
> Staff Software Engineer, GPU team, Central Engineering Group
> ARM Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.
> Web:   http://www.arm.com/
Joseph Myers Oct. 9, 2024, 5:05 p.m. UTC | #22
On Wed, 9 Oct 2024, Alejandro Colomar wrote:

> I'm not fabricating, BTW.  Here's a list of off-by-one bugs in login
> code, precisely due to this size-length naming issue:
> <https://github.com/shadow-maint/shadow/commit/6551709e96b2bc6f084fdf170ad5bcc11f0038ab>
> <https://github.com/shadow-maint/shadow/commit/15882a5f904b3c277d73254a6953c1051db55df4>

Those don't look to me like they're much to do with size/length *naming* 
confusion.  It's a conceptual confusion about whether the value needed by 
a particular API includes a null terminator or not, not about what you 
call size and what you call length.  As such, a name without "length" 
wouldn't help, because if you say countof, there would still be the same 
confusion about whether the bytes you are counting are meant to include a 
null terminator or not.

You could maybe avoid some cases of such off-by-one errors by language 
features that tie an array length more closely to a pointer (such as 
.IDENTIFIER proposals where IDENTIFIER is required to be const size_t, in 
cases where a pointer-to-VLA is passed, if there were appropriate 
constraints to require a matching pair of const size_t object and pointer 
to [.IDENTIFIER] VLA to be passed from caller to callee - more general 
versions with such strict requirements about passing matching pairs would 
be less likely to ensure correct sizes everywhere, and this idea about 
ensuring matching pairs isn't in N3188, it's an idea combining things from 
multiple papers).  But I think naming is essentially orthogonal to any 
kind of language feature that might enable reliable bounded pointers.
Alejandro Colomar Oct. 9, 2024, 6:48 p.m. UTC | #23
Hi Joseph,

On Wed, Oct 09, 2024 at 05:05:16PM GMT, Joseph Myers wrote:
> On Wed, 9 Oct 2024, Alejandro Colomar wrote:
> 
> > I'm not fabricating, BTW.  Here's a list of off-by-one bugs in login
> > code, precisely due to this size-length naming issue:
> > <https://github.com/shadow-maint/shadow/commit/6551709e96b2bc6f084fdf170ad5bcc11f0038ab>
> > <https://github.com/shadow-maint/shadow/commit/15882a5f904b3c277d73254a6953c1051db55df4>
> 
> Those don't look to me like they're much to do with size/length *naming* 
> confusion.  It's a conceptual confusion about whether the value needed by 
> a particular API includes a null terminator or not, not about what you 
> call size and what you call length.

The documentation of an API starts by its prototype.

	void login_prompt(char *name, int len);
	void login_prompt(char *name, int size);

The former should _not_ include a NUL terminator in the argument.
The latter should.  If those names are meaningless, there are more
chances of being confused.

The bugs were introduced in
<https://github.com/shadow-maint/shadow/commit/3b7cc053872cf4ce77fd74dc037f43f34e951e83>
which changed old code that was misusing the term length for referring
to the size (or number of elements, pedantically), for code that used
the actual size.  The author of the change didn't give much meaning to
the difference between size and length, and thought they were
interchangeable, and so the bugs were introduced.

As long as one has a clear distinction, that wouldn't have happened.

>  As such, a name without "length" 
> wouldn't help, because if you say countof, there would still be the same 
> confusion about whether the bytes you are counting are meant to include a 
> null terminator or not.

There are 3 terms:

-  size:    Size in bytes of an object (possibly an array).
-  length:  Number of non-null characters in a string.
-  n:     Number of elements of an array.

When the array is of char, since sizeof(char)==1, size and n are
interchangeable, and both obviously include the NUL terminator.

If you prefer nelementsof() over countof(), I'm all-in for it.  Just ask
for it, and I'll send a patch using nelementsof().  countof() is a new
term, so it doesn't yet have a meaning (except as given by the
attribute), but it naturally fits more with number of elements.  But
from all of the terms that there are, length is the only one that
doesn't include the NUL, so count is fine.  As long as you don't use
length, it should include the NUL.

> 
> You could maybe avoid some cases of such off-by-one errors by language 
> features that tie an array length more closely to a pointer (such as 
> .IDENTIFIER proposals where IDENTIFIER is required to be const size_t, in 
> cases where a pointer-to-VLA is passed, if there were appropriate 
> constraints to require a matching pair of const size_t object and pointer 
> to [.IDENTIFIER] VLA to be passed from caller to callee - more general 
> versions with such strict requirements about passing matching pairs would 
> be less likely to ensure correct sizes everywhere, and this idea about 
> ensuring matching pairs isn't in N3188, it's an idea combining things from 
> multiple papers).

Yeah, Martin and I have the intention of moving in that direction.

countof() [or whatever the name is] will hopefully soon work on array
parameters.

>  But I think naming is essentially orthogonal to any 
> kind of language feature that might enable reliable bounded pointers.

I still think conceptual confusions like this one (two) start with API
design and documentation, which itself starts in API naming.


Have a lovely night!
Alex
Joseph Myers Oct. 9, 2024, 7:31 p.m. UTC | #24
On Wed, 9 Oct 2024, Alejandro Colomar wrote:

> The documentation of an API starts by its prototype.
> 
> 	void login_prompt(char *name, int len);
> 	void login_prompt(char *name, int size);
> 
> The former should _not_ include a NUL terminator in the argument.
> The latter should.  If those names are meaningless, there are more
> chances of being confused.

You need actual API *documentation*, not just expecting people to guess 
based on a name.

One of those commit messages refers to non-null-terminated utmp 
structures.  I'd say the actual error-prone antipattern seen here is the 
use of such arrays (fixed width, not necessarily null-terminated) to store 
things that might otherwise be thought of as strings, rather than anything 
to do with naming.
Jakub Łukasiewicz Oct. 9, 2024, 7:40 p.m. UTC | #25
On 2024-10-09 20:48 CEST, Alejandro Colomar <alx@kernel.org> wrote:
> countof() is a new term, so it doesn't yet have a meaning (except 
> as given by the attribute), but it naturally fits more with number 
> of elements.

How would you call, for example, a function that returns how many 
times a value is contained in a data structure (be it array, linked 
list, or any other)?

~ J.Ł.
Alejandro Colomar Oct. 9, 2024, 8:25 p.m. UTC | #26
Hi Joseph,

On Wed, Oct 09, 2024 at 07:31:50PM GMT, Joseph Myers wrote:
> On Wed, 9 Oct 2024, Alejandro Colomar wrote:
> 
> > The documentation of an API starts by its prototype.
> > 
> > 	void login_prompt(char *name, int len);
> > 	void login_prompt(char *name, int size);
> > 
> > The former should _not_ include a NUL terminator in the argument.
> > The latter should.  If those names are meaningless, there are more
> > chances of being confused.
> 
> You need actual API *documentation*, not just expecting people to guess 
> based on a name.

Every little bit adds up.  Documentation is simpler if there is naming
consistency.  We have SYNOPSISes in the man pages, and they're up front,
because they constitute an important part of the documentation.

If each manual page used a different naming convention, you'd have to
carefully read each page to understand an API.  And you'd have to be
extra careful about every little detail.

If instead there's a careful consistency across the entire Linux
man-pages project (for example), including naming consistency, you can
read a new page, and have a rough idea of how it works after just a few
looks at the page; hopefully even just by having a look at the SYNOPSIS
plus the first few lines of the DESCRIPTION.  Documentation should not
be surprising, but rather confirm what you already guessed by looking at
the API itself.

> One of those commit messages refers to non-null-terminated utmp 
> structures.  I'd say the actual error-prone antipattern seen here is the 
> use of such arrays (fixed width, not necessarily null-terminated) to store 
> things that might otherwise be thought of as strings, rather than anything 
> to do with naming.

Yeah, utmp(5) didn't help either.

It's hard to quantify how much each problem contributed to the actual
bugs, but I tend to think both factors contributed.


Have a lovely night!
Alex
Alejandro Colomar Oct. 9, 2024, 8:31 p.m. UTC | #27
Hi Jakub,

On Wed, Oct 09, 2024 at 09:40:11PM GMT, Jakub Łukasiewicz wrote:
> On 2024-10-09 20:48 CEST, Alejandro Colomar <alx@kernel.org> wrote:
> > countof() is a new term, so it doesn't yet have a meaning (except as
> > given by the attribute), but it naturally fits more with number of
> > elements.
> 
> How would you call, for example, a function that returns how many times a
> value is contained in a data structure (be it array, linked list, or any
> other)?

list_count() or similar would be a good name.

It's length that's dangerous to overload because (1) it's already used
by strings, and (2) strings have the danger of the NUL terminator which
is not counted by its length.

But for example, it's not dangerous to misuse size for the number of
elements of an array, because they're so obviously different that you'll
not introduce a bug easily.  I think it's okay to say wcslcpy() gets a
size as the third parameter, even if pedantically it's a number of
elements.  So, using "count" for both arrays, and user-defined types
such as linked lists, is just fine.  The only _dangerous_ term is
length.


Have a lovely night!
Alex

> 
> ~ J.Ł.
Joseph Myers Oct. 9, 2024, 9:11 p.m. UTC | #28
On Wed, 9 Oct 2024, Alejandro Colomar wrote:

> Every little bit adds up.  Documentation is simpler if there is naming
> consistency.  We have SYNOPSISes in the man pages, and they're up front,
> because they constitute an important part of the documentation.

We also have a convention for future standard C interfaces to put the 
length before the pointer so that a VLA parameter declaration can be used 
that makes very clear the intent for how many elements the array has, 
which seems much better for that purpose than relying on the name of a 
parameter.
Alejandro Colomar Oct. 9, 2024, 9:20 p.m. UTC | #29
On Wed, Oct 09, 2024 at 09:11:52PM GMT, Joseph Myers wrote:
> On Wed, 9 Oct 2024, Alejandro Colomar wrote:
> 
> > Every little bit adds up.  Documentation is simpler if there is naming
> > consistency.  We have SYNOPSISes in the man pages, and they're up front,
> > because they constitute an important part of the documentation.
> 
> We also have a convention for future standard C interfaces to put the 
> length before the pointer so that a VLA parameter declaration can be used 
> that makes very clear the intent for how many elements the array has, 
> which seems much better for that purpose than relying on the name of a 
> parameter.

I doubt that this will be doable for string functions.  Even newer
additions to <string.h> will most likely have the size as the last
element, if just for consistency with the existing APIs.  And this issue
is primarily a string issue, so it won't be solved.

[.identifier] is more likely to help with this.

Cheers,
Alex

> 
> -- 
> Joseph S. Myers
> josmyers@redhat.com
>
Xavier Del Campo Romero Oct. 10, 2024, 4:13 a.m. UTC | #30
Hello,

Could you please unCC me from this discussion? Despite I originally made this proposal, I no longer have an opinion on the subject and there is not much I can add to the discussion anyway.

Thank you all very much for your efforts into improving C.

Best regards,
--
Xavier Del Campo Romero



9 Oct 2024, 23:20 by alx@kernel.org:

> On Wed, Oct 09, 2024 at 09:11:52PM GMT, Joseph Myers wrote:
>
>> On Wed, 9 Oct 2024, Alejandro Colomar wrote:
>>
>> > Every little bit adds up.  Documentation is simpler if there is naming
>> > consistency.  We have SYNOPSISes in the man pages, and they're up front,
>> > because they constitute an important part of the documentation.
>>
>> We also have a convention for future standard C interfaces to put the 
>> length before the pointer so that a VLA parameter declaration can be used 
>> that makes very clear the intent for how many elements the array has, 
>> which seems much better for that purpose than relying on the name of a 
>> parameter.
>>
>
> I doubt that this will be doable for string functions.  Even newer
> additions to <string.h> will most likely have the size as the last
> element, if just for consistency with the existing APIs.  And this issue
> is primarily a string issue, so it won't be solved.
>
> [.identifier] is more likely to help with this.
>
> Cheers,
> Alex
>
>>
>> -- 
>> Joseph S. Myers
>> josmyers@redhat.com
>>
>
> -- 
> <https://www.alejandro-colomar.es/>
>
Alejandro Colomar Oct. 15, 2024, 10:03 a.m. UTC | #31
Hi Joseph,

On Wed, Oct 09, 2024 at 09:11:52PM GMT, Joseph Myers wrote:
> On Wed, 9 Oct 2024, Alejandro Colomar wrote:
> 
> > Every little bit adds up.  Documentation is simpler if there is naming
> > consistency.  We have SYNOPSISes in the man pages, and they're up front,
> > because they constitute an important part of the documentation.
> 
> We also have a convention for future standard C interfaces to put the 
> length before the pointer so that a VLA parameter declaration can be used 
> that makes very clear the intent for how many elements the array has, 
> which seems much better for that purpose than relying on the name of a 
> parameter.

Just as a confirmation of what I already said: none of the arguments
convince me.  They seem mitigations to the damage that overloading the
term length can do.

I stand on my proposal of either __nelementsof__(), __countof__() (with
no preference), any derivative of those, or almost anything that doesn't
derive from "length" (well, I also veto "dimension", "extent", and
"range", for different reasons, but anything else is fair game).

If you want _Lengthof, please sed(1) it yourself and sign the patch
below my signature.  I don't think you (or myself) can convince me of
changing my mind, so it's up to you to decide what you want to do.

I think it would be good to have this in GCC 15, so if you're convinced
of _Lengthof(), please go ahead already.  I don't think delaying this
further will change the mind of any of us.


Have a lovely day!
Alex