diff mbox series

libcpp, c++: Optimize initializers using #embed in C++

Message ID Zpd3BUloKu1g2zKK@tucnak
State New
Headers show
Series libcpp, c++: Optimize initializers using #embed in C++ | expand

Commit Message

Jakub Jelinek July 17, 2024, 7:47 a.m. UTC
Hi!

This patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655013.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657049.html
patches which just introduce non-optimized support for the C23 feature
and two extensions to it actually optimizes it and on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657053.html
patch which adds optimizations to C & middle-end adds similar
optimizations to the C++ FE.
The first hunk enables use of CPP_EMBED token even for C++, not just
C; the preprocessor guarantees there is always a CPP_NUMBER CPP_COMMA
before CPP_EMBED and CPP_COMMA CPP_NUMBER after it which simplifies
parsing (unless #embed is more than 2GB, in that case it could be
CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED
CPP_COMMA CPP_NUMBER etc. with each CPP_EMBED covering at most INT_MAX
bytes).
Similarly to the C patch, this patch parses it into RAW_DATA_CST tree
in the braced initializers (and from there peels into INTEGER_CSTs unless
it is an initializer of an std::byte array or integral array with CHAR_BIT
element precision), parses CPP_EMBED in cp_parser_expression into just
the last INTEGER_CST in it because I think users don't need millions of
-Wunused-value warnings because they did useless
  int a = (
  #embed "megabyte.dat"
  );
and so most of the inner INTEGER_CSTs would be there just for the warning,
and in the rest of contexts like template argument list, function argument
list, attribute argument list, ...) parse it into a sequence of INTEGER_CSTs
(I wrote a range/iterator classes to simplify that).

My dumb
cat embed-11.c
constexpr unsigned char a[] = {
#embed "cc1plus"
};
const unsigned char *b = a;
testcase where cc1plus is 492329008 bytes long when configured
--enable-checking=yes,rtl,extra against recent binutils with .base64 gas
support results in:
time ./xg++ -B ./ -S -O2 embed-11.c

real	0m4.350s
user	0m2.427s
sys	0m0.830s
time ./xg++ -B ./ -c -O2 embed-11.c

real	0m6.932s
user	0m6.034s
sys	0m0.888s
(compared to running out of memory or very long compilation).
On a shorter inclusion,
cat embed-12.c
constexpr unsigned char a[] = {
#embed "xg++"
};
const unsigned char *b = a;
where xg++ is 15225904 bytes long, this takes using GCC with the #embed
patchset except for this patch:
time ~/src/gcc/obj36/gcc/xg++ -B ~/src/gcc/obj36/gcc/ -S -O2 embed-12.c

real	0m33.190s
user	0m32.327s
sys	0m0.790s
and with this patch:
time ./xg++ -B ./ -S -O2 embed-12.c

real	0m0.118s
user	0m0.090s
sys	0m0.028s

The patch doesn't change anything on what the first patch in the series
introduces even for C++, namely that #embed is expanded (actually or as if)
into a sequence of literals like
127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253
and so each element has int type.
That is how I believe it is in C23, and the different versions of the
C++ P1967 paper specified there some casts, P1967R12 in particular
"Otherwise, the integral constant expression is the value of std::fgetc’s return is cast
to unsigned char."
but please see
https://github.com/llvm/llvm-project/pull/97274#issuecomment-2230929277
comment and whether we really want the preprocessor to preprocess it for
C++ as (or as-if)
static_cast<unsigned char>(127),static_cast<unsigned char>(69),static_cast<unsigned char>(76),static_cast<unsigned char>(70),static_cast<unsigned char>(2),...
i.e. 9 tokens per byte rather than 2, or
(unsigned char)127,(unsigned char)69,...
or
((unsigned char)127),((unsigned char)69),...
etc.
Without a literal suffix for unsigned char constant literals it is horrible,
plus the incompatibility between C and C++.  Sure, we could use the magic
form more often for C++ to save the size and do the 9 or how many tokens
form only for the boundary constants and use #embed "." __gnu__::__base64__("...")
for what is in between if there are at least 2 tokens inside of it.
E.g. (unsigned char)127 vs. static_cast<unsigned char>(127) behaves
differently if there is constexpr long long p[] = { ... };
...
#embed __FILE__
[p]

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk if
the rest of the series is approved?

2024-07-17  Jakub Jelinek  <jakub@redhat.com>

libcpp/
	* files.cc (finish_embed): Use CPP_EMBED even for C++.
gcc/cp/ChangeLog:
	* cp-tree.h (class raw_data_iterator): New type.
	(class raw_data_range): New type.
	* parser.cc (cp_parser_postfix_open_square_expression): Handle
	parsing of CPP_EMBED.
	(cp_parser_parenthesized_expression_list): Likewise.  Use
	cp_lexer_next_token_is.
	(cp_parser_expression): Handle parsing of CPP_EMBED.
	(cp_parser_template_argument_list): Likewise.
	(cp_parser_initializer_list): Likewise.
	(cp_parser_oacc_clause_tile): Likewise.
	(cp_parser_omp_tile_sizes): Likewise.
	* pt.cc (tsubst_expr): Handle RAW_DATA_CST.
	* constexpr.cc (reduced_constant_expression_p): Likewise.
	(raw_data_cst_elt): New function.
	(find_array_ctor_elt): Handle RAW_DATA_CST.
	(cxx_eval_array_reference): Likewise.
	* typeck2.cc (digest_init_r): Emit -Wnarrowing and/or -Wconversion
	diagnostics.
	(process_init_constructor_array): Handle RAW_DATA_CST.
	* decl.cc (maybe_deduce_size_from_array_init): Likewise.
	(is_direct_enum_init): Fail for RAW_DATA_CST.
	(cp_maybe_split_raw_data): New function.
	(reshape_init_array_1): Add VECTOR_P argument.  Handle RAW_DATA_CST.
	(reshape_init_array): Adjust reshape_init_array_1 caller.
	(reshape_init_vector): Likewise.
	(reshape_init_class): Handle RAW_DATA_CST.
	(reshape_init_r): Likewise.
gcc/testsuite/
	* c-c++-common/cpp/embed-22.c: New test.
	* c-c++-common/cpp/embed-23.c: New test.
	* g++.dg/cpp/embed-4.C: New test.
	* g++.dg/cpp/embed-5.C: New test.
	* g++.dg/cpp/embed-6.C: New test.
	* g++.dg/cpp/embed-7.C: New test.
	* g++.dg/cpp/embed-8.C: New test.
	* g++.dg/cpp/embed-9.C: New test.
	* g++.dg/cpp/embed-10.C: New test.
	* g++.dg/cpp/embed-11.C: New test.
	* g++.dg/cpp/embed-12.C: New test.


	Jakub

Comments

Jason Merrill July 24, 2024, 1:38 a.m. UTC | #1
On 7/17/24 3:47 AM, Jakub Jelinek wrote:
> Hi!
> 
> This patch on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655013.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657049.html
> patches which just introduce non-optimized support for the C23 feature
> and two extensions to it actually optimizes it and on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657053.html
> patch which adds optimizations to C & middle-end adds similar
> optimizations to the C++ FE.
> The first hunk enables use of CPP_EMBED token even for C++, not just
> C; the preprocessor guarantees there is always a CPP_NUMBER CPP_COMMA
> before CPP_EMBED and CPP_COMMA CPP_NUMBER after it which simplifies
> parsing (unless #embed is more than 2GB, in that case it could be
> CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED
> CPP_COMMA CPP_NUMBER etc. with each CPP_EMBED covering at most INT_MAX
> bytes).
> Similarly to the C patch, this patch parses it into RAW_DATA_CST tree
> in the braced initializers (and from there peels into INTEGER_CSTs unless
> it is an initializer of an std::byte array or integral array with CHAR_BIT
> element precision), parses CPP_EMBED in cp_parser_expression into just
> the last INTEGER_CST in it because I think users don't need millions of
> -Wunused-value warnings because they did useless
>    int a = (
>    #embed "megabyte.dat"
>    );
> and so most of the inner INTEGER_CSTs would be there just for the warning,
> and in the rest of contexts like template argument list, function argument
> list, attribute argument list, ...) parse it into a sequence of INTEGER_CSTs
> (I wrote a range/iterator classes to simplify that).
> 
> My dumb
> cat embed-11.c
> constexpr unsigned char a[] = {
> #embed "cc1plus"
> };
> const unsigned char *b = a;
> testcase where cc1plus is 492329008 bytes long when configured
> --enable-checking=yes,rtl,extra against recent binutils with .base64 gas
> support results in:
> time ./xg++ -B ./ -S -O2 embed-11.c
> 
> real	0m4.350s
> user	0m2.427s
> sys	0m0.830s
> time ./xg++ -B ./ -c -O2 embed-11.c
> 
> real	0m6.932s
> user	0m6.034s
> sys	0m0.888s
> (compared to running out of memory or very long compilation).
> On a shorter inclusion,
> cat embed-12.c
> constexpr unsigned char a[] = {
> #embed "xg++"
> };
> const unsigned char *b = a;
> where xg++ is 15225904 bytes long, this takes using GCC with the #embed
> patchset except for this patch:
> time ~/src/gcc/obj36/gcc/xg++ -B ~/src/gcc/obj36/gcc/ -S -O2 embed-12.c
> 
> real	0m33.190s
> user	0m32.327s
> sys	0m0.790s
> and with this patch:
> time ./xg++ -B ./ -S -O2 embed-12.c
> 
> real	0m0.118s
> user	0m0.090s
> sys	0m0.028s
> 
> The patch doesn't change anything on what the first patch in the series
> introduces even for C++, namely that #embed is expanded (actually or as if)
> into a sequence of literals like
> 127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253
> and so each element has int type.
> That is how I believe it is in C23, and the different versions of the
> C++ P1967 paper specified there some casts, P1967R12 in particular
> "Otherwise, the integral constant expression is the value of std::fgetc’s return is cast
> to unsigned char."
> but please see
> https://github.com/llvm/llvm-project/pull/97274#issuecomment-2230929277
> comment and whether we really want the preprocessor to preprocess it for
> C++ as (or as-if)
> static_cast<unsigned char>(127),static_cast<unsigned char>(69),static_cast<unsigned char>(76),static_cast<unsigned char>(70),static_cast<unsigned char>(2),...
> i.e. 9 tokens per byte rather than 2, or
> (unsigned char)127,(unsigned char)69,...
> or
> ((unsigned char)127),((unsigned char)69),...
> etc.

The discussion at that link suggests that the author is planning to 
propose removing the cast.

> @@ -6895,16 +6918,68 @@ reshape_init_array_1 (tree elt_type, tre
>       {
>         tree elt_init;
>         constructor_elt *old_cur = d->cur;
> +      const char *old_ptr = NULL;
> +
> +      if (TREE_CODE (d->cur->value) == RAW_DATA_CST)
> +	old_ptr = RAW_DATA_POINTER (d->cur->value);

Let's call this variable old_raw_data_ptr for clarity, here and in 
reshape_init_class.

>   
>         if (d->cur->index)
>   	CONSTRUCTOR_IS_DESIGNATED_INIT (new_init) = true;
>         check_array_designated_initializer (d->cur, index);
> -      elt_init = reshape_init_r (elt_type, d,
> -				 /*first_initializer_p=*/NULL_TREE,
> -				 complain);
> +      if (TREE_CODE (d->cur->value) == RAW_DATA_CST
> +	  && (TREE_CODE (elt_type) == INTEGER_TYPE
> +	      || (TREE_CODE (elt_type) == ENUMERAL_TYPE
> +		  && TYPE_CONTEXT (TYPE_MAIN_VARIANT (elt_type)) == std_node
> +		  && strcmp (TYPE_NAME_STRING (TYPE_MAIN_VARIANT (elt_type)),
> +			     "byte") == 0))

Maybe is_byte_access_type?  Or finally factor out a function to test 
specifically for std::byte, it's odd that we don't have one yet.

> @@ -7158,6 +7244,7 @@ reshape_init_class (tree type, reshape_i
>   	     is initialized by the designated-initializer-list { D }, where D
>   	     is the designated- initializer-clause naming a member of the
>   	     anonymous union member."  */
> +	  gcc_checking_assert (TREE_CODE (d->cur->value) != RAW_DATA_CST);

Is there a test of trying to use #embed as a designated initializer?  I 
don't see one.

> @@ -7358,9 +7461,16 @@ reshape_init_r (tree type, reshape_iter
>   	 valid aggregate initialization.  */
>         && !first_initializer_p
>         && (same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (init))
> -	  || can_convert_arg (type, TREE_TYPE (init), init, LOOKUP_NORMAL,
> -			      complain)))
> +	  || can_convert_arg (type, TREE_TYPE (init),
> +			      TREE_CODE (init) == RAW_DATA_CST
> +			      ? build_int_cst (integer_type_node,
> +					       *(const unsigned char *)
> +					       RAW_DATA_POINTER (init))
> +			      : init,
> +			      LOOKUP_NORMAL, complain)))
>       {
> +      if (tree raw_init = cp_maybe_split_raw_data (d))
> +	return raw_init;
>         d->cur++;
>         return init;

This split-or-++ pattern seems to repeat a lot in reshape_init_r, could 
we factor it out to avoid problems with people forgetting one or the 
other?  Maybe consume_init (d) or d->consume_init ()?

Jason
diff mbox series

Patch

--- libcpp/files.cc.jj	2024-07-12 14:13:34.854093279 +0200
+++ libcpp/files.cc	2024-07-12 14:17:58.677783797 +0200
@@ -1241,8 +1241,7 @@  finish_embed (cpp_reader *pfile, _cpp_fi
     limit = params->limit;
 
   size_t embed_tokens = 0;
-  if (!CPP_OPTION (pfile, cplusplus)
-      && CPP_OPTION (pfile, lang) != CLK_ASM
+  if (CPP_OPTION (pfile, lang) != CLK_ASM
       && limit >= 64)
     embed_tokens = ((limit - 2) / INT_MAX) + (((limit - 2) % INT_MAX) != 0);
 
--- gcc/cp/cp-tree.h.jj	2024-07-12 14:03:23.863727788 +0200
+++ gcc/cp/cp-tree.h	2024-07-16 09:53:24.260884437 +0200
@@ -1000,6 +1000,54 @@  public:
   lkp_iterator end() { return lkp_iterator (NULL_TREE); }
 };
 
+/* Iterator for a RAW_DATA_CST.  */
+
+class raw_data_iterator {
+  tree t;
+  unsigned int n;
+
+ public:
+  explicit raw_data_iterator (tree t, unsigned int n)
+    : t (t), n (n)
+  {
+  }
+
+  operator bool () const
+  {
+    return n < (unsigned) RAW_DATA_LENGTH (t);
+  }
+
+  raw_data_iterator &operator++ ()
+  {
+    ++n;
+    return *this;
+  }
+
+  tree operator* () const
+  {
+    return build_int_cst (TREE_TYPE (t),
+			  ((const unsigned char *) RAW_DATA_POINTER (t))[n]);
+  }
+
+  bool operator== (const raw_data_iterator &o) const
+  {
+    return t == o.t && n == o.n;
+  }
+};
+
+/* Treat a tree as a range of raw_data_iterator, e.g.
+   for (tree f : raw_data_range (d)) { ... }  */
+
+class raw_data_range
+{
+  tree t;
+public:
+  raw_data_range (tree t) : t (t) { }
+  raw_data_iterator begin () { return raw_data_iterator (t, 0); }
+  raw_data_iterator end ()
+  { return raw_data_iterator (t, RAW_DATA_LENGTH (t)); }
+};
+
 /* hash traits for declarations.  Hashes potential overload sets via
    DECL_NAME.  */
 
--- gcc/cp/parser.cc.jj	2024-07-12 14:03:23.898727351 +0200
+++ gcc/cp/parser.cc	2024-07-16 10:12:17.573343570 +0200
@@ -8366,6 +8366,19 @@  cp_parser_postfix_open_square_expression
 	{
 	  while (true)
 	    {
+	      /* Handle #embed in the expression-list.  */
+	      if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
+		{
+		  tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
+		  cp_lexer_consume_token (parser->lexer);
+		  vec_safe_reserve (expression_list,
+				    RAW_DATA_LENGTH (raw_data));
+		  for (tree argument : raw_data_range (raw_data))
+		    expression_list->quick_push (argument);
+		  cp_parser_require (parser, CPP_COMMA, RT_COMMA);
+		  continue;
+		}
+
 	      cp_expr expr
 		= cp_parser_parenthesized_expression_list_elt (parser,
 							       /*cast_p=*/
@@ -8833,12 +8846,27 @@  cp_parser_parenthesized_expression_list
 	/* At the beginning of attribute lists, check to see if the
 	   next token is an identifier.  */
 	if (is_attribute_list == id_attr
-	    && cp_lexer_peek_token (parser->lexer)->type == CPP_NAME)
+	    && cp_lexer_next_token_is (parser->lexer, CPP_NAME))
 	  expr = cp_lexer_consume_token (parser->lexer)->u.value;
 	else if (is_attribute_list == assume_attr)
 	  expr = cp_parser_conditional_expression (parser);
 	else if (is_attribute_list == uneval_string_attr)
 	  expr = cp_parser_unevaluated_string_literal (parser);
+	else if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
+	  {
+	    /* Handle #embed in the argument list.  */
+	    tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
+	    location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+	    cp_lexer_consume_token (parser->lexer);
+	    vec_safe_reserve (expression_list, RAW_DATA_LENGTH (raw_data));
+	    for (tree arg : raw_data_range (raw_data))
+	      if (wrap_locations_p)
+		expression_list->quick_push (maybe_wrap_with_location (arg,
+								       loc));
+	      else
+		expression_list->quick_push (arg);
+	    goto get_comma;
+	  }
 	else
 	  expr
 	    = cp_parser_parenthesized_expression_list_elt (parser, cast_p,
@@ -10921,8 +10949,24 @@  cp_parser_expression (cp_parser* parser,
       cp_expr assignment_expression;
 
       /* Parse the next assignment-expression.  */
-      assignment_expression
-	= cp_parser_assignment_expression (parser, pidk, cast_p, decltype_p);
+      if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
+	{
+	  /* Users aren't interested in milions of -Wunused-value
+	     warnings when using #embed inside of a comma expression,
+	     and one CPP_NUMBER plus CPP_COMMA before it and one
+	     CPP_COMMA plus CPP_NUMBER after it is guaranteed by
+	     the preprocessor.  Thus, parse the whole CPP_EMBED just
+	     as a single INTEGER_CST, the last byte in it.  */
+	  tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
+	  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+	  cp_lexer_consume_token (parser->lexer);
+	  assignment_expression
+	    = *raw_data_iterator (raw_data, RAW_DATA_LENGTH (raw_data) - 1);
+	  assignment_expression.set_location (loc);
+	}
+      else
+	assignment_expression
+	  = cp_parser_assignment_expression (parser, pidk, cast_p, decltype_p);
 
       /* We don't create a temporary for a call that is the immediate operand
 	 of decltype or on the RHS of a comma.  But when we see a comma, we
@@ -19575,6 +19619,17 @@  cp_parser_template_argument_list (cp_par
 	/* Consume the comma.  */
 	cp_lexer_consume_token (parser->lexer);
 
+      /* Handle #embed in the argument list.  */
+      if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
+	{
+	  tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
+	  cp_lexer_consume_token (parser->lexer);
+	  args.reserve (RAW_DATA_LENGTH (raw_data), false);
+	  for (tree argument : raw_data_range (raw_data))
+	    args.quick_push (argument);
+	  continue;
+	}
+
       /* Parse the template-argument.  */
       tree argument = cp_parser_template_argument (parser);
 
@@ -26598,10 +26653,17 @@  cp_parser_initializer_list (cp_parser* p
 	first_designator = designator;
 
       /* Parse the initializer.  */
-      initializer = cp_parser_initializer_clause (parser,
-						  (non_constant_p != nullptr
-						   ? &clause_non_constant_p
-						   : nullptr));
+      if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
+	{
+	  initializer = cp_lexer_peek_token (parser->lexer)->u.value;
+	  clause_non_constant_p = false;
+	  cp_lexer_consume_token (parser->lexer);
+	}
+      else
+	initializer = cp_parser_initializer_clause (parser,
+						    (non_constant_p != nullptr
+						     ? &clause_non_constant_p
+						     : nullptr));
       /* If any clause is non-constant, so is the entire initializer.  */
       if (non_constant_p && clause_non_constant_p)
 	*non_constant_p = true;
@@ -39017,6 +39079,15 @@  cp_parser_oacc_clause_tile (cp_parser *p
 	  cp_lexer_consume_token (parser->lexer);
 	  expr = integer_zero_node;
 	}
+      else if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
+	{
+	  /* Handle #embed in the size-expr-list.  */
+	  tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
+	  cp_lexer_consume_token (parser->lexer);
+	  for (tree argument : raw_data_range (raw_data))
+	    tile = tree_cons (NULL_TREE, argument, tile);
+	  continue;
+	}
       else
 	expr = cp_parser_constant_expression (parser);
 
@@ -47632,6 +47703,16 @@  cp_parser_omp_tile_sizes (cp_parser *par
       if (sizes && !cp_parser_require (parser, CPP_COMMA, RT_COMMA))
 	return error_mark_node;
 
+      if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
+	{
+	  /* Handle #embed in the size-expr-list.  */
+	  tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
+	  cp_lexer_consume_token (parser->lexer);
+	  for (tree argument : raw_data_range (raw_data))
+	    sizes = tree_cons (NULL_TREE, argument, sizes);
+	  continue;
+	}
+
       tree expr = cp_parser_constant_expression (parser);
       if (expr == error_mark_node)
 	{
--- gcc/cp/pt.cc.jj	2024-07-12 14:03:23.908727226 +0200
+++ gcc/cp/pt.cc	2024-07-15 18:36:16.075729634 +0200
@@ -21657,6 +21657,14 @@  tsubst_expr (tree t, tree args, tsubst_f
 	RETURN (r);
       }
 
+    case RAW_DATA_CST:
+      {
+	tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
+	r = copy_node (t);
+	TREE_TYPE (r) = type;
+	RETURN (r);
+      }
+
     case PTRMEM_CST:
       /* These can sometimes show up in a partial instantiation, but never
 	 involve template parms.  */
--- gcc/cp/constexpr.cc.jj	2024-07-12 14:03:23.834728149 +0200
+++ gcc/cp/constexpr.cc	2024-07-15 17:33:16.111144086 +0200
@@ -3448,7 +3448,13 @@  reduced_constant_expression_p (tree t)
 		    return false;
 		  if (TREE_CODE (e.index) == RANGE_EXPR)
 		    cursor = TREE_OPERAND (e.index, 1);
-		  cursor = int_const_binop (PLUS_EXPR, cursor, size_one_node);
+		  if (TREE_CODE (e.value) == RAW_DATA_CST)
+		    cursor
+		      = int_const_binop (PLUS_EXPR, cursor,
+					 size_int (RAW_DATA_LENGTH (e.value)));
+		  else
+		    cursor = int_const_binop (PLUS_EXPR, cursor,
+					      size_one_node);
 		}
 	      if (find_array_ctor_elt (t, max) == -1)
 		return false;
@@ -4057,6 +4063,22 @@  array_index_cmp (tree key, tree index)
     }
 }
 
+/* Extract a single INTEGER_CST from RAW_DATA_CST RAW_DATA at
+   relative index OFF.  */
+
+static tree
+raw_data_cst_elt (tree raw_data, unsigned int off)
+{
+  return build_int_cst (TREE_TYPE (raw_data),
+			TYPE_UNSIGNED (TREE_TYPE (raw_data))
+			? (HOST_WIDE_INT)
+			  (((const unsigned char *)
+			    RAW_DATA_POINTER (raw_data))[off])
+			: (HOST_WIDE_INT)
+			  (((const signed char *)
+			    RAW_DATA_POINTER (raw_data))[off]));
+}
+
 /* Returns the index of the constructor_elt of ARY which matches DINDEX, or -1
    if none.  If INSERT is true, insert a matching element rather than fail.  */
 
@@ -4081,10 +4103,11 @@  find_array_ctor_elt (tree ary, tree dind
       if (cindex == NULL_TREE)
 	{
 	  /* Verify that if the last index is missing, all indexes
-	     are missing.  */
+	     are missing and there is no RAW_DATA_CST.  */
 	  if (flag_checking)
 	    for (unsigned int j = 0; j < len - 1; ++j)
-	      gcc_assert ((*elts)[j].index == NULL_TREE);
+	      gcc_assert ((*elts)[j].index == NULL_TREE
+			  && TREE_CODE ((*elts)[j].value) != RAW_DATA_CST);
 	  if (i < end)
 	    return i;
 	  else
@@ -4107,6 +4130,11 @@  find_array_ctor_elt (tree ary, tree dind
 	{
 	  if (i < end)
 	    return i;
+	  tree value = (*elts)[end - 1].value;
+	  if (TREE_CODE (value) == RAW_DATA_CST
+	      && wi::to_widest (dindex) < (wi::to_widest (cindex)
+					   + RAW_DATA_LENGTH (value)))
+	    begin = end - 1;
 	  else
 	    begin = end;
 	}
@@ -4120,12 +4148,59 @@  find_array_ctor_elt (tree ary, tree dind
       tree idx = elt.index;
 
       int cmp = array_index_cmp (dindex, idx);
+      if (cmp > 0
+	  && TREE_CODE (elt.value) == RAW_DATA_CST
+	  && wi::to_widest (dindex) < (wi::to_widest (idx)
+				       + RAW_DATA_LENGTH (elt.value)))
+	cmp = 0;
       if (cmp < 0)
 	end = middle;
       else if (cmp > 0)
 	begin = middle + 1;
       else
 	{
+	  if (insert && TREE_CODE (elt.value) == RAW_DATA_CST)
+	    {
+	      /* We need to split the RAW_DATA_CST elt.  */
+	      constructor_elt e;
+	      gcc_checking_assert (TREE_CODE (idx) != RANGE_EXPR);
+	      unsigned int off = (wi::to_widest (dindex)
+				  - wi::to_widest (idx)).to_uhwi ();
+	      tree value = elt.value;
+	      unsigned int len = RAW_DATA_LENGTH (value);
+	      if (off > 1 && len >= off + 3)
+		value = copy_node (elt.value);
+	      if (off)
+		{
+		  if (off > 1)
+		    RAW_DATA_LENGTH (elt.value) = off;
+		  else
+		    elt.value = raw_data_cst_elt (elt.value, 0);
+		  e.index = size_binop (PLUS_EXPR, elt.index,
+					build_int_cst (TREE_TYPE (elt.index),
+						       off));
+		  e.value = NULL_TREE;
+		  ++middle;
+		  vec_safe_insert (CONSTRUCTOR_ELTS (ary), middle, e);
+		}
+	      (*elts)[middle].value = raw_data_cst_elt (value, off);
+	      if (len >= off + 2)
+		{
+		  e.index = (*elts)[middle].index;
+		  e.index = size_binop (PLUS_EXPR, e.index,
+					build_one_cst (TREE_TYPE (e.index)));
+		  if (len >= off + 3)
+		    {
+		      RAW_DATA_LENGTH (value) -= off + 1;
+		      RAW_DATA_POINTER (value) += off + 1;
+		      e.value = value;
+		    }
+		  else
+		    e.value = raw_data_cst_elt (value, off + 1);
+		  vec_safe_insert (CONSTRUCTOR_ELTS (ary), middle + 1, e);
+		}
+	      return middle;
+	    }
 	  if (insert && TREE_CODE (idx) == RANGE_EXPR)
 	    {
 	      /* We need to split the range.  */
@@ -4481,7 +4556,17 @@  cxx_eval_array_reference (const constexp
     {
       tree r;
       if (TREE_CODE (ary) == CONSTRUCTOR)
-	r = (*CONSTRUCTOR_ELTS (ary))[i].value;
+	{
+	  r = (*CONSTRUCTOR_ELTS (ary))[i].value;
+	  if (TREE_CODE (r) == RAW_DATA_CST)
+	    {
+	      tree ridx = (*CONSTRUCTOR_ELTS (ary))[i].index;
+	      gcc_checking_assert (ridx);
+	      unsigned int off
+		= (wi::to_widest (index) - wi::to_widest (ridx)).to_uhwi ();
+	      r = raw_data_cst_elt (r, off);
+	    }
+	}
       else if (TREE_CODE (ary) == VECTOR_CST)
 	r = VECTOR_CST_ELT (ary, i);
       else
--- gcc/cp/typeck2.cc.jj	2024-07-02 22:06:53.591463682 +0200
+++ gcc/cp/typeck2.cc	2024-07-15 15:37:41.565862537 +0200
@@ -1310,6 +1310,40 @@  digest_init_r (tree type, tree init, int
 	 a parenthesized list.  */
       if (nested && !(flags & LOOKUP_AGGREGATE_PAREN_INIT))
 	flags |= LOOKUP_NO_NARROWING;
+      if (TREE_CODE (init) == RAW_DATA_CST && !TYPE_UNSIGNED (type))
+	{
+	  tree ret = init;
+	  if ((flags & LOOKUP_NO_NARROWING) || warn_conversion)
+	    for (unsigned int i = 0;
+		 i < (unsigned) RAW_DATA_LENGTH (init); ++i)
+	      if (((const signed char *)
+		   RAW_DATA_POINTER (init))[i] < 0)
+		{
+		  if ((flags & LOOKUP_NO_NARROWING))
+		    {
+		      tree elt
+			= build_int_cst (integer_type_node,
+					 ((const unsigned char *)
+					  RAW_DATA_POINTER (init))[i]);
+		      if (!check_narrowing (type, elt, complain, false))
+			{
+			  if (!(complain & tf_warning_or_error))
+			    ret = error_mark_node;
+			  continue;
+			}
+		    }
+		  if (warn_conversion)
+		    warning (OPT_Wconversion,
+			     "conversion from %qT to %qT changes value from "
+			     "%qd to %qd",
+			     integer_type_node, type,
+			     ((const unsigned char *)
+			      RAW_DATA_POINTER (init))[i],
+			     ((const signed char *)
+			      RAW_DATA_POINTER (init))[i]);
+		}
+	  return ret;
+	}
       init = convert_for_initialization (0, type, init, flags,
 					 ICR_INIT, NULL_TREE, 0,
 					 complain);
@@ -1558,7 +1592,7 @@  static int
 process_init_constructor_array (tree type, tree init, int nested, int flags,
 				tsubst_flags_t complain)
 {
-  unsigned HOST_WIDE_INT i, len = 0;
+  unsigned HOST_WIDE_INT i, j, len = 0;
   int picflags = 0;
   bool unbounded = false;
   constructor_elt *ce;
@@ -1601,11 +1635,12 @@  process_init_constructor_array (tree typ
 	return PICFLAG_ERRONEOUS;
     }
 
+  j = 0;
   FOR_EACH_VEC_SAFE_ELT (v, i, ce)
     {
       if (!ce->index)
-	ce->index = size_int (i);
-      else if (!check_array_designated_initializer (ce, i))
+	ce->index = size_int (j);
+      else if (!check_array_designated_initializer (ce, j))
 	ce->index = error_mark_node;
       gcc_assert (ce->value);
       ce->value
@@ -1627,6 +1662,10 @@  process_init_constructor_array (tree typ
 	  CONSTRUCTOR_PLACEHOLDER_BOUNDARY (init) = 1;
 	  CONSTRUCTOR_PLACEHOLDER_BOUNDARY (ce->value) = 0;
 	}
+      if (TREE_CODE (ce->value) == RAW_DATA_CST)
+	j += RAW_DATA_LENGTH (ce->value);
+      else
+	++j;
     }
 
   /* No more initializers. If the array is unbounded, we are done. Otherwise,
--- gcc/cp/decl.cc.jj	2024-07-12 14:03:23.870727700 +0200
+++ gcc/cp/decl.cc	2024-07-16 22:44:25.156545691 +0200
@@ -6471,18 +6471,22 @@  maybe_deduce_size_from_array_init (tree
 	{
 	  vec<constructor_elt, va_gc> *v = CONSTRUCTOR_ELTS (initializer);
 	  constructor_elt *ce;
-	  HOST_WIDE_INT i;
+	  HOST_WIDE_INT i, j = 0;
 	  FOR_EACH_VEC_SAFE_ELT (v, i, ce)
 	    {
 	      if (instantiation_dependent_expression_p (ce->index))
 		return;
-	      if (!check_array_designated_initializer (ce, i))
+	      if (!check_array_designated_initializer (ce, j))
 		failure = 1;
 	      /* If an un-designated initializer is type-dependent, we can't
 		 check brace elision yet.  */
 	      if (ce->index == NULL_TREE
 		  && type_dependent_expression_p (ce->value))
 		return;
+	      if (TREE_CODE (ce->value) == RAW_DATA_CST)
+		j += RAW_DATA_LENGTH (ce->value);
+	      else
+		++j;
 	    }
 	}
 
@@ -6836,6 +6840,7 @@  is_direct_enum_init (tree type, tree ini
       && TREE_CODE (init) == CONSTRUCTOR
       && CONSTRUCTOR_IS_DIRECT_INIT (init)
       && CONSTRUCTOR_NELTS (init) == 1
+      && TREE_CODE (CONSTRUCTOR_ELT (init, 0)->value) != RAW_DATA_CST
       /* DR 2374: The single element needs to be implicitly
 	 convertible to the underlying type of the enum.  */
       && !type_dependent_expression_p (CONSTRUCTOR_ELT (init, 0)->value)
@@ -6847,6 +6852,22 @@  is_direct_enum_init (tree type, tree ini
   return false;
 }
 
+/* Helper function for reshape_init*.  Split first element of
+   RAW_DATA_CST and save the rest to d->cur->value.  */
+
+static tree
+cp_maybe_split_raw_data (reshape_iter *d)
+{
+  if (TREE_CODE (d->cur->value) != RAW_DATA_CST)
+    return NULL_TREE;
+  tree ret = *raw_data_iterator (d->cur->value, 0);
+  ++RAW_DATA_POINTER (d->cur->value);
+  --RAW_DATA_LENGTH (d->cur->value);
+  if (RAW_DATA_LENGTH (d->cur->value) == 1)
+    d->cur->value = *raw_data_iterator (d->cur->value, 0);
+  return ret;
+}
+
 /* Subroutine of reshape_init_array and reshape_init_vector, which does
    the actual work. ELT_TYPE is the element type of the array. MAX_INDEX is an
    INTEGER_CST representing the size of the array minus one (the maximum index),
@@ -6855,7 +6876,8 @@  is_direct_enum_init (tree type, tree ini
 
 static tree
 reshape_init_array_1 (tree elt_type, tree max_index, reshape_iter *d,
-		      tree first_initializer_p, tsubst_flags_t complain)
+		      tree first_initializer_p, bool vector_p,
+		      tsubst_flags_t complain)
 {
   tree new_init;
   bool sized_array_p = (max_index && TREE_CONSTANT (max_index));
@@ -6888,6 +6910,7 @@  reshape_init_array_1 (tree elt_type, tre
 	max_index_cst = tree_to_uhwi (fold_convert (size_type_node, max_index));
     }
 
+  constructor_elt *first_cur = d->cur;
   /* Loop until there are no more initializers.  */
   for (index = 0;
        d->cur != d->end && (!sized_array_p || index <= max_index_cst);
@@ -6895,16 +6918,68 @@  reshape_init_array_1 (tree elt_type, tre
     {
       tree elt_init;
       constructor_elt *old_cur = d->cur;
+      const char *old_ptr = NULL;
+
+      if (TREE_CODE (d->cur->value) == RAW_DATA_CST)
+	old_ptr = RAW_DATA_POINTER (d->cur->value);
 
       if (d->cur->index)
 	CONSTRUCTOR_IS_DESIGNATED_INIT (new_init) = true;
       check_array_designated_initializer (d->cur, index);
-      elt_init = reshape_init_r (elt_type, d,
-				 /*first_initializer_p=*/NULL_TREE,
-				 complain);
+      if (TREE_CODE (d->cur->value) == RAW_DATA_CST
+	  && (TREE_CODE (elt_type) == INTEGER_TYPE
+	      || (TREE_CODE (elt_type) == ENUMERAL_TYPE
+		  && TYPE_CONTEXT (TYPE_MAIN_VARIANT (elt_type)) == std_node
+		  && strcmp (TYPE_NAME_STRING (TYPE_MAIN_VARIANT (elt_type)),
+			     "byte") == 0))
+	  && TYPE_PRECISION (elt_type) == CHAR_BIT
+	  && (!sized_array_p || index < max_index_cst)
+	  && !vector_p)
+	{
+	  elt_init = d->cur->value;
+	  if (!sized_array_p
+	      || ((unsigned) RAW_DATA_LENGTH (d->cur->value)
+		  <= max_index_cst - index + 1))
+	    d->cur++;
+	  else
+	    {
+	      unsigned int len = max_index_cst - index + 1;
+	      if ((unsigned) RAW_DATA_LENGTH (d->cur->value) == len + 1)
+		d->cur->value
+		  = build_int_cst (integer_type_node,
+				   *(const unsigned char *)
+				   RAW_DATA_POINTER (d->cur->value) + len);
+	      else
+		{
+		  d->cur->value = copy_node (elt_init);
+		  RAW_DATA_LENGTH (d->cur->value) -= len;
+		  RAW_DATA_POINTER (d->cur->value) += len;
+		}
+	      RAW_DATA_LENGTH (elt_init) = len;
+	    }
+	  TREE_TYPE (elt_init) = elt_type;
+	}
+      else
+	elt_init = reshape_init_r (elt_type, d,
+				   /*first_initializer_p=*/NULL_TREE,
+				   complain);
       if (elt_init == error_mark_node)
 	return error_mark_node;
       tree idx = size_int (index);
+      if (reuse && old_ptr && d->cur == old_cur)
+	{
+	  /* We need to stop reusing as some RAW_DATA_CST in the original
+	     ctor had to be split.  */
+	  new_init = build_constructor (init_list_type_node, NULL);
+	  if (index)
+	    {
+	      vec_safe_grow (CONSTRUCTOR_ELTS (new_init), index);
+	      memcpy (CONSTRUCTOR_ELT (new_init, 0), first_cur,
+		      (d->cur - first_cur)
+		      * sizeof (*CONSTRUCTOR_ELT (new_init, 0)));
+	    }
+	  reuse = false;
+	}
       if (reuse)
 	{
 	  old_cur->index = idx;
@@ -6917,8 +6992,15 @@  reshape_init_array_1 (tree elt_type, tre
 	TREE_CONSTANT (new_init) = false;
 
       /* This can happen with an invalid initializer (c++/54501).  */
-      if (d->cur == old_cur && !sized_array_p)
+      if (d->cur == old_cur
+	  && !sized_array_p
+	  && (old_ptr == NULL
+	      || (TREE_CODE (d->cur->value) == RAW_DATA_CST
+		  && RAW_DATA_POINTER (d->cur->value) == old_ptr)))
 	break;
+
+      if (TREE_CODE (elt_init) == RAW_DATA_CST)
+	index += RAW_DATA_LENGTH (elt_init) - 1;
     }
 
   return new_init;
@@ -6939,7 +7021,7 @@  reshape_init_array (tree type, reshape_i
     max_index = array_type_nelts (type);
 
   return reshape_init_array_1 (TREE_TYPE (type), max_index, d,
-			       first_initializer_p, complain);
+			       first_initializer_p, false, complain);
 }
 
 /* Subroutine of reshape_init_r, processes the initializers for vectors.
@@ -6971,7 +7053,7 @@  reshape_init_vector (tree type, reshape_
     max_index = size_int (TYPE_VECTOR_SUBPARTS (type) - 1);
 
   return reshape_init_array_1 (TREE_TYPE (type), max_index, d,
-			       NULL_TREE, complain);
+			       NULL_TREE, true, complain);
 }
 
 /* Subroutine of reshape_init*: We're initializing an element with TYPE from
@@ -7044,8 +7126,12 @@  reshape_init_class (tree type, reshape_i
     {
       tree field_init;
       constructor_elt *old_cur = d->cur;
+      const char *old_ptr = NULL;
       bool direct_desig = false;
 
+      if (TREE_CODE (d->cur->value) == RAW_DATA_CST)
+	old_ptr = RAW_DATA_POINTER (d->cur->value);
+
       /* Handle C++20 designated initializers.  */
       if (d->cur->index)
 	{
@@ -7158,6 +7244,7 @@  reshape_init_class (tree type, reshape_i
 	     is initialized by the designated-initializer-list { D }, where D
 	     is the designated- initializer-clause naming a member of the
 	     anonymous union member."  */
+	  gcc_checking_assert (TREE_CODE (d->cur->value) != RAW_DATA_CST);
 	  field_init = reshape_single_init (TREE_TYPE (field),
 					    d->cur->value, complain);
 	  d->cur++;
@@ -7170,7 +7257,11 @@  reshape_init_class (tree type, reshape_i
       if (field_init == error_mark_node)
 	return error_mark_node;
 
-      if (d->cur == old_cur && d->cur->index)
+      if (d->cur == old_cur
+	  && d->cur->index
+	  && (old_ptr == NULL
+	      || (TREE_CODE (d->cur->value) == RAW_DATA_CST
+		  && RAW_DATA_POINTER (d->cur->value) == old_ptr)))
 	{
 	  /* This can happen with an invalid initializer for a flexible
 	     array member (c++/54441).  */
@@ -7205,8 +7296,11 @@  reshape_init_class (tree type, reshape_i
      correspond to all remaining elements of the initializer list (if any).  */
   if (last_was_pack_expansion)
     {
+      tree init = d->cur->value;
+      if (tree raw_init = cp_maybe_split_raw_data (d))
+	init = raw_init;
       CONSTRUCTOR_APPEND_ELT (CONSTRUCTOR_ELTS (new_init),
-			      last_was_pack_expansion, d->cur->value);
+			      last_was_pack_expansion, init);
       while (d->cur != d->end)
 	d->cur++;
     }
@@ -7258,7 +7352,10 @@  reshape_init_r (tree type, reshape_iter
     {
       /* A complex type can be initialized from one or two initializers,
 	 but braces are not elided.  */
-      d->cur++;
+      if (tree raw_init = cp_maybe_split_raw_data (d))
+	init = raw_init;
+      else
+	d->cur++;
       if (BRACE_ENCLOSED_INITIALIZER_P (stripped_init))
 	{
 	  if (CONSTRUCTOR_NELTS (stripped_init) > 2)
@@ -7273,10 +7370,13 @@  reshape_init_r (tree type, reshape_iter
 	{
 	  vec<constructor_elt, va_gc> *v = 0;
 	  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init);
-	  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, d->cur->value);
+	  tree raw_init = cp_maybe_split_raw_data (d);
+	  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+				  raw_init ? raw_init : d->cur->value);
 	  if (has_designator_problem (d, complain))
 	    return error_mark_node;
-	  d->cur++;
+	  if (!raw_init)
+	    d->cur++;
 	  init = build_constructor (init_list_type_node, v);
 	}
       return init;
@@ -7324,6 +7424,8 @@  reshape_init_r (tree type, reshape_iter
 	  else
 	    maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
 	}
+      else if (tree raw_init = cp_maybe_split_raw_data (d))
+	return raw_init;
 
       d->cur++;
       return init;
@@ -7337,6 +7439,7 @@  reshape_init_r (tree type, reshape_iter
       /* But not if it's a designated init.  */
       && !d->cur->index
       && d->end - d->cur == 1
+      && TREE_CODE (init) != RAW_DATA_CST
       && reference_related_p (type, TREE_TYPE (init)))
     {
       d->cur++;
@@ -7358,9 +7461,16 @@  reshape_init_r (tree type, reshape_iter
 	 valid aggregate initialization.  */
       && !first_initializer_p
       && (same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (init))
-	  || can_convert_arg (type, TREE_TYPE (init), init, LOOKUP_NORMAL,
-			      complain)))
+	  || can_convert_arg (type, TREE_TYPE (init),
+			      TREE_CODE (init) == RAW_DATA_CST
+			      ? build_int_cst (integer_type_node,
+					       *(const unsigned char *)
+					       RAW_DATA_POINTER (init))
+			      : init,
+			      LOOKUP_NORMAL, complain)))
     {
+      if (tree raw_init = cp_maybe_split_raw_data (d))
+	return raw_init;
       d->cur++;
       return init;
     }
@@ -7463,7 +7573,7 @@  reshape_init_r (tree type, reshape_iter
   else if (VECTOR_TYPE_P (type))
     new_init = reshape_init_vector (type, d, complain);
   else
-    gcc_unreachable();
+    gcc_unreachable ();
 
   if (braces_elided_p
       && TREE_CODE (new_init) == CONSTRUCTOR)
--- gcc/testsuite/c-c++-common/cpp/embed-22.c.jj	2024-07-15 18:57:18.013860745 +0200
+++ gcc/testsuite/c-c++-common/cpp/embed-22.c	2024-07-15 19:01:49.146451109 +0200
@@ -0,0 +1,28 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -Wno-psabi" } */
+/* { dg-additional-options "-std=c23" { target c } } */
+
+typedef unsigned char V __attribute__((vector_size (128)));
+
+V a;
+
+void
+foo (void)
+{
+  V b = {
+    #embed __FILE__ limit (128) gnu::offset (3)
+  };
+  a = b;
+}
+
+const unsigned char c[] = {
+  #embed __FILE__ limit (128) gnu::offset (3)
+};
+
+int
+main ()
+{
+  foo ();
+  if (__builtin_memcmp (&c[0], &a, sizeof (a)))
+    __builtin_abort ();
+}
--- gcc/testsuite/c-c++-common/cpp/embed-23.c.jj	2024-07-16 12:41:11.514073178 +0200
+++ gcc/testsuite/c-c++-common/cpp/embed-23.c	2024-07-16 13:09:16.730670474 +0200
@@ -0,0 +1,36 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-std=gnu23" { target c } } */
+
+typedef unsigned char V __attribute__((vector_size (16)));
+
+struct S { _Complex double a; V b; int c; };
+struct T { int a; struct S b; int c; struct S d; int e; unsigned char f[22]; _Complex long double g; };
+
+const unsigned char a[] = {
+  #embed __FILE__ limit (124)
+};
+const struct T b[2] = {
+  #embed __FILE__ limit (124)
+};
+
+int
+main ()
+{
+  for (int i = 0; i < 2; ++i)
+    if (b[i].a != a[i * 62]
+	|| __real__ b[i].b.a != a[i * 62 + 1]
+	|| __imag__ b[i].b.a
+	|| __builtin_memcmp (&b[i].b.b, &a[i * 62 + 2], 16)
+	|| b[i].b.c != a[i * 62 + 18]
+	|| b[i].c != a[i * 62 + 19]
+	|| __real__ b[i].d.a != a[i * 62 + 20]
+	|| __imag__ b[i].d.a
+	|| __builtin_memcmp (&b[i].d.b, &a[i * 62 + 21], 16)
+	|| b[i].d.c != a[i * 62 + 37]
+	|| b[i].e != a[i * 62 + 38]
+	|| __builtin_memcmp (&b[i].f[0], &a[i * 62 + 39], 22)
+	|| __real__ b[i].g != a[i * 62 + 61]
+	|| __imag__ b[i].g)
+      __builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp/embed-4.C.jj	2024-07-15 17:46:54.113865890 +0200
+++ gcc/testsuite/g++.dg/cpp/embed-4.C	2024-07-15 17:48:15.000000000 +0200
@@ -0,0 +1,66 @@ 
+// { dg-do run { target c++11 } }
+// { dg-options "" }
+
+constexpr unsigned char a[] = {
+#embed __FILE__
+};
+
+constexpr unsigned char
+foo (int x)
+{
+  return a[x];
+}
+constexpr unsigned char b = a[32];
+constexpr unsigned char c = foo (42);
+
+#if __cplusplus >= 201402L
+constexpr bool
+bar ()
+{
+  unsigned char d[] = {
+  #embed __FILE__
+  };
+  d[42] = ' ';
+  d[32] = 'X';
+  d[0] = d[1] + 16;
+  d[sizeof (d) - 1] = d[42] - ' ';
+  for (int i = 0; i < sizeof (d); ++i)
+    switch (i)
+      {
+      case 0:
+	if (d[i] != a[1] + 16)
+	  return false;
+	break;
+      case 32:
+	if (d[i] != 'X')
+	  return false;
+	break;
+      case 42:
+	if (d[i] != ' ')
+	  return false;
+	break;
+      case sizeof (d) - 1:
+	if (d[i] != 0)
+	  return false;
+	break;
+      default:
+	if (d[i] != a[i])
+	  return false;
+	break;
+      }
+  return true;
+}
+
+static_assert (bar (), "");
+#endif
+
+int
+main ()
+{
+  unsigned char e[] = {
+  #embed __FILE__
+  };
+
+  if (b != e[32] || c != e[42])
+    __builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp/embed-5.C.jj	2024-07-15 18:06:56.460845067 +0200
+++ gcc/testsuite/g++.dg/cpp/embed-5.C	2024-07-15 18:38:41.170905555 +0200
@@ -0,0 +1,72 @@ 
+// { dg-do run { target c++14 } }
+// { dg-options "" }
+
+template <typename T>
+constexpr T a[] = {
+#embed __FILE__
+};
+
+template <typename T>
+constexpr T
+foo (int x)
+{
+  return a<T>[x];
+}
+constexpr unsigned char b = a<unsigned char>[32];
+constexpr unsigned char c = foo<unsigned char> (42);
+constexpr int b2 = a<int>[32];
+constexpr int c2 = foo<int> (42);
+
+template <typename T>
+constexpr bool
+bar ()
+{
+  T d[] = {
+  #embed __FILE__
+  };
+  d[42] = ' ';
+  d[32] = 'X';
+  d[0] = d[1] + 16;
+  d[sizeof (d) / sizeof (T) - 1] = d[42] - ' ';
+  for (int i = 0; i < sizeof (d) / sizeof (T); ++i)
+    switch (i)
+      {
+      case 0:
+	if (d[i] != a<T>[1] + 16)
+	  return false;
+	break;
+      case 32:
+	if (d[i] != 'X')
+	  return false;
+	break;
+      case 42:
+	if (d[i] != ' ')
+	  return false;
+	break;
+      case sizeof (d) / sizeof (T) - 1:
+	if (d[i] != 0)
+	  return false;
+	break;
+      default:
+	if (d[i] != a<T>[i])
+	  return false;
+	break;
+      }
+  return true;
+}
+
+static_assert (bar<unsigned char> (), "");
+static_assert (bar<int> (), "");
+
+int
+main ()
+{
+  unsigned char e[] = {
+  #embed __FILE__
+  };
+
+  if (b != e[32] || c != e[42])
+    __builtin_abort ();
+  if (b2 != b || c2 != c)
+    __builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp/embed-6.C.jj	2024-07-15 18:07:35.927349168 +0200
+++ gcc/testsuite/g++.dg/cpp/embed-6.C	2024-07-15 18:31:54.519017822 +0200
@@ -0,0 +1,72 @@ 
+// { dg-do run { target c++14 } }
+// { dg-options "" }
+
+template <typename T>
+constexpr unsigned char a[] = {
+#embed __FILE__
+};
+
+template <typename T>
+constexpr unsigned char
+foo (int x)
+{
+  return a<T>[x];
+}
+constexpr unsigned char b = a<unsigned char>[32];
+constexpr unsigned char c = foo<unsigned char> (42);
+constexpr unsigned char b2 = a<int>[32];
+constexpr unsigned char c2 = foo<int> (42);
+
+template <typename T>
+constexpr bool
+bar ()
+{
+  unsigned char d[] = {
+  #embed __FILE__
+  };
+  d[42] = ' ';
+  d[32] = 'X';
+  d[0] = d[1] + 16;
+  d[sizeof (d) - 1] = d[42] - ' ';
+  for (int i = 0; i < sizeof (d); ++i)
+    switch (i)
+      {
+      case 0:
+	if (d[i] != a<T>[1] + 16)
+	  return false;
+	break;
+      case 32:
+	if (d[i] != 'X')
+	  return false;
+	break;
+      case 42:
+	if (d[i] != ' ')
+	  return false;
+	break;
+      case sizeof (d) - 1:
+	if (d[i] != 0)
+	  return false;
+	break;
+      default:
+	if (d[i] != a<T>[i])
+	  return false;
+	break;
+      }
+  return true;
+}
+
+static_assert (bar<unsigned char> (), "");
+static_assert (bar<int> (), "");
+
+int
+main ()
+{
+  unsigned char e[] = {
+  #embed __FILE__
+  };
+
+  if (b != e[32] || c != e[42])
+    __builtin_abort ();
+  if (b2 != b || c2 != c)
+    __builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp/embed-7.C.jj	2024-07-15 18:30:55.356761596 +0200
+++ gcc/testsuite/g++.dg/cpp/embed-7.C	2024-07-15 18:18:06.385427418 +0200
@@ -0,0 +1,7 @@ 
+// This is a comment with some UTF-8 non-ASCII characters: áéíóú.
+// { dg-do compile { target c++11 } }
+// { dg-options "" } */
+
+const signed char a[] = {
+#embed __FILE__
+};	// { dg-error "narrowing conversion of '\[12]\[0-9]\[0-9]' from 'int' to 'const signed char'" }
--- gcc/testsuite/g++.dg/cpp/embed-8.C.jj	2024-07-15 18:30:58.879717302 +0200
+++ gcc/testsuite/g++.dg/cpp/embed-8.C	2024-07-15 18:34:17.199224101 +0200
@@ -0,0 +1,7 @@ 
+// This is a comment with some UTF-8 non-ASCII characters: áéíóú.
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wno-narrowing -Wconversion" }
+
+const signed char a[] = {
+#embed __FILE__
+};	// { dg-warning "conversion from 'int' to 'const signed char' changes value from '\[12]\[0-9]\[0-9]' to '-\[0-9]\[0-9]*'" }
--- gcc/testsuite/g++.dg/cpp/embed-9.C.jj	2024-07-16 11:44:18.624617163 +0200
+++ gcc/testsuite/g++.dg/cpp/embed-9.C	2024-07-16 11:49:19.171768836 +0200
@@ -0,0 +1,57 @@ 
+// { dg-do run { target c++11 } }
+// { dg-options "--embed-dir=${srcdir}/c-c++-common/cpp/embed-dir" }
+
+const unsigned char m[] = {
+  #embed <magna-carta.txt> limit (131)
+};
+
+template <int ...N>
+int
+foo ()
+{
+  unsigned char a[] = { N... };
+  for (int i = 0; i < sizeof (a); ++i)
+    if (a[i] != m[i])
+      return -1;
+  return sizeof (a);
+}
+
+template <typename ...T>
+int
+bar (T... args)
+{
+  int a[] = { args... };
+  for (int i = 0; i < sizeof (a) / sizeof (a[0]); ++i)
+    if (a[i] != m[i])
+      return -1;
+  return sizeof (a) / sizeof (a[0]);
+}
+
+int
+main ()
+{
+  if (foo <
+    #embed <magna-carta.txt> limit (1)
+      > () != 1)
+    __builtin_abort ();
+  if (foo <
+    #embed <magna-carta.txt> limit (6)
+      > () != 6)
+    __builtin_abort ();
+  if (foo <
+    #embed <magna-carta.txt> limit (131)
+      > () != 131)
+    __builtin_abort ();
+  if (bar (
+    #embed <magna-carta.txt> limit (1)
+      ) != 1)
+    __builtin_abort ();
+  if (bar (
+    #embed <magna-carta.txt> limit (6)
+      ) != 6)
+    __builtin_abort ();
+  if (bar (
+    #embed <magna-carta.txt> limit (131)
+      ) != 131)
+    __builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp/embed-10.C.jj	2024-07-16 11:50:28.571880216 +0200
+++ gcc/testsuite/g++.dg/cpp/embed-10.C	2024-07-16 11:57:46.462296213 +0200
@@ -0,0 +1,40 @@ 
+// { dg-do run { target c++23 } }
+// { dg-options "--embed-dir=${srcdir}/c-c++-common/cpp/embed-dir" }
+
+const unsigned char m[] = {
+  #embed <magna-carta.txt> limit (136)
+};
+
+struct S
+{
+  S () : a {} {};
+  template <typename ...T>
+  int &operator[] (T... args)
+  {
+    int b[] = { args... };
+    for (int i = 0; i < sizeof (b) / sizeof (b[0]); ++i)
+      if (b[i] != m[i])
+	return a[137];
+    return a[sizeof (b) / sizeof (b[0])];
+  }
+  int a[138];
+};
+
+S s;
+
+int
+main ()
+{
+  if (&s[
+      #embed <magna-carta.txt> limit (1)
+	] != &s.a[1])
+    __builtin_abort ();
+  if (&s[
+      #embed <magna-carta.txt> limit (6)
+	] != &s.a[6])
+    __builtin_abort ();
+  if (&s[
+      #embed <magna-carta.txt> limit (135)
+	] != &s.a[135])
+    __builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp/embed-11.C.jj	2024-07-16 12:05:19.170536951 +0200
+++ gcc/testsuite/g++.dg/cpp/embed-11.C	2024-07-16 12:16:01.948346872 +0200
@@ -0,0 +1,41 @@ 
+// { dg-do run }
+// { dg-options "-Wunused-value" }
+
+#include <stdarg.h>
+
+const unsigned char a[] = {
+  #embed __FILE__ limit (128)
+};
+
+int
+foo (int x, ...)
+{
+  if (x != 42)
+    return 2;
+  va_list ap;
+  va_start (ap, x);
+  for (int i = 0; i < 128; ++i)
+    if (va_arg (ap, int) != a[i])
+      {
+	va_end (ap);
+	return 1;
+      }
+  va_end (ap);
+  return 0;
+}
+
+int b, c;
+
+int
+main ()
+{
+  if (foo (42,
+#embed __FILE__ limit (128)
+      ))
+    __builtin_abort ();
+  b = (
+#embed __FILE__ limit (128) prefix (c = 2 * ) suffix ( + 6)	// { dg-warning "right operand of comma operator has no effect" }
+  );
+  if (b != a[127] + 6 || c != 2 * a[0])
+    __builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp/embed-12.C.jj	2024-07-16 12:07:19.451006766 +0200
+++ gcc/testsuite/g++.dg/cpp/embed-12.C	2024-07-16 12:29:27.601065723 +0200
@@ -0,0 +1,34 @@ 
+// { dg-do compile }
+// { dg-options "-Wnonnull" }
+
+#define A(n) int *p##n
+#define B(n) A(n##0), A(n##1), A(n##2), A(n##3), A(n##4), A(n##5), A(n##6), A(n##7)
+#define C(n) B(n##0), B(n##1), B(n##2), B(n##3), B(n##4), B(n##5), B(n##6), B(n##7)
+#define D C(0), C(1), C(2), C(3)
+
+void foo (D) __attribute__((nonnull (	// { dg-message "in a call to function '\[^\n\r]*' declared 'nonnull'" }
+#embed __FILE__ limit (128)
+)));
+#if __cplusplus >= 201103L
+[[gnu::nonnull (
+#embed __FILE__ limit (128)
+)]] void bar (D);	// { dg-message "in a call to function '\[^\n\r]*' declared 'nonnull'" "" { target c++11 } }
+#else
+void bar (D) __attribute__((nonnull (	// { dg-message "in a call to function '\[^\n\r]*' declared 'nonnull'" "" { target c++98_only } }
+#embed __FILE__ limit (128)
+)));
+#endif
+
+#undef A
+#if __cplusplus >= 201103L
+#define A(n) nullptr
+#else
+#define A(n) 0
+#endif
+
+void
+baz ()
+{
+  foo (D);	// { dg-warning "argument \[0-9]\+ null where non-null expected" }
+  bar (D);	// { dg-warning "argument \[0-9]\+ null where non-null expected" }
+}