Message ID | Zpg/3AiW41ccEeKL@tucnak |
---|---|
State | New |
Headers | show |
Series | c++: Implement C++26 P2558R2 - Add @, $, and ` to the basic character set [PR110343] | expand |
On 7/17/24 6:04 PM, Jakub Jelinek wrote: > Hi! > > The following patch implements the easy parts of the paper. > When @$` are added to the basic character set, it means that > R"@$`()@$`" should now be valid (here I've noticed most of the > raw string tests were tested solely with -std=c++11 or -std=gnu++11 > and I've tried to change that), and on the other side even if > by extension $ is allowed in identifiers, \u0024 or \U00000024 > or \u{24} should not be, similarly how \u0041 is not allowed. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > The paper in 3.1 claims though that > #include <stdio.h> > > #define STR(x) #x > > int main() > { > printf("%s", STR(\u0060)); // U+0060 is ` GRAVE ACCENT > } > should have been accepted before this paper (and rejected after it), > but g++ rejects it. > > I've tried to understand it, but am confused on what is the right > behavior and why. > > Consider > #define STR(x) #x > const char *a = "\u00b7"; > const char *b = STR(\u00b7); > const char *c = "\u0041"; > const char *d = STR(\u0041); > const char *e = STR(a\u00b7); > const char *f = STR(a\u0041); > const char *g = STR(a \u00b7); > const char *h = STR(a \u0041); > const char *i = "\u066d"; > const char *j = STR(\u066d); > const char *k = "\u0040"; > const char *l = STR(\u0040); > const char *m = STR(a\u066d); > const char *n = STR(a\u0040); > const char *o = STR(a \u066d); > const char *p = STR(a \u0040); > > Neither clang nor gcc emit any diagnostics on the a, c, i and k > initializers, those are certainly valid (c is invalid in C23 though). g++ > emits with -pedantic-errors errors on all the others, while clang++ on the > ones with STR involving \u0041, \u0040 and a\u0066d. The chosen values are > \u0040 '@' as something being changed by this paper, \u0041 'A' as basic > character set char valid in identifiers before/after, \u00b7 as an example > of character which is pedantically valid in identifiers if not at the start > and \u066d s something pedantically not valid in identifiers. > > Now, https://eel.is/c++draft/lex.charset#6 says that UCN used outside of a > string/character literal which corresponds to basic character set character > (or control character) is ill-formed, that would make d, f, h cases invalid > for C++ and l, n, p cases invalid for C++26. > > https://eel.is/c++draft/lex.name states which characters can appear at the > start of the identifier and which can appear after the start. And > https://eel.is/c++draft/lex.pptoken states that preprocessing-token is > either identifier, or tons of other things, or "each non-whitespace > character that cannot be one of the above" > > Then https://eel.is/c++draft/lex.pptoken#1 says that this last category is > invalid if the preprocessing token is being converted into token. > > And https://eel.is/c++draft/lex.pptoken#2 includes "If any character not in > the basic character set matches the last category, the program is > ill-formed." > > Now, e.g. for the C++23 STR(\u0040) case, \u0040 is there not in the basic > character set, so valid outside of the literals (not the case anymore in > C++26), but it isn't nondigit and doesn't have XID_Start property, so it > isn't IMHO an identifier and so must be the "each non-whitespace character > that cannot be one of the above" case. Why doesn't the above mentioned > https://eel.is/c++draft/lex.pptoken#2 sentence make that invalid? Your argument makes sense to me, though... > Ignoring > that, I'd say it would be then stringized and that feels like it is what > clang++ is doing. Now, e.g. for the STR(a\u066d) case, I wonder why that > isn't lexed as a identifier followed by \u066d "each non-whitespace > character that cannot be one of the above" token and stringified similarly, > clang++ rejects that. > > What GCC libcpp seems to be doing is that if that forms_identifier_p calls > _cpp_valid_utf8 or _cpp_valid_ucn with an argument which tells it is first > or second+ in identifier, and e.g. _cpp_valid_ucn then for UCNs valid in > string literals calls > else if (identifier_pos) > { > int validity = ucn_valid_in_identifier (pfile, result, nst); > > if (validity == 0) > cpp_error (pfile, CPP_DL_ERROR, > "universal character %.*s is not valid in an identifier", > (int) (str - base), base); > else if (validity == 2 && identifier_pos == 1) > cpp_error (pfile, CPP_DL_ERROR, > "universal character %.*s is not valid at the start of an identifier", > (int) (str - base), base); > } > so basically all those invalid in identifiers cases emit an error and > pretend to be valid in identifiers, rather than what e.g. _cpp_valid_utf8 > does for C but not for C++ and only for the chars completely invalid in > identifiers rather than just valid in identifiers but not at the start: > /* In C++, this is an error for invalid character in an identifier > because logically, the UTF-8 was converted to a UCN during > translation phase 1 (even though we don't physically do it that > way). In C, this byte rather becomes grammatically a separate > token. */ > > if (CPP_OPTION (pfile, cplusplus)) > cpp_error (pfile, CPP_DL_ERROR, > "extended character %.*s is not valid in an identifier", > (int) (*pstr - base), base); > else > { > *pstr = base; > return false; > } > The comment doesn't really match what is done in recent C++ versions because > there UCNs are translated to characters and not the other way around. ...it seems wrong that calling forms_identifier_p gives an error and returns true for characters that can't be part of an identifier, which I would expect to produce a false result. If we want to complain about the pptoken#2 issue, that seems like it should happen in the CPP_OTHER section of _cpp_lex_direct. Our diagnostic for STR(\u0041) is similarly unhelpful, saying just "not valid in an identifier" rather than anything about the basic character set or that it should be spelled "A". But if we're going to give an error either way, fixing this seems a low priority. > 2024-07-17 Jakub Jelinek <jakub@redhat.com> > > PR c++/110343 > libcpp/ > * lex.cc: C++26 P2558R2 - Add @, $, and ` to the basic character set. > (lex_raw_string): For C++26 allow $@` characters in prefix. > * charset.cc (_cpp_valid_ucn): For C++26 reject \u0024 in identifiers. > gcc/testsuite/ > * c-c++-common/raw-string-1.c: Use { c || c++11 } effective target, > remove c++ specific dg-options. > * c-c++-common/raw-string-2.c: Likewise. > * c-c++-common/raw-string-4.c: Likewise. > * c-c++-common/raw-string-5.c: Likewise. Expect some diagnostics > only for non-c++26, for c++26 expect different. > * c-c++-common/raw-string-6.c: Use { c || c++11 } effective target, > remove c++ specific dg-options. > * c-c++-common/raw-string-11.c: Likewise. > * c-c++-common/raw-string-13.c: Likewise. > * c-c++-common/raw-string-14.c: Likewise. > * c-c++-common/raw-string-15.c: Use { c || c++11 } effective target, > change c++ specific dg-options to just -Wtrigraphs. > * c-c++-common/raw-string-16.c: Likewise. > * c-c++-common/raw-string-17.c: Use { c || c++11 } effective target, > remove c++ specific dg-options. > * c-c++-common/raw-string-18.c: Use { c || c++11 } effective target, > remove -std=c++11 from c++ specific dg-options. > * c-c++-common/raw-string-19.c: Likewise. > * g++.dg/cpp26/raw-string1.C: New test. > * g++.dg/cpp26/raw-string2.C: New test. > > --- libcpp/lex.cc.jj 2024-07-17 11:36:49.897873247 +0200 > +++ libcpp/lex.cc 2024-07-17 20:04:43.936793506 +0200 > @@ -2718,7 +2718,10 @@ lex_raw_string (cpp_reader *pfile, cpp_t > || c == '*' || c == '+' || c == '-' || c == '/' > || c == '^' || c == '&' || c == '|' || c == '~' > || c == '!' || c == '=' || c == ',' > - || c == '"' || c == '\'')) > + || c == '"' || c == '\'' > + || ((c == '$' || c == '@' || c == '`') > + && CPP_OPTION (pfile, cplusplus) > + && CPP_OPTION (pfile, lang) > CLK_CXX23))) > prefix[prefix_len++] = c; > else > { > --- libcpp/charset.cc.jj 2024-01-05 08:35:13.696827331 +0100 > +++ libcpp/charset.cc 2024-07-17 20:18:13.665467035 +0200 > @@ -1808,7 +1808,12 @@ _cpp_valid_ucn (cpp_reader *pfile, const > result = 1; > } > else if (identifier_pos && result == 0x24 > - && CPP_OPTION (pfile, dollars_in_ident)) > + && CPP_OPTION (pfile, dollars_in_ident) > + /* In C++26 when dollars are allowed in identifiers, > + we should still reject \u0024 as $ is part of the basic > + character set. */ > + && !(CPP_OPTION (pfile, cplusplus) > + && CPP_OPTION (pfile, lang) > CLK_CXX23)) I wonder about moving $ handling into the next else, so we don't need to worry about the basic charset here? But the patch is OK. Jason
On 7/17/24 6:04 PM, Jakub Jelinek wrote: > Hi! > > The following patch implements the easy parts of the paper. > When @$` are added to the basic character set, it means that > R"@$`()@$`" should now be valid (here I've noticed most of the > raw string tests were tested solely with -std=c++11 or -std=gnu++11 > and I've tried to change that), and on the other side even if > by extension $ is allowed in identifiers, \u0024 or \U00000024 > or \u{24} should not be, similarly how \u0041 is not allowed. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > The paper in 3.1 claims though that > #include <stdio.h> > > #define STR(x) #x > > int main() > { > printf("%s", STR(\u0060)); // U+0060 is ` GRAVE ACCENT > } > should have been accepted before this paper (and rejected after it), > but g++ rejects it. > > I've tried to understand it, but am confused on what is the right > behavior and why. > > Consider > #define STR(x) #x > const char *a = "\u00b7"; > const char *b = STR(\u00b7); > const char *c = "\u0041"; > const char *d = STR(\u0041); > const char *e = STR(a\u00b7); > const char *f = STR(a\u0041); > const char *g = STR(a \u00b7); > const char *h = STR(a \u0041); > const char *i = "\u066d"; > const char *j = STR(\u066d); > const char *k = "\u0040"; > const char *l = STR(\u0040); > const char *m = STR(a\u066d); > const char *n = STR(a\u0040); > const char *o = STR(a \u066d); > const char *p = STR(a \u0040); > > Neither clang nor gcc emit any diagnostics on the a, c, i and k > initializers, those are certainly valid (c is invalid in C23 though). g++ > emits with -pedantic-errors errors on all the others, while clang++ on the > ones with STR involving \u0041, \u0040 and a\u0066d. The chosen values are > \u0040 '@' as something being changed by this paper, \u0041 'A' as basic > character set char valid in identifiers before/after, \u00b7 as an example > of character which is pedantically valid in identifiers if not at the start > and \u066d s something pedantically not valid in identifiers. > > Now, https://eel.is/c++draft/lex.charset#6 says that UCN used outside of a > string/character literal which corresponds to basic character set character > (or control character) is ill-formed, that would make d, f, h cases invalid > for C++ and l, n, p cases invalid for C++26. > > https://eel.is/c++draft/lex.name states which characters can appear at the > start of the identifier and which can appear after the start. And > https://eel.is/c++draft/lex.pptoken states that preprocessing-token is > either identifier, or tons of other things, or "each non-whitespace > character that cannot be one of the above" > > Then https://eel.is/c++draft/lex.pptoken#1 says that this last category is > invalid if the preprocessing token is being converted into token. > > And https://eel.is/c++draft/lex.pptoken#2 includes "If any character not in > the basic character set matches the last category, the program is > ill-formed." > > Now, e.g. for the C++23 STR(\u0040) case, \u0040 is there not in the basic > character set, so valid outside of the literals (not the case anymore in > C++26), but it isn't nondigit and doesn't have XID_Start property, so it > isn't IMHO an identifier and so must be the "each non-whitespace character > that cannot be one of the above" case. Why doesn't the above mentioned > https://eel.is/c++draft/lex.pptoken#2 sentence make that invalid? Ignoring > that, I'd say it would be then stringized and that feels like it is what > clang++ is doing. Now, e.g. for the STR(a\u066d) case, I wonder why that > isn't lexed as a identifier followed by \u066d "each non-whitespace > character that cannot be one of the above" token and stringified similarly, > clang++ rejects that. > > What GCC libcpp seems to be doing is that if that forms_identifier_p calls > _cpp_valid_utf8 or _cpp_valid_ucn with an argument which tells it is first > or second+ in identifier, and e.g. _cpp_valid_ucn then for UCNs valid in > string literals calls > else if (identifier_pos) > { > int validity = ucn_valid_in_identifier (pfile, result, nst); > > if (validity == 0) > cpp_error (pfile, CPP_DL_ERROR, > "universal character %.*s is not valid in an identifier", > (int) (str - base), base); > else if (validity == 2 && identifier_pos == 1) > cpp_error (pfile, CPP_DL_ERROR, > "universal character %.*s is not valid at the start of an identifier", > (int) (str - base), base); > } > so basically all those invalid in identifiers cases emit an error and > pretend to be valid in identifiers, rather than what e.g. _cpp_valid_utf8 > does for C but not for C++ and only for the chars completely invalid in > identifiers rather than just valid in identifiers but not at the start: > /* In C++, this is an error for invalid character in an identifier > because logically, the UTF-8 was converted to a UCN during > translation phase 1 (even though we don't physically do it that > way). In C, this byte rather becomes grammatically a separate > token. */ > > if (CPP_OPTION (pfile, cplusplus)) > cpp_error (pfile, CPP_DL_ERROR, > "extended character %.*s is not valid in an identifier", > (int) (*pstr - base), base); > else > { > *pstr = base; > return false; > } > The comment doesn't really match what is done in recent C++ versions because > there UCNs are translated to characters and not the other way around. > > 2024-07-17 Jakub Jelinek <jakub@redhat.com> > > PR c++/110343 > libcpp/ > * lex.cc: C++26 P2558R2 - Add @, $, and ` to the basic character set. > (lex_raw_string): For C++26 allow $@` characters in prefix. > * charset.cc (_cpp_valid_ucn): For C++26 reject \u0024 in identifiers. > gcc/testsuite/ > * c-c++-common/raw-string-1.c: Use { c || c++11 } effective target, > remove c++ specific dg-options. > * c-c++-common/raw-string-2.c: Likewise. > * c-c++-common/raw-string-4.c: Likewise. > * c-c++-common/raw-string-5.c: Likewise. Expect some diagnostics > only for non-c++26, for c++26 expect different. > * c-c++-common/raw-string-6.c: Use { c || c++11 } effective target, > remove c++ specific dg-options. > * c-c++-common/raw-string-11.c: Likewise. > * c-c++-common/raw-string-13.c: Likewise. > * c-c++-common/raw-string-14.c: Likewise. > * c-c++-common/raw-string-15.c: Use { c || c++11 } effective target, > change c++ specific dg-options to just -Wtrigraphs. > * c-c++-common/raw-string-16.c: Likewise. > * c-c++-common/raw-string-17.c: Use { c || c++11 } effective target, > remove c++ specific dg-options. > * c-c++-common/raw-string-18.c: Use { c || c++11 } effective target, > remove -std=c++11 from c++ specific dg-options. > * c-c++-common/raw-string-19.c: Likewise. > * g++.dg/cpp26/raw-string1.C: New test. > * g++.dg/cpp26/raw-string2.C: New test. I'm now seeing a -std=c++26 failure on g++.dg/cpp/ucn-1.C. Jason
On Fri, Jul 26, 2024 at 11:43:13AM -0400, Jason Merrill wrote:
> I'm now seeing a -std=c++26 failure on g++.dg/cpp/ucn-1.C.
I don't remember seeing it when I wrote the patch, but today I see it as
well.
The following patch seems to fix that, tested on i686-linux, ok for trunk?
2024-07-26 Jakub Jelinek <jakub@redhat.com>
* g++.dg/cpp/ucn-1.C (main): Expect error on c\u0024c identifier also
for C++26.
--- gcc/testsuite/g++.dg/cpp/ucn-1.C.jj 2020-01-14 20:02:46.702611047 +0100
+++ gcc/testsuite/g++.dg/cpp/ucn-1.C 2024-07-26 17:52:33.881518790 +0200
@@ -9,7 +9,7 @@ int main()
int c\u0041c; // { dg-error "not valid in an identifier" }
// $ is OK on most targets; not part of basic source char set
- int c\u0024c; // { dg-error "not valid in an identifier" "" { target { powerpc-ibm-aix* } } }
+ int c\u0024c; // { dg-error "not valid in an identifier" "" { target { { powerpc-ibm-aix* } || c++26 } } }
U"\uD800"; // { dg-error "not a valid universal character" }
Jakub
On 7/26/24 11:55 AM, Jakub Jelinek wrote: > On Fri, Jul 26, 2024 at 11:43:13AM -0400, Jason Merrill wrote: >> I'm now seeing a -std=c++26 failure on g++.dg/cpp/ucn-1.C. > > I don't remember seeing it when I wrote the patch, but today I see it as > well. > > The following patch seems to fix that, tested on i686-linux, ok for trunk? OK. > 2024-07-26 Jakub Jelinek <jakub@redhat.com> > > * g++.dg/cpp/ucn-1.C (main): Expect error on c\u0024c identifier also > for C++26. > > --- gcc/testsuite/g++.dg/cpp/ucn-1.C.jj 2020-01-14 20:02:46.702611047 +0100 > +++ gcc/testsuite/g++.dg/cpp/ucn-1.C 2024-07-26 17:52:33.881518790 +0200 > @@ -9,7 +9,7 @@ int main() > > int c\u0041c; // { dg-error "not valid in an identifier" } > // $ is OK on most targets; not part of basic source char set > - int c\u0024c; // { dg-error "not valid in an identifier" "" { target { powerpc-ibm-aix* } } } > + int c\u0024c; // { dg-error "not valid in an identifier" "" { target { { powerpc-ibm-aix* } || c++26 } } } > > U"\uD800"; // { dg-error "not a valid universal character" } > > > > Jakub >
--- libcpp/lex.cc.jj 2024-07-17 11:36:49.897873247 +0200 +++ libcpp/lex.cc 2024-07-17 20:04:43.936793506 +0200 @@ -2718,7 +2718,10 @@ lex_raw_string (cpp_reader *pfile, cpp_t || c == '*' || c == '+' || c == '-' || c == '/' || c == '^' || c == '&' || c == '|' || c == '~' || c == '!' || c == '=' || c == ',' - || c == '"' || c == '\'')) + || c == '"' || c == '\'' + || ((c == '$' || c == '@' || c == '`') + && CPP_OPTION (pfile, cplusplus) + && CPP_OPTION (pfile, lang) > CLK_CXX23))) prefix[prefix_len++] = c; else { --- libcpp/charset.cc.jj 2024-01-05 08:35:13.696827331 +0100 +++ libcpp/charset.cc 2024-07-17 20:18:13.665467035 +0200 @@ -1808,7 +1808,12 @@ _cpp_valid_ucn (cpp_reader *pfile, const result = 1; } else if (identifier_pos && result == 0x24 - && CPP_OPTION (pfile, dollars_in_ident)) + && CPP_OPTION (pfile, dollars_in_ident) + /* In C++26 when dollars are allowed in identifiers, + we should still reject \u0024 as $ is part of the basic + character set. */ + && !(CPP_OPTION (pfile, cplusplus) + && CPP_OPTION (pfile, lang) > CLK_CXX23)) { if (CPP_OPTION (pfile, warn_dollars) && !pfile->state.skipping) { --- gcc/testsuite/c-c++-common/raw-string-1.c.jj 2020-01-12 11:54:37.022404221 +0100 +++ gcc/testsuite/c-c++-common/raw-string-1.c 2024-07-17 20:31:02.272652757 +0200 @@ -1,7 +1,6 @@ -// { dg-do run } +// { dg-do run { target { c || c++11 } } } // { dg-require-effective-target wchar } // { dg-options "-std=gnu99 -Wno-c++-compat -trigraphs" { target c } } -// { dg-options "-std=c++0x" { target c++ } } #ifndef __cplusplus #include <wchar.h> --- gcc/testsuite/c-c++-common/raw-string-2.c.jj 2020-01-12 11:54:37.023404206 +0100 +++ gcc/testsuite/c-c++-common/raw-string-2.c 2024-07-17 20:31:18.415446546 +0200 @@ -1,7 +1,6 @@ -// { dg-do run } +// { dg-do run { target { c || c++11 } } } // { dg-require-effective-target wchar } // { dg-options "-std=gnu99 -Wno-c++-compat -trigraphs" { target c } } -// { dg-options "-std=c++0x" { target c++ } } #ifndef __cplusplus #include <wchar.h> --- gcc/testsuite/c-c++-common/raw-string-4.c.jj 2020-01-12 11:54:37.023404206 +0100 +++ gcc/testsuite/c-c++-common/raw-string-4.c 2024-07-17 20:31:51.590022777 +0200 @@ -1,7 +1,6 @@ // R is not applicable for character literals. -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99" { target c } } -// { dg-options "-std=c++0x" { target c++ } } const int i0 = R'a'; // { dg-error "was not declared|undeclared" "undeclared" } // { dg-error "expected ',' or ';'" "expected" { target c } .-1 } --- gcc/testsuite/c-c++-common/raw-string-5.c.jj 2020-07-28 15:39:09.992756448 +0200 +++ gcc/testsuite/c-c++-common/raw-string-5.c 2024-07-17 20:56:46.522822013 +0200 @@ -1,6 +1,5 @@ -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99" { target c } } -// { dg-options "-std=c++0x" { target c++ } } const void *s0 = R"0123456789abcdefg()0123456789abcdefg" 0; // { dg-error "raw string delimiter longer" "longer" { target *-*-* } .-1 } @@ -15,12 +14,18 @@ const void *s3 = R")())" 0; // { dg-error "invalid character" "invalid" { target *-*-* } .-1 } // { dg-error "stray" "stray" { target *-*-* } .-2 } const void *s4 = R"@()@" 0; - // { dg-error "invalid character" "invalid" { target *-*-* } .-1 } - // { dg-error "stray" "stray" { target *-*-* } .-2 } + // { dg-error "invalid character" "invalid" { target { c || c++23_down } } .-1 } + // { dg-error "stray" "stray" { target { c || c++23_down } } .-2 } + // { dg-error "before numeric constant" "numeric" { target c++26 } .-3 } const void *s5 = R"$()$" 0; - // { dg-error "invalid character" "invalid" { target *-*-* } .-1 } - // { dg-error "stray" "stray" { target *-*-* } .-2 } -const void *s6 = R"\u0040()\u0040" 0; + // { dg-error "invalid character" "invalid" { target { c || c++23_down } } .-1 } + // { dg-error "stray" "stray" { target { c || c++23_down } } .-2 } + // { dg-error "before numeric constant" "numeric" { target c++26 } .-3 } +const void *s6 = R"`()`" 0; + // { dg-error "invalid character" "invalid" { target { c || c++23_down } } .-1 } + // { dg-error "stray" "stray" { target { c || c++23_down } } .-2 } + // { dg-error "before numeric constant" "numeric" { target c++26 } .-3 } +const void *s7 = R"\u0040()\u0040" 0; // { dg-error "invalid character" "invalid" { target *-*-* } .-1 } // { dg-error "stray" "stray" { target *-*-* } .-2 } --- gcc/testsuite/c-c++-common/raw-string-6.c.jj 2020-12-28 12:27:32.500752614 +0100 +++ gcc/testsuite/c-c++-common/raw-string-6.c 2024-07-17 20:32:26.193580759 +0200 @@ -1,6 +1,5 @@ -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99" { target c } } -// { dg-options "-std=c++0x" { target c++ } } const void *s0 = R"ouch()ouCh"; // { dg-error "unterminated raw string" "unterminated" } // { dg-error "at end of input" "end" { target *-*-* } .-1 } --- gcc/testsuite/c-c++-common/raw-string-11.c.jj 2020-01-12 11:54:37.022404221 +0100 +++ gcc/testsuite/c-c++-common/raw-string-11.c 2024-07-17 20:33:54.236456112 +0200 @@ -1,7 +1,7 @@ // PR preprocessor/48740 +// { dg-do run { target { c || c++11 } } } // { dg-options "-std=gnu99 -trigraphs -save-temps" { target c } } -// { dg-options "-std=c++0x -save-temps" { target c++ } } -// { dg-do run } +// { dg-options "-save-temps" { target c++ } } int main () { @@ -9,4 +9,3 @@ int main () "foo%sbar%sfred%sbob?""?""?""?""?", sizeof ("foo%sbar%sfred%sbob?""?""?""?""?")); } - --- gcc/testsuite/c-c++-common/raw-string-13.c.jj 2020-01-12 11:54:37.022404221 +0100 +++ gcc/testsuite/c-c++-common/raw-string-13.c 2024-07-17 20:34:23.669080145 +0200 @@ -1,8 +1,7 @@ // PR preprocessor/57620 -// { dg-do run } +// { dg-do run { target { c || c++11 } } } // { dg-require-effective-target wchar } // { dg-options "-std=gnu99 -Wno-c++-compat -trigraphs" { target c } } -// { dg-options "-std=c++11" { target c++ } } #ifndef __cplusplus #include <wchar.h> --- gcc/testsuite/c-c++-common/raw-string-14.c.jj 2020-07-28 15:39:09.992756448 +0200 +++ gcc/testsuite/c-c++-common/raw-string-14.c 2024-07-17 20:34:43.507826727 +0200 @@ -1,7 +1,6 @@ // PR preprocessor/57620 -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99 -trigraphs" { target c } } -// { dg-options "-std=c++11" { target c++ } } const void *s0 = R"abc\ def()abcdef" 0; --- gcc/testsuite/c-c++-common/raw-string-15.c.jj 2020-01-12 11:54:37.022404221 +0100 +++ gcc/testsuite/c-c++-common/raw-string-15.c 2024-07-17 20:34:58.994628892 +0200 @@ -1,8 +1,8 @@ // PR preprocessor/57620 -// { dg-do run } +// { dg-do run { target { c || c++11 } } } // { dg-require-effective-target wchar } // { dg-options "-std=gnu99 -Wno-c++-compat -Wtrigraphs" { target c } } -// { dg-options "-std=gnu++11 -Wtrigraphs" { target c++ } } +// { dg-options "-Wtrigraphs" { target c++ } } #ifndef __cplusplus #include <wchar.h> --- gcc/testsuite/c-c++-common/raw-string-16.c.jj 2020-07-28 15:39:09.992756448 +0200 +++ gcc/testsuite/c-c++-common/raw-string-16.c 2024-07-17 20:35:22.387330085 +0200 @@ -1,7 +1,7 @@ // PR preprocessor/57620 -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99 -Wtrigraphs" { target c } } -// { dg-options "-std=gnu++11 -Wtrigraphs" { target c++ } } +// { dg-options "-Wtrigraphs" { target c++ } } const void *s0 = R"abc\ def()abcdef" 0; --- gcc/testsuite/c-c++-common/raw-string-17.c.jj 2020-01-12 11:54:37.022404221 +0100 +++ gcc/testsuite/c-c++-common/raw-string-17.c 2024-07-17 20:35:36.497149845 +0200 @@ -1,7 +1,6 @@ /* PR preprocessor/57824 */ -/* { dg-do run } */ +/* { dg-do run { target { c || c++11 } } } */ /* { dg-options "-std=gnu99" { target c } } */ -/* { dg-options "-std=c++11" { target c++ } } */ #define S(s) s #define T(s) s "\n" --- gcc/testsuite/c-c++-common/raw-string-18.c.jj 2020-01-12 11:54:37.022404221 +0100 +++ gcc/testsuite/c-c++-common/raw-string-18.c 2024-07-17 20:35:55.151911555 +0200 @@ -1,7 +1,7 @@ /* PR preprocessor/57824 */ -/* { dg-do compile } */ +/* { dg-do compile { target { c || c++11 } } } */ /* { dg-options "-std=gnu99 -fdump-tree-optimized-lineno" { target c } } */ -/* { dg-options "-std=c++11 -fdump-tree-optimized-lineno" { target c++ } } */ +/* { dg-options "-fdump-tree-optimized-lineno" { target c++ } } */ const char x[] = R"( abc --- gcc/testsuite/c-c++-common/raw-string-19.c.jj 2020-01-12 11:54:37.022404221 +0100 +++ gcc/testsuite/c-c++-common/raw-string-19.c 2024-07-17 20:36:25.445524589 +0200 @@ -1,7 +1,7 @@ /* PR preprocessor/57824 */ -/* { dg-do compile } */ +// { dg-do compile { target { c || c++11 } } } /* { dg-options "-std=gnu99 -fdump-tree-optimized-lineno -save-temps" { target c } } */ -/* { dg-options "-std=c++11 -fdump-tree-optimized-lineno -save-temps" { target c++ } } */ +/* { dg-options "-fdump-tree-optimized-lineno -save-temps" { target c++ } } */ const char x[] = R"( abc --- gcc/testsuite/g++.dg/cpp26/raw-string1.C.jj 2024-07-17 20:46:06.878052479 +0200 +++ gcc/testsuite/g++.dg/cpp26/raw-string1.C 2024-07-17 20:47:50.761715122 +0200 @@ -0,0 +1,4 @@ +// C++26 P2558R2 - Add @, $, and ` to the basic character set +// { dg-do compile { target c++26 } } + +const char *s0 = R"`@$$@`@`$()`@$$@`@`$"; --- gcc/testsuite/g++.dg/cpp26/raw-string2.C.jj 2024-07-17 20:54:53.478273235 +0200 +++ gcc/testsuite/g++.dg/cpp26/raw-string2.C 2024-07-17 20:58:46.177289931 +0200 @@ -0,0 +1,7 @@ +// C++26 P2558R2 - Add @, $, and ` to the basic character set +// { dg-do compile { target { ! { avr*-*-* mmix*-*-* *-*-aix* } } } } +// { dg-options "" } + +int a$b; +int a\u0024c; // { dg-error "universal character \\\\u0024 is not valid in an identifier" "" { target c++26 } } +int a\U00000024d; // { dg-error "universal character \\\\U00000024 is not valid in an identifier" "" { target c++26 } }