Message ID | 20240125121819.2759991-1-ak@linux.intel.com |
---|---|
State | New |
Headers | show |
Series | [v1] C++: Support constexpr strings for asm statements | expand |
On Thu, Jan 25, 2024 at 04:18:19AM -0800, Andi Kleen wrote: > Some programing styles use a lot of inline assembler, and it is common > to use very complex preprocessor macros to generate the assembler > strings for the asm statements. In C++ there would be a typesafe alternative > using templates and constexpr to generate the assembler strings, but > unfortunately the asm statement requires plain string literals, so this > doesn't work. > > This patch modifies the C++ parser to accept strings generated by > constexpr instead of just plain strings. This requires new syntax > because e.g. asm("..." : "r" (expr)) would be ambigious with a function > call. I chose () to make it unique. For example now you can write > > constexpr const char *genasm() { return "insn"; } > constexpr const char *genconstraint() { return "r"; } > > asm(genasm() :: (genconstraint()) (input)); > > The constexpr strings are allowed for the asm template, the > constraints and the clobbers (every time current asm accepts a string) > > The drawback of this scheme is that the constexpr doesn't have > full control over the input/output/clobber lists, but that can be > usually handled with a switch statement. One could imagine > more flexible ways to handle that, for example supporting constexpr > vectors for the clobber list, or similar. But even without > that it is already useful. > > Bootstrapped and full test on x86_64-linux. > --- > gcc/cp/parser.cc | 76 ++++++++++++++++++-------- > gcc/doc/extend.texi | 17 +++++- > gcc/testsuite/g++.dg/constexpr-asm-1.C | 30 ++++++++++ > 3 files changed, 99 insertions(+), 24 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/constexpr-asm-1.C > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc > index 3748ccd49ff3..cc323dc8557a 100644 > --- a/gcc/cp/parser.cc > +++ b/gcc/cp/parser.cc > @@ -22654,6 +22654,43 @@ cp_parser_using_directive (cp_parser* parser) > cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); > } > > +/* Parse a string literal or constant expression yielding a string. > + The constant expression uses extra parens to avoid ambiguity with "x" (expr). > + > + asm-string-expr: > + string-literal > + ( constant-expr ) */ > + > +static tree > +cp_parser_asm_string_expression (cp_parser *parser) > +{ > + location_t sloc = cp_lexer_peek_token (parser->lexer)->location; > + > + if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)) Why should it be wrapped in ()s? > + { > + matching_parens parens; > + parens.consume_open (parser); > + tree string = cp_parser_constant_expression (parser); > + if (string != error_mark_node) > + string = cxx_constant_value (string, tf_error); > + if (TREE_CODE (string) == NOP_EXPR) > + string = TREE_OPERAND (string, 0); > + if (TREE_CODE (string) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (string, 0)) == STRING_CST) Too long line. > + string = TREE_OPERAND (string, 0); > + if (TREE_CODE (string) == VIEW_CONVERT_EXPR) > + string = TREE_OPERAND (string, 0); > + if (TREE_CODE (string) != STRING_CST && string != error_mark_node) > + { > + error_at (sloc, "Expected string valued constant expression for %<asm%>, not type %qT", > + TREE_TYPE (string)); Again, too long line, diagnostics should never start with a capital letter, but more importantly, this will handle only a small subset of what one can construct with constexpr functions, not everything they can return even if they return const char * is necessarily a STRING_LITERAL, could be an array of chars or something similar, especially if the intent is not just to return prepared whole string literals, but construct the template etc. from substrings. Given the https://wg21.link/P2741R3 C++26 addition, I wonder if it wouldn't be much better to stay compatible with the static_assert extension there and use similar thing for inline asm. See https://gcc.gnu.org/r14-5771 and r14-5956 follow-up for the actual implementation. One issue is the character set question. The strings in inline asm right now are untranslated, i.e. remain in SOURCE_CHARSET, usually UTF-8 (in theory there is also UTF-EBCDIC support, but nobody knows if it actually works), which is presumably what the assembler expects too. Most of the string literals and character literals constexpr deals with are in the execution character set though. For static_assert we just assume the user knows what he is doing when trying to emit non-ASCII characters in the message when using say -fexec-charset=EBCDICUS , but should that be the case for inline asm too? Or should we try to translate those strings back from execution character set to source character set? Or require that it actually constructs UTF-8 string literals and for the UTF-EBCDIC case translate from UTF-8 to UTF-EBCDIC? So the user constexpr functions then would return u8"insn"; or construct from u8'i' etc. character literals... In any case, as has been said earlier, this isn't stage4 material. Jakub
Thanks Jakub. > > +/* Parse a string literal or constant expression yielding a string. > > + The constant expression uses extra parens to avoid ambiguity with "x" (expr). > > + > > + asm-string-expr: > > + string-literal > > + ( constant-expr ) */ > > + > > +static tree > > +cp_parser_asm_string_expression (cp_parser *parser) > > +{ > > + location_t sloc = cp_lexer_peek_token (parser->lexer)->location; > > + > > + if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)) > > Why should it be wrapped in ()s? It's true it's only needed for the constraints, but not here. > Too long line. > > > + string = TREE_OPERAND (string, 0); > > + if (TREE_CODE (string) == VIEW_CONVERT_EXPR) > > + string = TREE_OPERAND (string, 0); > > + if (TREE_CODE (string) != STRING_CST && string != error_mark_node) > > + { > > + error_at (sloc, "Expected string valued constant expression for %<asm%>, not type %qT", > > + TREE_TYPE (string)); > > Again, too long line, diagnostics should never start with a capital letter, > but more importantly, this will handle only a small subset of what one can > construct with constexpr functions, not everything they can return even if > they return const char * is necessarily a STRING_LITERAL, could be an array > of chars or something similar, especially if the intent is not just to > return prepared whole string literals, but construct the template etc. from > substrings. > > Given the https://wg21.link/P2741R3 C++26 addition, I wonder if it wouldn't That's a good find. > be much better to stay compatible with the static_assert extension there and > use similar thing for inline asm. > See https://gcc.gnu.org/r14-5771 and r14-5956 follow-up for the actual > implementation. Makes sense. I will adapt the code. > One issue is the character set question. The strings in inline asm right > now are untranslated, i.e. remain in SOURCE_CHARSET, usually UTF-8 (in > theory there is also UTF-EBCDIC support, but nobody knows if it actually > works), which is presumably what the assembler expects too. > Most of the string literals and character literals constexpr deals with > are in the execution character set though. For static_assert we just assume > the user knows what he is doing when trying to emit non-ASCII characters in > the message when using say -fexec-charset=EBCDICUS , but should that be the > case for inline asm too? Or should we try to translate those strings back > from execution character set to source character set? Or require that it > actually constructs UTF-8 string literals and for the UTF-EBCDIC case > translate from UTF-8 to UTF-EBCDIC? So the user constexpr functions then > would return u8"insn"; or construct from u8'i' etc. character literals... I think keeping it untranslated is fine for now. Any translation if really needed would be a separate feature. > > In any case, as has been said earlier, this isn't stage4 material. Yes that merge can be deferred of course. -Andi
On Fri, Mar 15, 2024 at 10:20:06AM -0700, Andi Kleen wrote: > > One issue is the character set question. The strings in inline asm right > > now are untranslated, i.e. remain in SOURCE_CHARSET, usually UTF-8 (in > > theory there is also UTF-EBCDIC support, but nobody knows if it actually > > works), which is presumably what the assembler expects too. > > Most of the string literals and character literals constexpr deals with > > are in the execution character set though. For static_assert we just assume > > the user knows what he is doing when trying to emit non-ASCII characters in > > the message when using say -fexec-charset=EBCDICUS , but should that be the > > case for inline asm too? Or should we try to translate those strings back > > from execution character set to source character set? Or require that it > > actually constructs UTF-8 string literals and for the UTF-EBCDIC case > > translate from UTF-8 to UTF-EBCDIC? So the user constexpr functions then > > would return u8"insn"; or construct from u8'i' etc. character literals... > > I think keeping it untranslated is fine for now. Any translation > if really needed would be a separate feature. I mean, unless you make extra effort, it is translated. Even in your current version, try constexpr *foo () { return "nop"; } and you'll see that it actually results in "\x95\x96\x97" with -fexec-charset=EBCDICUS. What is worse, constexpr *bar () { return "%0 %1"; } results in "\x6c\xf0\x40\x6c\xf1", so the compiler will not be able to find the % special characters in there etc. The parsing of the string literal in asm definitions uses translate=false to avoid the translations. As the static_assert paper says, for static_assert it isn't that big a deal, the program is already UB if it diagnoses static assertion failure, worst case it prints garbage if one plays with -fexec-charset=. But for inline asm it would fail to compile... So, the extension really should be well defined vs. the character set, either it should be characters in the execution charset and the FE would need to ask libcpp to translate it back, or it would need to be declared to be e.g. in UTF-8 regardless of the charset (like u8'x' or u8"abc" literals are; but then shouldn't the _M_data in that case return a pointer to char8_t instead), something else? Jakub
> > I think keeping it untranslated is fine for now. Any translation > > if really needed would be a separate feature. > > I mean, unless you make extra effort, it is translated. > Even in your current version, try constexpr *foo () { return "nop"; } > and you'll see that it actually results in "\x95\x96\x97" with > -fexec-charset=EBCDICUS. Perhaps I don't understand the use case for this option. Would it ever be used on a non EBCDIC system? > What is worse, constexpr *bar () { return "%0 %1"; } > results in "\x6c\xf0\x40\x6c\xf1", so the compiler will not be able to find > the % special characters in there etc. > The parsing of the string literal in asm definitions uses translate=false > to avoid the translations. > As the static_assert paper says, for static_assert it isn't that big a deal, > the program is already UB if it diagnoses static assertion failure, worst > case it prints garbage if one plays with -fexec-charset=. But for inline > asm it would fail to compile... > > So, the extension really should be well defined vs. the character set, > either it should be characters in the execution charset and the FE would > need to ask libcpp to translate it back, or it would need to be declared > to be e.g. in UTF-8 regardless of the charset (like u8'x' or u8"abc" > literals are; but then shouldn't the _M_data in that case return a pointer > to char8_t instead), something else? Okay then we can always translate to UTF-8. -Andi
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 3748ccd49ff3..cc323dc8557a 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -22654,6 +22654,43 @@ cp_parser_using_directive (cp_parser* parser) cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); } +/* Parse a string literal or constant expression yielding a string. + The constant expression uses extra parens to avoid ambiguity with "x" (expr). + + asm-string-expr: + string-literal + ( constant-expr ) */ + +static tree +cp_parser_asm_string_expression (cp_parser *parser) +{ + location_t sloc = cp_lexer_peek_token (parser->lexer)->location; + + if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)) + { + matching_parens parens; + parens.consume_open (parser); + tree string = cp_parser_constant_expression (parser); + if (string != error_mark_node) + string = cxx_constant_value (string, tf_error); + if (TREE_CODE (string) == NOP_EXPR) + string = TREE_OPERAND (string, 0); + if (TREE_CODE (string) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (string, 0)) == STRING_CST) + string = TREE_OPERAND (string, 0); + if (TREE_CODE (string) == VIEW_CONVERT_EXPR) + string = TREE_OPERAND (string, 0); + if (TREE_CODE (string) != STRING_CST && string != error_mark_node) + { + error_at (sloc, "Expected string valued constant expression for %<asm%>, not type %qT", + TREE_TYPE (string)); + string = error_mark_node; + } + parens.require_close (parser); + return string; + } + return cp_parser_string_literal (parser, false, false); +} + /* Parse an asm-definition. asm-qualifier: @@ -22666,19 +22703,19 @@ cp_parser_using_directive (cp_parser* parser) asm-qualifier-list asm-qualifier asm-definition: - asm ( string-literal ) ; + asm ( constant-expr ) ; GNU Extension: asm-definition: - asm asm-qualifier-list [opt] ( string-literal ) ; - asm asm-qualifier-list [opt] ( string-literal : asm-operand-list [opt] ) ; - asm asm-qualifier-list [opt] ( string-literal : asm-operand-list [opt] + asm asm-qualifier-list [opt] ( asm-string-expr ) ; + asm asm-qualifier-list [opt] ( asm-string-expr : asm-operand-list [opt] ) ; + asm asm-qualifier-list [opt] ( asm-string-expr : asm-operand-list [opt] : asm-operand-list [opt] ) ; - asm asm-qualifier-list [opt] ( string-literal : asm-operand-list [opt] + asm asm-qualifier-list [opt] ( asm-string-expr : asm-operand-list [opt] : asm-operand-list [opt] : asm-clobber-list [opt] ) ; - asm asm-qualifier-list [opt] ( string-literal : : asm-operand-list [opt] + asm asm-qualifier-list [opt] ( asm-string-expr : : asm-operand-list [opt] : asm-clobber-list [opt] : asm-goto-list ) ; @@ -22797,8 +22834,7 @@ cp_parser_asm_definition (cp_parser* parser) if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN)) return; /* Look for the string. */ - tree string = cp_parser_string_literal (parser, /*translate=*/false, - /*wide_ok=*/false); + tree string = cp_parser_asm_string_expression (parser); if (string == error_mark_node) { cp_parser_skip_to_closing_parenthesis (parser, true, false, @@ -29352,7 +29388,7 @@ cp_parser_yield_expression (cp_parser* parser) /* Parse an (optional) asm-specification. asm-specification: - asm ( string-literal ) + asm ( asm-string-expr ) If the asm-specification is present, returns a STRING_CST corresponding to the string-literal. Otherwise, returns @@ -29375,9 +29411,7 @@ cp_parser_asm_specification_opt (cp_parser* parser) parens.require_open (parser); /* Look for the string-literal. */ - tree asm_specification = cp_parser_string_literal (parser, - /*translate=*/false, - /*wide_ok=*/false); + tree asm_specification = cp_parser_asm_string_expression (parser); /* Look for the `)'. */ parens.require_close (parser); @@ -29392,8 +29426,8 @@ cp_parser_asm_specification_opt (cp_parser* parser) asm-operand-list , asm-operand asm-operand: - string-literal ( expression ) - [ string-literal ] string-literal ( expression ) + asm-string-expr ( expression ) + [ asm-string-expr ] asm-string-expr ( expression ) Returns a TREE_LIST representing the operands. The TREE_VALUE of each node is the expression. The TREE_PURPOSE is itself a @@ -29426,10 +29460,8 @@ cp_parser_asm_operand_list (cp_parser* parser) } else name = NULL_TREE; - /* Look for the string-literal. */ - tree string_literal = cp_parser_string_literal (parser, - /*translate=*/false, - /*wide_ok=*/false); + /* Look for the string. */ + tree string_literal = cp_parser_asm_string_expression (parser); /* Look for the `('. */ matching_parens parens; @@ -29462,8 +29494,8 @@ cp_parser_asm_operand_list (cp_parser* parser) /* Parse an asm-clobber-list. asm-clobber-list: - string-literal - asm-clobber-list , string-literal + const-expression + asm-clobber-list , const-expression Returns a TREE_LIST, indicating the clobbers in the order that they appeared. The TREE_VALUE of each node is a STRING_CST. */ @@ -29476,9 +29508,7 @@ cp_parser_asm_clobber_list (cp_parser* parser) while (true) { /* Look for the string literal. */ - tree string_literal = cp_parser_string_literal (parser, - /*translate=*/false, - /*wide_ok=*/false); + tree string_literal = cp_parser_asm_string_expression (parser); /* Add it to the list. */ clobbers = tree_cons (NULL_TREE, string_literal, clobbers); /* If the next token is not a `,', then the list is diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0bc586d120e7..230b46fc6c75 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -10589,6 +10589,17 @@ contain any instructions recognized by the assembler, including directives. GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input. +With gnu++11 or later it can also be a compile time constant expression inside parens. + +@example +constexpr const char *genfoo() @{ return "foo"; @} + +void function() +@{ + asm((genfoo())); +@} +@end example + You may place multiple assembler instructions together in a single @code{asm} string, separated by the characters normally used in assembly code for the system. A combination that works in most places is a newline to break the @@ -10739,20 +10750,24 @@ perform a jump to one of the labels listed in the @var{GotoLabels}. @item AssemblerTemplate This is a literal string that is the template for the assembler code. It is a combination of fixed text and tokens that refer to the input, output, -and goto parameters. @xref{AssemblerTemplate}. +and goto parameters. @xref{AssemblerTemplate}. With gnu++11 or later it can +also be a constant expression inside parens. @item OutputOperands A comma-separated list of the C variables modified by the instructions in the @var{AssemblerTemplate}. An empty list is permitted. @xref{OutputOperands}. +With gnu++11 or later the strings can also be constant expressions inside parens. @item InputOperands A comma-separated list of C expressions read by the instructions in the @var{AssemblerTemplate}. An empty list is permitted. @xref{InputOperands}. +With gnu++11 or later the strings can also be constant expressions inside parens. @item Clobbers A comma-separated list of registers or other values changed by the @var{AssemblerTemplate}, beyond those listed as outputs. An empty list is permitted. @xref{Clobbers and Scratch Registers}. +With gnu++11 or later the strings can also be constant expressions inside parens. @item GotoLabels When you are using the @code{goto} form of @code{asm}, this section contains diff --git a/gcc/testsuite/g++.dg/constexpr-asm-1.C b/gcc/testsuite/g++.dg/constexpr-asm-1.C new file mode 100644 index 000000000000..b1305571a2c1 --- /dev/null +++ b/gcc/testsuite/g++.dg/constexpr-asm-1.C @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-std=gnu++11" } */ + +constexpr const char *genfoo() +{ + return "foo %1,%0"; +} + +constexpr const char *genoutput() +{ + return "=r"; +} + +constexpr const char *geninput() +{ + return "r"; +} + +constexpr const char *genclobber() +{ + return "memory"; +} + +void f() +{ + int a; + asm((genfoo()) : (genoutput()) (a) : (geninput()) (1) : (genclobber())); +} + +/* { dg-final { scan-assembler "foo" } } */