Message ID | 20230914064949.29787-17-kmatsui@gcc.gnu.org |
---|---|
State | New |
Headers | show |
Series | Optimize type traits performance | expand |
On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote: > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h > index 545f0f4d9eb..eed6deaf0f8 100644 > --- a/gcc/c/c-parser.h > +++ b/gcc/c/c-parser.h > @@ -51,14 +51,14 @@ enum c_id_kind { > /* A single C token after string literal concatenation and conversion > of preprocessing tokens to tokens. */ > struct GTY (()) c_token { > + /* If this token is a keyword, this value indicates which keyword. > + Otherwise, this value is RID_MAX. */ > + ENUM_BITFIELD (rid) keyword : 16; > /* The kind of token. */ > ENUM_BITFIELD (cpp_ttype) type : 8; > /* If this token is a CPP_NAME, this value indicates whether also > declared as some kind of type. Otherwise, it is C_ID_NONE. */ > ENUM_BITFIELD (c_id_kind) id_kind : 8; > - /* If this token is a keyword, this value indicates which keyword. > - Otherwise, this value is RID_MAX. */ > - ENUM_BITFIELD (rid) keyword : 8; > /* If this token is a CPP_PRAGMA, this indicates the pragma that > was seen. Otherwise it is PRAGMA_NONE. */ > ENUM_BITFIELD (pragma_kind) pragma_kind : 8; If you want to optimize layout, I'd expect flags to move so it can share the same 32-bit unit as the pragma_kind bit-field (not sure if any changes should be made to the declaration of flags to maximise the chance of such sharing across different host bit-field ABIs). > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h > index 6cbb9a8e031..3c3c482c6ce 100644 > --- a/gcc/cp/parser.h > +++ b/gcc/cp/parser.h > @@ -40,11 +40,11 @@ struct GTY(()) tree_check { > /* A C++ token. */ > > struct GTY (()) cp_token { > - /* The kind of token. */ > - enum cpp_ttype type : 8; > /* If this token is a keyword, this value indicates which keyword. > Otherwise, this value is RID_MAX. */ > - enum rid keyword : 8; > + enum rid keyword : 16; > + /* The kind of token. */ > + enum cpp_ttype type : 8; > /* Token flags. */ > unsigned char flags; > /* True if this token is from a context where it is implicitly extern "C" */ You're missing an update to the "3 unused bits." comment further down. > @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode { > unsigned int directive_index : 7; /* If is_directive, > then index into directive table. > Otherwise, a NODE_OPERATOR. */ > - unsigned int rid_code : 8; /* Rid code - for front ends. */ > + unsigned int rid_code : 16; /* Rid code - for front ends. */ > unsigned int flags : 9; /* CPP flags. */ > ENUM_BITFIELD(node_type) type : 2; /* CPP node type. */ You're missing an update to the "5 bits spare." comment further down. Do you have any figures for the effects on compilation time or memory usage from the increase in size of these structures?
On Thu, Sep 14, 2023 at 10:54 AM Joseph Myers <joseph@codesourcery.com> wrote: > > On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote: > > > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h > > index 545f0f4d9eb..eed6deaf0f8 100644 > > --- a/gcc/c/c-parser.h > > +++ b/gcc/c/c-parser.h > > @@ -51,14 +51,14 @@ enum c_id_kind { > > /* A single C token after string literal concatenation and conversion > > of preprocessing tokens to tokens. */ > > struct GTY (()) c_token { > > + /* If this token is a keyword, this value indicates which keyword. > > + Otherwise, this value is RID_MAX. */ > > + ENUM_BITFIELD (rid) keyword : 16; > > /* The kind of token. */ > > ENUM_BITFIELD (cpp_ttype) type : 8; > > /* If this token is a CPP_NAME, this value indicates whether also > > declared as some kind of type. Otherwise, it is C_ID_NONE. */ > > ENUM_BITFIELD (c_id_kind) id_kind : 8; > > - /* If this token is a keyword, this value indicates which keyword. > > - Otherwise, this value is RID_MAX. */ > > - ENUM_BITFIELD (rid) keyword : 8; > > /* If this token is a CPP_PRAGMA, this indicates the pragma that > > was seen. Otherwise it is PRAGMA_NONE. */ > > ENUM_BITFIELD (pragma_kind) pragma_kind : 8; > > If you want to optimize layout, I'd expect flags to move so it can share > the same 32-bit unit as the pragma_kind bit-field (not sure if any changes > should be made to the declaration of flags to maximise the chance of such > sharing across different host bit-field ABIs). > Thank you for your review! I did not make this change aggressively, but we can do the following to minimize the fragmentation: struct GTY (()) c_token { tree value; /* pointer, depends, but 4 or 8 bytes as usual */ location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */ ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */ ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */ ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */ ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */ unsigned char flags; /* 1 byte */ } Supposing a pointer size is 8 bytes and int is 4 bytes, the struct size would be 24 bytes. The internal fragmentation would be 0 bytes, and the external fragmentation is 6 bytes since the overall struct alignment requirement is $K_{max} = 8$ from the pointer. Here is the original struct before making keyword 16-bit. The overall struct alignment requirement is $K_{max} = 8$ from the pointer. This struct size would be 24 bytes since the internal fragmentation is 4 bytes (after location), and the external fragmentation is 3 bytes. struct GTY (()) c_token { ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */ ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */ ENUM_BITFIELD (rid) keyword : 8; /* 1 byte */ ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */ location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */ tree value; /* pointer, depends, but 4 or 8 bytes as usual */ unsigned char flags; /* 1 byte */ } If we keep the original order with the 16-bit keyword, the struct size would be 32 bytes (my current implementation as well, I will update this patch). struct GTY (()) c_token { ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */ ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */ ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */ ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */ location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */ tree value; /* pointer, depends, but 4 or 8 bytes as usual */ unsigned char flags; /* 1 byte */ } Likewise, the overall struct alignment requirement is $K_{max} = 8$ from the pointer. The internal fragmentation would be 7 bytes (3 bytes after pragma_kind + 4 bytes after location), and the external fragmentation would be 7 bytes. I think optimizing the size is worth doing unless this breaks GCC. > > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h > > index 6cbb9a8e031..3c3c482c6ce 100644 > > --- a/gcc/cp/parser.h > > +++ b/gcc/cp/parser.h > > @@ -40,11 +40,11 @@ struct GTY(()) tree_check { > > /* A C++ token. */ > > > > struct GTY (()) cp_token { > > - /* The kind of token. */ > > - enum cpp_ttype type : 8; > > /* If this token is a keyword, this value indicates which keyword. > > Otherwise, this value is RID_MAX. */ > > - enum rid keyword : 8; > > + enum rid keyword : 16; > > + /* The kind of token. */ > > + enum cpp_ttype type : 8; > > /* Token flags. */ > > unsigned char flags; > > /* True if this token is from a context where it is implicitly extern "C" */ > > You're missing an update to the "3 unused bits." comment further down. > > > @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode { > > unsigned int directive_index : 7; /* If is_directive, > > then index into directive table. > > Otherwise, a NODE_OPERATOR. */ > > - unsigned int rid_code : 8; /* Rid code - for front ends. */ > > + unsigned int rid_code : 16; /* Rid code - for front ends. */ > > unsigned int flags : 9; /* CPP flags. */ > > ENUM_BITFIELD(node_type) type : 2; /* CPP node type. */ > > You're missing an update to the "5 bits spare." comment further down. > Thank you! > Do you have any figures for the effects on compilation time or memory > usage from the increase in size of these structures? > Regarding only c_token, we will have the same size if we optimize the size. Although I did not calculate the size of other structs, we might not see any significant performance change? I am taking benchmarks and will let you know once it is done. > -- > Joseph S. Myers > joseph@codesourcery.com
On Thu, Sep 14, 2023 at 10:54 AM Joseph Myers <joseph@codesourcery.com> wrote: > > On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote: > > > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h > > index 545f0f4d9eb..eed6deaf0f8 100644 > > --- a/gcc/c/c-parser.h > > +++ b/gcc/c/c-parser.h > > @@ -51,14 +51,14 @@ enum c_id_kind { > > /* A single C token after string literal concatenation and conversion > > of preprocessing tokens to tokens. */ > > struct GTY (()) c_token { > > + /* If this token is a keyword, this value indicates which keyword. > > + Otherwise, this value is RID_MAX. */ > > + ENUM_BITFIELD (rid) keyword : 16; > > /* The kind of token. */ > > ENUM_BITFIELD (cpp_ttype) type : 8; > > /* If this token is a CPP_NAME, this value indicates whether also > > declared as some kind of type. Otherwise, it is C_ID_NONE. */ > > ENUM_BITFIELD (c_id_kind) id_kind : 8; > > - /* If this token is a keyword, this value indicates which keyword. > > - Otherwise, this value is RID_MAX. */ > > - ENUM_BITFIELD (rid) keyword : 8; > > /* If this token is a CPP_PRAGMA, this indicates the pragma that > > was seen. Otherwise it is PRAGMA_NONE. */ > > ENUM_BITFIELD (pragma_kind) pragma_kind : 8; > > If you want to optimize layout, I'd expect flags to move so it can share > the same 32-bit unit as the pragma_kind bit-field (not sure if any changes > should be made to the declaration of flags to maximise the chance of such > sharing across different host bit-field ABIs). > > > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h > > index 6cbb9a8e031..3c3c482c6ce 100644 > > --- a/gcc/cp/parser.h > > +++ b/gcc/cp/parser.h > > @@ -40,11 +40,11 @@ struct GTY(()) tree_check { > > /* A C++ token. */ > > > > struct GTY (()) cp_token { > > - /* The kind of token. */ > > - enum cpp_ttype type : 8; > > /* If this token is a keyword, this value indicates which keyword. > > Otherwise, this value is RID_MAX. */ > > - enum rid keyword : 8; > > + enum rid keyword : 16; > > + /* The kind of token. */ > > + enum cpp_ttype type : 8; > > /* Token flags. */ > > unsigned char flags; > > /* True if this token is from a context where it is implicitly extern "C" */ > > You're missing an update to the "3 unused bits." comment further down. > > > @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode { > > unsigned int directive_index : 7; /* If is_directive, > > then index into directive table. > > Otherwise, a NODE_OPERATOR. */ > > - unsigned int rid_code : 8; /* Rid code - for front ends. */ > > + unsigned int rid_code : 16; /* Rid code - for front ends. */ > > unsigned int flags : 9; /* CPP flags. */ > > ENUM_BITFIELD(node_type) type : 2; /* CPP node type. */ > > You're missing an update to the "5 bits spare." comment further down. > > Do you have any figures for the effects on compilation time or memory > usage from the increase in size of these structures? > Here is the benchmark result: https://github.com/ken-matsui/gsoc23/tree/main/reports/gcc-build I can see regression for compilation time but not for memory. * Time: +0.950995% * Memory: No difference proven at 95.0% confidence Sincerely, Ken Matsui > -- > Joseph S. Myers > joseph@codesourcery.com
diff --git a/gcc/c-family/c-indentation.h b/gcc/c-family/c-indentation.h index c0e07bf49f1..1ce8753813a 100644 --- a/gcc/c-family/c-indentation.h +++ b/gcc/c-family/c-indentation.h @@ -25,8 +25,8 @@ along with GCC; see the file COPYING3. If not see struct token_indent_info { location_t location; + ENUM_BITFIELD (rid) keyword : 16; ENUM_BITFIELD (cpp_ttype) type : 8; - ENUM_BITFIELD (rid) keyword : 8; }; /* Extract token information from TOKEN, which ought to either be a diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index b9a1b75ca43..2086f253923 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -115,9 +115,9 @@ c_parse_init (void) tree id; int mask = 0; - /* Make sure RID_MAX hasn't grown past the 8 bits used to hold the keyword in - the c_token structure. */ - gcc_assert (RID_MAX <= 255); + /* Make sure RID_MAX hasn't grown past the 16 bits used to hold the keyword + in the c_token structure. */ + gcc_assert (RID_MAX <= 65535); mask |= D_CXXONLY; if (!flag_isoc99) diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h index 545f0f4d9eb..eed6deaf0f8 100644 --- a/gcc/c/c-parser.h +++ b/gcc/c/c-parser.h @@ -51,14 +51,14 @@ enum c_id_kind { /* A single C token after string literal concatenation and conversion of preprocessing tokens to tokens. */ struct GTY (()) c_token { + /* If this token is a keyword, this value indicates which keyword. + Otherwise, this value is RID_MAX. */ + ENUM_BITFIELD (rid) keyword : 16; /* The kind of token. */ ENUM_BITFIELD (cpp_ttype) type : 8; /* If this token is a CPP_NAME, this value indicates whether also declared as some kind of type. Otherwise, it is C_ID_NONE. */ ENUM_BITFIELD (c_id_kind) id_kind : 8; - /* If this token is a keyword, this value indicates which keyword. - Otherwise, this value is RID_MAX. */ - ENUM_BITFIELD (rid) keyword : 8; /* If this token is a CPP_PRAGMA, this indicates the pragma that was seen. Otherwise it is PRAGMA_NONE. */ ENUM_BITFIELD (pragma_kind) pragma_kind : 8; diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h index 6cbb9a8e031..3c3c482c6ce 100644 --- a/gcc/cp/parser.h +++ b/gcc/cp/parser.h @@ -40,11 +40,11 @@ struct GTY(()) tree_check { /* A C++ token. */ struct GTY (()) cp_token { - /* The kind of token. */ - enum cpp_ttype type : 8; /* If this token is a keyword, this value indicates which keyword. Otherwise, this value is RID_MAX. */ - enum rid keyword : 8; + enum rid keyword : 16; + /* The kind of token. */ + enum cpp_ttype type : 8; /* Token flags. */ unsigned char flags; /* True if this token is from a context where it is implicitly extern "C" */ @@ -101,8 +101,8 @@ struct GTY (()) cp_lexer { vec<cp_token_position> GTY ((skip)) saved_tokens; /* Saved pieces of end token we replaced with the eof token. */ + enum rid saved_keyword : 16; enum cpp_ttype saved_type : 8; - enum rid saved_keyword : 8; /* The next lexer in a linked list of lexers. */ struct cp_lexer *next; diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index fcdaf082b09..b93899cd364 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode { unsigned int directive_index : 7; /* If is_directive, then index into directive table. Otherwise, a NODE_OPERATOR. */ - unsigned int rid_code : 8; /* Rid code - for front ends. */ + unsigned int rid_code : 16; /* Rid code - for front ends. */ unsigned int flags : 9; /* CPP flags. */ ENUM_BITFIELD(node_type) type : 2; /* CPP node type. */
Now that RID_MAX has reached 255, we need to update the bit sizes of every use of the enum rid from 8 to 16 to support more keywords. gcc/c-family/ChangeLog: * c-indentation.h (struct token_indent_info): Make keyword 16 bits and move this upward to minimize memory fragmentation. gcc/c/ChangeLog: * c-parser.cc (c_parse_init): Handle RID_MAX not to exceed the max value of 16 bits. * c-parser.h (struct c_token): Make keyword 16 bits and move this upward to minimize memory fragmentation. gcc/cp/ChangeLog: * parser.h (struct cp_token): Make keyword 16 bits and move this upward to minimize memory fragmentation. (struct cp_lexer): Likewise, for saved_keyword. libcpp/ChangeLog: * include/cpplib.h (struct cpp_hashnode): Make rid_code 16 bits. Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org> --- gcc/c-family/c-indentation.h | 2 +- gcc/c/c-parser.cc | 6 +++--- gcc/c/c-parser.h | 6 +++--- gcc/cp/parser.h | 8 ++++---- libcpp/include/cpplib.h | 2 +- 5 files changed, 12 insertions(+), 12 deletions(-)