Message ID | 20241103222220.933471-1-lhyatt@gmail.com |
---|---|
Headers | show |
Series | Support for 64-bit location_t | expand |
On Sun, Nov 03, 2024 at 05:22:05PM -0500, Lewis Hyatt wrote: > Hello- > > There is no shortage of PRs complaining about things that go wrong when the > line_maps data structure in libcpp starts to run into its limits. Being > restricted to a 32-bit location_t to cover all source locations means it has > the following limitations, among others: > > -Column numbers larger than 2^12 are not tracked at all, so diagnostics > can only refer to the whole line, and things like -Wmisleading-indentation > stop working with large column numbers. [PR89549] > > -Tokens longer than 32 characters cannot be stored compactly in the line > map and have to be stored as ad-hoc locations with the associated > memory and locality overhead. > > -Sufficiently large sources can exhaust the available location_t space. > > The current limitations are not encountered too much, but it does happen > often enough, especially in testsuites or large PCH compiles that include, > e.g., every header in a large project. > > Currently, the main workaround is to use the option -flarge-source-locations. > This sets the range bits in the line_maps to 0, effectively meaning that > every source code token is stored as an ad-hoc location. This does > significantly increase the number of locations that can be stored, but it > loses all the benefits of the line_maps infrastructure, and it does not help > with the 4096 limit for column numbers. > > It seemed to me like there's no way to improve on that other than letting > location_t be 64 bits instead of 32 bits. I thought I would take a look at > what that would entail... the attached 15 (mostly small) patches are the > result. It was unfortunately necessary to touch a lot of different parts of > the compiler, since location_t is used pretty pervasively. This patch set > adds a configure option --enable-large-source-locations that makes > location_t be 64 bits. The option is off by default. > > There are decisions to be made about what to do with all those new bits. I > made some choices in this patch, but it's easy to change if different > tradeoffs are preferable. Here is what I did: > > - Only 63 bits are used, the high bit is not used so that two location_t > can be safely subtracted and stored in an int64_t. > > - As in the 32-bit case, half of the available locations are reserved > for ad-hoc locations, and the other half for macros and ordinary > locations. > > - The limit on the maximum column number is removed, so it's now set to > 2^31-1. (GCC right now pervasively assumes line and column numbers > within a file can be represented by a 32-bit signed int. That could be > changed too, but it is not part of this patch and would be another > similarly large change.) > > - The default range bits is changed from 5 to 7. This means that > tokens up to 128 chars in length get ordinary locations, while longer > ones get ad-hoc locations. I think that's a big improvement from the > current max of 32 and will result in a lot fewer ad-hoc locations > being generated. There are probably enough bits to go around, though, > that this could be made even larger too. > > All that stuff is configured in line-map.h and is easy to change. The bulk > of the patch is dealing with the fallout from making location_t's type > configurable; in particular, C++ modules needed a lot of tweaks, since they > deal with line_map internals, and I had to add a new field to RTL which > necessitated a lot of small changes. I was not previously familiar with > RTL... I think I generally grokked the idea there and I think the changes I > made are safe, but I am pretty sure some of them are unnecessary so would > appreciate feedback on that too. > > I tried to test on a variety of platforms. For each of them, I did three > bootstrap + regtest (without the patch, with the patch and without > --enable-large-source-locations, and with the patch and with > --enable-large-source-locations) with --enable-checking=yes,extra,rtl and > verified no change in the testsuite other than the expected new PASSes. Here > is what I have tested so far: > > all languages: x86_64, aarch64 (cfarm185), ppc64le (cfarm135) > c/c++ only: sparc 32-bit (cfarm216) > > The testing on sparc 32-bit was very productive and in particular identified > a few issues with padding and alignment that have not been apparent until > now. That platform differs from x86 32-bit in that uint64_t has a larger > alignment than a pointer does. > > Also it's worth noting, even if there is no interest in supporting this > feature, patches 2, 3, 4, 5, and 6 I think are worth accepting regardless; > these are small bug fixes that came up in testing and are independent of > whether location_t is changed or not... they could be triggered by other > changes in the future too. > > Regarding memory usage implications of this change. It does make a lot of > data structures larger. Here is what it does on x86-64. The base sizes of some > types change as follows: > > location_t: From 4 to 8 (+100%) > line_map_ordinary: From 24 to 32 (+33%) > line_map_macro: From 32 to 40 (+25%) > cpp_token: From 24 to 32 (+33%) > cpp_macro: From 48 to 56 (+17%) > tree_exp: From 32 to 40 (+25%) > gimple: From 40 to 48 (+20%) > > The compiled size of stdc++.h.gch increases from 210MB to 219MB (+4%). > > Compiling testsuite/g++.dg/modules/xtreme-header.h into a module, I see: > > size of the output module: From 21.9MB to 21.6MB (-1%) > peak memory usage: From 211MB to 235MB (+11%) > > The reason the module gets smaller is that it streams ordinary locations > more efficiently than adhoc locations, and with the 64-bit location_t, the > number of adhoc locations was reduced from 189,891 to 101,571. > > I'm curious to hear your thoughts on this one please. I think it could be > nice potentially to get it into GCC 15, and consider changing the default in > GCC 16 or later, if you think it's worth the trouble. If it seems too > invasive a change and not worth it, I'll certainly understand, and I have > enjoyed the tour through the compiler internals in any case. > > FWIW I feel like this issue will need to be dealt with some time or other. I > think users generally expect not to run into arbitrary limits, and compiling > a lot of headers together into one module is going to become possibly more > common as modules become available by default. I think any attempt to deal > with it will have to share at least a lot of the features of this one. > Thanks for this! I don't have any comments on the patch series itself, but thought I'd just concur about modules likely going to make the 32-bit location_t constraint more noticeable; I've found that with including headers in module interfaces it really doesn't take very long to run out of locations for imported module units. Some of that could possibly be alleviated by even more aggressive location remapping on stream-in/stream-out (especially for mergeable declarations), and if common build systems like CMake start supporting header units that can also help workaround the issue for a time, but in general as larger projects mixing modules and #includes get developed I believe this will continue to be a problem. Nathaniel > -Lewis > > 01/15: Support for 64-bit location_t: libcpp parts > 02/15: libcpp: Fix potential unaligned access in cpp_buffer > 03/15: tree-cfg: Fix call to next_discriminator_for_locus() > 04/15: tree-phinodes: Use 4 instead of 2 as the minimum number of phi args > 05/15: c++: Fix tree_contains_struct for TRAIT_EXPR > 06/15: gimple: Handle tail padding when computing gimple_ops_offset > 07/15: Support for 64-bit location_t: toplev parts > 08/15: Support for 64-bit location_t: Analyzer parts > 09/15: Support for 64-bit location_t: Frontend parts > 10/15: Support for 64-bit location_t: C++ modules parts > 11/15: Support for 64-bit location_t: RTL parts > 12/15: Support for 64-bit location_t: Backend parts > 13/15: Support for 64-bit location_t: Internal parts > 14/15: Support for 64-bit location_t: Testsuite parts > 15/15: Support for 64-bit location_t: Configury parts > > 64 files changed, 841 insertions(+), 386 deletions(-)
On Sun, Nov 3, 2024 at 11:23 PM Lewis Hyatt <lhyatt@gmail.com> wrote: > > Hello- > > There is no shortage of PRs complaining about things that go wrong when the > line_maps data structure in libcpp starts to run into its limits. Being > restricted to a 32-bit location_t to cover all source locations means it has > the following limitations, among others: > > -Column numbers larger than 2^12 are not tracked at all, so diagnostics > can only refer to the whole line, and things like -Wmisleading-indentation > stop working with large column numbers. [PR89549] > > -Tokens longer than 32 characters cannot be stored compactly in the line > map and have to be stored as ad-hoc locations with the associated > memory and locality overhead. > > -Sufficiently large sources can exhaust the available location_t space. > > The current limitations are not encountered too much, but it does happen > often enough, especially in testsuites or large PCH compiles that include, > e.g., every header in a large project. > > Currently, the main workaround is to use the option -flarge-source-locations. > This sets the range bits in the line_maps to 0, effectively meaning that > every source code token is stored as an ad-hoc location. This does > significantly increase the number of locations that can be stored, but it > loses all the benefits of the line_maps infrastructure, and it does not help > with the 4096 limit for column numbers. > > It seemed to me like there's no way to improve on that other than letting > location_t be 64 bits instead of 32 bits. I thought I would take a look at > what that would entail... the attached 15 (mostly small) patches are the > result. It was unfortunately necessary to touch a lot of different parts of > the compiler, since location_t is used pretty pervasively. This patch set > adds a configure option --enable-large-source-locations that makes > location_t be 64 bits. The option is off by default. > > There are decisions to be made about what to do with all those new bits. I > made some choices in this patch, but it's easy to change if different > tradeoffs are preferable. Here is what I did: > > - Only 63 bits are used, the high bit is not used so that two location_t > can be safely subtracted and stored in an int64_t. > > - As in the 32-bit case, half of the available locations are reserved > for ad-hoc locations, and the other half for macros and ordinary > locations. > > - The limit on the maximum column number is removed, so it's now set to > 2^31-1. (GCC right now pervasively assumes line and column numbers > within a file can be represented by a 32-bit signed int. That could be > changed too, but it is not part of this patch and would be another > similarly large change.) > > - The default range bits is changed from 5 to 7. This means that > tokens up to 128 chars in length get ordinary locations, while longer > ones get ad-hoc locations. I think that's a big improvement from the > current max of 32 and will result in a lot fewer ad-hoc locations > being generated. There are probably enough bits to go around, though, > that this could be made even larger too. > > All that stuff is configured in line-map.h and is easy to change. The bulk > of the patch is dealing with the fallout from making location_t's type > configurable; in particular, C++ modules needed a lot of tweaks, since they > deal with line_map internals, and I had to add a new field to RTL which > necessitated a lot of small changes. I was not previously familiar with > RTL... I think I generally grokked the idea there and I think the changes I > made are safe, but I am pretty sure some of them are unnecessary so would > appreciate feedback on that too. > > I tried to test on a variety of platforms. For each of them, I did three > bootstrap + regtest (without the patch, with the patch and without > --enable-large-source-locations, and with the patch and with > --enable-large-source-locations) with --enable-checking=yes,extra,rtl and > verified no change in the testsuite other than the expected new PASSes. Here > is what I have tested so far: > > all languages: x86_64, aarch64 (cfarm185), ppc64le (cfarm135) > c/c++ only: sparc 32-bit (cfarm216) > > The testing on sparc 32-bit was very productive and in particular identified > a few issues with padding and alignment that have not been apparent until > now. That platform differs from x86 32-bit in that uint64_t has a larger > alignment than a pointer does. > > Also it's worth noting, even if there is no interest in supporting this > feature, patches 2, 3, 4, 5, and 6 I think are worth accepting regardless; > these are small bug fixes that came up in testing and are independent of > whether location_t is changed or not... they could be triggered by other > changes in the future too. > > Regarding memory usage implications of this change. It does make a lot of > data structures larger. Here is what it does on x86-64. The base sizes of some > types change as follows: > > location_t: From 4 to 8 (+100%) > line_map_ordinary: From 24 to 32 (+33%) > line_map_macro: From 32 to 40 (+25%) > cpp_token: From 24 to 32 (+33%) > cpp_macro: From 48 to 56 (+17%) > tree_exp: From 32 to 40 (+25%) > gimple: From 40 to 48 (+20%) All GENERIC decls (tree_decl_minimal) will grow by 8 bytes (and we have 4 bytes unused padding). On the GIMPLE side it's probably a "hooray - more space for flags!". It does feel a bit backwards because the savings we got by making the TREE_BLOCK part of the location. I do think that there's no good way around 64bit location_t though, so thanks for tackling this. What I'm less sure is whether we really want to make this configurable - I think we're past the point of considering a 32bit host to be a platform to compile arbitrary large sources. So - thanks for proposing this. I'll propose to make the change unconditionally though - it's quite difficult to optimize structure layout when bits in it are of unknown size and alignment. Richard. > The compiled size of stdc++.h.gch increases from 210MB to 219MB (+4%). > > Compiling testsuite/g++.dg/modules/xtreme-header.h into a module, I see: > > size of the output module: From 21.9MB to 21.6MB (-1%) > peak memory usage: From 211MB to 235MB (+11%) > > The reason the module gets smaller is that it streams ordinary locations > more efficiently than adhoc locations, and with the 64-bit location_t, the > number of adhoc locations was reduced from 189,891 to 101,571. > > I'm curious to hear your thoughts on this one please. I think it could be > nice potentially to get it into GCC 15, and consider changing the default in > GCC 16 or later, if you think it's worth the trouble. If it seems too > invasive a change and not worth it, I'll certainly understand, and I have > enjoyed the tour through the compiler internals in any case. > > FWIW I feel like this issue will need to be dealt with some time or other. I > think users generally expect not to run into arbitrary limits, and compiling > a lot of headers together into one module is going to become possibly more > common as modules become available by default. I think any attempt to deal > with it will have to share at least a lot of the features of this one. > > -Lewis > > 01/15: Support for 64-bit location_t: libcpp parts > 02/15: libcpp: Fix potential unaligned access in cpp_buffer > 03/15: tree-cfg: Fix call to next_discriminator_for_locus() > 04/15: tree-phinodes: Use 4 instead of 2 as the minimum number of phi args > 05/15: c++: Fix tree_contains_struct for TRAIT_EXPR > 06/15: gimple: Handle tail padding when computing gimple_ops_offset > 07/15: Support for 64-bit location_t: toplev parts > 08/15: Support for 64-bit location_t: Analyzer parts > 09/15: Support for 64-bit location_t: Frontend parts > 10/15: Support for 64-bit location_t: C++ modules parts > 11/15: Support for 64-bit location_t: RTL parts > 12/15: Support for 64-bit location_t: Backend parts > 13/15: Support for 64-bit location_t: Internal parts > 14/15: Support for 64-bit location_t: Testsuite parts > 15/15: Support for 64-bit location_t: Configury parts > > 64 files changed, 841 insertions(+), 386 deletions(-)
On 11/4/24 2:58 AM, Richard Biener wrote: > > All GENERIC decls (tree_decl_minimal) will grow by 8 bytes (and we have 4 bytes > unused padding). On the GIMPLE side it's probably a "hooray - more space for > flags!". It does feel a bit backwards because the savings we got by making > the TREE_BLOCK part of the location. I'm sure we'll use up that padding, probably quicker than any of us would estimate. The whole point of the current scheme was to try and avoid the growth in that data structure and maybe that's inevitable at this point. So I'm not going to object. > tackling this. What I'm less sure is whether we really want to make this > configurable - I think we're past the point of considering a 32bit host to be > a platform to compile arbitrary large sources. I wouldn't let 32bit hosts drive any significant decision; so I wouldn't make it configurable. Those platforms may suffer a bit, but they're less relevant every day IMHO. > > So - thanks for proposing this. I'll propose to make the change unconditionally > though - it's quite difficult to optimize structure layout when bits in it are > of unknown size and alignment. +1 from me. jeff
On Mon, Nov 04, 2024 at 10:58:14AM +0100, Richard Biener wrote: > > Regarding memory usage implications of this change. It does make a lot of > > data structures larger. Here is what it does on x86-64. The base sizes of some > > types change as follows: > > > > location_t: From 4 to 8 (+100%) > > line_map_ordinary: From 24 to 32 (+33%) > > line_map_macro: From 32 to 40 (+25%) > > cpp_token: From 24 to 32 (+33%) > > cpp_macro: From 48 to 56 (+17%) > > tree_exp: From 32 to 40 (+25%) > > gimple: From 40 to 48 (+20%) > > All GENERIC decls (tree_decl_minimal) will grow by 8 bytes (and we have 4 bytes > unused padding). On the GIMPLE side it's probably a "hooray - more space for > flags!". It does feel a bit backwards because the savings we got by making > the TREE_BLOCK part of the location. I think we need to carefully look at each of the structs with location_t that is used widely. E.g. for tree_exp, I think we just had a padding there up to and including GCC 13, and in GCC 14+, while EXPR_COND_UID is computed unconditionally, it is ignored unless -fcondition-coverage which is not enabled by default and I think it isn't commonly used and is used solely on COND_EXPRs. Furthermore, for gimple we don't waste precious space for it and use on the side hash map, so why not use similar hash-map for COND_EXPRs during gimplification too (and only for -fcondition-coverage)? That would keep tree_exp at the same size as before. Other than that, guess gimple is the most important, especially for LTO. Except for GIMPLE_CALL/ASM/SWITCH, most GIMPLE_* codes have very low num_ops, so perhaps having 32-bit num_ops is a waste. Though we'd need 3 bits to represent those, or at least 2 if GIMPLE_COND has it somewhere else. cpp_token hurts too, at least for C++, though I'm not sure what to do that. While most of the union members are just 32 or 64-bit, there are two with 32-bit and 64-bit data (that could be in theory handled by having the 32-bit stuff in some flags field and union just the 64-bit) but then for identifiers we want 2 pointers. Jakub
On Mon, Nov 4, 2024 at 11:19 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Nov 04, 2024 at 10:58:14AM +0100, Richard Biener wrote: > > > Regarding memory usage implications of this change. It does make a lot of > > > data structures larger. Here is what it does on x86-64. The base sizes of some > > > types change as follows: > > > > > > location_t: From 4 to 8 (+100%) > > > line_map_ordinary: From 24 to 32 (+33%) > > > line_map_macro: From 32 to 40 (+25%) > > > cpp_token: From 24 to 32 (+33%) > > > cpp_macro: From 48 to 56 (+17%) > > > tree_exp: From 32 to 40 (+25%) > > > gimple: From 40 to 48 (+20%) > > > > All GENERIC decls (tree_decl_minimal) will grow by 8 bytes (and we have 4 bytes > > unused padding). On the GIMPLE side it's probably a "hooray - more space for > > flags!". It does feel a bit backwards because the savings we got by making > > the TREE_BLOCK part of the location. > > I think we need to carefully look at each of the structs with location_t > that is used widely. > E.g. for tree_exp, I think we just had a padding there up to and including > GCC 13, and in GCC 14+, while EXPR_COND_UID is computed unconditionally, it > is ignored unless -fcondition-coverage which is not enabled by default and > I think it isn't commonly used and is used solely on COND_EXPRs. > Furthermore, for gimple we don't waste precious space for it and use on the > side hash map, so why not use similar hash-map for COND_EXPRs during > gimplification too (and only for -fcondition-coverage)? > That would keep tree_exp at the same size as before. Good idea. > Other than that, guess gimple is the most important, especially for LTO. > Except for GIMPLE_CALL/ASM/SWITCH, most GIMPLE_* codes have very low num_ops, > so perhaps having 32-bit num_ops is a waste. All non-GIMPLE_ASSIGN_SINGLE have the possibility of much saving: - as you say num_ops is either 2, 3 or 4 - they are unnecessarily using gimple_statement_with_memory_ops_base, no vuse/def on non-SINGLE For GIMPLE_ASSIGN_SINGLE num_ops is 2. So - ICK - the "best" way would be to split gassign into gsingle_assign (gload, gstore, gcopy, glegacy?) and goperation. Currently gassign has gimple_statement_with_memory_ops as base, I'm not sure we can keep gassign as a base of the two new classes and thus make it have gimple_statement_with_ops_base as base. A "cheap" way would be to scrap _with_memory_ops and put vuse/vdef in two trailing ops[] positions at ops[num_ops] and ops[num_ops+1], making gimple_vuse and friends a bit more expensive and complicated to verify (we have to check for GIMPLE_ASSIGN_SINGLE and always allocate for those). Generally num_ops has to be able to be large (for calls and switch), and given num_ops is currently in the base and also accessed with just gimple * it's difficult to move it down the class hierarchy without using virtual functions to access it or make the access expensive (switch on gimple_code and up-cast). That said, for gassign num_ops is redundant information, and in general 32bits is overkill, 16bits should be OK - salvaging 'subcode' might be possible as well by moving it where needed. > Though we'd need 3 bits to represent those, or at least 2 if GIMPLE_COND > has it somewhere else. GIMPLE_COND always has two ops, num_ops is redundant. I think there's the possibility to get back the memory on the GIMPLE side but I wouldn't make this a requirement for this patch. > cpp_token hurts too, at least for C++, though I'm not sure what to do that. > While most of the union members are just 32 or 64-bit, there are two with > 32-bit and 64-bit data (that could be in theory handled by having the 32-bit > stuff in some flags field and union just the 64-bit) but then for identifiers > we want 2 pointers. We could dynamically allocate cpp_token and run-length encode location_t, making it effectively less than 64bit. For cpp_token we'd have to split it into a low 32bit at the existing location and optionally allocated upper 32bits. But I'm honestly not sure this is worth the trouble - it would also restrict how we allocate the bits of location_t. We might want to dynamically allocate cpp_token in general given the current 24 bytes are only for the largest union members - but I have no statistics on the token distribution and no idea if we use an array of cpp_token somewhere which would rule this out. Richard. > Jakub >
On Tue, Nov 05, 2024 at 09:32:23AM +0100, Richard Biener wrote: > I think there's the possibility to get back the memory on the GIMPLE > side but I wouldn't make > this a requirement for this patch. Sure. I'll I'm saying is that we should make some effort to avoid the growth, not just even without any movements accept we have tons of new bits for flags, let's use them. > We could dynamically allocate cpp_token and run-length encode location_t, > making it effectively less than 64bit. For cpp_token we'd have to split > it into a low 32bit at the existing location and optionally allocated upper > 32bits. But I'm honestly not sure this is worth the trouble - it would also > restrict how we allocate the bits of location_t. We might want to > dynamically allocate cpp_token in general given the current 24 bytes > are only for the largest union members - but I have no statistics on the > token distribution and no idea if we use an array of cpp_token somewhere > which would rule this out. Actually, I think cpp_token isn't that big deal, that should be short-lived unless using huge macros. cp_token in the C++ FE is more important, the FE uses a vector of those and there is one cp_token per token read from libcpp. Unfortunately, I'm afraid there is nothing that can be done there, the struct has currently 29 bits of various flags, then 32 bit location_t and then union with a single pointer in it, so nicely 16 bytes. Now it will be 24 bytes, with 35 spare bits for flags. And the vector is live across the whole parsing (pointer to it cleared at the end of parsing, so GC collect can use it). Jakub
On Tue, Nov 5, 2024 at 9:59 AM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 05, 2024 at 09:32:23AM +0100, Richard Biener wrote: > > I think there's the possibility to get back the memory on the GIMPLE > > side but I wouldn't make > > this a requirement for this patch. > > Sure. I'll I'm saying is that we should make some effort to avoid the > growth, not just even without any movements accept we have tons of new bits > for flags, let's use them. True. I'm looking how difficult it is to optimize the vuse/vdef thing now. > > We could dynamically allocate cpp_token and run-length encode location_t, > > making it effectively less than 64bit. For cpp_token we'd have to split > > it into a low 32bit at the existing location and optionally allocated upper > > 32bits. But I'm honestly not sure this is worth the trouble - it would also > > restrict how we allocate the bits of location_t. We might want to > > dynamically allocate cpp_token in general given the current 24 bytes > > are only for the largest union members - but I have no statistics on the > > token distribution and no idea if we use an array of cpp_token somewhere > > which would rule this out. > > Actually, I think cpp_token isn't that big deal, that should be short-lived > unless using huge macros. > cp_token in the C++ FE is more important, the FE uses a vector of those > and there is one cp_token per token read from libcpp. > Unfortunately, I'm afraid there is nothing that can be done there, > the struct has currently 29 bits of various flags, then 32 bit location_t > and then union with a single pointer in it, so nicely 16 bytes. > Now it will be 24 bytes, with 35 spare bits for flags. > And the vector is live across the whole parsing (pointer to it cleared at > the end of parsing, so GC collect can use it). So cp_token[] could be split into two arrays to avoid the 32bit padding with the enlarged location_t. Maybe that's even more cache efficient if one 32bit field is often the only accessed one when sweeping over a chain of tokens. Richard. > > Jakub >
On Tue, Nov 05, 2024 at 10:42:10AM +0100, Richard Biener wrote: > > Actually, I think cpp_token isn't that big deal, that should be short-lived > > unless using huge macros. > > cp_token in the C++ FE is more important, the FE uses a vector of those > > and there is one cp_token per token read from libcpp. > > Unfortunately, I'm afraid there is nothing that can be done there, > > the struct has currently 29 bits of various flags, then 32 bit location_t > > and then union with a single pointer in it, so nicely 16 bytes. > > Now it will be 24 bytes, with 35 spare bits for flags. > > And the vector is live across the whole parsing (pointer to it cleared at > > the end of parsing, so GC collect can use it). > > So cp_token[] could be split into two arrays to avoid the 32bit padding with > the enlarged location_t. Maybe that's even more cache efficient if one > 32bit field is often the only accessed one when sweeping over a chain > of tokens. Not without rewriting the whole parser, cp_token is the basic structure it uses everywhere, most of the code doesn't really care whether the tokens are in a vector or where, they just peek a token or peek 2nd token or similarly. And having to rewrite all cp_token -> accesses into some inline function calls or macros that would for e.g. the 32-bit pointer to cp_token_flags look up the corresponding location/u.value/u.tree_check_value would be a nightmare, there are about 950 such cases. What is worse, not all cp_tokens live in the parser->lexer->buffer vector, some of them live in automatic variables. Jakub