[00/15] Support for 64-bit location_t

Message ID	20241103222220.933471-1-lhyatt@gmail.com
Headers	show Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 662953858D33 From: Lewis Hyatt <lhyatt@gmail.com> To: gcc-patches@gcc.gnu.org Cc: Lewis Hyatt <lhyatt@gmail.com> Subject: [PATCH 00/15] Support for 64-bit location_t Date: Sun, 3 Nov 2024 17:22:05 -0500 Message-Id: <20241103222220.933471-1-lhyatt@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	Support for 64-bit location_t \| expand [00/15] Support for 64-bit location_t [01/15] Support for 64-bit location_t: libcpp parts [02/15] libcpp: Fix potential unaligned access in cpp_buffer [03/15] tree-cfg: Fix call to next_discriminator_for_locus() [04/15] tree-phinodes: Use 4 instead of 2 as the minimum number of phi args [05/15] c++: Fix tree_contains_struct for TRAIT_EXPR [06/15] gimple: Handle tail padding when computing gimple_ops_offset [07/15] Support for 64-bit location_t: toplev parts [08/15] Support for 64-bit location_t: Analyzer parts [09/15] Support for 64-bit location_t: Frontend parts [10/15] Support for 64-bit location_t: C++ modules parts [11/15] Support for 64-bit location_t: RTL parts [12/15] Support for 64-bit location_t: Backend parts [13/15] Support for 64-bit location_t: Internal parts [14/15] Support for 64-bit location_t: Testsuite parts [15/15] Support for 64-bit location_t: Configury parts

Lewis Hyatt Nov. 3, 2024, 10:22 p.m. UTC

Hello-

There is no shortage of PRs complaining about things that go wrong when the
line_maps data structure in libcpp starts to run into its limits. Being
restricted to a 32-bit location_t to cover all source locations means it has
the following limitations, among others:

    -Column numbers larger than 2^12 are not tracked at all, so diagnostics
     can only refer to the whole line, and things like -Wmisleading-indentation
     stop working with large column numbers. [PR89549]

    -Tokens longer than 32 characters cannot be stored compactly in the line
     map and have to be stored as ad-hoc locations with the associated
     memory and locality overhead.

    -Sufficiently large sources can exhaust the available location_t space.

The current limitations are not encountered too much, but it does happen
often enough, especially in testsuites or large PCH compiles that include,
e.g., every header in a large project.

Currently, the main workaround is to use the option -flarge-source-locations.
This sets the range bits in the line_maps to 0, effectively meaning that
every source code token is stored as an ad-hoc location. This does
significantly increase the number of locations that can be stored, but it
loses all the benefits of the line_maps infrastructure, and it does not help
with the 4096 limit for column numbers.

It seemed to me like there's no way to improve on that other than letting
location_t be 64 bits instead of 32 bits. I thought I would take a look at
what that would entail... the attached 15 (mostly small) patches are the
result. It was unfortunately necessary to touch a lot of different parts of
the compiler, since location_t is used pretty pervasively. This patch set
adds a configure option --enable-large-source-locations that makes
location_t be 64 bits. The option is off by default.

There are decisions to be made about what to do with all those new bits. I
made some choices in this patch, but it's easy to change if different
tradeoffs are preferable. Here is what I did:

    - Only 63 bits are used, the high bit is not used so that two location_t
      can be safely subtracted and stored in an int64_t.

    - As in the 32-bit case, half of the available locations are reserved
      for ad-hoc locations, and the other half for macros and ordinary
      locations.

    - The limit on the maximum column number is removed, so it's now set to
      2^31-1. (GCC right now pervasively assumes line and column numbers
      within a file can be represented by a 32-bit signed int. That could be
      changed too, but it is not part of this patch and would be another
      similarly large change.)

    - The default range bits is changed from 5 to 7. This means that
      tokens up to 128 chars in length get ordinary locations, while longer
      ones get ad-hoc locations. I think that's a big improvement from the
      current max of 32 and will result in a lot fewer ad-hoc locations
      being generated. There are probably enough bits to go around, though,
      that this could be made even larger too.

All that stuff is configured in line-map.h and is easy to change. The bulk
of the patch is dealing with the fallout from making location_t's type
configurable; in particular, C++ modules needed a lot of tweaks, since they
deal with line_map internals, and I had to add a new field to RTL which
necessitated a lot of small changes. I was not previously familiar with
RTL... I think I generally grokked the idea there and I think the changes I
made are safe, but I am pretty sure some of them are unnecessary so would
appreciate feedback on that too.

I tried to test on a variety of platforms. For each of them, I did three
bootstrap + regtest (without the patch, with the patch and without
--enable-large-source-locations, and with the patch and with
--enable-large-source-locations) with --enable-checking=yes,extra,rtl and
verified no change in the testsuite other than the expected new PASSes. Here
is what I have tested so far:

all languages: x86_64, aarch64 (cfarm185), ppc64le (cfarm135)
c/c++ only: sparc 32-bit (cfarm216)

The testing on sparc 32-bit was very productive and in particular identified
a few issues with padding and alignment that have not been apparent until
now. That platform differs from x86 32-bit in that uint64_t has a larger
alignment than a pointer does.

Also it's worth noting, even if there is no interest in supporting this
feature, patches 2, 3, 4, 5, and 6 I think are worth accepting regardless;
these are small bug fixes that came up in testing and are independent of
whether location_t is changed or not... they could be triggered by other
changes in the future too.

Regarding memory usage implications of this change. It does make a lot of
data structures larger. Here is what it does on x86-64. The base sizes of some
types change as follows:

location_t:              From 4 to 8 (+100%)
line_map_ordinary:       From 24 to 32 (+33%)
line_map_macro:          From 32 to 40 (+25%)
cpp_token:               From 24 to 32 (+33%)
cpp_macro:               From 48 to 56 (+17%)
tree_exp:                From 32 to 40 (+25%)
gimple:                  From 40 to 48 (+20%)

The compiled size of stdc++.h.gch increases from 210MB to 219MB (+4%).

Compiling testsuite/g++.dg/modules/xtreme-header.h into a module, I see:

size of the output module: From 21.9MB to 21.6MB (-1%)
peak memory usage: From 211MB to 235MB (+11%)

The reason the module gets smaller is that it streams ordinary locations
more efficiently than adhoc locations, and with the 64-bit location_t, the
number of adhoc locations was reduced from 189,891 to 101,571.

I'm curious to hear your thoughts on this one please. I think it could be
nice potentially to get it into GCC 15, and consider changing the default in
GCC 16 or later, if you think it's worth the trouble. If it seems too
invasive a change and not worth it, I'll certainly understand, and I have
enjoyed the tour through the compiler internals in any case.

FWIW I feel like this issue will need to be dealt with some time or other. I
think users generally expect not to run into arbitrary limits, and compiling
a lot of headers together into one module is going to become possibly more
common as modules become available by default. I think any attempt to deal
with it will have to share at least a lot of the features of this one.

-Lewis

01/15: Support for 64-bit location_t: libcpp parts
02/15: libcpp: Fix potential unaligned access in cpp_buffer
03/15: tree-cfg: Fix call to next_discriminator_for_locus()
04/15: tree-phinodes: Use 4 instead of 2 as the minimum number of phi args
05/15: c++: Fix tree_contains_struct for TRAIT_EXPR
06/15: gimple: Handle tail padding when computing gimple_ops_offset
07/15: Support for 64-bit location_t: toplev parts
08/15: Support for 64-bit location_t: Analyzer parts
09/15: Support for 64-bit location_t: Frontend parts
10/15: Support for 64-bit location_t: C++ modules parts
11/15: Support for 64-bit location_t: RTL parts
12/15: Support for 64-bit location_t: Backend parts
13/15: Support for 64-bit location_t: Internal parts
14/15: Support for 64-bit location_t: Testsuite parts
15/15: Support for 64-bit location_t: Configury parts

64 files changed, 841 insertions(+), 386 deletions(-)

Nathaniel Shead Nov. 4, 2024, 2:19 a.m. UTC | #1

On Sun, Nov 03, 2024 at 05:22:05PM -0500, Lewis Hyatt wrote:
> Hello-
> 
> There is no shortage of PRs complaining about things that go wrong when the
> line_maps data structure in libcpp starts to run into its limits. Being
> restricted to a 32-bit location_t to cover all source locations means it has
> the following limitations, among others:
> 
>     -Column numbers larger than 2^12 are not tracked at all, so diagnostics
>      can only refer to the whole line, and things like -Wmisleading-indentation
>      stop working with large column numbers. [PR89549]
> 
>     -Tokens longer than 32 characters cannot be stored compactly in the line
>      map and have to be stored as ad-hoc locations with the associated
>      memory and locality overhead.
> 
>     -Sufficiently large sources can exhaust the available location_t space.
> 
> The current limitations are not encountered too much, but it does happen
> often enough, especially in testsuites or large PCH compiles that include,
> e.g., every header in a large project.
> 
> Currently, the main workaround is to use the option -flarge-source-locations.
> This sets the range bits in the line_maps to 0, effectively meaning that
> every source code token is stored as an ad-hoc location. This does
> significantly increase the number of locations that can be stored, but it
> loses all the benefits of the line_maps infrastructure, and it does not help
> with the 4096 limit for column numbers.
> 
> It seemed to me like there's no way to improve on that other than letting
> location_t be 64 bits instead of 32 bits. I thought I would take a look at
> what that would entail... the attached 15 (mostly small) patches are the
> result. It was unfortunately necessary to touch a lot of different parts of
> the compiler, since location_t is used pretty pervasively. This patch set
> adds a configure option --enable-large-source-locations that makes
> location_t be 64 bits. The option is off by default.
> 
> There are decisions to be made about what to do with all those new bits. I
> made some choices in this patch, but it's easy to change if different
> tradeoffs are preferable. Here is what I did:
> 
>     - Only 63 bits are used, the high bit is not used so that two location_t
>       can be safely subtracted and stored in an int64_t.
> 
>     - As in the 32-bit case, half of the available locations are reserved
>       for ad-hoc locations, and the other half for macros and ordinary
>       locations.
> 
>     - The limit on the maximum column number is removed, so it's now set to
>       2^31-1. (GCC right now pervasively assumes line and column numbers
>       within a file can be represented by a 32-bit signed int. That could be
>       changed too, but it is not part of this patch and would be another
>       similarly large change.)
> 
>     - The default range bits is changed from 5 to 7. This means that
>       tokens up to 128 chars in length get ordinary locations, while longer
>       ones get ad-hoc locations. I think that's a big improvement from the
>       current max of 32 and will result in a lot fewer ad-hoc locations
>       being generated. There are probably enough bits to go around, though,
>       that this could be made even larger too.
> 
> All that stuff is configured in line-map.h and is easy to change. The bulk
> of the patch is dealing with the fallout from making location_t's type
> configurable; in particular, C++ modules needed a lot of tweaks, since they
> deal with line_map internals, and I had to add a new field to RTL which
> necessitated a lot of small changes. I was not previously familiar with
> RTL... I think I generally grokked the idea there and I think the changes I
> made are safe, but I am pretty sure some of them are unnecessary so would
> appreciate feedback on that too.
> 
> I tried to test on a variety of platforms. For each of them, I did three
> bootstrap + regtest (without the patch, with the patch and without
> --enable-large-source-locations, and with the patch and with
> --enable-large-source-locations) with --enable-checking=yes,extra,rtl and
> verified no change in the testsuite other than the expected new PASSes. Here
> is what I have tested so far:
> 
> all languages: x86_64, aarch64 (cfarm185), ppc64le (cfarm135)
> c/c++ only: sparc 32-bit (cfarm216)
> 
> The testing on sparc 32-bit was very productive and in particular identified
> a few issues with padding and alignment that have not been apparent until
> now. That platform differs from x86 32-bit in that uint64_t has a larger
> alignment than a pointer does.
> 
> Also it's worth noting, even if there is no interest in supporting this
> feature, patches 2, 3, 4, 5, and 6 I think are worth accepting regardless;
> these are small bug fixes that came up in testing and are independent of
> whether location_t is changed or not... they could be triggered by other
> changes in the future too.
> 
> Regarding memory usage implications of this change. It does make a lot of
> data structures larger. Here is what it does on x86-64. The base sizes of some
> types change as follows:
> 
> location_t:              From 4 to 8 (+100%)
> line_map_ordinary:       From 24 to 32 (+33%)
> line_map_macro:          From 32 to 40 (+25%)
> cpp_token:               From 24 to 32 (+33%)
> cpp_macro:               From 48 to 56 (+17%)
> tree_exp:                From 32 to 40 (+25%)
> gimple:                  From 40 to 48 (+20%)
> 
> The compiled size of stdc++.h.gch increases from 210MB to 219MB (+4%).
> 
> Compiling testsuite/g++.dg/modules/xtreme-header.h into a module, I see:
> 
> size of the output module: From 21.9MB to 21.6MB (-1%)
> peak memory usage: From 211MB to 235MB (+11%)
> 
> The reason the module gets smaller is that it streams ordinary locations
> more efficiently than adhoc locations, and with the 64-bit location_t, the
> number of adhoc locations was reduced from 189,891 to 101,571.
> 
> I'm curious to hear your thoughts on this one please. I think it could be
> nice potentially to get it into GCC 15, and consider changing the default in
> GCC 16 or later, if you think it's worth the trouble. If it seems too
> invasive a change and not worth it, I'll certainly understand, and I have
> enjoyed the tour through the compiler internals in any case.
> 
> FWIW I feel like this issue will need to be dealt with some time or other. I
> think users generally expect not to run into arbitrary limits, and compiling
> a lot of headers together into one module is going to become possibly more
> common as modules become available by default. I think any attempt to deal
> with it will have to share at least a lot of the features of this one.
> 

Thanks for this!  I don't have any comments on the patch series itself,
but thought I'd just concur about modules likely going to make the
32-bit location_t constraint more noticeable; I've found that with
including headers in module interfaces it really doesn't take very long
to run out of locations for imported module units.

Some of that could possibly be alleviated by even more aggressive
location remapping on stream-in/stream-out (especially for mergeable
declarations), and if common build systems like CMake start supporting
header units that can also help workaround the issue for a time, but in
general as larger projects mixing modules and #includes get developed I
believe this will continue to be a problem.

Nathaniel

> -Lewis
> 
> 01/15: Support for 64-bit location_t: libcpp parts
> 02/15: libcpp: Fix potential unaligned access in cpp_buffer
> 03/15: tree-cfg: Fix call to next_discriminator_for_locus()
> 04/15: tree-phinodes: Use 4 instead of 2 as the minimum number of phi args
> 05/15: c++: Fix tree_contains_struct for TRAIT_EXPR
> 06/15: gimple: Handle tail padding when computing gimple_ops_offset
> 07/15: Support for 64-bit location_t: toplev parts
> 08/15: Support for 64-bit location_t: Analyzer parts
> 09/15: Support for 64-bit location_t: Frontend parts
> 10/15: Support for 64-bit location_t: C++ modules parts
> 11/15: Support for 64-bit location_t: RTL parts
> 12/15: Support for 64-bit location_t: Backend parts
> 13/15: Support for 64-bit location_t: Internal parts
> 14/15: Support for 64-bit location_t: Testsuite parts
> 15/15: Support for 64-bit location_t: Configury parts
> 
> 64 files changed, 841 insertions(+), 386 deletions(-)

Richard Biener Nov. 4, 2024, 9:58 a.m. UTC | #2

On Sun, Nov 3, 2024 at 11:23 PM Lewis Hyatt <lhyatt@gmail.com> wrote:
>
> Hello-
>
> There is no shortage of PRs complaining about things that go wrong when the
> line_maps data structure in libcpp starts to run into its limits. Being
> restricted to a 32-bit location_t to cover all source locations means it has
> the following limitations, among others:
>
>     -Column numbers larger than 2^12 are not tracked at all, so diagnostics
>      can only refer to the whole line, and things like -Wmisleading-indentation
>      stop working with large column numbers. [PR89549]
>
>     -Tokens longer than 32 characters cannot be stored compactly in the line
>      map and have to be stored as ad-hoc locations with the associated
>      memory and locality overhead.
>
>     -Sufficiently large sources can exhaust the available location_t space.
>
> The current limitations are not encountered too much, but it does happen
> often enough, especially in testsuites or large PCH compiles that include,
> e.g., every header in a large project.
>
> Currently, the main workaround is to use the option -flarge-source-locations.
> This sets the range bits in the line_maps to 0, effectively meaning that
> every source code token is stored as an ad-hoc location. This does
> significantly increase the number of locations that can be stored, but it
> loses all the benefits of the line_maps infrastructure, and it does not help
> with the 4096 limit for column numbers.
>
> It seemed to me like there's no way to improve on that other than letting
> location_t be 64 bits instead of 32 bits. I thought I would take a look at
> what that would entail... the attached 15 (mostly small) patches are the
> result. It was unfortunately necessary to touch a lot of different parts of
> the compiler, since location_t is used pretty pervasively. This patch set
> adds a configure option --enable-large-source-locations that makes
> location_t be 64 bits. The option is off by default.
>
> There are decisions to be made about what to do with all those new bits. I
> made some choices in this patch, but it's easy to change if different
> tradeoffs are preferable. Here is what I did:
>
>     - Only 63 bits are used, the high bit is not used so that two location_t
>       can be safely subtracted and stored in an int64_t.
>
>     - As in the 32-bit case, half of the available locations are reserved
>       for ad-hoc locations, and the other half for macros and ordinary
>       locations.
>
>     - The limit on the maximum column number is removed, so it's now set to
>       2^31-1. (GCC right now pervasively assumes line and column numbers
>       within a file can be represented by a 32-bit signed int. That could be
>       changed too, but it is not part of this patch and would be another
>       similarly large change.)
>
>     - The default range bits is changed from 5 to 7. This means that
>       tokens up to 128 chars in length get ordinary locations, while longer
>       ones get ad-hoc locations. I think that's a big improvement from the
>       current max of 32 and will result in a lot fewer ad-hoc locations
>       being generated. There are probably enough bits to go around, though,
>       that this could be made even larger too.
>
> All that stuff is configured in line-map.h and is easy to change. The bulk
> of the patch is dealing with the fallout from making location_t's type
> configurable; in particular, C++ modules needed a lot of tweaks, since they
> deal with line_map internals, and I had to add a new field to RTL which
> necessitated a lot of small changes. I was not previously familiar with
> RTL... I think I generally grokked the idea there and I think the changes I
> made are safe, but I am pretty sure some of them are unnecessary so would
> appreciate feedback on that too.
>
> I tried to test on a variety of platforms. For each of them, I did three
> bootstrap + regtest (without the patch, with the patch and without
> --enable-large-source-locations, and with the patch and with
> --enable-large-source-locations) with --enable-checking=yes,extra,rtl and
> verified no change in the testsuite other than the expected new PASSes. Here
> is what I have tested so far:
>
> all languages: x86_64, aarch64 (cfarm185), ppc64le (cfarm135)
> c/c++ only: sparc 32-bit (cfarm216)
>
> The testing on sparc 32-bit was very productive and in particular identified
> a few issues with padding and alignment that have not been apparent until
> now. That platform differs from x86 32-bit in that uint64_t has a larger
> alignment than a pointer does.
>
> Also it's worth noting, even if there is no interest in supporting this
> feature, patches 2, 3, 4, 5, and 6 I think are worth accepting regardless;
> these are small bug fixes that came up in testing and are independent of
> whether location_t is changed or not... they could be triggered by other
> changes in the future too.
>
> Regarding memory usage implications of this change. It does make a lot of
> data structures larger. Here is what it does on x86-64. The base sizes of some
> types change as follows:
>
> location_t:              From 4 to 8 (+100%)
> line_map_ordinary:       From 24 to 32 (+33%)
> line_map_macro:          From 32 to 40 (+25%)
> cpp_token:               From 24 to 32 (+33%)
> cpp_macro:               From 48 to 56 (+17%)
> tree_exp:                From 32 to 40 (+25%)
> gimple:                  From 40 to 48 (+20%)

All GENERIC decls (tree_decl_minimal) will grow by 8 bytes (and we have 4 bytes
unused padding).  On the GIMPLE side it's probably a "hooray - more space for
flags!".  It does feel a bit backwards because the savings we got by making
the TREE_BLOCK part of the location.

I do think that there's no good way around 64bit location_t though, so
thanks for
tackling this.  What I'm less sure is whether we really want to make this
configurable - I think we're past the point of considering a 32bit host to be
a platform to compile arbitrary large sources.

So - thanks for proposing this.  I'll propose to make the change unconditionally
though - it's quite difficult to optimize structure layout when bits in it are
of unknown size and alignment.

Richard.

> The compiled size of stdc++.h.gch increases from 210MB to 219MB (+4%).
>
> Compiling testsuite/g++.dg/modules/xtreme-header.h into a module, I see:
>
> size of the output module: From 21.9MB to 21.6MB (-1%)
> peak memory usage: From 211MB to 235MB (+11%)
>
> The reason the module gets smaller is that it streams ordinary locations
> more efficiently than adhoc locations, and with the 64-bit location_t, the
> number of adhoc locations was reduced from 189,891 to 101,571.
>
> I'm curious to hear your thoughts on this one please. I think it could be
> nice potentially to get it into GCC 15, and consider changing the default in
> GCC 16 or later, if you think it's worth the trouble. If it seems too
> invasive a change and not worth it, I'll certainly understand, and I have
> enjoyed the tour through the compiler internals in any case.
>
> FWIW I feel like this issue will need to be dealt with some time or other. I
> think users generally expect not to run into arbitrary limits, and compiling
> a lot of headers together into one module is going to become possibly more
> common as modules become available by default. I think any attempt to deal
> with it will have to share at least a lot of the features of this one.
>
> -Lewis
>
> 01/15: Support for 64-bit location_t: libcpp parts
> 02/15: libcpp: Fix potential unaligned access in cpp_buffer
> 03/15: tree-cfg: Fix call to next_discriminator_for_locus()
> 04/15: tree-phinodes: Use 4 instead of 2 as the minimum number of phi args
> 05/15: c++: Fix tree_contains_struct for TRAIT_EXPR
> 06/15: gimple: Handle tail padding when computing gimple_ops_offset
> 07/15: Support for 64-bit location_t: toplev parts
> 08/15: Support for 64-bit location_t: Analyzer parts
> 09/15: Support for 64-bit location_t: Frontend parts
> 10/15: Support for 64-bit location_t: C++ modules parts
> 11/15: Support for 64-bit location_t: RTL parts
> 12/15: Support for 64-bit location_t: Backend parts
> 13/15: Support for 64-bit location_t: Internal parts
> 14/15: Support for 64-bit location_t: Testsuite parts
> 15/15: Support for 64-bit location_t: Configury parts
>
> 64 files changed, 841 insertions(+), 386 deletions(-)

Jeff Law Nov. 4, 2024, 8:43 p.m. UTC | #3

On 11/4/24 2:58 AM, Richard Biener wrote:

> 
> All GENERIC decls (tree_decl_minimal) will grow by 8 bytes (and we have 4 bytes
> unused padding).  On the GIMPLE side it's probably a "hooray - more space for
> flags!".  It does feel a bit backwards because the savings we got by making
> the TREE_BLOCK part of the location.
I'm sure we'll use up that padding, probably quicker than any of us 
would estimate.

The whole point of the current scheme was to try and avoid the growth in 
that data structure and maybe that's inevitable at this point.  So I'm 
not going to object.


> tackling this.  What I'm less sure is whether we really want to make this
> configurable - I think we're past the point of considering a 32bit host to be
> a platform to compile arbitrary large sources.
I wouldn't let 32bit hosts drive any significant decision; so I wouldn't 
make it configurable.  Those platforms may suffer a bit, but they're 
less relevant every day IMHO.


> 
> So - thanks for proposing this.  I'll propose to make the change unconditionally
> though - it's quite difficult to optimize structure layout when bits in it are
> of unknown size and alignment.
+1 from me.

jeff

Jakub Jelinek Nov. 4, 2024, 10:19 p.m. UTC | #4

On Mon, Nov 04, 2024 at 10:58:14AM +0100, Richard Biener wrote:
> > Regarding memory usage implications of this change. It does make a lot of
> > data structures larger. Here is what it does on x86-64. The base sizes of some
> > types change as follows:
> >
> > location_t:              From 4 to 8 (+100%)
> > line_map_ordinary:       From 24 to 32 (+33%)
> > line_map_macro:          From 32 to 40 (+25%)
> > cpp_token:               From 24 to 32 (+33%)
> > cpp_macro:               From 48 to 56 (+17%)
> > tree_exp:                From 32 to 40 (+25%)
> > gimple:                  From 40 to 48 (+20%)
> 
> All GENERIC decls (tree_decl_minimal) will grow by 8 bytes (and we have 4 bytes
> unused padding).  On the GIMPLE side it's probably a "hooray - more space for
> flags!".  It does feel a bit backwards because the savings we got by making
> the TREE_BLOCK part of the location.

I think we need to carefully look at each of the structs with location_t
that is used widely.
E.g. for tree_exp, I think we just had a padding there up to and including
GCC 13, and in GCC 14+, while EXPR_COND_UID is computed unconditionally, it
is ignored unless -fcondition-coverage which is not enabled by default and
I think it isn't commonly used and is used solely on COND_EXPRs.
Furthermore, for gimple we don't waste precious space for it and use on the
side hash map, so why not use similar hash-map for COND_EXPRs during
gimplification too (and only for -fcondition-coverage)?
That would keep tree_exp at the same size as before.

Other than that, guess gimple is the most important, especially for LTO.
Except for GIMPLE_CALL/ASM/SWITCH, most GIMPLE_* codes have very low num_ops,
so perhaps having 32-bit num_ops is a waste.
Though we'd need 3 bits to represent those, or at least 2 if GIMPLE_COND
has it somewhere else.

cpp_token hurts too, at least for C++, though I'm not sure what to do that.
While most of the union members are just 32 or 64-bit, there are two with
32-bit and 64-bit data (that could be in theory handled by having the 32-bit
stuff in some flags field and union just the 64-bit) but then for identifiers
we want 2 pointers.

	Jakub

Richard Biener Nov. 5, 2024, 8:32 a.m. UTC | #5

On Mon, Nov 4, 2024 at 11:19 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Nov 04, 2024 at 10:58:14AM +0100, Richard Biener wrote:
> > > Regarding memory usage implications of this change. It does make a lot of
> > > data structures larger. Here is what it does on x86-64. The base sizes of some
> > > types change as follows:
> > >
> > > location_t:              From 4 to 8 (+100%)
> > > line_map_ordinary:       From 24 to 32 (+33%)
> > > line_map_macro:          From 32 to 40 (+25%)
> > > cpp_token:               From 24 to 32 (+33%)
> > > cpp_macro:               From 48 to 56 (+17%)
> > > tree_exp:                From 32 to 40 (+25%)
> > > gimple:                  From 40 to 48 (+20%)
> >
> > All GENERIC decls (tree_decl_minimal) will grow by 8 bytes (and we have 4 bytes
> > unused padding).  On the GIMPLE side it's probably a "hooray - more space for
> > flags!".  It does feel a bit backwards because the savings we got by making
> > the TREE_BLOCK part of the location.
>
> I think we need to carefully look at each of the structs with location_t
> that is used widely.
> E.g. for tree_exp, I think we just had a padding there up to and including
> GCC 13, and in GCC 14+, while EXPR_COND_UID is computed unconditionally, it
> is ignored unless -fcondition-coverage which is not enabled by default and
> I think it isn't commonly used and is used solely on COND_EXPRs.
> Furthermore, for gimple we don't waste precious space for it and use on the
> side hash map, so why not use similar hash-map for COND_EXPRs during
> gimplification too (and only for -fcondition-coverage)?
> That would keep tree_exp at the same size as before.

Good idea.

> Other than that, guess gimple is the most important, especially for LTO.
> Except for GIMPLE_CALL/ASM/SWITCH, most GIMPLE_* codes have very low num_ops,
> so perhaps having 32-bit num_ops is a waste.

All non-GIMPLE_ASSIGN_SINGLE have the possibility of much saving:
 - as you say num_ops is either 2, 3 or 4
 - they are unnecessarily using gimple_statement_with_memory_ops_base,
no vuse/def on non-SINGLE

For GIMPLE_ASSIGN_SINGLE num_ops is 2.

So - ICK - the "best" way would be to split gassign into
gsingle_assign (gload, gstore, gcopy, glegacy?)
and goperation.  Currently gassign has
gimple_statement_with_memory_ops as base, I'm not sure
we can keep gassign as a base of the two new classes and thus make it
have gimple_statement_with_ops_base
as base.  A "cheap" way would be to scrap _with_memory_ops and put
vuse/vdef in two trailing ops[] positions
at ops[num_ops] and ops[num_ops+1], making gimple_vuse and friends a
bit more expensive and complicated
to verify (we have to check for GIMPLE_ASSIGN_SINGLE and always
allocate for those).

Generally num_ops has to be able to be large (for calls and switch),
and given num_ops is currently
in the base and also accessed with just gimple * it's difficult to
move it down the class hierarchy without
using virtual functions to access it or make the access expensive
(switch on gimple_code and up-cast).

That said, for gassign num_ops is redundant information, and in
general 32bits is overkill, 16bits
should be OK - salvaging 'subcode' might be possible as well by moving
it where needed.

> Though we'd need 3 bits to represent those, or at least 2 if GIMPLE_COND
> has it somewhere else.

GIMPLE_COND always has two ops, num_ops is redundant.

I think there's the possibility to get back the memory on the GIMPLE
side but I wouldn't make
this a requirement for this patch.

> cpp_token hurts too, at least for C++, though I'm not sure what to do that.
> While most of the union members are just 32 or 64-bit, there are two with
> 32-bit and 64-bit data (that could be in theory handled by having the 32-bit
> stuff in some flags field and union just the 64-bit) but then for identifiers
> we want 2 pointers.

We could dynamically allocate cpp_token and run-length encode location_t,
making it effectively less than 64bit.  For cpp_token we'd have to split
it into a low 32bit at the existing location and optionally allocated upper
32bits.  But I'm honestly not sure this is worth the trouble - it would also
restrict how we allocate the bits of location_t.  We might want to
dynamically allocate cpp_token in general given the current 24 bytes
are only for the largest union members - but I have no statistics on the
token distribution and no idea if we use an array of cpp_token somewhere
which would rule this out.

Richard.

>         Jakub
>

Jakub Jelinek Nov. 5, 2024, 8:59 a.m. UTC | #6

On Tue, Nov 05, 2024 at 09:32:23AM +0100, Richard Biener wrote:
> I think there's the possibility to get back the memory on the GIMPLE
> side but I wouldn't make
> this a requirement for this patch.

Sure.  I'll I'm saying is that we should make some effort to avoid the
growth, not just even without any movements accept we have tons of new bits
for flags, let's use them.

> We could dynamically allocate cpp_token and run-length encode location_t,
> making it effectively less than 64bit.  For cpp_token we'd have to split
> it into a low 32bit at the existing location and optionally allocated upper
> 32bits.  But I'm honestly not sure this is worth the trouble - it would also
> restrict how we allocate the bits of location_t.  We might want to
> dynamically allocate cpp_token in general given the current 24 bytes
> are only for the largest union members - but I have no statistics on the
> token distribution and no idea if we use an array of cpp_token somewhere
> which would rule this out.

Actually, I think cpp_token isn't that big deal, that should be short-lived
unless using huge macros.
cp_token in the C++ FE is more important, the FE uses a vector of those
and there is one cp_token per token read from libcpp.
Unfortunately, I'm afraid there is nothing that can be done there,
the struct has currently 29 bits of various flags, then 32 bit location_t
and then union with a single pointer in it, so nicely 16 bytes.
Now it will be 24 bytes, with 35 spare bits for flags.
And the vector is live across the whole parsing (pointer to it cleared at
the end of parsing, so GC collect can use it).

	Jakub

Richard Biener Nov. 5, 2024, 9:42 a.m. UTC | #7

On Tue, Nov 5, 2024 at 9:59 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 05, 2024 at 09:32:23AM +0100, Richard Biener wrote:
> > I think there's the possibility to get back the memory on the GIMPLE
> > side but I wouldn't make
> > this a requirement for this patch.
>
> Sure.  I'll I'm saying is that we should make some effort to avoid the
> growth, not just even without any movements accept we have tons of new bits
> for flags, let's use them.

True.  I'm looking how difficult it is to optimize the vuse/vdef thing now.

> > We could dynamically allocate cpp_token and run-length encode location_t,
> > making it effectively less than 64bit.  For cpp_token we'd have to split
> > it into a low 32bit at the existing location and optionally allocated upper
> > 32bits.  But I'm honestly not sure this is worth the trouble - it would also
> > restrict how we allocate the bits of location_t.  We might want to
> > dynamically allocate cpp_token in general given the current 24 bytes
> > are only for the largest union members - but I have no statistics on the
> > token distribution and no idea if we use an array of cpp_token somewhere
> > which would rule this out.
>
> Actually, I think cpp_token isn't that big deal, that should be short-lived
> unless using huge macros.
> cp_token in the C++ FE is more important, the FE uses a vector of those
> and there is one cp_token per token read from libcpp.
> Unfortunately, I'm afraid there is nothing that can be done there,
> the struct has currently 29 bits of various flags, then 32 bit location_t
> and then union with a single pointer in it, so nicely 16 bytes.
> Now it will be 24 bytes, with 35 spare bits for flags.
> And the vector is live across the whole parsing (pointer to it cleared at
> the end of parsing, so GC collect can use it).

So cp_token[] could be split into two arrays to avoid the 32bit padding with
the enlarged location_t.  Maybe that's even more cache efficient if one
32bit field is often the only accessed one when sweeping over a chain
of tokens.

Richard.

>
>         Jakub
>

Jakub Jelinek Nov. 5, 2024, 9:56 a.m. UTC | #8

On Tue, Nov 05, 2024 at 10:42:10AM +0100, Richard Biener wrote:
> > Actually, I think cpp_token isn't that big deal, that should be short-lived
> > unless using huge macros.
> > cp_token in the C++ FE is more important, the FE uses a vector of those
> > and there is one cp_token per token read from libcpp.
> > Unfortunately, I'm afraid there is nothing that can be done there,
> > the struct has currently 29 bits of various flags, then 32 bit location_t
> > and then union with a single pointer in it, so nicely 16 bytes.
> > Now it will be 24 bytes, with 35 spare bits for flags.
> > And the vector is live across the whole parsing (pointer to it cleared at
> > the end of parsing, so GC collect can use it).
> 
> So cp_token[] could be split into two arrays to avoid the 32bit padding with
> the enlarged location_t.  Maybe that's even more cache efficient if one
> 32bit field is often the only accessed one when sweeping over a chain
> of tokens.

Not without rewriting the whole parser, cp_token is the basic structure it
uses everywhere, most of the code doesn't really care whether the tokens are
in a vector or where, they just peek a token or peek 2nd token or similarly.
And having to rewrite all cp_token -> accesses into some inline function
calls or macros that would for e.g. the 32-bit pointer to cp_token_flags
look up the corresponding location/u.value/u.tree_check_value would be a
nightmare, there are about 950 such cases.  What is worse, not all cp_tokens
live in the parser->lexer->buffer vector, some of them live in automatic
variables.

	Jakub

[00/15] Support for 64-bit location_t

Message

Comments