Message ID | 9dcba167-4fa0-7c7e-aee4-74a29a8b32cb@netcologne.de |
---|---|
State | New |
Headers | show |
Series | Fix PR 84487, large rodata increase in tonto and other programs | expand |
Hi Thomas, Thanks for your determination in dealing with this. It has been on my TODO list for a long time but, like you at the outset, I had no idea how to deal with it. OK on the fortran side. Paul On Sat, 13 Apr 2019 at 19:48, Thomas Koenig <tkoenig@netcologne.de> wrote: > > Hello world, > > the attached patch fixes a 8/9 regression where _def_init, an internal > Fortran variable containing only zeros, was placed into the .rodata > section. This led to a large increase in executable size. > > There should be no impact on other languages because the change to > varasm.c is guarded by lang_GNU_Fortran (). > > Regarding the test case: I did find one other test which checks > for .bss, so I suppose this is safe. > > Regression-tested with a full test (--enable-languages=all and > make -j64 -k check) on POWER9. > > I would like to apply it to both affected branches. > > OK for the general and the Fortran part? > > Regards > > Thomas > > 2019-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/84487 > * trans-decl.c (gfc_get_symbol_decl): Mark _def_init as > artificial. > > 2019-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/84487 > * varasm.c (bss_initializer_p): If we are compiling Fortran, the > decl is artifical and it has a size larger than 255, it can be > put into BSS. > > 2019-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/84487 > * gfortran.dg/def_init_1.f90: New test. > >
On Sat, Apr 13, 2019 at 8:48 PM Thomas Koenig <tkoenig@netcologne.de> wrote: > > Hello world, > > the attached patch fixes a 8/9 regression where _def_init, an internal > Fortran variable containing only zeros, was placed into the .rodata > section. This led to a large increase in executable size. > > There should be no impact on other languages because the change to > varasm.c is guarded by lang_GNU_Fortran (). > > Regarding the test case: I did find one other test which checks > for .bss, so I suppose this is safe. > > Regression-tested with a full test (--enable-languages=all and > make -j64 -k check) on POWER9. > > I would like to apply it to both affected branches. > > OK for the general and the Fortran part? This won't work with LTO. Note we have the issue in the middle-end as well since we promote variables we see are not written to to TREE_READONLY. This can be seen with (the somewhat artificial...): int a[1024*1024] = { 0 }; int __attribute__((noinline)) foo() { return *(volatile int *)a; } int main() { return foo (); } where without -flto a gets placed into .bss while with -flto it gets into .rodata. So I believe we should add a DECL flag specifying whether for section placement we can "ignore" TREE_READONLY. We'd initialize that with the original state of TREE_READONLY so that the R/O promotion doesn't change section placement. Also the Fortran FE can then simply set this flag on variables that may live in .bss. There are 14 unused bits in tree_decl_with_vis so a patch for the middle-end part could look like the attached (w/o solving the LTO issue yet). Of course adding sth like a .robss section would be nice. Richard. > Regards > > Thomas > > 2019-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/84487 > * trans-decl.c (gfc_get_symbol_decl): Mark _def_init as > artificial. > > 2019-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/84487 > * varasm.c (bss_initializer_p): If we are compiling Fortran, the > decl is artifical and it has a size larger than 255, it can be > put into BSS. > > 2019-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/84487 > * gfortran.dg/def_init_1.f90: New test. > >
> > This won't work with LTO. Note we have the issue in the middle-end as well > since we promote variables we see are not written to to TREE_READONLY. > This can be seen with (the somewhat artificial...): > > int a[1024*1024] = { 0 }; > > int __attribute__((noinline)) foo() { return *(volatile int *)a; } > > int main() > { > return foo (); > } > > where without -flto a gets placed into .bss while with -flto it > gets into .rodata. So I believe we should add a DECL flag > specifying whether for section placement we can "ignore" > TREE_READONLY. We'd initialize that with the original > state of TREE_READONLY so that the R/O promotion doesn't > change section placement. Also the Fortran FE can then > simply set this flag on variables that may live in .bss. > > There are 14 unused bits in tree_decl_with_vis so a > patch for the middle-end part could look like the attached > (w/o solving the LTO issue yet). > > Of course adding sth like a .robss section would be nice. Yep, but I think what you propose works well in practice (I am not sure if we are forced to put const delcared variables to readonly memory and if we can't do this as binary size optimization always). The patch looks fine to me. Would be possible to place the flags into varpool_node rather then TREE? It is a lot easier to manage flags there. Honza > > Richard. > > > Regards > > > > Thomas > > > > 2019-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> > > > > PR fortran/84487 > > * trans-decl.c (gfc_get_symbol_decl): Mark _def_init as > > artificial. > > > > 2019-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> > > > > PR fortran/84487 > > * varasm.c (bss_initializer_p): If we are compiling Fortran, the > > decl is artifical and it has a size larger than 255, it can be > > put into BSS. > > > > 2019-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> > > > > PR fortran/84487 > > * gfortran.dg/def_init_1.f90: New test. > > > >
* Richard Biener:
> Of course adding sth like a .robss section would be nice.
I think this is strictly a link editor issue because a read-only PT_LOAD
directive with a memory size larger than the file size already produces
read-only zero pages, without requiring a file allocation.
Thanks,
Florian
On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote: > * Richard Biener: > > > Of course adding sth like a .robss section would be nice. > > I think this is strictly a link editor issue because a read-only PT_LOAD > directive with a memory size larger than the file size already produces > read-only zero pages, without requiring a file allocation. But .rodata normally is not the last thing in its segment (the .eh* things are after it, and those are usually not all zero). Segher
* Segher Boessenkool: > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote: >> * Richard Biener: >> >> > Of course adding sth like a .robss section would be nice. >> >> I think this is strictly a link editor issue because a read-only PT_LOAD >> directive with a memory size larger than the file size already produces >> read-only zero pages, without requiring a file allocation. > > But .rodata normally is not the last thing in its segment (the .eh* > things are after it, and those are usually not all zero). If you don't mind the proliferation of load segments (we've add many of them in recent years), placement does not matter. Thanks, Florian
On Tue, Apr 16, 2019 at 11:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Segher Boessenkool: > > > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote: > >> * Richard Biener: > >> > >> > Of course adding sth like a .robss section would be nice. > >> > >> I think this is strictly a link editor issue because a read-only PT_LOAD > >> directive with a memory size larger than the file size already produces > >> read-only zero pages, without requiring a file allocation. > > > > But .rodata normally is not the last thing in its segment (the .eh* > > things are after it, and those are usually not all zero). > > If you don't mind the proliferation of load segments (we've add many of > them in recent years), placement does not matter. Btw, besides being a link editor issue the issue also shows in object file size. Not sure if a similar trick exists for those (.rodata with NOBITS?) Richard. > Thanks, > Florian
On Tue, Apr 16, 2019 at 11:33:39AM +0200, Florian Weimer wrote: > * Segher Boessenkool: > > > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote: > >> * Richard Biener: > >> > >> > Of course adding sth like a .robss section would be nice. > >> > >> I think this is strictly a link editor issue because a read-only PT_LOAD > >> directive with a memory size larger than the file size already produces > >> read-only zero pages, without requiring a file allocation. > > > > But .rodata normally is not the last thing in its segment (the .eh* > > things are after it, and those are usually not all zero). > > If you don't mind the proliferation of load segments (we've add many of > them in recent years), placement does not matter. Many (or at least some) ABIs have requirements wrt placement, and number and kind of segments even. Is there any disadvantage to having a .robss section (that is mapped at the end of the text segment, or some other r/o segment, so that its data does not have to exist in object and executable files)? Segher
On Tue, Apr 16, 2019 at 11:33:39AM +0200, Florian Weimer wrote: > * Segher Boessenkool: > > > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote: > >> * Richard Biener: > >> > >> > Of course adding sth like a .robss section would be nice. > >> > >> I think this is strictly a link editor issue because a read-only PT_LOAD > >> directive with a memory size larger than the file size already produces > >> read-only zero pages, without requiring a file allocation. > > > > But .rodata normally is not the last thing in its segment (the .eh* > > things are after it, and those are usually not all zero). > > If you don't mind the proliferation of load segments (we've add many of > them in recent years), placement does not matter. That is something I really dislike, each load segment has a significant cost and from what I remember, at least one of the 4 PT_LOADs in current setup is completely useless: LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x02bf60 0x02bf60 R 0x1000 LOAD 0x02c000 0x000000000002c000 0x000000000002c000 0x0a74a5 0x0a74a5 R E 0x1000 LOAD 0x0d4000 0x00000000000d4000 0x00000000000d4000 0x033fd0 0x033fd0 R 0x1000 LOAD 0x108d50 0x0000000000109d50 0x0000000000109d50 0x00b814 0x0153d8 RW 0x1000 there is no reason not to reorder at least on most targets the sections such that there is just one R, one R E and one RW segment. Jakub
On Tue, Apr 16, 2019 at 12:05:56PM +0200, Richard Biener wrote: > On Tue, Apr 16, 2019 at 11:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > * Segher Boessenkool: > > > > > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote: > > >> * Richard Biener: > > >> > > >> > Of course adding sth like a .robss section would be nice. > > >> > > >> I think this is strictly a link editor issue because a read-only PT_LOAD > > >> directive with a memory size larger than the file size already produces > > >> read-only zero pages, without requiring a file allocation. > > > > > > But .rodata normally is not the last thing in its segment (the .eh* > > > things are after it, and those are usually not all zero). > > > > If you don't mind the proliferation of load segments (we've add many of > > them in recent years), placement does not matter. > > Btw, besides being a link editor issue the issue also shows in object > file size. Not sure if a similar trick exists for those (.rodata with NOBITS?) Section type SHT_NOBITS does the trick, yes. Segher
On Tue, Apr 16, 2019 at 12:16:16PM +0200, Jakub Jelinek wrote: > On Tue, Apr 16, 2019 at 11:33:39AM +0200, Florian Weimer wrote: > > * Segher Boessenkool: > > > > > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote: > > >> * Richard Biener: > > >> > > >> > Of course adding sth like a .robss section would be nice. > > >> > > >> I think this is strictly a link editor issue because a read-only PT_LOAD > > >> directive with a memory size larger than the file size already produces > > >> read-only zero pages, without requiring a file allocation. > > > > > > But .rodata normally is not the last thing in its segment (the .eh* > > > things are after it, and those are usually not all zero). > > > > If you don't mind the proliferation of load segments (we've add many of > > them in recent years), placement does not matter. > > That is something I really dislike, each load segment has a significant cost > and from what I remember, at least one of the 4 PT_LOADs in current setup is > completely useless: > LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x02bf60 0x02bf60 R 0x1000 > LOAD 0x02c000 0x000000000002c000 0x000000000002c000 0x0a74a5 0x0a74a5 R E 0x1000 > LOAD 0x0d4000 0x00000000000d4000 0x00000000000d4000 0x033fd0 0x033fd0 R 0x1000 > LOAD 0x108d50 0x0000000000109d50 0x0000000000109d50 0x00b814 0x0153d8 RW 0x1000 > there is no reason not to reorder at least on most targets the sections such > that there is just one R, one R E and one RW segment. It is more painful if your segments are aligned to 64kB or anything else bigger, too. Segher
Hi, thanks a lot for the extensive discussion :-) How should we now proceed, first for gcc 9, snd then for backporting? Use Richard‘s patch with the corresponding Fortran FE change? Regards Thomas
On Wed, Apr 17, 2019 at 9:19 AM Thomas König <tk@tkoenig.net> wrote: > > Hi, > > thanks a lot for the extensive discussion :-) > > How should we now proceed, first for gcc 9, snd then for backporting? > Use Richard‘s patch with the corresponding Fortran FE change? Btw, for the testcase the fortran FE could also simply opt to not make def_init TREE_READONLY. Or even better, for all-zero initialization omit the explicit initialization data and instead mark it specially in the vtable (just use a NULL initializer denoting zero-initialization?). Even .bss costs (runtime) memory. But yes, my patch would be a way to solve the middle-end issue of promoting a variable TREE_READONLY, preventing .bss use. And the FE could then "abuse" this feature. Note the middle-end already special-cases variables with an explicit section so the Fortran FE can already use that feature to put the initializer into .bss explicitely (set_decl_section_name (decl, ".bss"), conditional on availability (not 100% sure how to test that...). Your testcase probably will fail on targets w/o .bss section support. Richard. > Regards > > Thomas
* Richard Biener: > On Wed, Apr 17, 2019 at 9:19 AM Thomas König <tk@tkoenig.net> wrote: >> >> Hi, >> >> thanks a lot for the extensive discussion :-) >> >> How should we now proceed, first for gcc 9, snd then for backporting? >> Use Richard‘s patch with the corresponding Fortran FE change? > > Btw, for the testcase the fortran FE could also simply opt to not > make def_init TREE_READONLY. Or even better, for all-zero > initialization omit the explicit initialization data and instead > mark it specially in the vtable (just use a NULL initializer > denoting zero-initialization?). Even .bss costs (runtime) memory. Not just that, .bss adds to the commit charge, while .rodata would not. So it's not clear that using .bss for zero constants is always a win. Thanks, Florian
On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote:
> Not just that, .bss adds to the commit charge,
Only one page at most.
Andreas.
* Andreas Schwab: > On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote: > >> Not just that, .bss adds to the commit charge, > > Only one page at most. That would be a bug. All of it is anonymous memory which needs backing from RAM or swap, in case the process writes to it. Thanks, Florian
On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote: > * Andreas Schwab: > >> On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote: >> >>> Not just that, .bss adds to the commit charge, >> >> Only one page at most. > > That would be a bug. You cannot avoid it for the page shared with .data, unless you force .bss to be page aligned. Andreas.
* Andreas Schwab: > On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote: > >> * Andreas Schwab: >> >>> On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote: >>> >>>> Not just that, .bss adds to the commit charge, >>> >>> Only one page at most. >> >> That would be a bug. > > You cannot avoid it for the page shared with .data, unless you force > .bss to be page aligned. Would you please elaborate? With “commit charge”, I mean address space accounted towards the commit limit, when Linux is running in vm.overcommit_memory=2 mode. Thanks, Florian
Index: fortran/trans-decl.c =================================================================== --- fortran/trans-decl.c (Revision 270182) +++ fortran/trans-decl.c (Arbeitskopie) @@ -1865,7 +1865,10 @@ gfc_get_symbol_decl (gfc_symbol * sym) if (sym->attr.vtab || (sym->name[0] == '_' && gfc_str_startswith (sym->name, "__def_init"))) - TREE_READONLY (decl) = 1; + { + TREE_READONLY (decl) = 1; + DECL_ARTIFICIAL (decl) = 1; + } return decl; } Index: varasm.c =================================================================== --- varasm.c (Revision 270182) +++ varasm.c (Arbeitskopie) @@ -1007,9 +1007,13 @@ decode_reg_name (const char *name) bool bss_initializer_p (const_tree decl, bool named) { - /* Do not put non-common constants into the .bss section, they belong in - a readonly section, except when NAMED is true. */ - return ((!TREE_READONLY (decl) || DECL_COMMON (decl) || named) + /* Do not put non-common constants into the .bss section, they + belong in a readonly section, except when NAMED is true or when + we are dealing with an artificial declaration, in the Fortran + compiler only, above a certain size. */ + return ((!TREE_READONLY (decl) || DECL_COMMON (decl) || named + || (lang_GNU_Fortran () && DECL_ARTIFICIAL (decl) + && (tree_to_uhwi (DECL_SIZE_UNIT (decl)) > 255 ))) && (DECL_INITIAL (decl) == NULL /* In LTO we have no errors in program; error_mark_node is used to mark offlined constructors. */