Message ID | 20150414223916.9B4782C3BDC@topped-with-meat.com |
---|---|
State | New |
Headers | show |
On 04/14/2015 06:39 PM, Roland McGrath wrote: > I suspect this is now safe. (It caused no apparent mischief in my > arm-linux-gnueabihf build using GCC 4.8.2.) But I don't actually know > what GCC version is implied by "added by the GCC TLS patches". I'm > guessing that 4.6 (our minimum) is new enough that this is always true > now. If that's true, it would be nice to know the appropriate GCC > version number for sure and mention that in the comment. If it is > really still the case that some supported compiler versions or > configurations do not reliably do pure PI access to static/hidden, > then I'd like to have enough information about that to write a proper > configure test. Did you do any benchmarking? I enabled this on tilegx out of curiosity, and I found it was both larger (ld.so size increased 0.3%) and slower (average of 0.1% slower when running fork/exec of a no-op program in a loop with 10,000 iterations). The timing results do vary enough that I'd want to do more extensive testing to be able to say anything definitive, but it's not very encouraging. I admit I'm not sure why this might be, but it seems like the right things were happening, e.g. _dl_start_final is missing and _dl_start is bigger in the version with PI_STATIC_AND_HIDDEN defined. All that said, if it's the more standard way, or is desirable for some other reason, I'm happy to enable it for tilegx, but...
> Did you do any benchmarking? I enabled this on tilegx out of curiosity, > and I found it was both larger (ld.so size increased 0.3%) and slower > (average of 0.1% slower when running fork/exec of a no-op program > in a loop with 10,000 iterations). The timing results do vary enough > that I'd want to do more extensive testing to be able to say anything > definitive, but it's not very encouraging. I did not. Just now I've looked at the sizes on arm-linux-gnueabihf (-mthumb), and indeed this increases ld.so's text by 256 bytes (to 93989 from 94245). For -marm the difference is only 192 bytes (to 126581 from 126389). > I admit I'm not sure why this might be, but it seems like the right > things were happening, e.g. _dl_start_final is missing and _dl_start > is bigger in the version with PI_STATIC_AND_HIDDEN defined. My guess is that there is some extra arithmetic to compute runtime addresses from PC-relative ones. Conversely, without PI_STATIC_AND_HIDDEN defined the equivalent accesses are SP-relative (and thus simpler) while the extra code to copy fields from the stack struct to the hidden global struct is less than that increase. But I didn't actually compare the code even for ARM, let alone Tile. > All that said, if it's the more standard way, or is desirable for some > other reason, I'm happy to enable it for tilegx, but... It's certainly the more standard way in the sense that most machines, including the most-used machines (x86), do it that way. The recent MIPS bug is an indication that the stack bootstrap_map code path is less tested and perhaps less generally reliable. I'd say that the PI_STATIC_AND_HIDDEN case is just more straightforward and clean, so if we can eventually get every machine into that state, that would be ideal. For x86_64, PC-relative accesses are entirely free. (Well, you forego indexed addressing modes you might sometimes use with absolute or SP-relative accesses, but there is no PC-relative overhead per se.) Not surprisingly, turning PI_STATIC_AND_HIDDEN off there adds 256 bytes to ld.so. (Actually it is slightly surprising that it's exactly +256 while ARM/Thumb is exactly -256. :-) It's just entirely expected that it's a loser on x86_64.) For i386, PC-relative accesses are relatively costly. So it's a mild surprise that turning PI_STATIC_AND_HIDDEN off there adds 188 bytes (i.e. fewer PC-relative accesses needs more code, perhaps mostly the extra copy-from-stack code). My guess is that the compiler backend for i386 has had more effort put into optimizing these cases. For example, perhaps it computes the common base address only once in the function and reuses it more effectively while the ARM and TIle backends are repeating the computation more often. Given that i386 has more register pressure than any other machine, the fact that they can win here suggests that you could too with sufficient work on the compiler side. (But this is all just guesses.) (All these examples were with GCC 4.8.2 as modified by Ubuntu. For arm-nacl with GCC 4.9.2 as modified by me, it adds 512 bytes.) Thanks, Roland
On Tue, 14 Apr 2015, Roland McGrath wrote: > I suspect this is now safe. (It caused no apparent mischief in my > arm-linux-gnueabihf build using GCC 4.8.2.) But I don't actually know > what GCC version is implied by "added by the GCC TLS patches". I'm I believe it means 4.1. > 2015-04-14 Roland McGrath <roland@hack.frob.com> > > * sysdeps/arm/configure.ac (PI_STATIC_AND_HIDDEN): Define it. > * sysdeps/arm/configure: Regenerated. OK.
Committed with improved comment. Thanks, Roland
--- a/sysdeps/arm/configure +++ b/sysdeps/arm/configure @@ -1,7 +1,8 @@ # This file is generated from configure.ac by Autoconf. DO NOT EDIT! # Local configure fragment for sysdeps/arm. -#AC_DEFINE(PI_STATIC_AND_HIDDEN) +$as_echo "#define PI_STATIC_AND_HIDDEN 1" >>confdefs.h + # We check to see if the compiler and flags are # selecting the hard-float ABI and if they are then --- a/sysdeps/arm/configure.ac +++ b/sysdeps/arm/configure.ac @@ -3,9 +3,7 @@ GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory. dnl It is always possible to access static and hidden symbols in an dnl position independent way. -dnl NOTE: This feature was added by the GCC TLS patches. We should test for -dnl it. Until we do, don't define it. -#AC_DEFINE(PI_STATIC_AND_HIDDEN) +AC_DEFINE(PI_STATIC_AND_HIDDEN) # We check to see if the compiler and flags are # selecting the hard-float ABI and if they are then