Message ID | 20150318061145.GE24573@bubble.grove.modra.org |
---|---|
State | New |
Headers | show |
On 03/18/2015 02:11 AM, Alan Modra wrote: > Now that Alex's fixes for static TLS have gone in, I figure it's worth > revisiting an old patch of mine. > https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html I'm not against this patch, but it certainly seems like you would be better served by just implementing tls descriptors? Do you have a reference to the binutils patch? > This patch is glibc support for a PowerPC TLS optimization, inspired > by Alexandre Oliva's TLS optimization for other processors, > http://www.lsd.ic.unicamp.br/~oliva/writeups/TLS/RFC-TLSDESC-x86.txt > > In essence, this optimization uses a zero module id in the TLS > descriptor to indicate that a TLS variable is allocated space in the > static TLS area. A special plt call linker stub for __tls_get_addr > checks for such a TLS descriptor and if found, returns the offset > immediately. The linker communicates the fact that the special > __tls_get_addr stub is present by setting a bit in the dynamic tag > DT_PPC64_OPT/DT_PPC_OPT. I'm confused, you write "TLS descriptor" but power doesn't have TLS DESC support yet in glibc? The code in question writes a module id of zero into the GOT entry associated with the TLS variable, not really the TLS descriptor? Speaking of which, you wouldn't happen to have a Latex contribution that describes the Power TLS support so I can add it to and update tls.pdf? :-) > tst-tlsmod2.so is built with -Wl,--no-tls-get-addr-optimize for > tst-tls-dlinfo, which otherwise would fail since it tests that no > static tls is allocated. The ld option --no-tls-get-addr-optimize has > been available since binutils-2.20 so doesn't need a configure test. OK. > Regression tested powerpc-linux and powerpc64-linux. > > * NEWS: Advertise TLS optimization. > * elf/elf.h (R_PPC_TLSGD, R_PPC_TLSLD, DT_PPC_OPT, PPC_OPT_TLS): Define. > (DT_PPC_NUM): Increment. > * elf/dynamic-link.h (HAVE_STATIC_TLS): Define, extracted from.. > (CHECK_STATIC_TLS): ..here. > * sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela): Optimize > TLS descriptors. > * sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise. > * sysdeps/powerpc/dl-tls.c: New file. > * sysdeps/powerpc/Versions: Add __tls_get_addr_opt. > * sysdeps/unix/sysv/linux/powerpc/Makefile: Build tst-tlsmod2.so > with --no-tls-get-addr-optimize. > * sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist: Update. > * sysdeps/unix/sysv/linux/powerpc/powerpc64/ld.abilist: Likewise. > * sysdeps/unix/sysv/linux/powerpc/powerpc64/ld-le.abilist: Likewise. This absolutely needs a new ppc64-specific test case to make sure this is actually working as intended? If it requires a new binutils, then you'll need to have the test return 77 (UNSUPPORTED) if the present binutils is not new enough. The rest looks fine. Cheers, Carlos.
On Wed, 18 Mar 2015, Alan Modra wrote: > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist > index d71611f..052f311 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist > @@ -1,3 +1,6 @@ > +GLIBC_2.22 > + GLIBC_2.22 A > + __tls_get_addr_opt F > GLIBC_2.0 > GLIBC_2.0 A > __libc_memalign F That positioning looks odd - I thought these files were alphabetical (so it would go between GLIBC_2.1 and GLIBC_2.3)?
On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: > On 03/18/2015 02:11 AM, Alan Modra wrote: > > Now that Alex's fixes for static TLS have gone in, I figure it's worth > > revisiting an old patch of mine. > > https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html > > I'm not against this patch, but it certainly seems like you would be > better served by just implementing tls descriptors? I think this is one better than tls descriptors, because powerpc avoids the indirect function call used by tls descriptors. > Do you have a reference to the binutils patch? https://sourceware.org/ml/binutils/2009-03/msg00498.html > > This patch is glibc support for a PowerPC TLS optimization, inspired > > by Alexandre Oliva's TLS optimization for other processors, > > http://www.lsd.ic.unicamp.br/~oliva/writeups/TLS/RFC-TLSDESC-x86.txt > > > > In essence, this optimization uses a zero module id in the TLS > > descriptor to indicate that a TLS variable is allocated space in the > > static TLS area. A special plt call linker stub for __tls_get_addr > > checks for such a TLS descriptor and if found, returns the offset > > immediately. The linker communicates the fact that the special > > __tls_get_addr stub is present by setting a bit in the dynamic tag > > DT_PPC64_OPT/DT_PPC_OPT. > > I'm confused, you write "TLS descriptor" but power doesn't have TLS DESC > support yet in glibc? Oops, I meant a tls_index object as defined in Drepper's tls.pdf. The binutils reference above makes the same error.. > The code in question writes a module id of zero into the GOT entry > associated with the TLS variable, not really the TLS descriptor? Right. > Speaking of which, you wouldn't happen to have a Latex contribution > that describes the Power TLS support so I can add it to and update > tls.pdf? :-) No, sorry, I wrote the original powerpc tls abi as plain text. > This absolutely needs a new ppc64-specific test case to make sure this is > actually working as intended? If it requires a new binutils, then you'll need > to have the test return 77 (UNSUPPORTED) if the present binutils is not new > enough. We actually get the support turned on automatically, no gcc or ld options needed, so the existing tls tests run using the optimized __tls_get_addr support. Hmm, your comment reminded me that I need to check older binutils, because I renamed DT_PPC64_TLSOPT to DT_PPC64_OPT and changed the tag to a bitfield. On looking at that, it seems you'll need binutils-2.24 to build executables and shared libraries that work with the patch as is (well, they'll work but glibc won't provide the (0,offset) tls_index objects). glibc itself doesn't need the newer binutils to build, but you're right, I should mention this in NEWS and there should be a test that the new support is working for those that don't read NEWS.
On Wed, Mar 18, 2015 at 05:14:19PM +0000, Joseph Myers wrote: > On Wed, 18 Mar 2015, Alan Modra wrote: > > > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist > > index d71611f..052f311 100644 > > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist > > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist > > @@ -1,3 +1,6 @@ > > +GLIBC_2.22 > > + GLIBC_2.22 A > > + __tls_get_addr_opt F > > GLIBC_2.0 > > GLIBC_2.0 A > > __libc_memalign F > > That positioning looks odd - I thought these files were alphabetical (so > it would go between GLIBC_2.1 and GLIBC_2.3)? I'd better test that again. I did have the location wrong first time I changed the file, so may have a discrepancy between what I posted here and the source actually tested.
On 03/18/2015 10:56 PM, Alan Modra wrote: > On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: >> On 03/18/2015 02:11 AM, Alan Modra wrote: >>> Now that Alex's fixes for static TLS have gone in, I figure it's worth >>> revisiting an old patch of mine. >>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html >> >> I'm not against this patch, but it certainly seems like you would be >> better served by just implementing tls descriptors? > > I think this is one better than tls descriptors, because powerpc > avoids the indirect function call used by tls descriptors. You mean to say it is "faster" than tls descriptors, but at the same time "harder" to maintain because it's a custom implementation that anyone debugging glibc has to learn about. That's not a bad thing, I just want us all to acknowledge the tradeoff. The present goal for glibc and the toolchain in general has been to move to TLS descriptors, and thus provide a way for the dozen or so packages in the distribution to stop doing this: mesa (src/mapi/u_current.h): extern __thread struct mapi_table *u_current_table __attribute__((tls_model("initial-exec"))); They would instead use TLS descriptors, and the above markings would be removed and the access would be as fast as possible without needing to specify the IE model. These packages are sometimes linked with applications, and sometimes arbitrarily dlopened. Would this present optimization you propose for power support this use case? Would it use static TLS for the above access if it could and fall back gracefully if it can't? What I want to make sure is that Power isn't left behind when we eventually transition everyone else to TLS Descriptors and remove the above markings from source programs. >> Do you have a reference to the binutils patch? > > https://sourceware.org/ml/binutils/2009-03/msg00498.html Excellent, that makes it much easier to review the glibc pieces since I can see what the static linker is going to do and review the stub itself. >>> In essence, this optimization uses a zero module id in the TLS >>> descriptor to indicate that a TLS variable is allocated space in the >>> static TLS area. A special plt call linker stub for __tls_get_addr >>> checks for such a TLS descriptor and if found, returns the offset >>> immediately. The linker communicates the fact that the special >>> __tls_get_addr stub is present by setting a bit in the dynamic tag >>> DT_PPC64_OPT/DT_PPC_OPT. >> >> I'm confused, you write "TLS descriptor" but power doesn't have TLS DESC >> support yet in glibc? > > Oops, I meant a tls_index object as defined in Drepper's tls.pdf. > The binutils reference above makes the same error.. No problem. Thanks for clarifying. This is part of the problem with having an alternate implementation. >> The code in question writes a module id of zero into the GOT entry >> associated with the TLS variable, not really the TLS descriptor? > > Right. OK. >> Speaking of which, you wouldn't happen to have a Latex contribution >> that describes the Power TLS support so I can add it to and update >> tls.pdf? :-) > > No, sorry, I wrote the original powerpc tls abi as plain text. Could you mail that to me privately please? I'd like a copy for my own reference. >> This absolutely needs a new ppc64-specific test case to make sure this is >> actually working as intended? If it requires a new binutils, then you'll need >> to have the test return 77 (UNSUPPORTED) if the present binutils is not new >> enough. > > We actually get the support turned on automatically, no gcc or ld > options needed, so the existing tls tests run using the optimized > __tls_get_addr support. OK, as long as there are binutils test that make sure the stub is in place and being used, and static tls is allocated for the entries, then I'm fine. > Hmm, your comment reminded me that I need to check older binutils, > because I renamed DT_PPC64_TLSOPT to DT_PPC64_OPT and changed the tag > to a bitfield. On looking at that, it seems you'll need binutils-2.24 > to build executables and shared libraries that work with the patch as > is (well, they'll work but glibc won't provide the (0,offset) > tls_index objects). glibc itself doesn't need the newer binutils to > build, but you're right, I should mention this in NEWS and there > should be a test that the new support is working for those that don't > read NEWS. OK. Cheers, Carlos.
On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: > On 03/18/2015 10:56 PM, Alan Modra wrote: > > On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: > >> On 03/18/2015 02:11 AM, Alan Modra wrote: > >>> Now that Alex's fixes for static TLS have gone in, I figure it's worth > >>> revisiting an old patch of mine. > >>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html > >> > >> I'm not against this patch, but it certainly seems like you would be > >> better served by just implementing tls descriptors? > > > > I think this is one better than tls descriptors, because powerpc > > avoids the indirect function call used by tls descriptors. > > You mean to say it is "faster" than tls descriptors, but at the same To be honest, there isn't much difference in the optimized case where static TLS is available. It boils down to an indirect call to a function that loads one value vs. a direct call to a stub that loads two values and compares one against zero. I think what I've implemented is slightly better for PowerPC, but whether that would carry over to other architectures is debatable. > time "harder" to maintain because it's a custom implementation that > anyone debugging glibc has to learn about. That's not a bad thing, > I just want us all to acknowledge the tradeoff. Well, yes, but the PowerPC implementation is all in dl-machine.h, and looks very similar to x86_64 in use of CHECK_STATIC_TLS, TRY_STATIC_TLS and modification of the tls_index entry. PowerPC doesn't have the complication and potential failure of allocating extended descriptors. We also don't need to pass extra flags to gcc to enable the optimization. > The present goal for glibc and the toolchain in general has been > to move to TLS descriptors, and thus provide a way for the dozen or > so packages in the distribution to stop doing this: > > mesa (src/mapi/u_current.h): > > extern __thread struct mapi_table *u_current_table > __attribute__((tls_model("initial-exec"))); > > They would instead use TLS descriptors, and the above markings would > be removed and the access would be as fast as possible without needing > to specify the IE model. > > These packages are sometimes linked with applications, and sometimes > arbitrarily dlopened. > > Would this present optimization you propose for power support this > use case? Sure. This is exactly the use case the powerpc optimization tackles, shared libraries using general dynamic or local dynamic TLS access. Like TLS descriptors, it can also handle general dynamic or local dynamic TLS access in an executable, but these will normally be optimized to IE or LE by GNU ld. > Would it use static TLS for the above access if it could and fall > back gracefully if it can't? Yes. > What I want to make sure is that Power isn't left behind when we > eventually transition everyone else to TLS Descriptors and remove > the above markings from source programs. Other architectures left behind by the PowerPC implementation might like to transition from TLS descriptors. Just kidding. :)
On 03/20/2015 03:55 AM, Alan Modra wrote: > On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: >> On 03/18/2015 10:56 PM, Alan Modra wrote: >>> On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: >>>> On 03/18/2015 02:11 AM, Alan Modra wrote: >>>>> Now that Alex's fixes for static TLS have gone in, I figure it's worth >>>>> revisiting an old patch of mine. >>>>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html >>>> >>>> I'm not against this patch, but it certainly seems like you would be >>>> better served by just implementing tls descriptors? >>> >>> I think this is one better than tls descriptors, because powerpc >>> avoids the indirect function call used by tls descriptors. >> >> You mean to say it is "faster" than tls descriptors, but at the same > > To be honest, there isn't much difference in the optimized case where > static TLS is available. It boils down to an indirect call to a > function that loads one value vs. a direct call to a stub that loads > two values and compares one against zero. I think what I've > implemented is slightly better for PowerPC, but whether that would > carry over to other architectures is debatable. I agree that what you have implemented is faster for power. >> time "harder" to maintain because it's a custom implementation that >> anyone debugging glibc has to learn about. That's not a bad thing, >> I just want us all to acknowledge the tradeoff. > > Well, yes, but the PowerPC implementation is all in dl-machine.h, and > looks very similar to x86_64 in use of CHECK_STATIC_TLS, > TRY_STATIC_TLS and modification of the tls_index entry. PowerPC > doesn't have the complication and potential failure of allocating > extended descriptors. We also don't need to pass extra flags to gcc > to enable the optimization. I also agree that your present implementation mirrors TLS DESC in the implementation and reuse of CHECK_STATIC_TLS/TRY_STAIC_TLS, and I like that aspect of the change. >> The present goal for glibc and the toolchain in general has been >> to move to TLS descriptors, and thus provide a way for the dozen or >> so packages in the distribution to stop doing this: >> >> mesa (src/mapi/u_current.h): >> >> extern __thread struct mapi_table *u_current_table >> __attribute__((tls_model("initial-exec"))); >> >> They would instead use TLS descriptors, and the above markings would >> be removed and the access would be as fast as possible without needing >> to specify the IE model. >> >> These packages are sometimes linked with applications, and sometimes >> arbitrarily dlopened. >> >> Would this present optimization you propose for power support this >> use case? > > Sure. This is exactly the use case the powerpc optimization tackles, > shared libraries using general dynamic or local dynamic TLS access. > Like TLS descriptors, it can also handle general dynamic or local > dynamic TLS access in an executable, but these will normally be > optimized to IE or LE by GNU ld. Perfect, just making sure were were on the same page. I figured, after reading the binutils patch this is mostly operated like TLS DESC, but slightly optimized for power. >> Would it use static TLS for the above access if it could and fall >> back gracefully if it can't? > > Yes. Good. I expected that it would simply degenerate to a call to __tls_get_addr if it can't get static tls space. >> What I want to make sure is that Power isn't left behind when we >> eventually transition everyone else to TLS Descriptors and remove >> the above markings from source programs. > > Other architectures left behind by the PowerPC implementation might > like to transition from TLS descriptors. Just kidding. :) Given your answers above I'm happy to see this go into glibc. The patch itself looks fine to me, the real magic is in binutils with yet another super-secret stub that has no debug information and must be recognized by memory by the person doing the debugging :} Cheers, Carlos.
On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote: > On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: > > On 03/18/2015 10:56 PM, Alan Modra wrote: > > > On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: > > >> On 03/18/2015 02:11 AM, Alan Modra wrote: > > >>> Now that Alex's fixes for static TLS have gone in, I figure it's worth > > >>> revisiting an old patch of mine. > > >>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html > > >> > > >> I'm not against this patch, but it certainly seems like you would be > > >> better served by just implementing tls descriptors? > > > > > > I think this is one better than tls descriptors, because powerpc > > > avoids the indirect function call used by tls descriptors. > > > > You mean to say it is "faster" than tls descriptors, but at the same > > To be honest, there isn't much difference in the optimized case where > static TLS is available. It boils down to an indirect call to a > function that loads one value vs. a direct call to a stub that loads > two values and compares one against zero. I think what I've > implemented is slightly better for PowerPC, but whether that would > carry over to other architectures is debatable. If the performance difference isn't measurable in real-world applications, I would think uniformity between targets would be a lot more valuable. I also don't see how your approach is a "direct call". The function being called is in a different DSO so it has to go through a pointer in the GOT or similar, in which case it's just as "indirect" as the TLSDESC call would be. Rich
On 03/20/2015 11:27 AM, Rich Felker wrote: > On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote: >> On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: >>> On 03/18/2015 10:56 PM, Alan Modra wrote: >>>> On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: >>>>> On 03/18/2015 02:11 AM, Alan Modra wrote: >>>>>> Now that Alex's fixes for static TLS have gone in, I figure it's worth >>>>>> revisiting an old patch of mine. >>>>>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html >>>>> >>>>> I'm not against this patch, but it certainly seems like you would be >>>>> better served by just implementing tls descriptors? >>>> >>>> I think this is one better than tls descriptors, because powerpc >>>> avoids the indirect function call used by tls descriptors. >>> >>> You mean to say it is "faster" than tls descriptors, but at the same >> >> To be honest, there isn't much difference in the optimized case where >> static TLS is available. It boils down to an indirect call to a >> function that loads one value vs. a direct call to a stub that loads >> two values and compares one against zero. I think what I've >> implemented is slightly better for PowerPC, but whether that would >> carry over to other architectures is debatable. > > If the performance difference isn't measurable in real-world > applications, I would think uniformity between targets would be a lot > more valuable. > > I also don't see how your approach is a "direct call". The function > being called is in a different DSO so it has to go through a pointer > in the GOT or similar, in which case it's just as "indirect" as the > TLSDESC call would be. I agree. And this was my initial inclination, but I'm not against what Alan has implemented. As a machine maintainer he should be allowed some leeway to argue this implementation is "N instructions less" and therefore must be faster, but that such speed is harder to show in a microbenchmark, it would in the mean result in say less CPU usage over billions of cycles. IBM has to accept that the downside to all of this is that breakage in this area may take longer to fix, and get less fixes than those arches already using TLS DESC. Cheers, Carlos.
On Fri, Mar 20, 2015 at 8:48 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 03/20/2015 11:27 AM, Rich Felker wrote: >> On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote: >>> On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: >>>> On 03/18/2015 10:56 PM, Alan Modra wrote: >>>>> On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: >>>>>> On 03/18/2015 02:11 AM, Alan Modra wrote: >>>>>>> Now that Alex's fixes for static TLS have gone in, I figure it's worth >>>>>>> revisiting an old patch of mine. >>>>>>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html >>>>>> >>>>>> I'm not against this patch, but it certainly seems like you would be >>>>>> better served by just implementing tls descriptors? >>>>> >>>>> I think this is one better than tls descriptors, because powerpc >>>>> avoids the indirect function call used by tls descriptors. >>>> >>>> You mean to say it is "faster" than tls descriptors, but at the same >>> >>> To be honest, there isn't much difference in the optimized case where >>> static TLS is available. It boils down to an indirect call to a >>> function that loads one value vs. a direct call to a stub that loads >>> two values and compares one against zero. I think what I've >>> implemented is slightly better for PowerPC, but whether that would >>> carry over to other architectures is debatable. >> >> If the performance difference isn't measurable in real-world >> applications, I would think uniformity between targets would be a lot >> more valuable. >> >> I also don't see how your approach is a "direct call". The function >> being called is in a different DSO so it has to go through a pointer >> in the GOT or similar, in which case it's just as "indirect" as the >> TLSDESC call would be. > > I agree. And this was my initial inclination, but I'm not against what > Alan has implemented. As a machine maintainer he should be allowed some > leeway to argue this implementation is "N instructions less" and therefore > must be faster, but that such speed is harder to show in a microbenchmark, > it would in the mean result in say less CPU usage over billions of cycles. > > IBM has to accept that the downside to all of this is that breakage in > this area may take longer to fix, and get less fixes than those arches > already using TLS DESC. Speaking of TLS DESC, are there any tests for TLS DESC in glibc? I never implemented TLS DESC for x32 since I didn't find any run-time tests for TLS DESC in GCC nor glibc.
On Fri, Mar 20, 2015 at 08:51:39AM -0700, H.J. Lu wrote: > On Fri, Mar 20, 2015 at 8:48 AM, Carlos O'Donell <carlos@redhat.com> wrote: > > On 03/20/2015 11:27 AM, Rich Felker wrote: > >> On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote: > >>> On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: > >>>> On 03/18/2015 10:56 PM, Alan Modra wrote: > >>>>> On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: > >>>>>> On 03/18/2015 02:11 AM, Alan Modra wrote: > >>>>>>> Now that Alex's fixes for static TLS have gone in, I figure it's worth > >>>>>>> revisiting an old patch of mine. > >>>>>>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html > >>>>>> > >>>>>> I'm not against this patch, but it certainly seems like you would be > >>>>>> better served by just implementing tls descriptors? > >>>>> > >>>>> I think this is one better than tls descriptors, because powerpc > >>>>> avoids the indirect function call used by tls descriptors. > >>>> > >>>> You mean to say it is "faster" than tls descriptors, but at the same > >>> > >>> To be honest, there isn't much difference in the optimized case where > >>> static TLS is available. It boils down to an indirect call to a > >>> function that loads one value vs. a direct call to a stub that loads > >>> two values and compares one against zero. I think what I've > >>> implemented is slightly better for PowerPC, but whether that would > >>> carry over to other architectures is debatable. > >> > >> If the performance difference isn't measurable in real-world > >> applications, I would think uniformity between targets would be a lot > >> more valuable. > >> > >> I also don't see how your approach is a "direct call". The function > >> being called is in a different DSO so it has to go through a pointer > >> in the GOT or similar, in which case it's just as "indirect" as the > >> TLSDESC call would be. > > > > I agree. And this was my initial inclination, but I'm not against what > > Alan has implemented. As a machine maintainer he should be allowed some > > leeway to argue this implementation is "N instructions less" and therefore > > must be faster, but that such speed is harder to show in a microbenchmark, > > it would in the mean result in say less CPU usage over billions of cycles. > > > > IBM has to accept that the downside to all of this is that breakage in > > this area may take longer to fix, and get less fixes than those arches > > already using TLS DESC. > > Speaking of TLS DESC, are there any tests for TLS DESC in > glibc? I never implemented TLS DESC for x32 since I didn't > find any run-time tests for TLS DESC in GCC nor glibc. Not that I know of. i386 TLSDESC was broken in binutils for several years and only recently fixed... Until a couple months ago nobody noticed. :-( This situation really should be set right (with proper tests and timeline for changing the default to TLSDESC) so we can put an end to the invalid use of IE-model in shared libraries. Rich
On Fri, Mar 20, 2015 at 9:14 AM, Rich Felker <dalias@libc.org> wrote: > On Fri, Mar 20, 2015 at 08:51:39AM -0700, H.J. Lu wrote: >> On Fri, Mar 20, 2015 at 8:48 AM, Carlos O'Donell <carlos@redhat.com> wrote: >> > On 03/20/2015 11:27 AM, Rich Felker wrote: >> >> On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote: >> >>> On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: >> >>>> On 03/18/2015 10:56 PM, Alan Modra wrote: >> >>>>> On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: >> >>>>>> On 03/18/2015 02:11 AM, Alan Modra wrote: >> >>>>>>> Now that Alex's fixes for static TLS have gone in, I figure it's worth >> >>>>>>> revisiting an old patch of mine. >> >>>>>>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html >> >>>>>> >> >>>>>> I'm not against this patch, but it certainly seems like you would be >> >>>>>> better served by just implementing tls descriptors? >> >>>>> >> >>>>> I think this is one better than tls descriptors, because powerpc >> >>>>> avoids the indirect function call used by tls descriptors. >> >>>> >> >>>> You mean to say it is "faster" than tls descriptors, but at the same >> >>> >> >>> To be honest, there isn't much difference in the optimized case where >> >>> static TLS is available. It boils down to an indirect call to a >> >>> function that loads one value vs. a direct call to a stub that loads >> >>> two values and compares one against zero. I think what I've >> >>> implemented is slightly better for PowerPC, but whether that would >> >>> carry over to other architectures is debatable. >> >> >> >> If the performance difference isn't measurable in real-world >> >> applications, I would think uniformity between targets would be a lot >> >> more valuable. >> >> >> >> I also don't see how your approach is a "direct call". The function >> >> being called is in a different DSO so it has to go through a pointer >> >> in the GOT or similar, in which case it's just as "indirect" as the >> >> TLSDESC call would be. >> > >> > I agree. And this was my initial inclination, but I'm not against what >> > Alan has implemented. As a machine maintainer he should be allowed some >> > leeway to argue this implementation is "N instructions less" and therefore >> > must be faster, but that such speed is harder to show in a microbenchmark, >> > it would in the mean result in say less CPU usage over billions of cycles. >> > >> > IBM has to accept that the downside to all of this is that breakage in >> > this area may take longer to fix, and get less fixes than those arches >> > already using TLS DESC. >> >> Speaking of TLS DESC, are there any tests for TLS DESC in >> glibc? I never implemented TLS DESC for x32 since I didn't >> find any run-time tests for TLS DESC in GCC nor glibc. > > Not that I know of. i386 TLSDESC was broken in binutils for several > years and only recently fixed... Until a couple months ago nobody > noticed. :-( > > This situation really should be set right (with proper tests and > timeline for changing the default to TLSDESC) so we can put an end to > the invalid use of IE-model in shared libraries. Another thing, x86 and x86-64 TLS DESC spec should be in x86 and x86-64 psABIs, not a URL.
On 03/20/2015 12:19 PM, H.J. Lu wrote: > On Fri, Mar 20, 2015 at 9:14 AM, Rich Felker <dalias@libc.org> wrote: >> On Fri, Mar 20, 2015 at 08:51:39AM -0700, H.J. Lu wrote: >>> On Fri, Mar 20, 2015 at 8:48 AM, Carlos O'Donell <carlos@redhat.com> wrote: >>>> On 03/20/2015 11:27 AM, Rich Felker wrote: >>>>> On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote: >>>>>> On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: >>>>>>> On 03/18/2015 10:56 PM, Alan Modra wrote: >>>>>>>> On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: >>>>>>>>> On 03/18/2015 02:11 AM, Alan Modra wrote: >>>>>>>>>> Now that Alex's fixes for static TLS have gone in, I figure it's worth >>>>>>>>>> revisiting an old patch of mine. >>>>>>>>>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html >>>>>>>>> >>>>>>>>> I'm not against this patch, but it certainly seems like you would be >>>>>>>>> better served by just implementing tls descriptors? >>>>>>>> >>>>>>>> I think this is one better than tls descriptors, because powerpc >>>>>>>> avoids the indirect function call used by tls descriptors. >>>>>>> >>>>>>> You mean to say it is "faster" than tls descriptors, but at the same >>>>>> >>>>>> To be honest, there isn't much difference in the optimized case where >>>>>> static TLS is available. It boils down to an indirect call to a >>>>>> function that loads one value vs. a direct call to a stub that loads >>>>>> two values and compares one against zero. I think what I've >>>>>> implemented is slightly better for PowerPC, but whether that would >>>>>> carry over to other architectures is debatable. >>>>> >>>>> If the performance difference isn't measurable in real-world >>>>> applications, I would think uniformity between targets would be a lot >>>>> more valuable. >>>>> >>>>> I also don't see how your approach is a "direct call". The function >>>>> being called is in a different DSO so it has to go through a pointer >>>>> in the GOT or similar, in which case it's just as "indirect" as the >>>>> TLSDESC call would be. >>>> >>>> I agree. And this was my initial inclination, but I'm not against what >>>> Alan has implemented. As a machine maintainer he should be allowed some >>>> leeway to argue this implementation is "N instructions less" and therefore >>>> must be faster, but that such speed is harder to show in a microbenchmark, >>>> it would in the mean result in say less CPU usage over billions of cycles. >>>> >>>> IBM has to accept that the downside to all of this is that breakage in >>>> this area may take longer to fix, and get less fixes than those arches >>>> already using TLS DESC. >>> >>> Speaking of TLS DESC, are there any tests for TLS DESC in >>> glibc? I never implemented TLS DESC for x32 since I didn't >>> find any run-time tests for TLS DESC in GCC nor glibc. >> >> Not that I know of. i386 TLSDESC was broken in binutils for several >> years and only recently fixed... Until a couple months ago nobody >> noticed. :-( >> >> This situation really should be set right (with proper tests and >> timeline for changing the default to TLSDESC) so we can put an end to >> the invalid use of IE-model in shared libraries. > > Another thing, x86 and x86-64 TLS DESC spec should be > in x86 and x86-64 psABIs, not a URL. Agreed. As should the TLS specification instead of a URL reference to tls.pdf which is going to get out of date. Cheers, Carlos.
On Fri, Mar 20, 2015 at 9:21 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 03/20/2015 12:19 PM, H.J. Lu wrote: >> On Fri, Mar 20, 2015 at 9:14 AM, Rich Felker <dalias@libc.org> wrote: >>> On Fri, Mar 20, 2015 at 08:51:39AM -0700, H.J. Lu wrote: >>>> On Fri, Mar 20, 2015 at 8:48 AM, Carlos O'Donell <carlos@redhat.com> wrote: >>>>> On 03/20/2015 11:27 AM, Rich Felker wrote: >>>>>> On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote: >>>>>>> On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: >>>>>>>> On 03/18/2015 10:56 PM, Alan Modra wrote: >>>>>>>>> On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: >>>>>>>>>> On 03/18/2015 02:11 AM, Alan Modra wrote: >>>>>>>>>>> Now that Alex's fixes for static TLS have gone in, I figure it's worth >>>>>>>>>>> revisiting an old patch of mine. >>>>>>>>>>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html >>>>>>>>>> >>>>>>>>>> I'm not against this patch, but it certainly seems like you would be >>>>>>>>>> better served by just implementing tls descriptors? >>>>>>>>> >>>>>>>>> I think this is one better than tls descriptors, because powerpc >>>>>>>>> avoids the indirect function call used by tls descriptors. >>>>>>>> >>>>>>>> You mean to say it is "faster" than tls descriptors, but at the same >>>>>>> >>>>>>> To be honest, there isn't much difference in the optimized case where >>>>>>> static TLS is available. It boils down to an indirect call to a >>>>>>> function that loads one value vs. a direct call to a stub that loads >>>>>>> two values and compares one against zero. I think what I've >>>>>>> implemented is slightly better for PowerPC, but whether that would >>>>>>> carry over to other architectures is debatable. >>>>>> >>>>>> If the performance difference isn't measurable in real-world >>>>>> applications, I would think uniformity between targets would be a lot >>>>>> more valuable. >>>>>> >>>>>> I also don't see how your approach is a "direct call". The function >>>>>> being called is in a different DSO so it has to go through a pointer >>>>>> in the GOT or similar, in which case it's just as "indirect" as the >>>>>> TLSDESC call would be. >>>>> >>>>> I agree. And this was my initial inclination, but I'm not against what >>>>> Alan has implemented. As a machine maintainer he should be allowed some >>>>> leeway to argue this implementation is "N instructions less" and therefore >>>>> must be faster, but that such speed is harder to show in a microbenchmark, >>>>> it would in the mean result in say less CPU usage over billions of cycles. >>>>> >>>>> IBM has to accept that the downside to all of this is that breakage in >>>>> this area may take longer to fix, and get less fixes than those arches >>>>> already using TLS DESC. >>>> >>>> Speaking of TLS DESC, are there any tests for TLS DESC in >>>> glibc? I never implemented TLS DESC for x32 since I didn't >>>> find any run-time tests for TLS DESC in GCC nor glibc. >>> >>> Not that I know of. i386 TLSDESC was broken in binutils for several >>> years and only recently fixed... Until a couple months ago nobody >>> noticed. :-( >>> >>> This situation really should be set right (with proper tests and >>> timeline for changing the default to TLSDESC) so we can put an end to >>> the invalid use of IE-model in shared libraries. >> >> Another thing, x86 and x86-64 TLS DESC spec should be >> in x86 and x86-64 psABIs, not a URL. > > Agreed. As should the TLS specification instead of a URL reference to > tls.pdf which is going to get out of date. TLS spec is too big to be included in x86 psABIs unless Ulrich contributed patches for tex source to x86 psABIs.
On 03/20/2015 12:24 PM, H.J. Lu wrote: >>> Another thing, x86 and x86-64 TLS DESC spec should be >>> in x86 and x86-64 psABIs, not a URL. >> >> Agreed. As should the TLS specification instead of a URL reference to >> tls.pdf which is going to get out of date. > > TLS spec is too big to be included in x86 psABIs unless > Ulrich contributed patches for tex source to x86 psABIs. Yes, the whole spec is too big, the psABI would have only the portion for x86. Does that make sense? Cheers, Carlos.
On Fri, Mar 20, 2015 at 10:34 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 03/20/2015 12:24 PM, H.J. Lu wrote: >>>> Another thing, x86 and x86-64 TLS DESC spec should be >>>> in x86 and x86-64 psABIs, not a URL. >>> >>> Agreed. As should the TLS specification instead of a URL reference to >>> tls.pdf which is going to get out of date. >> >> TLS spec is too big to be included in x86 psABIs unless >> Ulrich contributed patches for tex source to x86 psABIs. > > Yes, the whole spec is too big, the psABI would have only > the portion for x86. > > Does that make sense? Sure. Patches are welcome.
On Fri, Mar 20, 2015 at 09:24:21AM -0700, H.J. Lu wrote: > >>> Not that I know of. i386 TLSDESC was broken in binutils for several > >>> years and only recently fixed... Until a couple months ago nobody > >>> noticed. :-( > >>> > >>> This situation really should be set right (with proper tests and > >>> timeline for changing the default to TLSDESC) so we can put an end to > >>> the invalid use of IE-model in shared libraries. > >> > >> Another thing, x86 and x86-64 TLS DESC spec should be > >> in x86 and x86-64 psABIs, not a URL. > > > > Agreed. As should the TLS specification instead of a URL reference to > > tls.pdf which is going to get out of date. > > TLS spec is too big to be included in x86 psABIs unless > Ulrich contributed patches for tex source to x86 psABIs. Are you sure? His TLS docs contain a lot of informative content that should not be taken as spec. There's no reason for a psABI to document optimizations a linker can make. Simply documenting the semantics of the relocation types and the actual ABI constraints they impose (mainly, location of static TLS relative to the thread-pointer) should be possible in a fairly compact text suitable for inclusion in the psABI. Of course actually writing that is a bit of work... Rich
On 03/20/2015 01:37 PM, H.J. Lu wrote: > On Fri, Mar 20, 2015 at 10:34 AM, Carlos O'Donell <carlos@redhat.com> wrote: >> On 03/20/2015 12:24 PM, H.J. Lu wrote: >>>>> Another thing, x86 and x86-64 TLS DESC spec should be >>>>> in x86 and x86-64 psABIs, not a URL. >>>> >>>> Agreed. As should the TLS specification instead of a URL reference to >>>> tls.pdf which is going to get out of date. >>> >>> TLS spec is too big to be included in x86 psABIs unless >>> Ulrich contributed patches for tex source to x86 psABIs. >> >> Yes, the whole spec is too big, the psABI would have only >> the portion for x86. >> >> Does that make sense? > > Sure. Patches are welcome. Thanks, just making sure you didn't object. c.
On Fri, Mar 20, 2015 at 11:04 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 03/20/2015 01:37 PM, H.J. Lu wrote: >> On Fri, Mar 20, 2015 at 10:34 AM, Carlos O'Donell <carlos@redhat.com> wrote: >>> On 03/20/2015 12:24 PM, H.J. Lu wrote: >>>>>> Another thing, x86 and x86-64 TLS DESC spec should be >>>>>> in x86 and x86-64 psABIs, not a URL. >>>>> >>>>> Agreed. As should the TLS specification instead of a URL reference to >>>>> tls.pdf which is going to get out of date. >>>> >>>> TLS spec is too big to be included in x86 psABIs unless >>>> Ulrich contributed patches for tex source to x86 psABIs. >>> >>> Yes, the whole spec is too big, the psABI would have only >>> the portion for x86. >>> >>> Does that make sense? >> >> Sure. Patches are welcome. > > Thanks, just making sure you didn't object. I have been trying to keep x86 psABIs up to date. Since x86-32 psABI is based on x86-64 psABI, changes like this should go into x86-64 psABI first. X86-64 psABI patches should be sent to https://groups.google.com/forum/#!forum/x86-64-abi Thanks.
On Fri, Mar 20, 2015 at 11:27:12AM -0400, Rich Felker wrote: > On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote: > > On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: > > > On 03/18/2015 10:56 PM, Alan Modra wrote: > > > > On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: > > > >> On 03/18/2015 02:11 AM, Alan Modra wrote: > > > >>> Now that Alex's fixes for static TLS have gone in, I figure it's worth > > > >>> revisiting an old patch of mine. > > > >>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html > > > >> > > > >> I'm not against this patch, but it certainly seems like you would be > > > >> better served by just implementing tls descriptors? > > > > > > > > I think this is one better than tls descriptors, because powerpc > > > > avoids the indirect function call used by tls descriptors. > > > > > > You mean to say it is "faster" than tls descriptors, but at the same > > > > To be honest, there isn't much difference in the optimized case where > > static TLS is available. It boils down to an indirect call to a > > function that loads one value vs. a direct call to a stub that loads > > two values and compares one against zero. I think what I've > > implemented is slightly better for PowerPC, but whether that would > > carry over to other architectures is debatable. > > If the performance difference isn't measurable in real-world > applications, I would think uniformity between targets would be a lot > more valuable. Think of my design as "TLS descriptors version 2". I take the best features of TLS descriptors and add one trick, the special linker stub, that allows you to omit many of the nasty details of the current TLS descriptor design. A target that currently has TLS support but no TLS descriptor support and follows the powerpc design: 1) won't need to implement gcc changes for tls descriptors, 2) won't need to define new relocations, 3) won't need to implement linker support for tls descriptors, quite a large effort, and 4) won't need to implement dl-tlsdesc.S and tlsdesc.c in glibc, also not a simple task. Another benefit in terms of reliability (and repeatable user timing!) is that extended TLS descriptors are not needed, so the locking and mallocing in tlsdeschtab.h is avoided. Admittedly, part of the reason a port is so much easier is due to omitting lazy TLS resolution. Lazy TLS is complex. What's more, the per-target support code is non-trivial. All of tlsdesc.c and half of dl-tlsdesc.S is lazy TLS support. I question whether the added complexity provides commensurate benefit in real-world applications, apart from the degenerate case of loading a shared library that is never used. (And even then, you'd need a lot of __thread variables to make it worthwhile.) In fact, I wouldn't be surprised to find lazy TLS has a net negative benefit in real-world applications! /me dons asbestos suit. :) > I also don't see how your approach is a "direct call". The function > being called is in a different DSO so it has to go through a pointer > in the GOT or similar, in which case it's just as "indirect" as the > TLSDESC call would be. It is a direct call to the linker provided stub, which will return after a few instructions in the optimized case when static TLS is available. Control is passed to __tls_get_addr_opt only when no static TLS was available for the shared library at the time the library was dynamically relocated, ie. it was dlopen'ed and not enough spare static TLS was free. Note that __tls_get_addr_opt is currently an alias for __tls_get_addr. I believe it could be implemented as a different function with a few more bells and whistles to provide lazy TLS resolution, but I haven't proven that.
On Sat, Mar 21, 2015 at 01:37:02PM +1030, Alan Modra wrote: > On Fri, Mar 20, 2015 at 11:27:12AM -0400, Rich Felker wrote: > > On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote: > > > On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote: > > > > On 03/18/2015 10:56 PM, Alan Modra wrote: > > > > > On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote: > > > > >> On 03/18/2015 02:11 AM, Alan Modra wrote: > > > > >>> Now that Alex's fixes for static TLS have gone in, I figure it's worth > > > > >>> revisiting an old patch of mine. > > > > >>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html > > > > >> > > > > >> I'm not against this patch, but it certainly seems like you would be > > > > >> better served by just implementing tls descriptors? > > > > > > > > > > I think this is one better than tls descriptors, because powerpc > > > > > avoids the indirect function call used by tls descriptors. > > > > > > > > You mean to say it is "faster" than tls descriptors, but at the same > > > > > > To be honest, there isn't much difference in the optimized case where > > > static TLS is available. It boils down to an indirect call to a > > > function that loads one value vs. a direct call to a stub that loads > > > two values and compares one against zero. I think what I've > > > implemented is slightly better for PowerPC, but whether that would > > > carry over to other architectures is debatable. > > > > If the performance difference isn't measurable in real-world > > applications, I would think uniformity between targets would be a lot > > more valuable. > > Think of my design as "TLS descriptors version 2". I take the best > features of TLS descriptors and add one trick, the special linker > stub, that allows you to omit many of the nasty details of the current > TLS descriptor design. A target that currently has TLS support but no > TLS descriptor support and follows the powerpc design: > 1) won't need to implement gcc changes for tls descriptors, > 2) won't need to define new relocations, > 3) won't need to implement linker support for tls descriptors, quite a > large effort, and > 4) won't need to implement dl-tlsdesc.S and tlsdesc.c in glibc, also > not a simple task. > Another benefit in terms of reliability (and repeatable user timing!) > is that extended TLS descriptors are not needed, so the locking and > mallocing in tlsdeschtab.h is avoided. If the lazy allocation stuff is removed (which it should be; it breaks AS-safety and other things), the last issue would go away. > Admittedly, part of the reason a port is so much easier is due to > omitting lazy TLS resolution. Lazy TLS is complex. What's more, the > per-target support code is non-trivial. All of tlsdesc.c and half of > dl-tlsdesc.S is lazy TLS support. I question whether the added > complexity provides commensurate benefit in real-world applications, > apart from the degenerate case of loading a shared library that is > never used. (And even then, you'd need a lot of __thread variables to > make it worthwhile.) > > In fact, I wouldn't be surprised to find lazy TLS has a net negative > benefit in real-world applications! > /me dons asbestos suit. :) I completely agree. I want to see it removed. > > I also don't see how your approach is a "direct call". The function > > being called is in a different DSO so it has to go through a pointer > > in the GOT or similar, in which case it's just as "indirect" as the > > TLSDESC call would be. > > It is a direct call to the linker provided stub, which will return > after a few instructions in the optimized case when static TLS is > available. That linker-provided stub address is loaded from a "GOT slot" of some sort, just like the tlsdesc function would be. Either way you have a PC/GP-relative load followed by a jump to the loaded address. There's actually one additional level of indirection to load this pointer for TLSDESC, but for static TLS, the callee returns instantly after performing a single load. With non-TLSDESC dynamic TLS on the other hand, there's an additional PC/GP-relative address computation (for the module/offset structure's address to pass) in the caller, which should equal out with the cost of the extra indirection for TLSDESC. But then there's a fair bit of additional work to be done in the callee. > Control is passed to __tls_get_addr_opt only when no static TLS was > available for the shared library at the time the library was > dynamically relocated, ie. it was dlopen'ed and not enough spare > static TLS was free. Where is contol passed if static TLS was used? Maybe I'm misunderstanding your design? How would the dynamic linker resolve some calls to __tls_get_addr to different places than other calls, when there's only a single GOT entry for it? Rich
On Sat, Mar 21, 2015 at 12:36:30AM -0400, Rich Felker wrote: > On Sat, Mar 21, 2015 at 01:37:02PM +1030, Alan Modra wrote: > > On Fri, Mar 20, 2015 at 11:27:12AM -0400, Rich Felker wrote: > > > I also don't see how your approach is a "direct call". The function > > > being called is in a different DSO so it has to go through a pointer > > > in the GOT or similar, in which case it's just as "indirect" as the > > > TLSDESC call would be. > > > > It is a direct call to the linker provided stub, which will return > > after a few instructions in the optimized case when static TLS is > > available. > > That linker-provided stub address is loaded from a "GOT slot" of some > sort, No, it really is a direct call. The linker provided stub is local. This ppc64 elfv2 GD sequence in a relocatable object file addi r3,r2,x@got@tlsgd bl __tls_get_addr(x@tlsgd) nop results in shared library code of addi r3,r2,x@got@tlsgd # r3 -> tls_index entry in GOT bl __tls_get_addr_opt_stub # direct call nop . . __tls_get_addr_opt_stub: ld r11,0(r3) # tls_index->ti_module ld r12 8(r3) # tls_index->ti_offset mr r0,r3 cmpdi r11,0 add r3,r12,r13 # r13 == thread pointer beqlr # return if static TLS allocated mr r3,r0 mflr r11 std r11, 8(r1) std r2 24(r1) addis r12,r2,__tls_get_addr_opt@plt@ha ld r12, __tls_get_addr_opt@plt@l(r12) mtctr r12 bctrl # call __tls_get_addr_opt ld r2,24(r1) ld r11,8(r1) mtlr r11 blr
diff --git a/NEWS b/NEWS index 86394b8..a6a8b6d 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,9 @@ Version 2.22 18042, 18043, 18046, 18047, 18068, 18080, 18093, 18104, 18110, 18111, 18128. +* A powerpc and powerpc64 optimization for TLS, similar to TLS descriptors + for LD and GD on x86 and x86-64, has been implemented. + * Character encoding and ctype tables were updated to Unicode 7.0.0, using new generator scripts contributed by Pravin Satpute and Mike FABIAN (Red Hat). These updates cause user visible changes, such as the fix for bug diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h index 6f4a773..8d428e2 100644 --- a/elf/dynamic-link.h +++ b/elf/dynamic-link.h @@ -25,11 +25,14 @@ an attempt to allocate it in surplus space on the fly. If that can't be done, we fall back to the error that DF_STATIC_TLS is intended to produce. */ +#define HAVE_STATIC_TLS(map, sym_map) \ + (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET \ + && ((sym_map)->l_tls_offset \ + != FORCED_DYNAMIC_TLS_OFFSET), 1)) + #define CHECK_STATIC_TLS(map, sym_map) \ do { \ - if (__builtin_expect ((sym_map)->l_tls_offset == NO_TLS_OFFSET \ - || ((sym_map)->l_tls_offset \ - == FORCED_DYNAMIC_TLS_OFFSET), 0)) \ + if (!HAVE_STATIC_TLS (map, sym_map)) \ _dl_allocate_static_tls (sym_map); \ } while (0) diff --git a/elf/elf.h b/elf/elf.h index 496f08d..71492a2 100644 --- a/elf/elf.h +++ b/elf/elf.h @@ -2194,6 +2194,8 @@ enum #define R_PPC_GOT_DTPREL16_LO 92 /* half16* (sym+add)@got@dtprel@l */ #define R_PPC_GOT_DTPREL16_HI 93 /* half16* (sym+add)@got@dtprel@h */ #define R_PPC_GOT_DTPREL16_HA 94 /* half16* (sym+add)@got@dtprel@ha */ +#define R_PPC_TLSGD 95 /* none (sym+add)@tlsgd */ +#define R_PPC_TLSLD 96 /* none (sym+add)@tlsld */ /* The remaining relocs are from the Embedded ELF ABI, and are not in the SVR4 ELF ABI. */ @@ -2237,7 +2239,11 @@ enum /* PowerPC specific values for the Dyn d_tag field. */ #define DT_PPC_GOT (DT_LOPROC + 0) -#define DT_PPC_NUM 1 +#define DT_PPC_OPT (DT_LOPROC + 1) +#define DT_PPC_NUM 2 + +/* PowerPC specific values for the DT_PPC_OPT Dyn entry. */ +#define PPC_OPT_TLS 1 /* PowerPC64 relocations defined by the ABIs */ #define R_PPC64_NONE R_PPC_NONE diff --git a/sysdeps/powerpc/Versions b/sysdeps/powerpc/Versions index 47c2c3e..2aebf7c 100644 --- a/sysdeps/powerpc/Versions +++ b/sysdeps/powerpc/Versions @@ -15,3 +15,9 @@ libc { __vmx__libc_longjmp; __vmx__libc_siglongjmp; } } + +ld { + GLIBC_2.22 { + __tls_get_addr_opt; + } +} diff --git a/sysdeps/powerpc/dl-tls.c b/sysdeps/powerpc/dl-tls.c new file mode 100644 index 0000000..a18b23e --- /dev/null +++ b/sysdeps/powerpc/dl-tls.c @@ -0,0 +1,24 @@ +/* Thread-local storage handling in the ELF dynamic linker. PowerPC version. + Copyright (C) 2009-2015 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#include "elf/dl-tls.c" + +#ifdef SHARED +strong_alias(__tls_get_addr, __tls_get_addr_opt) +#endif diff --git a/sysdeps/powerpc/powerpc32/dl-machine.h b/sysdeps/powerpc/powerpc32/dl-machine.h index c94674f..8b0c067 100644 --- a/sysdeps/powerpc/powerpc32/dl-machine.h +++ b/sysdeps/powerpc/powerpc32/dl-machine.h @@ -333,6 +333,32 @@ elf_machine_rela (struct link_map *map, const Elf32_Rela *reloc, # endif case R_PPC_DTPMOD32: + if (map->l_info[DT_PPC(OPT)] + && (map->l_info[DT_PPC(OPT)]->d_un.d_val & PPC_OPT_TLS)) + { + if (!NOT_BOOTSTRAP) + { + reloc_addr[0] = 0; + reloc_addr[1] = (sym_map->l_tls_offset - TLS_TP_OFFSET + + TLS_DTV_OFFSET); + break; + } + else if (sym_map != NULL) + { +# ifndef SHARED + CHECK_STATIC_TLS (map, sym_map); +# else + if (TRY_STATIC_TLS (map, sym_map)) +# endif + { + reloc_addr[0] = 0; + /* Set up for local dynamic. */ + reloc_addr[1] = (sym_map->l_tls_offset - TLS_TP_OFFSET + + TLS_DTV_OFFSET); + break; + } + } + } if (!NOT_BOOTSTRAP) /* During startup the dynamic linker is always index 1. */ *reloc_addr = 1; @@ -342,6 +368,28 @@ elf_machine_rela (struct link_map *map, const Elf32_Rela *reloc, *reloc_addr = sym_map->l_tls_modid; break; case R_PPC_DTPREL32: + if (map->l_info[DT_PPC(OPT)] + && (map->l_info[DT_PPC(OPT)]->d_un.d_val & PPC_OPT_TLS)) + { + if (!NOT_BOOTSTRAP) + { + *reloc_addr = TLS_TPREL_VALUE (sym_map, sym, reloc); + break; + } + else if (sym_map != NULL) + { + /* This reloc is always preceded by R_PPC_DTPMOD32. */ +# ifndef SHARED + assert (HAVE_STATIC_TLS (map, sym_map)); +# else + if (HAVE_STATIC_TLS (map, sym_map)) +# endif + { + *reloc_addr = TLS_TPREL_VALUE (sym_map, sym, reloc); + break; + } + } + } /* During relocation all TLS symbols are defined and used. Therefore the offset is already correct. */ if (NOT_BOOTSTRAP && sym_map != NULL) diff --git a/sysdeps/powerpc/powerpc64/dl-machine.h b/sysdeps/powerpc/powerpc64/dl-machine.h index 5cb0087..55ac736 100644 --- a/sysdeps/powerpc/powerpc64/dl-machine.h +++ b/sysdeps/powerpc/powerpc64/dl-machine.h @@ -701,6 +701,32 @@ elf_machine_rela (struct link_map *map, return; case R_PPC64_DTPMOD64: + if (map->l_info[DT_PPC64(OPT)] + && (map->l_info[DT_PPC64(OPT)]->d_un.d_val & PPC64_OPT_TLS)) + { +#ifdef RTLD_BOOTSTRAP + reloc_addr[0] = 0; + reloc_addr[1] = (sym_map->l_tls_offset - TLS_TP_OFFSET + + TLS_DTV_OFFSET); + return; +#else + if (sym_map != NULL) + { +# ifndef SHARED + CHECK_STATIC_TLS (map, sym_map); +# else + if (TRY_STATIC_TLS (map, sym_map)) +# endif + { + reloc_addr[0] = 0; + /* Set up for local dynamic. */ + reloc_addr[1] = (sym_map->l_tls_offset - TLS_TP_OFFSET + + TLS_DTV_OFFSET); + return; + } + } +#endif + } #ifdef RTLD_BOOTSTRAP /* During startup the dynamic linker is always index 1. */ *reloc_addr = 1; @@ -713,6 +739,28 @@ elf_machine_rela (struct link_map *map, return; case R_PPC64_DTPREL64: + if (map->l_info[DT_PPC64(OPT)] + && (map->l_info[DT_PPC64(OPT)]->d_un.d_val & PPC64_OPT_TLS)) + { +#ifdef RTLD_BOOTSTRAP + *reloc_addr = TLS_TPREL_VALUE (sym_map, sym, reloc); + return; +#else + if (sym_map != NULL) + { + /* This reloc is always preceded by R_PPC64_DTPMOD64. */ +# ifndef SHARED + assert (HAVE_STATIC_TLS (map, sym_map)); +# else + if (HAVE_STATIC_TLS (map, sym_map)) +# endif + { + *reloc_addr = TLS_TPREL_VALUE (sym_map, sym, reloc); + return; + } + } +#endif + } /* During relocation all TLS symbols are defined and used. Therefore the offset is already correct. */ #ifndef RTLD_BOOTSTRAP diff --git a/sysdeps/unix/sysv/linux/powerpc/Makefile b/sysdeps/unix/sysv/linux/powerpc/Makefile index fcf3bb5..c89ed9e 100644 --- a/sysdeps/unix/sysv/linux/powerpc/Makefile +++ b/sysdeps/unix/sysv/linux/powerpc/Makefile @@ -20,6 +20,8 @@ ifeq ($(build-shared),yes) # This is needed for DSO loading from static binaries. sysdep-dl-routines += dl-static endif +# Otherwise tst-tls-dlinfo fails due to tst-tlsmod2.so using static tls. +LDFLAGS-tst-tlsmod2.so += -Wl,--no-tls-get-addr-optimize endif ifeq ($(subdir),misc) diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist index d71611f..052f311 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist @@ -1,3 +1,6 @@ +GLIBC_2.22 + GLIBC_2.22 A + __tls_get_addr_opt F GLIBC_2.0 GLIBC_2.0 A __libc_memalign F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/ld-le.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/ld-le.abilist index 3530fb4..3174e21 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/ld-le.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/ld-le.abilist @@ -9,3 +9,6 @@ GLIBC_2.17 free F malloc F realloc F +GLIBC_2.22 + GLIBC_2.22 A + __tls_get_addr_opt F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/ld.abilist index 899360e..d8c4201 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/ld.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/ld.abilist @@ -1,3 +1,6 @@ +GLIBC_2.22 + GLIBC_2.22 A + __tls_get_addr_opt F GLIBC_2.3 GLIBC_2.3 A __libc_memalign F