Message ID | 20150603152448.GC32684@spoyarek.pnq.redhat.com |
---|---|
State | New |
Headers | show |
On Jun 3, 2015, Siddhesh Poyarekar <siddhesh@redhat.com> wrote: > On Wed, Jun 03, 2015 at 03:44:58AM -0300, Alexandre Oliva wrote: >> We used to store static TLS addrs in the DTV at module load time, but >> this required one thread to modify another thread's DTV. Now that we >> defer the DTV update to the first use in the thread, we should do so >> without taking the rtld lock if the module is already assigned to static >> TLS. Taking the lock introduces deadlocks where there weren't any >> before. >> >> This patch fixes the deadlock caused by tls_get_addr's unnecessarily >> taking the rtld lock to initialize the DTV entry for tls_dtor_list >> within __call_dtors_list, which deadlocks with a dlclose of a module >> whose finalizer joins with that thread. The patch does not, however, >> attempt to fix other potential sources of similar deadlocks, such as >> the explicit rtld locks taken by call_dtors_list, when the dtor list >> is not empty; lazy relocation of the reference to tls_dtor_list, when >> TLS Descriptors are in use; when tls dtors call functions through the >> PLT and lazy relocation needs to be performed, or any such called >> functions interact with the dynamic loader in ways that require its >> lock to be taken. >> >> Ok to install? > It's not good enough ... for what? It sure isn't good enough to enable anyone to do whatever they like in a module finalizer, including blocking waiting for other threads that interact, explicitly or implicitly, with the dynamic loader before the finalizer is unblocked. Most of these issues are preexisting, and explicitly not addressed by the patch. It is good enough, however, to fix the regression, namely, that ___tls_get_addr for an IE variable used to be able to complete even if the rtld lock was taken by another thread, while now it's slower than needed and it may deadlock. The patch is intended to fix this performance and deadlock regression, not any of the preexisting problems. > The simple patch to the test case below will cause the test > case to deadlock. I very much doubt it would have completed before the DTV race fix. ___tls_get_addr would find the l_tls_offset for tst-join7mod.so to be undecided, and would attempt to take the lock to make it forced dynamic. Deadlock. It's not like accessing TLS variables is special WRT interacting implicitly with the dynamic loader. It's just the tip of the iceberg. Any lazily-bound function called for the first time will do so as well. libdl functions will also interact with the dynamic loader, obviously, and there are various cases in which localization libraries deeply hidden in library internals will dlopen modules to translate an error message or somesuch. There are so many things that can go wrong if a module initializer or finalizer is called with the rtld lock held that I'm convinced the solution is not to go case after case after case trying to somehow make it lock-free, but rather to release the rtld lock before handing control over to user-supplied code (init and fini). This would require, before releasing the lock, placing the modules in intermediate states and making a list of things without the lock and after taking the lock again, and then, after we're done with the lockless list and take the lock back, checking that the things to do are still appropriate. We might even require additional corner-case states, such as a module that was to be finalized and released, but that got dlopened again by some other concurrent finalizer. Tricky, but doable. The alternative, IMHO, is not to go after each case that takes the rtld lock and somehow avoid that, but rather to document what can and cannot be done in module finalizers. We might want to rule out synchronization with other threads, or just document that they are called while holding a lock that various infrastructure bits have to take at unanticipated times, so any such synchronization is prone to deadlocks. > Andreas' reproducer can be fixed by simply setting > the TLS variables in cxa_thread_atexit as initial exec It's not clear to me that we want to make libc.so variables IE. If we do, there are a number of others that could benefit from this treatment. > 1. All of the lock taking and NODELETE flag clearing in > cxa_thread_atexit. As I wrote: >> The patch does not, however, >> attempt to fix other potential sources of similar deadlocks, such as >> the explicit rtld locks taken by call_dtors_list > 2. The lock taking in tls_get_addr_tail. Thanks, I failed to list this one as a remaining non-regression problem we might have to address. > That has to go and we need > to figure out another way to wait for another dlopen to complete. I don't think so, but I won't try to stop you ;-)
On Wed, Jun 03, 2015 at 01:51:32PM -0300, Alexandre Oliva wrote: > It sure isn't good enough to enable anyone to do whatever they like in a > module finalizer, including blocking waiting for other threads that > interact, explicitly or implicitly, with the dynamic loader before the > finalizer is unblocked. Most of these issues are preexisting, and > explicitly not addressed by the patch. > > It is good enough, however, to fix the regression, namely, that > ___tls_get_addr for an IE variable used to be able to complete even if > the rtld lock was taken by another thread, while now it's slower than > needed and it may deadlock. The patch is intended to fix this > performance and deadlock regression, not any of the preexisting > problems. Here's the core of my lack of understanding I guess - calling __tls_get_addr for an IE variable. Does that ever happen? An IE variable resolution does not result in a call to __tls_get_addr AFAICT. Maybe you mean a variable that could be relaxed as an IE but was compiled to use GD? > I very much doubt it would have completed before the DTV race fix. > ___tls_get_addr would find the l_tls_offset for tst-join7mod.so to be > undecided, and would attempt to take the lock to make it forced dynamic. > Deadlock. I haven't tested that, but I did test with master and it works just fine. This is because __tls_get_addr doesn't get called at all when the variables are set as IE. And for that reason, I don't see why it shouldn't work otherwise. > It's not like accessing TLS variables is special WRT interacting > implicitly with the dynamic loader. It's just the tip of the iceberg. > Any lazily-bound function called for the first time will do so as well. > libdl functions will also interact with the dynamic loader, obviously, > and there are various cases in which localization libraries deeply > hidden in library internals will dlopen modules to translate an error > message or somesuch. There are so many things that can go wrong if a > module initializer or finalizer is called with the rtld lock held that > I'm convinced the solution is not to go case after case after case > trying to somehow make it lock-free, but rather to release the rtld lock > before handing control over to user-supplied code (init and fini). This > would require, before releasing the lock, placing the modules in > intermediate states and making a list of things without the lock and > after taking the lock again, and then, after we're done with the > lockless list and take the lock back, checking that the things to do are > still appropriate. We might even require additional corner-case states, > such as a module that was to be finalized and released, but that got > dlopened again by some other concurrent finalizer. Tricky, but doable. > > The alternative, IMHO, is not to go after each case that takes the rtld > lock and somehow avoid that, but rather to document what can and cannot > be done in module finalizers. We might want to rule out synchronization > with other threads, or just document that they are called while holding > a lock that various infrastructure bits have to take at unanticipated > times, so any such synchronization is prone to deadlocks. Synchronization with other threads in a module finalizer seems like a common use case. For example, a module waiting for worker threads to exit before being deleted. The reproducer in fact is such a case. The fact that the module doesn't have its own TLS is just sheer luck. > It's not clear to me that we want to make libc.so variables IE. If we > do, there are a number of others that could benefit from this treatment. These are static variables that are only referenced inside libc.so. Also, libc.so will always be loaded before the executable is running. I don't see why they shouldn't be IE. > > That has to go and we need > > to figure out another way to wait for another dlopen to complete. > > I don't think so, but I won't try to stop you ;-) :) Siddhesh
On Jun 3, 2015, Siddhesh Poyarekar <siddhesh@redhat.com> wrote: > Here's the core of my lack of understanding I guess - calling > __tls_get_addr for an IE variable. Does that ever happen? I guess it depends on what is meant by "IE variable". I meant it as a TLS variable defined as part of the initial module set, i.e., the main executable and the transitive closure of its dependencies. Now, if you understood it as a variable declared with the initial-exec tls_model attribute, I apologize for the confusion. >> I very much doubt it would have completed before the DTV race fix. >> ___tls_get_addr would find the l_tls_offset for tst-join7mod.so to be >> undecided, and would attempt to take the lock to make it forced dynamic. >> Deadlock. > I haven't tested that, but I did test with master and it works just > fine. This is because __tls_get_addr doesn't get called at all when > the variables are set as IE. And for that reason, I don't see why it > shouldn't work otherwise. Declaring the variable with IE tls_model will indeed go through a completely different path. It is the GD tls_model, as in the patch for the testcase that you posted, that will go through __tls_get_addr and deadlock. >> It's not clear to me that we want to make libc.so variables IE. If we >> do, there are a number of others that could benefit from this treatment. > These are static variables that are only referenced inside libc.so. > Also, libc.so will always be loaded before the executable is running. > I don't see why they shouldn't be IE. The case to worry about is a static executable, that for whatever reason doesn't get these variables linked in, but that dlopens some library that depends on libc.so, which then brings these variables into the running image. If libc.so had these variables as IE, this dlopen could fail.
On Wed, Jun 03, 2015 at 06:42:28PM -0300, Alexandre Oliva wrote: > I guess it depends on what is meant by "IE variable". > > I meant it as a TLS variable defined as part of the initial module set, > i.e., the main executable and the transitive closure of its > dependencies. > > Now, if you understood it as a variable declared with the initial-exec > tls_model attribute, I apologize for the confusion. OK, that's what I thought. And I also understand now what your fix attempts to do and I agree that is a better and more general approach. > The case to worry about is a static executable, that for whatever reason > doesn't get these variables linked in, but that dlopens some library > that depends on libc.so, which then brings these variables into the > running image. If libc.so had these variables as IE, this dlopen could > fail. This seems to make a case for not having any variables in libc.so as IE, yet a number of them already are. Siddhesh
On 06/03/2015 11:24 AM, Siddhesh Poyarekar wrote: > On Wed, Jun 03, 2015 at 03:44:58AM -0300, Alexandre Oliva wrote: >> We used to store static TLS addrs in the DTV at module load time, but >> this required one thread to modify another thread's DTV. Now that we >> defer the DTV update to the first use in the thread, we should do so >> without taking the rtld lock if the module is already assigned to static >> TLS. Taking the lock introduces deadlocks where there weren't any >> before. >> >> This patch fixes the deadlock caused by tls_get_addr's unnecessarily >> taking the rtld lock to initialize the DTV entry for tls_dtor_list >> within __call_dtors_list, which deadlocks with a dlclose of a module >> whose finalizer joins with that thread. The patch does not, however, >> attempt to fix other potential sources of similar deadlocks, such as >> the explicit rtld locks taken by call_dtors_list, when the dtor list >> is not empty; lazy relocation of the reference to tls_dtor_list, when >> TLS Descriptors are in use; when tls dtors call functions through the >> PLT and lazy relocation needs to be performed, or any such called >> functions interact with the dynamic loader in ways that require its >> lock to be taken. >> >> Ok to install? > > It's not good enough and is in fact probably just dancing around the > problem. The simple patch to the test case below will cause the test > case to deadlock. Andreas' reproducer can be fixed by simply setting > the TLS variables in cxa_thread_atexit as initial exec; I've got a > patch for it that I'll post shortly. That would leave two other > problems: I agree with Siddhesh and Torvald, it's not a good enough solution. I spoke at length with Siddhesh about this, and I think we need to sit down and really fix this once and for all by reasoning through the TLS code and providing the appropriate atomic accesses, avoiding data races, and documenting synchronization. Siddhesh has the ball on this. Thanks for taking a first look, but we're going to need to spend more time fixing this than I think you've got :-( Cheers, Carlos.
On 06/03/2015 05:42 PM, Alexandre Oliva wrote: > On Jun 3, 2015, Siddhesh Poyarekar <siddhesh@redhat.com> wrote: > >> Here's the core of my lack of understanding I guess - calling >> __tls_get_addr for an IE variable. Does that ever happen? > > I guess it depends on what is meant by "IE variable". > > I meant it as a TLS variable defined as part of the initial module set, > i.e., the main executable and the transitive closure of its > dependencies. > > Now, if you understood it as a variable declared with the initial-exec > tls_model attribute, I apologize for the confusion. An IE variable is exactly a variable using initial-exec tls model. There is no other definition that I am aware of. Cheers, Carlos.
On 06/04/2015 01:38 AM, Siddhesh Poyarekar wrote: >> The case to worry about is a static executable, that for whatever reason >> doesn't get these variables linked in, but that dlopens some library >> that depends on libc.so, which then brings these variables into the >> running image. If libc.so had these variables as IE, this dlopen could >> fail. > > This seems to make a case for not having any variables in libc.so as > IE, yet a number of them already are. In practice all IE variables should be removed from dynamic libraries and be replaced with the more flexible TLS descriptors. However, presently, we allocate extra static TLS image space and surplus DTV entries to allow for some core coordinated set of DSOs to be able to be loaded with static-TLS-using IE variables. Making all core DSOs IE-free is a worthwhile but orthogonal problem. Cheers, Carlos.
On Jun 4, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote:
> I agree with Siddhesh and Torvald, it's not a good enough solution.
Again, good enough for what?
It is undisputably enough to fix the performance regression that in
certain cases introduced new deadlocks to the pile of other possible
deadlocks that variants of the testcase could exercise. It fixes a
small problem, and it's a trivial patch.
Sure, it doesn't fix a much larger preexisting problem, that's being
misrepresented as a TLS issue, but that is actually a far more pervasive
dynamic loader problem. I have no interest or time in tackling that.
Sure, if you guys want to leave the regression unfixed until someone
figures out some way to fix a larger but mostly unrelated problem, it's
your choice.
But if this larger fix does not contain the very change I'm proposing,
tls_get_addr on variables assigned static TLS will remain much slower
than needed, because there's absolutely no need to take a lock when it
is already decided whether the module should use static or dynamic TLS.
So what is this simple regression fix waiting for to get installed,
again?
On Fri, Jun 05, 2015 at 01:23:45AM -0300, Alexandre Oliva wrote: > It is undisputably enough to fix the performance regression that in > certain cases introduced new deadlocks to the pile of other possible > deadlocks that variants of the testcase could exercise. It fixes a > small problem, and it's a trivial patch. > > Sure, it doesn't fix a much larger preexisting problem, that's being > misrepresented as a TLS issue, but that is actually a far more pervasive > dynamic loader problem. I have no interest or time in tackling that. > > Sure, if you guys want to leave the regression unfixed until someone > figures out some way to fix a larger but mostly unrelated problem, it's > your choice. No, IMO we should fix this regression in the interim and your approach in general seems fine to me. I intended to point out that it doesn't fix the larger problem and I think we're on the same page on that. I haven't reviewed the patch in detail nor your conversation with Torvald yet, so I can't comment on whether Torvald and I agree, although from a quick skimming of the thread ISTM that we're talking about different things. Siddhesh
On 06/05/2015 12:23 AM, Alexandre Oliva wrote: > On Jun 4, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote: > >> I agree with Siddhesh and Torvald, it's not a good enough solution. > > Again, good enough for what? The solution does not meet my standards for fixing atomicity issues. It lacks proper documentation on the synchronization changes being made. I've asked Siddhesh to look into this issue and document some of the expectations and perhaps resolve some of the broader issues as well. We have received real customer reports of at least 3 kinds of P&C related failures in this code. So we are going to expand the nature of the fix and see if there isn't an overall cleaner solution. > It is undisputably enough to fix the performance regression that in > certain cases introduced new deadlocks to the pile of other possible > deadlocks that variants of the testcase could exercise. It fixes a > small problem, and it's a trivial patch. I agree. > Sure, it doesn't fix a much larger preexisting problem, that's being > misrepresented as a TLS issue, but that is actually a far more pervasive > dynamic loader problem. I have no interest or time in tackling that. I appreciate your honesty in expressing what you are able to do and what you want to do. > Sure, if you guys want to leave the regression unfixed until someone > figures out some way to fix a larger but mostly unrelated problem, it's > your choice. Siddhesh and I will be working on it right now since more than one of the problems are impacting real customer applications. > But if this larger fix does not contain the very change I'm proposing, > tls_get_addr on variables assigned static TLS will remain much slower > than needed, because there's absolutely no need to take a lock when it > is already decided whether the module should use static or dynamic TLS. I agree there is no need to take the lock, but your patch needs to be expanded in more detail to make it crystal clear exactly what synchronization is being done and why. > So what is this simple regression fix waiting for to get installed, > again? As of today only Fedora is using your previous patches which reveal these older bugs, and until we branch for 2.22, this patch can wait while we look into a more holistic fix. Cheers, Carlos.
On Jun 5, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote: > The solution does not meet my standards for fixing atomicity issues. > It lacks proper documentation on the synchronization changes being made. Could it be because there aren't any synchronization changes being made? Seriously, the changes, if any, were made in the previous patch, that didn't meet such a storm of stonewalling and nitpicking. This one just fixes a tiny part of the change in the previous patch. Why is it drawing so much attention? Why are so different standards being applied to this one? It's not like our consensus has changed since then. It's not like the problem fixed by this patch is less serious than the one fixed before. >> But if this larger fix does not contain the very change I'm proposing, >> tls_get_addr on variables assigned static TLS will remain much slower >> than needed, because there's absolutely no need to take a lock when it >> is already decided whether the module should use static or dynamic TLS. > I agree there is no need to take the lock, but your patch needs to be > expanded in more detail to make it crystal clear exactly what > synchronization is being done and why. /* If the TLS block for the map is already assigned to dynamic, or to some static TLS offset, the decision is final, and no lock is required. Now, if the decision hasn't been made, take the rtld lock, so that an ongoing dlopen gets a chance to complete, possibly assigning the module to static TLS and initializing the corresponding TLS area for all threads, and then retest; if the decision is still pending, force the module to dynamic TLS. The risk that the thread accesses an earlier value in that memory location, from before it was recycled into a link map in another thread, is removed by the need for some happens before relationship between the loader that set that up and the TLS access that referenced the module id. In the presence of such a relationship, the value will be at least as recent as the initialization, and in its absence, calling tls_get_addr with its module id invokes undefined behavior. */ Seriously, what's missing from this? What's not crystal clear about it? >> So what is this simple regression fix waiting for to get installed, >> again? > As of today only Fedora is using your previous patches which reveal these > older bugs, and until we branch for 2.22, this patch can wait while we > look into a more holistic fix. And then you're going to have a single patch fixing two or more different issues? Isn't that against our conventions, that prefer separate patches to address separate issues?
On 06/05/2015 03:18 PM, Alexandre Oliva wrote: > On Jun 5, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote: > >> The solution does not meet my standards for fixing atomicity issues. >> It lacks proper documentation on the synchronization changes being made. > > Could it be because there aren't any synchronization changes being made? > Seriously, the changes, if any, were made in the previous patch, that > didn't meet such a storm of stonewalling and nitpicking. This one just > fixes a tiny part of the change in the previous patch. > > Why is it drawing so much attention? Why are so different standards > being applied to this one? It's not like our consensus has changed > since then. It's not like the problem fixed by this patch is less > serious than the one fixed before. Different people review different patches. It has drawn attention because it is impacting certain use cases which I prioritize as needing to be fixed. >>> But if this larger fix does not contain the very change I'm proposing, >>> tls_get_addr on variables assigned static TLS will remain much slower >>> than needed, because there's absolutely no need to take a lock when it >>> is already decided whether the module should use static or dynamic TLS. > >> I agree there is no need to take the lock, but your patch needs to be >> expanded in more detail to make it crystal clear exactly what >> synchronization is being done and why. > > /* If the TLS block for the map is already assigned to dynamic, or > to some static TLS offset, the decision is final, and no lock is > required. Now, if the decision hasn't been made, take the rtld > lock, so that an ongoing dlopen gets a chance to complete, > possibly assigning the module to static TLS and initializing the > corresponding TLS area for all threads, and then retest; if the > decision is still pending, force the module to dynamic TLS. > > The risk that the thread accesses an earlier value in that memory > location, from before it was recycled into a link map in another > thread, is removed by the need for some happens before > relationship between the loader that set that up and the TLS > access that referenced the module id. In the presence of such a > relationship, the value will be at least as recent as the > initialization, and in its absence, calling tls_get_addr with its > module id invokes undefined behavior. */ > > Seriously, what's missing from this? What's not crystal clear about it? For synchronization issues I want to see comments at all the sites of code that coordinate to provide the synchronization, followed by a higher level comment explaining how the synchronization happens. One comment that covers all sites and fails to explain why the synchronization is correct is insufficient. >>> So what is this simple regression fix waiting for to get installed, >>> again? > >> As of today only Fedora is using your previous patches which reveal these >> older bugs, and until we branch for 2.22, this patch can wait while we >> look into a more holistic fix. > > And then you're going to have a single patch fixing two or more > different issues? Isn't that against our conventions, that prefer > separate patches to address separate issues? No. We are going to fix one issue per commit. Cheers, Carlos.
On Fri, 2015-06-05 at 16:18 -0300, Alexandre Oliva wrote: > On Jun 5, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote: > > > The solution does not meet my standards for fixing atomicity issues. > > It lacks proper documentation on the synchronization changes being made. > > Could it be because there aren't any synchronization changes being made? > Seriously, the changes, if any, were made in the previous patch, I'm not aware of another one with synchronization bits in it. I probably simply didn't notice it. > that > didn't meet such a storm of stonewalling and nitpicking. This one just > fixes a tiny part of the change in the previous patch. > > Why is it drawing so much attention? Why are so different standards > being applied to this one? As a counter-example, I reviewed Szabolcs' Lazy TLSDESC relocation data race fix at the same level of detail, and requested comments at the same level of detail as for your patch. If you feel that there are other patches that haven't been reviewed closely enough, please mention them specifically so we can see whether we need to improve the affected code. > >> But if this larger fix does not contain the very change I'm proposing, > >> tls_get_addr on variables assigned static TLS will remain much slower > >> than needed, because there's absolutely no need to take a lock when it > >> is already decided whether the module should use static or dynamic TLS. > > > I agree there is no need to take the lock, but your patch needs to be > > expanded in more detail to make it crystal clear exactly what > > synchronization is being done and why. > > /* If the TLS block for the map is already assigned to dynamic, or > to some static TLS offset, the decision is final, and no lock is > required. Now, if the decision hasn't been made, take the rtld > lock, so that an ongoing dlopen gets a chance to complete, > possibly assigning the module to static TLS and initializing the > corresponding TLS area for all threads, That sounds like it is not just about reaching consensus on the final value of l_tls_offset: For example, you mention initialization in there, which seems to indicate that there is some dependency on happening after initialization (ie, wait for dlopen to complete initialization); but elsewhere in the thread you say there are no such constraints and it's really just consensus on just this value. Also, it sounds as if any thread concurrent with the dlopen could only actually create a concurrent access if dlopen has the lock already -- but that doesn't seem to be a requirement based on the "spin" example/analogy you gave. > and then retest; if the > decision is still pending, force the module to dynamic TLS. > > The risk that the thread accesses an earlier value in that memory > location, from before it was recycled into a link map in another > thread, I wouldn't point out "earlier" here, but rather specifically address which stores (eg, initialization) the code needs a happens-before for. > is removed by the need for some happens before > relationship between the loader that set that up and the TLS > access that referenced the module id. In the presence of such a > relationship, the value will be at least as recent as the > initialization, and in its absence, calling tls_get_addr with its > module id invokes undefined behavior. */ > > Seriously, what's missing from this? What's not crystal clear about it? See above. The lack of atomic accesses in the code also made this seem inconsistent.
On Jun 7, 2015, Torvald Riegel <triegel@redhat.com> wrote: > That sounds like it is not just about reaching consensus on the final > value of l_tls_offset: For example, you mention initialization in > there, Yeah, we've already covered the relocation in the tls initialization block, performed by the dynamic loader. Why do we have to go in circles? > Also, it sounds as if any thread concurrent with the dlopen could only > actually create a concurrent access if dlopen has the lock already -- > but that doesn't seem to be a requirement based on the "spin" > example/analogy you gave. There are two roles dlopen plays here: 1. loading the module that defines the TLS variable 2. applying TLS replications to modules that reference the TLS variable It might turn out that 1. and 2. are performed under a single rtld lock taking, and a subsequent dlopen could bring in additional references to TLS variables.
On Mon, 2015-06-08 at 23:30 -0300, Alexandre Oliva wrote: > On Jun 7, 2015, Torvald Riegel <triegel@redhat.com> wrote: > > > That sounds like it is not just about reaching consensus on the final > > value of l_tls_offset: For example, you mention initialization in > > there, > > Yeah, we've already covered the relocation in the tls initialization > block, performed by the dynamic loader. Why do we have to go in > circles? Your comment, that you claimed was crystal-clear, seems inconsistent with what you described elsewhere. That's the reason. > > Also, it sounds as if any thread concurrent with the dlopen could only > > actually create a concurrent access if dlopen has the lock already -- > > but that doesn't seem to be a requirement based on the "spin" > > example/analogy you gave. > > There are two roles dlopen plays here: > > 1. loading the module that defines the TLS variable > > 2. applying TLS replications to modules that reference the TLS variable > > It might turn out that 1. and 2. are performed under a single rtld lock > taking, and a subsequent dlopen could bring in additional references to > TLS variables. > I can't follow you; I don't see how it addresses my comment. But given that you'll stop working on this patch anywhere and have handed it off to Siddhesh, I'll discuss with Siddhesh.
On Jun 9, 2015, Torvald Riegel <triegel@redhat.com> wrote: > On Mon, 2015-06-08 at 23:30 -0300, Alexandre Oliva wrote: >> On Jun 7, 2015, Torvald Riegel <triegel@redhat.com> wrote: >> > Also, it sounds as if any thread concurrent with the dlopen could only >> > actually create a concurrent access if dlopen has the lock already -- >> > but that doesn't seem to be a requirement based on the "spin" >> > example/analogy you gave. >> >> There are two roles dlopen plays here: >> >> 1. loading the module that defines the TLS variable >> >> 2. applying TLS replications to modules that reference the TLS variable >> >> It might turn out that 1. and 2. are performed under a single rtld lock >> taking, and a subsequent dlopen could bring in additional references to >> TLS variables. > I can't follow you; I don't see how it addresses my comment. rtld's role 1 is analogous to init in the electron spin example. rtld's role 2 is analogous to make_positive in the electron spin example. In some cases, rtld roles 1 and 2 are performed without releasing the lock. In this case, no concurrent accesses are possible, because nobody else can have the pointer returned by init, to pass to the other functions, before init returns. Now, in case rtld's role 2 is performed separately, as in a subsequent dlopen (analogous to make_positive), there may be concurrent tls_get_addr accesses (analogous to concurrent positive_p calls). At this point, role 1 has already released the lock and set up the module defining the TLS variable and possibly processed some dynamic TLS access model relocations (analogous to init returning), so there is room for concurrency between those accesses and a subsequent dlopen, that may or may not require the variable to be placed in static TLS. Does this clarify how my response addresses your comment?
diff --git a/nptl/tst-join7mod.c b/nptl/tst-join7mod.c index a8c7bc0..a066a1f 100644 --- a/nptl/tst-join7mod.c +++ b/nptl/tst-join7mod.c @@ -4,12 +4,15 @@ static pthread_t th; static int running = 1; +static __thread int foo; + static void * test_run (void *p) { while (running) fprintf (stderr, "XXX test_run\n"); fprintf (stderr, "XXX test_run FINISHED\n"); + foo = 42; return NULL; }