Message ID | orr3psedpw.fsf@livre.home |
---|---|
State | New |
Headers | show |
On Wed, 2015-06-03 at 17:31 -0300, Alexandre Oliva wrote: > On Jun 3, 2015, Alexandre Oliva <aoliva@redhat.com> wrote: > > > How's this? > > Or rather this, that I got after updating the patch file :-) Why did you remove the atomic accesses altogether? I argued that we need *more*, not less. You can't just pretend those issues don't exist. You made good progress towards a consistent fix with the reasoning on synchronization you provided for the static case, and the information you provided for the non-static case. This doesn't add any of the documentation I want to see either. Not OK. > (the testcase is adjusted to use atomics) Test case is OK. Thanks.
On Jun 4, 2015, Torvald Riegel <triegel@redhat.com> wrote: > On Wed, 2015-06-03 at 17:31 -0300, Alexandre Oliva wrote: >> On Jun 3, 2015, Alexandre Oliva <aoliva@redhat.com> wrote: >> >> > How's this? >> >> Or rather this, that I got after updating the patch file :-) > Why did you remove the atomic accesses altogether? Because I rearranged the code so that the double-checked lock pattern is self-evident. I thought we had agreed long ago that we didn't need atomics for double-checked locks. You added an "*Correct*" in this thread, so now I guess you have to show why the proposed change is not correct. Please explain? > You made good progress towards a consistent fix with the reasoning on > synchronization you provided for the static case It was not for the static case only. It covered both cases. > This doesn't add any of the documentation I want to see either. The deal I suggested was that I'd answer your questions and you'd write the documentation. Now you're moving the goalpost. Anyway, my manager told me not to spend time on this Q&A project I had suggested, so you'll have to find some other way to get the documentation you want. Stonewalling a trivial patch on the grounds that it fails to document something that was not documented to begin with it not reasonable; it won't even get you the documentation, it will just cause the regression to be around longer.
On Fri, 2015-06-05 at 01:16 -0300, Alexandre Oliva wrote: > On Jun 4, 2015, Torvald Riegel <triegel@redhat.com> wrote: > > > On Wed, 2015-06-03 at 17:31 -0300, Alexandre Oliva wrote: > >> On Jun 3, 2015, Alexandre Oliva <aoliva@redhat.com> wrote: > >> > >> > How's this? > >> > >> Or rather this, that I got after updating the patch file :-) > > > Why did you remove the atomic accesses altogether? > > Because I rearranged the code so that the double-checked lock pattern is > self-evident. I thought we had agreed long ago that we didn't need > atomics for double-checked locks. No, not generally. See my other email. > You added an "*Correct*" in this > thread, so now I guess you have to show why the proposed change is not > correct. Please explain? See my other email. The example in there has atomic accesses for the flag, your patch doesn't have them for l_tls_offset, which takes the role of the flag. > > You made good progress towards a consistent fix with the reasoning on > > synchronization you provided for the static case > > It was not for the static case only. It covered both cases. > > > This doesn't add any of the documentation I want to see either. > > The deal I suggested was that I'd answer your questions and you'd write > the documentation. Now you're moving the goalpost. No, I was speaking about the documentation for the code you touched -- not for everything. If you make a change involving synchronization, you have to provide enough documentation to argue why it works. If it is an incremental change, this doesn't mean you have to bring all the documentation transitively required, but you have to add docs for the immediate neighborhood of your change. > Anyway, my manager > told me not to spend time on this Q&A project I had suggested, so you'll > have to find some other way to get the documentation you want. > Stonewalling a trivial patch on the grounds that it fails to document > something that was not documented to begin with it not reasonable; it > won't even get you the documentation, it will just cause the regression > to be around longer. If your patch were trivial and correct, you could trivially explain why it is correct. You haven't done that yet. You applied double-checked locking incorrectly, and in other cases you're arguing that there's actually no concurrency, which conflicts with that you want the double-checked locking in the first place. You have data races in your code, or you haven't properly explained why there are none. If we apply fixes, they need to be correct -- even if they don't solve all the issues. I don't yet see why your patch results in correct code (even when just considering the functions you touched), and I have plenty of experience reasoning about concurrent code. Carlos and Siddhesh seem to not be convinced either. Isn't this enough indication that maybe something's wrong with your patch, or you need to add more comments to the code? Also, even if we consider the potential case that you'd be much better at concurrency than all of us, you'd still need to add sufficient documentation for all of us, because it matters that the project (and the people involved) understand the code, not that just you understand the code. For example, personally, I don't need all the code comments in pthread_once that I added -- I do know this synchronization pattern. However, it may not be obvious to others that don't look at concurrent code regularly, in the same way that I'd be pretty lost in libm. Therefore, we add comments so we can raise the overall level of accessibility and understanding. Period.
diff --git a/NEWS b/NEWS index ed02de0..eac100c 100644 --- a/NEWS +++ b/NEWS @@ -19,7 +19,7 @@ Version 2.22 18047, 18049, 18068, 18080, 18093, 18100, 18104, 18110, 18111, 18116, 18125, 18128, 18138, 18185, 18196, 18197, 18206, 18210, 18211, 18217, 18220, 18221, 18234, 18244, 18247, 18287, 18319, 18333, 18346, 18397, - 18409, 18410, 18412, 18418, 18422, 18434, 18444, 18469. + 18409, 18410, 18412, 18418, 18422, 18434, 18444, 18457, 18469. * Cache information can be queried via sysconf() function on s390 e.g. with _SC_LEVEL1_ICACHE_SIZE as argument. diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 20c7e33..8fc210d 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -755,41 +755,54 @@ tls_get_addr_tail (GET_ADDR_ARGS, dtv_t *dtv, struct link_map *the_map) the_map = listp->slotinfo[idx].map; } - /* Make sure that, if a dlopen running in parallel forces the - variable into static storage, we'll wait until the address in the - static TLS block is set up, and use that. If we're undecided - yet, make sure we make the decision holding the lock as well. */ - if (__glibc_unlikely (the_map->l_tls_offset - != FORCED_DYNAMIC_TLS_OFFSET)) + /* If the TLS block for the map is already assigned to dynamic, or + to some static TLS offset, the decision is final, and no lock is + required. Now, if the decision hasn't been made, take the rtld + lock, so that an ongoing dlopen gets a chance to complete, + possibly assigning the module to static TLS and initializing the + corresponding TLS area for all threads, and then retest; if the + decision is still pending, force the module to dynamic TLS. + + The risk that the thread accesses an earlier value in that memory + location, from before it was recycled into a link map in another + thread, is removed by the need for some happens before + relationship between the loader that set that up and the TLS + access that referenced the module id. In the presence of such a + relationship, the value will be at least as recent as the + initialization, and in its absence, calling tls_get_addr with its + module id invokes undefined behavior. */ + if (__glibc_unlikely (the_map->l_tls_offset == NO_TLS_OFFSET)) { __rtld_lock_lock_recursive (GL(dl_load_lock)); if (__glibc_likely (the_map->l_tls_offset == NO_TLS_OFFSET)) - { - the_map->l_tls_offset = FORCED_DYNAMIC_TLS_OFFSET; - __rtld_lock_unlock_recursive (GL(dl_load_lock)); - } - else if (__glibc_likely (the_map->l_tls_offset - != FORCED_DYNAMIC_TLS_OFFSET)) - { + the_map->l_tls_offset = FORCED_DYNAMIC_TLS_OFFSET; + __rtld_lock_unlock_recursive (GL(dl_load_lock)); + } + + void *p; + + if (the_map->l_tls_offset != FORCED_DYNAMIC_TLS_OFFSET) + { #if TLS_TCB_AT_TP - void *p = (char *) THREAD_SELF - the_map->l_tls_offset; + p = (char *) THREAD_SELF - the_map->l_tls_offset; #elif TLS_DTV_AT_TP - void *p = (char *) THREAD_SELF + the_map->l_tls_offset + TLS_PRE_TCB_SIZE; + p = (char *) THREAD_SELF + the_map->l_tls_offset + TLS_PRE_TCB_SIZE; #else # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" #endif - __rtld_lock_unlock_recursive (GL(dl_load_lock)); - - dtv[GET_ADDR_MODULE].pointer.is_static = true; - dtv[GET_ADDR_MODULE].pointer.val = p; - return (char *) p + GET_ADDR_OFFSET; - } - else - __rtld_lock_unlock_recursive (GL(dl_load_lock)); + dtv[GET_ADDR_MODULE].pointer.is_static = true; + dtv[GET_ADDR_MODULE].pointer.val = p; + } + else + { + p = allocate_and_init (the_map); + assert (!dtv[GET_ADDR_MODULE].pointer.is_static); + /* FIXME: this is AS-Unsafe. We'd have to atomically set val to + p, if it is still unallocated, or release p and set it to + val. */ + dtv[GET_ADDR_MODULE].pointer.val = p; } - void *p = dtv[GET_ADDR_MODULE].pointer.val = allocate_and_init (the_map); - assert (!dtv[GET_ADDR_MODULE].pointer.is_static); return (char *) p + GET_ADDR_OFFSET; } diff --git a/nptl/Makefile b/nptl/Makefile index 3dd2944..4ffd13c 100644 --- a/nptl/Makefile +++ b/nptl/Makefile @@ -242,7 +242,7 @@ tests = tst-typesizes \ tst-basic7 \ tst-kill1 tst-kill2 tst-kill3 tst-kill4 tst-kill5 tst-kill6 \ tst-raise1 \ - tst-join1 tst-join2 tst-join3 tst-join4 tst-join5 tst-join6 \ + tst-join1 tst-join2 tst-join3 tst-join4 tst-join5 tst-join6 tst-join7 \ tst-detach1 \ tst-eintr1 tst-eintr2 tst-eintr3 tst-eintr4 tst-eintr5 \ tst-tsd1 tst-tsd2 tst-tsd3 tst-tsd4 tst-tsd5 tst-tsd6 \ @@ -320,7 +320,8 @@ endif modules-names = tst-atfork2mod tst-tls3mod tst-tls4moda tst-tls4modb \ tst-tls5mod tst-tls5moda tst-tls5modb tst-tls5modc \ tst-tls5modd tst-tls5mode tst-tls5modf tst-stack4mod \ - tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod + tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod \ + tst-join7mod extra-test-objs += $(addsuffix .os,$(strip $(modules-names))) tst-cleanup4aux.o test-extras += $(modules-names) tst-cleanup4aux test-modules = $(addprefix $(objpfx),$(addsuffix .so,$(modules-names))) @@ -525,6 +526,11 @@ $(objpfx)tst-tls6.out: tst-tls6.sh $(objpfx)tst-tls5 \ $(evaluate-test) endif +$(objpfx)tst-join7: $(libdl) $(shared-thread-library) +$(objpfx)tst-join7.out: $(objpfx)tst-join7mod.so +$(objpfx)tst-join7mod.so: $(shared-thread-library) +LDFLAGS-tst-join7mod.so = -Wl,-soname,tst-join7mod.so + $(objpfx)tst-dlsym1: $(libdl) $(shared-thread-library) $(objpfx)tst-fini1: $(shared-thread-library) $(objpfx)tst-fini1mod.so diff --git a/nptl/tst-join7.c b/nptl/tst-join7.c new file mode 100644 index 0000000..bf6fc76 --- /dev/null +++ b/nptl/tst-join7.c @@ -0,0 +1,12 @@ +#include <dlfcn.h> + +int +do_test (void) +{ + void *f = dlopen ("tst-join7mod.so", RTLD_NOW | RTLD_GLOBAL); + if (f) dlclose (f); else return 1; + return 0; +} + +#define TEST_FUNCTION do_test () +#include "../test-skeleton.c" diff --git a/nptl/tst-join7mod.c b/nptl/tst-join7mod.c new file mode 100644 index 0000000..9960b76 --- /dev/null +++ b/nptl/tst-join7mod.c @@ -0,0 +1,30 @@ +#include <stdio.h> +#include <pthread.h> +#include <atomic.h> + +static pthread_t th; +static int running = 1; + +static void * +test_run (void *p) +{ + while (atomic_load_relaxed (&running)) + printf ("XXX test_run\n"); + printf ("XXX test_run FINISHED\n"); + return NULL; +} + +static void __attribute__ ((constructor)) +do_init (void) +{ + pthread_create (&th, NULL, test_run, NULL); +} + +static void __attribute__ ((destructor)) +do_end (void) +{ + atomic_store_relaxed (&running, 0); + printf ("thread_join...\n"); + pthread_join (th, NULL); + printf ("thread_join DONE\n"); +}
On Jun 3, 2015, Alexandre Oliva <aoliva@redhat.com> wrote: > How's this? Or rather this, that I got after updating the patch file :-) (the testcase is adjusted to use atomics) We used to store static TLS addrs in the DTV at module load time, but this required one thread to modify another thread's DTV. Now that we defer the DTV update to the first use in the thread, we should do so without taking the rtld lock if the module is already assigned to static TLS. Taking the lock without need introduces a significant performance penalty, and deadlocks where there weren't any before. This patch fixes the deadlock caused by tls_get_addr's unnecessarily taking the rtld lock to initialize the DTV entry for tls_dtor_list within __call_dtors_list, which deadlocks with a dlclose of a module whose finalizer joins with that thread, as in the testcase. The patch does not, however, attempt to fix other potential sources of similar deadlocks, such as the explicit rtld locks taken by call_dtors_list, when the dtor list is not empty; lazy relocation of the reference to tls_dtor_list, when TLS Descriptors are in use; when tls dtors call functions through the PLT and lazy relocation needs to be performed; when tls dtors access TLS variables, using dynamic access models, from modules that have not yet been resolved to static or dynamic TLS, or any other explicit or implicit interactions with the dynamic loader in ways that require its lock to be taken. for ChangeLog [PR dynamic-link/18457] * elf/dl-tls.c (tls_get_addr_tail): Don't take the rtld lock if we already have a final static TLS offset. * nptl/tst-join7.c, nptl/tst-join7mod.c: New, from Andreas Schwab's bug report. * nptl/Makefile (tests): Add tst-join7. (module-names): Add tst-join7mod. ($(objpfx)tst-join7): New. Add deps. ($(objpfx)tst-join7.out): Likewise. ($(objpfx)tst-join7mod.so): Likewise. (LDFLAGS-tst-join7mod.so): Set soname. --- NEWS | 2 +- elf/dl-tls.c | 63 +++++++++++++++++++++++++++++++-------------------- nptl/Makefile | 10 ++++++-- nptl/tst-join7.c | 12 ++++++++++ nptl/tst-join7mod.c | 30 ++++++++++++++++++++++++ 5 files changed, 89 insertions(+), 28 deletions(-) create mode 100644 nptl/tst-join7.c create mode 100644 nptl/tst-join7mod.c