Message ID | 068770faf123b7c227f5f1e130812f7976e74cef.1613390045.git.szabolcs.nagy@arm.com |
---|---|
State | New |
Headers | show |
Series | Dynamic TLS related data race fixes | expand |
On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote: > From: Szabolcs Nagy <szabolcs dot nagy at arm dot com> > > Since > > commit a509eb117fac1d764b15eba64993f4bdb63d7f3c > Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112] > > the generation counter update is not needed in the failure path. It is not clear to me from just the commit reference why it would be safe to remove the GL(dl_tls_generation) update on _dl_add_to_slotinfo. The dl_open_worker calls update_tls_slotinfo which in turn call might call _dl_add_to_slotinfo *after* the demarcation point. Will it terminate the process? > --- > elf/dl-tls.c | 11 +---------- > 1 file changed, 1 insertion(+), 10 deletions(-) > > diff --git a/elf/dl-tls.c b/elf/dl-tls.c > index 79b93ad91b..24d00c14ef 100644 > --- a/elf/dl-tls.c > +++ b/elf/dl-tls.c > @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add) > + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); > if (listp == NULL) > { > - /* We ran out of memory. We will simply fail this > - call but don't undo anything we did so far. The > - application will crash or be terminated anyway very > - soon. */ > - > - /* We have to do this since some entries in the dtv > - slotinfo array might already point to this > - generation. */ > - ++GL(dl_tls_generation); > - > + /* We ran out of memory while resizing the dtv slotinfo list. */ > _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\ > cannot create TLS data structures")); > } >
The 04/02/2021 17:50, Adhemerval Zanella via Libc-alpha wrote: > On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote: > > From: Szabolcs Nagy <szabolcs dot nagy at arm dot com> > > > > Since > > > > commit a509eb117fac1d764b15eba64993f4bdb63d7f3c > > Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112] > > > > the generation counter update is not needed in the failure path. > > It is not clear to me from just the commit reference why it would > be safe to remove the GL(dl_tls_generation) update on > _dl_add_to_slotinfo. > > The dl_open_worker calls update_tls_slotinfo which in turn call > might call _dl_add_to_slotinfo *after* the demarcation point. Will > it terminate the process? in that commit the logic got changed such that allocations happen before the demarcation point in resize_tls_slotinfo. this is the reason for the do_add bool argument in _dl_add_to_slotinfo: it's called twice and the first call with do_add==false is only there to ensure everything is allocated before the demarcation point (so module loading can be reverted, no need to bump the generation count). i guess this is not visible by just looking at the _dl_add_to_slotinfo code. note that adding some asserts to ensure there is no allocation when do_add==true does not work: rtld uses the same api, but without the do_add==false preallocation step since at startup time allocation failure is fatal anyway. > > --- > > elf/dl-tls.c | 11 +---------- > > 1 file changed, 1 insertion(+), 10 deletions(-) > > > > diff --git a/elf/dl-tls.c b/elf/dl-tls.c > > index 79b93ad91b..24d00c14ef 100644 > > --- a/elf/dl-tls.c > > +++ b/elf/dl-tls.c > > @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add) > > + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); > > if (listp == NULL) > > { > > - /* We ran out of memory. We will simply fail this > > - call but don't undo anything we did so far. The > > - application will crash or be terminated anyway very > > - soon. */ > > - > > - /* We have to do this since some entries in the dtv > > - slotinfo array might already point to this > > - generation. */ > > - ++GL(dl_tls_generation); > > - > > + /* We ran out of memory while resizing the dtv slotinfo list. */ > > _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\ > > cannot create TLS data structures")); > > }
On 06/04/2021 12:48, Szabolcs Nagy wrote: > The 04/02/2021 17:50, Adhemerval Zanella via Libc-alpha wrote: >> On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote: >>> From: Szabolcs Nagy <szabolcs dot nagy at arm dot com> >>> >>> Since >>> >>> commit a509eb117fac1d764b15eba64993f4bdb63d7f3c >>> Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112] >>> >>> the generation counter update is not needed in the failure path. >> >> It is not clear to me from just the commit reference why it would >> be safe to remove the GL(dl_tls_generation) update on >> _dl_add_to_slotinfo. >> >> The dl_open_worker calls update_tls_slotinfo which in turn call >> might call _dl_add_to_slotinfo *after* the demarcation point. Will >> it terminate the process? > > in that commit the logic got changed such that allocations > happen before the demarcation point in resize_tls_slotinfo. > > this is the reason for the do_add bool argument in > _dl_add_to_slotinfo: it's called twice and the first call > with do_add==false is only there to ensure everything is > allocated before the demarcation point (so module loading > can be reverted, no need to bump the generation count). > > i guess this is not visible by just looking at the > _dl_add_to_slotinfo code. Right, so if I reading correctly once _dl_add_to_slotinfo (..., true) is called by update_tls_slotinfo, the malloc that create a new dtv_slotinfo_list won't be called (since it was already allocated previously) since the entry was already pre-allocated and thus the search part at line 978-987 will find. Is that correct? > > note that adding some asserts to ensure there is no allocation > when do_add==true does not work: rtld uses the same api, but > without the do_add==false preallocation step since at startup > time allocation failure is fatal anyway. Right, the _dl_signal_error will trigger a fatal_error since lcatch won't be override yet. Thanks for the explanation. > >>> --- >>> elf/dl-tls.c | 11 +---------- >>> 1 file changed, 1 insertion(+), 10 deletions(-) >>> >>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c >>> index 79b93ad91b..24d00c14ef 100644 >>> --- a/elf/dl-tls.c >>> +++ b/elf/dl-tls.c >>> @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add) >>> + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); >>> if (listp == NULL) >>> { >>> - /* We ran out of memory. We will simply fail this >>> - call but don't undo anything we did so far. The >>> - application will crash or be terminated anyway very >>> - soon. */ >>> - >>> - /* We have to do this since some entries in the dtv >>> - slotinfo array might already point to this >>> - generation. */ >>> - ++GL(dl_tls_generation); >>> - >>> + /* We ran out of memory while resizing the dtv slotinfo list. */ >>> _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\ >>> cannot create TLS data structures")); >>> }
The 04/06/2021 14:47, Adhemerval Zanella wrote: > On 06/04/2021 12:48, Szabolcs Nagy wrote: > > The 04/02/2021 17:50, Adhemerval Zanella via Libc-alpha wrote: > >> On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote: > >>> From: Szabolcs Nagy <szabolcs dot nagy at arm dot com> > >>> > >>> Since > >>> > >>> commit a509eb117fac1d764b15eba64993f4bdb63d7f3c > >>> Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112] > >>> > >>> the generation counter update is not needed in the failure path. > >> > >> It is not clear to me from just the commit reference why it would > >> be safe to remove the GL(dl_tls_generation) update on > >> _dl_add_to_slotinfo. > >> > >> The dl_open_worker calls update_tls_slotinfo which in turn call > >> might call _dl_add_to_slotinfo *after* the demarcation point. Will > >> it terminate the process? > > > > in that commit the logic got changed such that allocations > > happen before the demarcation point in resize_tls_slotinfo. > > > > this is the reason for the do_add bool argument in > > _dl_add_to_slotinfo: it's called twice and the first call > > with do_add==false is only there to ensure everything is > > allocated before the demarcation point (so module loading > > can be reverted, no need to bump the generation count). > > > > i guess this is not visible by just looking at the > > _dl_add_to_slotinfo code. > > Right, so if I reading correctly once _dl_add_to_slotinfo (..., true) > is called by update_tls_slotinfo, the malloc that create a new > dtv_slotinfo_list won't be called (since it was already allocated > previously) since the entry was already pre-allocated and thus the > search part at line 978-987 will find. Is that correct? yes if you have an idea how to make things clearer let me know. > > > > note that adding some asserts to ensure there is no allocation > > when do_add==true does not work: rtld uses the same api, but > > without the do_add==false preallocation step since at startup > > time allocation failure is fatal anyway. > > Right, the _dl_signal_error will trigger a fatal_error since > lcatch won't be override yet. > > Thanks for the explanation. > > > > >>> --- > >>> elf/dl-tls.c | 11 +---------- > >>> 1 file changed, 1 insertion(+), 10 deletions(-) > >>> > >>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c > >>> index 79b93ad91b..24d00c14ef 100644 > >>> --- a/elf/dl-tls.c > >>> +++ b/elf/dl-tls.c > >>> @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add) > >>> + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); > >>> if (listp == NULL) > >>> { > >>> - /* We ran out of memory. We will simply fail this > >>> - call but don't undo anything we did so far. The > >>> - application will crash or be terminated anyway very > >>> - soon. */ > >>> - > >>> - /* We have to do this since some entries in the dtv > >>> - slotinfo array might already point to this > >>> - generation. */ > >>> - ++GL(dl_tls_generation); > >>> - > >>> + /* We ran out of memory while resizing the dtv slotinfo list. */ > >>> _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\ > >>> cannot create TLS data structures")); > >>> } --
On 07/04/2021 04:57, Szabolcs Nagy wrote: > The 04/06/2021 14:47, Adhemerval Zanella wrote: >> On 06/04/2021 12:48, Szabolcs Nagy wrote: >>> The 04/02/2021 17:50, Adhemerval Zanella via Libc-alpha wrote: >>>> On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote: >>>>> From: Szabolcs Nagy <szabolcs dot nagy at arm dot com> >>>>> >>>>> Since >>>>> >>>>> commit a509eb117fac1d764b15eba64993f4bdb63d7f3c >>>>> Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112] >>>>> >>>>> the generation counter update is not needed in the failure path. >>>> >>>> It is not clear to me from just the commit reference why it would >>>> be safe to remove the GL(dl_tls_generation) update on >>>> _dl_add_to_slotinfo. >>>> >>>> The dl_open_worker calls update_tls_slotinfo which in turn call >>>> might call _dl_add_to_slotinfo *after* the demarcation point. Will >>>> it terminate the process? >>> >>> in that commit the logic got changed such that allocations >>> happen before the demarcation point in resize_tls_slotinfo. >>> >>> this is the reason for the do_add bool argument in >>> _dl_add_to_slotinfo: it's called twice and the first call >>> with do_add==false is only there to ensure everything is >>> allocated before the demarcation point (so module loading >>> can be reverted, no need to bump the generation count). >>> >>> i guess this is not visible by just looking at the >>> _dl_add_to_slotinfo code. >> >> Right, so if I reading correctly once _dl_add_to_slotinfo (..., true) >> is called by update_tls_slotinfo, the malloc that create a new >> dtv_slotinfo_list won't be called (since it was already allocated >> previously) since the entry was already pre-allocated and thus the >> search part at line 978-987 will find. Is that correct? > > yes > > if you have an idea how to make things clearer let me know. > Maybe add it on the commit list. The patch looks ok to me thanks.
diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 79b93ad91b..24d00c14ef 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add) + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); if (listp == NULL) { - /* We ran out of memory. We will simply fail this - call but don't undo anything we did so far. The - application will crash or be terminated anyway very - soon. */ - - /* We have to do this since some entries in the dtv - slotinfo array might already point to this - generation. */ - ++GL(dl_tls_generation); - + /* We ran out of memory while resizing the dtv slotinfo list. */ _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\ cannot create TLS data structures")); }
From: Szabolcs Nagy <szabolcs dot nagy at arm dot com> Since commit a509eb117fac1d764b15eba64993f4bdb63d7f3c Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112] the generation counter update is not needed in the failure path. --- elf/dl-tls.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-)