aarch64: Remove ld.so __tls_get_addr plt usage

Message ID	20240405123550.1748641-1-adhemerval.zanella@linaro.org
State	New
Headers	show Return-Path: <libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 93DA438460A2 From: Adhemerval Zanella <adhemerval.zanella@linaro.org> To: libc-alpha@sourceware.org Cc: Szabolcs Nagy <szabolcs.nagy@arm.com> Subject: [PATCH] aarch64: Remove ld.so __tls_get_addr plt usage Date: Fri, 5 Apr 2024 09:35:50 -0300 Message-Id: <20240405123550.1748641-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org
Series	aarch64: Remove ld.so __tls_get_addr plt usage \| expand aarch64: Remove ld.so __tls_get_addr plt usage

Adhemerval Zanella Netto April 5, 2024, 12:35 p.m. UTC

Use the hidden alias instead.

Checked on aarch64-linux-gnu.
---
 sysdeps/aarch64/dl-tlsdesc.S                  | 3 ++-
 sysdeps/unix/sysv/linux/aarch64/localplt.data | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

Szabolcs Nagy April 5, 2024, 2:58 p.m. UTC | #1

The 04/05/2024 09:35, Adhemerval Zanella wrote:
> Use the hidden alias instead.
> 
> Checked on aarch64-linux-gnu.

does this change behaviour in case __tls_get_addr is interposed?


> ---
>  sysdeps/aarch64/dl-tlsdesc.S                  | 3 ++-
>  sysdeps/unix/sysv/linux/aarch64/localplt.data | 2 --
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/sysdeps/aarch64/dl-tlsdesc.S b/sysdeps/aarch64/dl-tlsdesc.S
> index 9b253b39dd..4febf2ad21 100644
> --- a/sysdeps/aarch64/dl-tlsdesc.S
> +++ b/sysdeps/aarch64/dl-tlsdesc.S
> @@ -220,7 +220,7 @@ _dl_tlsdesc_dynamic:
>  	SAVE_Q_REGISTERS
>  
>  	mov	x0, x1
> -	bl	__tls_get_addr
> +	bl	HIDDEN_JUMPTARGET(__tls_get_addr)
>  
>  	mrs	x1, tpidr_el0
>  	sub	PTR_REG (0), PTR_REG (0), PTR_REG (1)
> @@ -246,5 +246,6 @@ _dl_tlsdesc_dynamic:
>  	b	1b
>  	cfi_endproc
>  	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> +	.hidden HIDDEN_JUMPTARGET(__tls_get_addr)
>  # undef NSAVEXREGPAIRS
>  #endif
> diff --git a/sysdeps/unix/sysv/linux/aarch64/localplt.data b/sysdeps/unix/sysv/linux/aarch64/localplt.data
> index 5d217cc50d..5dd07472df 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/localplt.data
> +++ b/sysdeps/unix/sysv/linux/aarch64/localplt.data
> @@ -9,5 +9,3 @@ libm.so: matherr
>  # If outline atomics are used, libgcc (built outside of glibc) may
>  # call __getauxval using the PLT.
>  libc.so: __getauxval ?
> -# The dynamic loader needs __tls_get_addr for TLS.
> -ld.so: __tls_get_addr
> -- 
> 2.34.1
>

Adhemerval Zanella Netto April 5, 2024, 4:29 p.m. UTC | #2

On 05/04/24 11:58, Szabolcs Nagy wrote:
> The 04/05/2024 09:35, Adhemerval Zanella wrote:
>> Use the hidden alias instead.
>>
>> Checked on aarch64-linux-gnu.
> 
> does this change behaviour in case __tls_get_addr is interposed?
> 
> 

Do we support symbol interposition for symbols in implementation namespace?
If so, we will need to revert x86 to use the same semantic and document it
properly.

>> ---
>>  sysdeps/aarch64/dl-tlsdesc.S                  | 3 ++-
>>  sysdeps/unix/sysv/linux/aarch64/localplt.data | 2 --
>>  2 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/sysdeps/aarch64/dl-tlsdesc.S b/sysdeps/aarch64/dl-tlsdesc.S
>> index 9b253b39dd..4febf2ad21 100644
>> --- a/sysdeps/aarch64/dl-tlsdesc.S
>> +++ b/sysdeps/aarch64/dl-tlsdesc.S
>> @@ -220,7 +220,7 @@ _dl_tlsdesc_dynamic:
>>  	SAVE_Q_REGISTERS
>>  
>>  	mov	x0, x1
>> -	bl	__tls_get_addr
>> +	bl	HIDDEN_JUMPTARGET(__tls_get_addr)
>>  
>>  	mrs	x1, tpidr_el0
>>  	sub	PTR_REG (0), PTR_REG (0), PTR_REG (1)
>> @@ -246,5 +246,6 @@ _dl_tlsdesc_dynamic:
>>  	b	1b
>>  	cfi_endproc
>>  	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
>> +	.hidden HIDDEN_JUMPTARGET(__tls_get_addr)
>>  # undef NSAVEXREGPAIRS
>>  #endif
>> diff --git a/sysdeps/unix/sysv/linux/aarch64/localplt.data b/sysdeps/unix/sysv/linux/aarch64/localplt.data
>> index 5d217cc50d..5dd07472df 100644
>> --- a/sysdeps/unix/sysv/linux/aarch64/localplt.data
>> +++ b/sysdeps/unix/sysv/linux/aarch64/localplt.data
>> @@ -9,5 +9,3 @@ libm.so: matherr
>>  # If outline atomics are used, libgcc (built outside of glibc) may
>>  # call __getauxval using the PLT.
>>  libc.so: __getauxval ?
>> -# The dynamic loader needs __tls_get_addr for TLS.
>> -ld.so: __tls_get_addr
>> -- 
>> 2.34.1
>>

Szabolcs Nagy April 6, 2024, 5:40 p.m. UTC | #3

* Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> [2024-04-05 13:29:48 -0300]:

> 
> 
> On 05/04/24 11:58, Szabolcs Nagy wrote:
> > The 04/05/2024 09:35, Adhemerval Zanella wrote:
> >> Use the hidden alias instead.
> >>
> >> Checked on aarch64-linux-gnu.
> > 
> > does this change behaviour in case __tls_get_addr is interposed?
> > 
> > 
> 
> Do we support symbol interposition for symbols in implementation namespace?
> If so, we will need to revert x86 to use the same semantic and document it
> properly.

i suspect this will partially break the sanitizers
and possibly other similar tools that hook internals
so i hoped at least for better commit msg pointing
out that this is deliberate behaviour change.



> 
> >> ---
> >>  sysdeps/aarch64/dl-tlsdesc.S                  | 3 ++-
> >>  sysdeps/unix/sysv/linux/aarch64/localplt.data | 2 --
> >>  2 files changed, 2 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/sysdeps/aarch64/dl-tlsdesc.S b/sysdeps/aarch64/dl-tlsdesc.S
> >> index 9b253b39dd..4febf2ad21 100644
> >> --- a/sysdeps/aarch64/dl-tlsdesc.S
> >> +++ b/sysdeps/aarch64/dl-tlsdesc.S
> >> @@ -220,7 +220,7 @@ _dl_tlsdesc_dynamic:
> >>  	SAVE_Q_REGISTERS
> >>  
> >>  	mov	x0, x1
> >> -	bl	__tls_get_addr
> >> +	bl	HIDDEN_JUMPTARGET(__tls_get_addr)
> >>  
> >>  	mrs	x1, tpidr_el0
> >>  	sub	PTR_REG (0), PTR_REG (0), PTR_REG (1)
> >> @@ -246,5 +246,6 @@ _dl_tlsdesc_dynamic:
> >>  	b	1b
> >>  	cfi_endproc
> >>  	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> >> +	.hidden HIDDEN_JUMPTARGET(__tls_get_addr)
> >>  # undef NSAVEXREGPAIRS
> >>  #endif
> >> diff --git a/sysdeps/unix/sysv/linux/aarch64/localplt.data b/sysdeps/unix/sysv/linux/aarch64/localplt.data
> >> index 5d217cc50d..5dd07472df 100644
> >> --- a/sysdeps/unix/sysv/linux/aarch64/localplt.data
> >> +++ b/sysdeps/unix/sysv/linux/aarch64/localplt.data
> >> @@ -9,5 +9,3 @@ libm.so: matherr
> >>  # If outline atomics are used, libgcc (built outside of glibc) may
> >>  # call __getauxval using the PLT.
> >>  libc.so: __getauxval ?
> >> -# The dynamic loader needs __tls_get_addr for TLS.
> >> -ld.so: __tls_get_addr
> >> -- 
> >> 2.34.1
> >>

Cristian Rodríguez April 7, 2024, 8:29 p.m. UTC | #4

On Fri, Apr 5, 2024 at 11:59 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 04/05/2024 09:35, Adhemerval Zanella wrote:
> > Use the hidden alias instead.
> >
> > Checked on aarch64-linux-gnu.
>
> does this change behaviour in case __tls_get_addr is interposed?

Wut ? is that really supported.. I mean.. isn't that symbol prefix
reserved for the implementation and any assumption about it is either
ID or UB?

Szabolcs Nagy April 8, 2024, 7:26 a.m. UTC | #5

The 04/07/2024 16:29, Cristian Rodríguez wrote:
> On Fri, Apr 5, 2024 at 11:59 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >
> > The 04/05/2024 09:35, Adhemerval Zanella wrote:
> > > Use the hidden alias instead.
> > >
> > > Checked on aarch64-linux-gnu.
> >
> > does this change behaviour in case __tls_get_addr is interposed?
> 
> Wut ? is that really supported.. I mean.. isn't that symbol prefix
> reserved for the implementation and any assumption about it is either
> ID or UB?

a behaviour can change even if it's not supported.
i did not try to imply that it should be supported.

i know sanitizers interpose __tls_get_addr, because
https://sourceware.org/bugzilla/show_bug.cgi?id=16291
i don't know if that hack works at all now for tlsdesc
(where the ld.so calls __tls_get_addr, not user code)

my question was if we investigated this issue since it
is useful to document then in the commit msg (or news
entry if this affects users)

Florian Weimer April 8, 2024, 8:04 a.m. UTC | #6

* Szabolcs Nagy:

> * Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> [2024-04-05 13:29:48 -0300]:
>
>> 
>> 
>> On 05/04/24 11:58, Szabolcs Nagy wrote:
>> > The 04/05/2024 09:35, Adhemerval Zanella wrote:
>> >> Use the hidden alias instead.
>> >>
>> >> Checked on aarch64-linux-gnu.
>> > 
>> > does this change behaviour in case __tls_get_addr is interposed?
>> > 
>> > 
>> 
>> Do we support symbol interposition for symbols in implementation namespace?
>> If so, we will need to revert x86 to use the same semantic and document it
>> properly.
>
> i suspect this will partially break the sanitizers
> and possibly other similar tools that hook internals
> so i hoped at least for better commit msg pointing
> out that this is deliberate behaviour change.

There is a __tls_get_addr interceptor, but I'm not sure what it does,
exactly.  If the initial-exec optimization kicks in, this code path
isn't even used.  I think we should try if removing the indirect call
helps.

Thanks,
Florian

Adhemerval Zanella Netto April 8, 2024, 4:57 p.m. UTC | #7

On 08/04/24 04:26, Szabolcs Nagy wrote:
> The 04/07/2024 16:29, Cristian Rodríguez wrote:
>> On Fri, Apr 5, 2024 at 11:59 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>
>>> The 04/05/2024 09:35, Adhemerval Zanella wrote:
>>>> Use the hidden alias instead.
>>>>
>>>> Checked on aarch64-linux-gnu.
>>>
>>> does this change behaviour in case __tls_get_addr is interposed?
>>
>> Wut ? is that really supported.. I mean.. isn't that symbol prefix
>> reserved for the implementation and any assumption about it is either
>> ID or UB?
> 
> a behaviour can change even if it's not supported.
> i did not try to imply that it should be supported.
> 
> i know sanitizers interpose __tls_get_addr, because
> https://sourceware.org/bugzilla/show_bug.cgi?id=16291
> i don't know if that hack works at all now for tlsdesc
> (where the ld.so calls __tls_get_addr, not user code)
> 
> my question was if we investigated this issue since it
> is useful to document then in the commit msg (or news
> entry if this affects users)

This change 'breaks' the sanitizer trick to get the dynamic TLS, with
this patch I now see:

  MemorySanitizer-AARCH64 :: dtls_test.c
  SanitizerCommon-asan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
  SanitizerCommon-msan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
  SanitizerCommon-tsan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp

And it does not fail on x86 only because it uses -mtls=gnu as default
(the same tests fail on x86 with -mtls=gnu2).

Now that GCC and distributions are aiming to use GNU2/DESC as the
default TLS, this hack will also break on x86.  So the question is 
whether we revert 050f7298e1ecc39887c329037575ccd972071255 and 
document that __tls_get_addr should be interposable, or move with this 
change and try to come up with a possible solution for BZ#16291.

I bringing this because we will have another two ABIs with tlsdesc
support (loongarch and riscv).

Szabolcs Nagy April 9, 2024, 8:30 a.m. UTC | #8

The 04/08/2024 13:57, Adhemerval Zanella Netto wrote:
> On 08/04/24 04:26, Szabolcs Nagy wrote:
> > The 04/07/2024 16:29, Cristian Rodríguez wrote:
> >> On Fri, Apr 5, 2024 at 11:59 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>> The 04/05/2024 09:35, Adhemerval Zanella wrote:
> >>>> Use the hidden alias instead.
> >>>>
> >>>> Checked on aarch64-linux-gnu.
> >>>
> >>> does this change behaviour in case __tls_get_addr is interposed?
> >>
> >> Wut ? is that really supported.. I mean.. isn't that symbol prefix
> >> reserved for the implementation and any assumption about it is either
> >> ID or UB?
> > 
> > a behaviour can change even if it's not supported.
> > i did not try to imply that it should be supported.
> > 
> > i know sanitizers interpose __tls_get_addr, because
> > https://sourceware.org/bugzilla/show_bug.cgi?id=16291
> > i don't know if that hack works at all now for tlsdesc
> > (where the ld.so calls __tls_get_addr, not user code)
> > 
> > my question was if we investigated this issue since it
> > is useful to document then in the commit msg (or news
> > entry if this affects users)
> 
> This change 'breaks' the sanitizer trick to get the dynamic TLS, with
> this patch I now see:
> 
>   MemorySanitizer-AARCH64 :: dtls_test.c
>   SanitizerCommon-asan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
>   SanitizerCommon-msan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
>   SanitizerCommon-tsan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> 
> And it does not fail on x86 only because it uses -mtls=gnu as default
> (the same tests fail on x86 with -mtls=gnu2).
> 
> Now that GCC and distributions are aiming to use GNU2/DESC as the
> default TLS, this hack will also break on x86.  So the question is 
> whether we revert 050f7298e1ecc39887c329037575ccd972071255 and 
> document that __tls_get_addr should be interposable, or move with this 
> change and try to come up with a possible solution for BZ#16291.
> 
> I bringing this because we will have another two ABIs with tlsdesc
> support (loongarch and riscv).

adding some sanitizer committers to cc.

tl;dr: in the next glibc release tlsdesc will not call
__tls_get_addr in an interposable way in the dynamic tls
allocation case, unless somebody screems that this is needed.
(affects targets that may default to tlsdesc, but note that
the dynamic case only triggers with tlsdesc when a lot of
dlopened tls is used, otherwise static tls area is used)

i think it is also possible that we will use custom malloc
in ld.so which may be just as big change for the sanitizers.
(this can make tls access signal safe)

i'm not against the change, but if we plan to add several
interposable hooks as in
https://sourceware.org/glibc/wiki/ThreadPropertiesAPI
then we might as well keep __tls_get_addr PLT for now.

Adhemerval Zanella Netto April 9, 2024, 2:03 p.m. UTC | #9

On 09/04/24 05:30, Szabolcs Nagy wrote:
> The 04/08/2024 13:57, Adhemerval Zanella Netto wrote:
>> On 08/04/24 04:26, Szabolcs Nagy wrote:
>>> The 04/07/2024 16:29, Cristian Rodríguez wrote:
>>>> On Fri, Apr 5, 2024 at 11:59 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>>> The 04/05/2024 09:35, Adhemerval Zanella wrote:
>>>>>> Use the hidden alias instead.
>>>>>>
>>>>>> Checked on aarch64-linux-gnu.
>>>>>
>>>>> does this change behaviour in case __tls_get_addr is interposed?
>>>>
>>>> Wut ? is that really supported.. I mean.. isn't that symbol prefix
>>>> reserved for the implementation and any assumption about it is either
>>>> ID or UB?
>>>
>>> a behaviour can change even if it's not supported.
>>> i did not try to imply that it should be supported.
>>>
>>> i know sanitizers interpose __tls_get_addr, because
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=16291
>>> i don't know if that hack works at all now for tlsdesc
>>> (where the ld.so calls __tls_get_addr, not user code)
>>>
>>> my question was if we investigated this issue since it
>>> is useful to document then in the commit msg (or news
>>> entry if this affects users)
>>
>> This change 'breaks' the sanitizer trick to get the dynamic TLS, with
>> this patch I now see:
>>
>>   MemorySanitizer-AARCH64 :: dtls_test.c
>>   SanitizerCommon-asan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
>>   SanitizerCommon-msan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
>>   SanitizerCommon-tsan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
>>
>> And it does not fail on x86 only because it uses -mtls=gnu as default
>> (the same tests fail on x86 with -mtls=gnu2).
>>
>> Now that GCC and distributions are aiming to use GNU2/DESC as the
>> default TLS, this hack will also break on x86.  So the question is 
>> whether we revert 050f7298e1ecc39887c329037575ccd972071255 and 
>> document that __tls_get_addr should be interposable, or move with this 
>> change and try to come up with a possible solution for BZ#16291.
>>
>> I bringing this because we will have another two ABIs with tlsdesc
>> support (loongarch and riscv).
> 
> adding some sanitizer committers to cc.
> 
> tl;dr: in the next glibc release tlsdesc will not call
> __tls_get_addr in an interposable way in the dynamic tls
> allocation case, unless somebody screems that this is needed.
> (affects targets that may default to tlsdesc, but note that
> the dynamic case only triggers with tlsdesc when a lot of
> dlopened tls is used, otherwise static tls area is used)

Just a note that this already true for x86 with -mtls=gnu2 since
2.21.  And now that distro are aiming to make it default, this issues
will happen more often.

> 
> i think it is also possible that we will use custom malloc
> in ld.so which may be just as big change for the sanitizers.
> (this can make tls access signal safe)
> 
> i'm not against the change, but if we plan to add several
> interposable hooks as in
> https://sourceware.org/glibc/wiki/ThreadPropertiesAPI
> then we might as well keep __tls_get_addr PLT for now.
> 

I don't have a strong opinion, but what I really want is to have
consistency over the architectures.  Meaning that if we want to keep
the __tls_get_addr PLT for sanitizer/runtime hooks, it would be good
to revert the x86 change.

It also means to document it properly somewhere and make the new
RISC-V and loongarch follow the same guidelines.

I will take a look again on the ThreadPropertiesAPI, since it is has
been more and more a demanding issue.

H.J. Lu April 9, 2024, 2:05 p.m. UTC | #10

On Tue, Apr 9, 2024 at 7:03 AM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 09/04/24 05:30, Szabolcs Nagy wrote:
> > The 04/08/2024 13:57, Adhemerval Zanella Netto wrote:
> >> On 08/04/24 04:26, Szabolcs Nagy wrote:
> >>> The 04/07/2024 16:29, Cristian Rodríguez wrote:
> >>>> On Fri, Apr 5, 2024 at 11:59 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>>>> The 04/05/2024 09:35, Adhemerval Zanella wrote:
> >>>>>> Use the hidden alias instead.
> >>>>>>
> >>>>>> Checked on aarch64-linux-gnu.
> >>>>>
> >>>>> does this change behaviour in case __tls_get_addr is interposed?
> >>>>
> >>>> Wut ? is that really supported.. I mean.. isn't that symbol prefix
> >>>> reserved for the implementation and any assumption about it is either
> >>>> ID or UB?
> >>>
> >>> a behaviour can change even if it's not supported.
> >>> i did not try to imply that it should be supported.
> >>>
> >>> i know sanitizers interpose __tls_get_addr, because
> >>> https://sourceware.org/bugzilla/show_bug.cgi?id=16291
> >>> i don't know if that hack works at all now for tlsdesc
> >>> (where the ld.so calls __tls_get_addr, not user code)
> >>>
> >>> my question was if we investigated this issue since it
> >>> is useful to document then in the commit msg (or news
> >>> entry if this affects users)
> >>
> >> This change 'breaks' the sanitizer trick to get the dynamic TLS, with
> >> this patch I now see:
> >>
> >>   MemorySanitizer-AARCH64 :: dtls_test.c
> >>   SanitizerCommon-asan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> >>   SanitizerCommon-msan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> >>   SanitizerCommon-tsan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> >>
> >> And it does not fail on x86 only because it uses -mtls=gnu as default
> >> (the same tests fail on x86 with -mtls=gnu2).
> >>
> >> Now that GCC and distributions are aiming to use GNU2/DESC as the
> >> default TLS, this hack will also break on x86.  So the question is
> >> whether we revert 050f7298e1ecc39887c329037575ccd972071255 and
> >> document that __tls_get_addr should be interposable, or move with this
> >> change and try to come up with a possible solution for BZ#16291.
> >>
> >> I bringing this because we will have another two ABIs with tlsdesc
> >> support (loongarch and riscv).
> >
> > adding some sanitizer committers to cc.
> >
> > tl;dr: in the next glibc release tlsdesc will not call
> > __tls_get_addr in an interposable way in the dynamic tls
> > allocation case, unless somebody screems that this is needed.
> > (affects targets that may default to tlsdesc, but note that
> > the dynamic case only triggers with tlsdesc when a lot of
> > dlopened tls is used, otherwise static tls area is used)
>
> Just a note that this already true for x86 with -mtls=gnu2 since
> 2.21.  And now that distro are aiming to make it default, this issues
> will happen more often.
>
> >
> > i think it is also possible that we will use custom malloc
> > in ld.so which may be just as big change for the sanitizers.
> > (this can make tls access signal safe)
> >
> > i'm not against the change, but if we plan to add several
> > interposable hooks as in
> > https://sourceware.org/glibc/wiki/ThreadPropertiesAPI
> > then we might as well keep __tls_get_addr PLT for now.
> >
>
> I don't have a strong opinion, but what I really want is to have
> consistency over the architectures.  Meaning that if we want to keep
> the __tls_get_addr PLT for sanitizer/runtime hooks, it would be good
> to revert the x86 change.

We need to add gnu2 tests to libsanitizer when making the x86 change.

> It also means to document it properly somewhere and make the new
> RISC-V and loongarch follow the same guidelines.
>
> I will take a look again on the ThreadPropertiesAPI, since it is has
> been more and more a demanding issue.

Palmer Dabbelt April 9, 2024, 2:11 p.m. UTC | #11

On Tue, 09 Apr 2024 07:03:45 PDT (-0700), adhemerval.zanella@linaro.org wrote:
>
>
> On 09/04/24 05:30, Szabolcs Nagy wrote:
>> The 04/08/2024 13:57, Adhemerval Zanella Netto wrote:
>>> On 08/04/24 04:26, Szabolcs Nagy wrote:
>>>> The 04/07/2024 16:29, Cristian Rodríguez wrote:
>>>>> On Fri, Apr 5, 2024 at 11:59 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>>>> The 04/05/2024 09:35, Adhemerval Zanella wrote:
>>>>>>> Use the hidden alias instead.
>>>>>>>
>>>>>>> Checked on aarch64-linux-gnu.
>>>>>>
>>>>>> does this change behaviour in case __tls_get_addr is interposed?
>>>>>
>>>>> Wut ? is that really supported.. I mean.. isn't that symbol prefix
>>>>> reserved for the implementation and any assumption about it is either
>>>>> ID or UB?
>>>>
>>>> a behaviour can change even if it's not supported.
>>>> i did not try to imply that it should be supported.
>>>>
>>>> i know sanitizers interpose __tls_get_addr, because
>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=16291
>>>> i don't know if that hack works at all now for tlsdesc
>>>> (where the ld.so calls __tls_get_addr, not user code)
>>>>
>>>> my question was if we investigated this issue since it
>>>> is useful to document then in the commit msg (or news
>>>> entry if this affects users)
>>>
>>> This change 'breaks' the sanitizer trick to get the dynamic TLS, with
>>> this patch I now see:
>>>
>>>   MemorySanitizer-AARCH64 :: dtls_test.c
>>>   SanitizerCommon-asan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
>>>   SanitizerCommon-msan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
>>>   SanitizerCommon-tsan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
>>>
>>> And it does not fail on x86 only because it uses -mtls=gnu as default
>>> (the same tests fail on x86 with -mtls=gnu2).
>>>
>>> Now that GCC and distributions are aiming to use GNU2/DESC as the
>>> default TLS, this hack will also break on x86.  So the question is
>>> whether we revert 050f7298e1ecc39887c329037575ccd972071255 and
>>> document that __tls_get_addr should be interposable, or move with this
>>> change and try to come up with a possible solution for BZ#16291.
>>>
>>> I bringing this because we will have another two ABIs with tlsdesc
>>> support (loongarch and riscv).
>>
>> adding some sanitizer committers to cc.
>>
>> tl;dr: in the next glibc release tlsdesc will not call
>> __tls_get_addr in an interposable way in the dynamic tls
>> allocation case, unless somebody screems that this is needed.
>> (affects targets that may default to tlsdesc, but note that
>> the dynamic case only triggers with tlsdesc when a lot of
>> dlopened tls is used, otherwise static tls area is used)
>
> Just a note that this already true for x86 with -mtls=gnu2 since
> 2.21.  And now that distro are aiming to make it default, this issues
> will happen more often.
>
>>
>> i think it is also possible that we will use custom malloc
>> in ld.so which may be just as big change for the sanitizers.
>> (this can make tls access signal safe)
>>
>> i'm not against the change, but if we plan to add several
>> interposable hooks as in
>> https://sourceware.org/glibc/wiki/ThreadPropertiesAPI
>> then we might as well keep __tls_get_addr PLT for now.
>>
>
> I don't have a strong opinion, but what I really want is to have
> consistency over the architectures.  Meaning that if we want to keep
> the __tls_get_addr PLT for sanitizer/runtime hooks, it would be good
> to revert the x86 change.
>
> It also means to document it properly somewhere and make the new
> RISC-V and loongarch follow the same guidelines.

I also don't have a strong opinion on whether __tls_get_addr should be 
interposable, but I'm happy to try and make the RISC-V port match 
arm64/x86.  I guess we're kind of safe for now as we don't have TLSDESC 
merged, though I think we were getting pretty close there so we should 
probbaly decide before we accidentally commit to an ABI.

> I will take a look again on the ThreadPropertiesAPI, since it is has
> been more and more a demanding issue.

H.J. Lu April 9, 2024, 2:46 p.m. UTC | #12

On Tue, Apr 9, 2024 at 7:11 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Tue, 09 Apr 2024 07:03:45 PDT (-0700), adhemerval.zanella@linaro.org wrote:
> >
> >
> > On 09/04/24 05:30, Szabolcs Nagy wrote:
> >> The 04/08/2024 13:57, Adhemerval Zanella Netto wrote:
> >>> On 08/04/24 04:26, Szabolcs Nagy wrote:
> >>>> The 04/07/2024 16:29, Cristian Rodríguez wrote:
> >>>>> On Fri, Apr 5, 2024 at 11:59 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>>>>> The 04/05/2024 09:35, Adhemerval Zanella wrote:
> >>>>>>> Use the hidden alias instead.
> >>>>>>>
> >>>>>>> Checked on aarch64-linux-gnu.
> >>>>>>
> >>>>>> does this change behaviour in case __tls_get_addr is interposed?
> >>>>>
> >>>>> Wut ? is that really supported.. I mean.. isn't that symbol prefix
> >>>>> reserved for the implementation and any assumption about it is either
> >>>>> ID or UB?
> >>>>
> >>>> a behaviour can change even if it's not supported.
> >>>> i did not try to imply that it should be supported.
> >>>>
> >>>> i know sanitizers interpose __tls_get_addr, because
> >>>> https://sourceware.org/bugzilla/show_bug.cgi?id=16291
> >>>> i don't know if that hack works at all now for tlsdesc
> >>>> (where the ld.so calls __tls_get_addr, not user code)
> >>>>
> >>>> my question was if we investigated this issue since it
> >>>> is useful to document then in the commit msg (or news
> >>>> entry if this affects users)
> >>>
> >>> This change 'breaks' the sanitizer trick to get the dynamic TLS, with
> >>> this patch I now see:
> >>>
> >>>   MemorySanitizer-AARCH64 :: dtls_test.c
> >>>   SanitizerCommon-asan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> >>>   SanitizerCommon-msan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> >>>   SanitizerCommon-tsan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> >>>
> >>> And it does not fail on x86 only because it uses -mtls=gnu as default
> >>> (the same tests fail on x86 with -mtls=gnu2).
> >>>
> >>> Now that GCC and distributions are aiming to use GNU2/DESC as the
> >>> default TLS, this hack will also break on x86.  So the question is
> >>> whether we revert 050f7298e1ecc39887c329037575ccd972071255 and
> >>> document that __tls_get_addr should be interposable, or move with this
> >>> change and try to come up with a possible solution for BZ#16291.
> >>>
> >>> I bringing this because we will have another two ABIs with tlsdesc
> >>> support (loongarch and riscv).
> >>
> >> adding some sanitizer committers to cc.
> >>
> >> tl;dr: in the next glibc release tlsdesc will not call
> >> __tls_get_addr in an interposable way in the dynamic tls
> >> allocation case, unless somebody screems that this is needed.
> >> (affects targets that may default to tlsdesc, but note that
> >> the dynamic case only triggers with tlsdesc when a lot of
> >> dlopened tls is used, otherwise static tls area is used)
> >
> > Just a note that this already true for x86 with -mtls=gnu2 since
> > 2.21.  And now that distro are aiming to make it default, this issues
> > will happen more often.
> >
> >>
> >> i think it is also possible that we will use custom malloc
> >> in ld.so which may be just as big change for the sanitizers.
> >> (this can make tls access signal safe)
> >>
> >> i'm not against the change, but if we plan to add several
> >> interposable hooks as in
> >> https://sourceware.org/glibc/wiki/ThreadPropertiesAPI
> >> then we might as well keep __tls_get_addr PLT for now.
> >>
> >
> > I don't have a strong opinion, but what I really want is to have
> > consistency over the architectures.  Meaning that if we want to keep
> > the __tls_get_addr PLT for sanitizer/runtime hooks, it would be good
> > to revert the x86 change.
> >
> > It also means to document it properly somewhere and make the new
> > RISC-V and loongarch follow the same guidelines.
>
> I also don't have a strong opinion on whether __tls_get_addr should be
> interposable, but I'm happy to try and make the RISC-V port match
> arm64/x86.  I guess we're kind of safe for now as we don't have TLSDESC
> merged, though I think we were getting pretty close there so we should
> probbaly decide before we accidentally commit to an ABI.
>
> > I will take a look again on the ThreadPropertiesAPI, since it is has
> > been more and more a demanding issue.

We should add a __tls_get_addr glibc test if we decide it should be
called via PLT.

Fangrui Song April 9, 2024, 5:50 p.m. UTC | #13

On Tue, Apr 9, 2024 at 7:03 AM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 09/04/24 05:30, Szabolcs Nagy wrote:
> > The 04/08/2024 13:57, Adhemerval Zanella Netto wrote:
> >> On 08/04/24 04:26, Szabolcs Nagy wrote:
> >>> The 04/07/2024 16:29, Cristian Rodríguez wrote:
> >>>> On Fri, Apr 5, 2024 at 11:59 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>>>> The 04/05/2024 09:35, Adhemerval Zanella wrote:
> >>>>>> Use the hidden alias instead.
> >>>>>>
> >>>>>> Checked on aarch64-linux-gnu.
> >>>>>
> >>>>> does this change behaviour in case __tls_get_addr is interposed?
> >>>>
> >>>> Wut ? is that really supported.. I mean.. isn't that symbol prefix
> >>>> reserved for the implementation and any assumption about it is either
> >>>> ID or UB?
> >>>
> >>> a behaviour can change even if it's not supported.
> >>> i did not try to imply that it should be supported.
> >>>
> >>> i know sanitizers interpose __tls_get_addr, because
> >>> https://sourceware.org/bugzilla/show_bug.cgi?id=16291
> >>> i don't know if that hack works at all now for tlsdesc
> >>> (where the ld.so calls __tls_get_addr, not user code)
> >>>
> >>> my question was if we investigated this issue since it
> >>> is useful to document then in the commit msg (or news
> >>> entry if this affects users)
> >>
> >> This change 'breaks' the sanitizer trick to get the dynamic TLS, with
> >> this patch I now see:
> >>
> >>   MemorySanitizer-AARCH64 :: dtls_test.c
> >>   SanitizerCommon-asan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> >>   SanitizerCommon-msan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> >>   SanitizerCommon-tsan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> >>
> >> And it does not fail on x86 only because it uses -mtls=gnu as default
> >> (the same tests fail on x86 with -mtls=gnu2).
> >>
> >> Now that GCC and distributions are aiming to use GNU2/DESC as the
> >> default TLS, this hack will also break on x86.  So the question is
> >> whether we revert 050f7298e1ecc39887c329037575ccd972071255 and
> >> document that __tls_get_addr should be interposable, or move with this
> >> change and try to come up with a possible solution for BZ#16291.
> >>
> >> I bringing this because we will have another two ABIs with tlsdesc
> >> support (loongarch and riscv).

Last time I analyzed the __tls_get_addr interceptor in sanitizers, I
have made quite some notes at
https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#why-does-compiler-rt-need-to-know-tls-blocks

Yes, an interceptor is needed.

> msan needs to do more than asan: the __tls_get_addr interceptor (DTLS_on_tls_get_addr) detects new dynamic TLS blocks and unpoisons the shadow. ld.so calls a non-interposable memset to clear the blocks. Otherwise, if a dynamic TLS block reuses a previous allocation with poison, there may be false positives. One way to semi reliably trigger this is (test/msan/dtls_test.cpp https://github.com/google/sanitizers/issues/547):

in a thread, write an uninitialized (poisoned) value to a dynamic TLS block
destroy the thread
create a new thread
try making the new thread reuse the poisoned dynamic TLS block.

Note: aarch64 uses TLSDESC by default and there is no interposable symbol.

---

Current TLSDESC implementations not providing an interposable symbol
is indeed an issue, but I haven't carefully analyzed it on an aarch64
machine.



> > adding some sanitizer committers to cc.
> >
> > tl;dr: in the next glibc release tlsdesc will not call
> > __tls_get_addr in an interposable way in the dynamic tls
> > allocation case, unless somebody screems that this is needed.
> > (affects targets that may default to tlsdesc, but note that
> > the dynamic case only triggers with tlsdesc when a lot of
> > dlopened tls is used, otherwise static tls area is used)
>
> Just a note that this already true for x86 with -mtls=gnu2 since
> 2.21.  And now that distro are aiming to make it default, this issues
> will happen more often.

Yes.

> > i think it is also possible that we will use custom malloc
> > in ld.so which may be just as big change for the sanitizers.
> > (this can make tls access signal safe)
> >
> > i'm not against the change, but if we plan to add several
> > interposable hooks as in
> > https://sourceware.org/glibc/wiki/ThreadPropertiesAPI
> > then we might as well keep __tls_get_addr PLT for now.
> >
>
> I don't have a strong opinion, but what I really want is to have
> consistency over the architectures.  Meaning that if we want to keep
> the __tls_get_addr PLT for sanitizer/runtime hooks, it would be good
> to revert the x86 change.
>
> It also means to document it properly somewhere and make the new
> RISC-V and loongarch follow the same guidelines.
>
> I will take a look again on the ThreadPropertiesAPI, since it is has
> been more and more a demanding issue.

Thanks!

Szabolcs Nagy April 10, 2024, 7:29 a.m. UTC | #14

The 04/09/2024 10:50, Fangrui Song wrote:
> On Tue, Apr 9, 2024 at 7:03 AM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
> > On 09/04/24 05:30, Szabolcs Nagy wrote:
> > > The 04/08/2024 13:57, Adhemerval Zanella Netto wrote:
> > >> On 08/04/24 04:26, Szabolcs Nagy wrote:
> > >>> i know sanitizers interpose __tls_get_addr, because
> > >>> https://sourceware.org/bugzilla/show_bug.cgi?id=16291
> > >>> i don't know if that hack works at all now for tlsdesc
> > >>> (where the ld.so calls __tls_get_addr, not user code)
> > >>>
> > >>> my question was if we investigated this issue since it
> > >>> is useful to document then in the commit msg (or news
> > >>> entry if this affects users)
> > >>
> > >> This change 'breaks' the sanitizer trick to get the dynamic TLS, with
> > >> this patch I now see:
> > >>
> > >>   MemorySanitizer-AARCH64 :: dtls_test.c
> > >>   SanitizerCommon-asan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> > >>   SanitizerCommon-msan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> > >>   SanitizerCommon-tsan-aarch64-Linux :: Linux/resize_tls_dynamic.cpp
> > >>
> > >> And it does not fail on x86 only because it uses -mtls=gnu as default
> > >> (the same tests fail on x86 with -mtls=gnu2).
> > >>
> > >> Now that GCC and distributions are aiming to use GNU2/DESC as the
> > >> default TLS, this hack will also break on x86.  So the question is
> > >> whether we revert 050f7298e1ecc39887c329037575ccd972071255 and
> > >> document that __tls_get_addr should be interposable, or move with this
> > >> change and try to come up with a possible solution for BZ#16291.
> > >>
> > >> I bringing this because we will have another two ABIs with tlsdesc
> > >> support (loongarch and riscv).
> 
> Last time I analyzed the __tls_get_addr interceptor in sanitizers, I
> have made quite some notes at
> https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#why-does-compiler-rt-need-to-know-tls-blocks

says

> Note: aarch64 uses TLSDESC by default and there is no interposable symbol.

not quite true: the tlsdesc call cannot be interposed, but then
ld.so internally calls __tls_get_addr via a PLT that can be.
(only for first time dynamic tls setup, but that's enough to
get dynamic tls boundaries)

you need another way to detect the static tls area, but that's
true for non-tlsdesc too.

> Current TLSDESC implementations not providing an interposable symbol
> is indeed an issue, but I haven't carefully analyzed it on an aarch64
> machine.

we are planning to remove the interposable symbol in the
next glibc version, so the current __tls_get_addr trick
won't work for x86_64, aarch64, loongarch, riscv.

Florian Weimer April 10, 2024, 8:23 a.m. UTC | #15

* Fangrui Song:

> Last time I analyzed the __tls_get_addr interceptor in sanitizers, I
> have made quite some notes at
> https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#why-does-compiler-rt-need-to-know-tls-blocks
>
> Yes, an interceptor is needed.

There's no guarantuee that TLS access goes through a regular function
call, so any design that relies on such a call happening is
fundamentally broken.

Quoting from your article:

| Note: if the allocation is rtld/libc internal and not intercepted,
| there is no need to unpoison the range. The associated shadow is
| supposed to be zeros. However, if the allocation is intercepted, the
| runtime should unpoison the range in case the range reuses a previous
| allocation which happens to contain poisoned bytes.
|
| In glibc, _dl_allocate_tls and _dl_deallocate_tls call malloc/free
| functions which are internal and not intercepted, so the allocations
| are opaque to the runtime and the shadow bytes are all zeroes.

I don't think this is accurate.  We call the application malloc/free for
non-main threads after initialization.

Having an accurate description of sanitizer needs in this area would be
really helpful, but I think we are not quite there yet.  (This is
different from an API description.)

I think there are several aspects here:

(a) Avoid false errors for bounds checks for Address Sanitizer.

(b) Support pointer discovery for Leak Sanitizer (essentially conservative
    garbage collection).

(c) Avoid false data race reports for Thread Sanitizer after TLS reuse
    from one thread for a different thread (only with non-overlapping
    lifetimes).

Based on your description, I'm not sure if (a) is actually a problem.
If we don't use application malloc for TLS allocations, bounds checking
is bypassed apparently?  And if we use malloc, out-of-bounds accesses
would be actual bugs.

Aspect (b) is a real issue.  Could we address that by allocating the TCB
(with static TLS) and all dynamic TLS with application malloc (or
rather, memalign/aligned_alloc), and keep a pointer to the allocation on
the thread stack?  Then a conservative collector could find it, and scan
it for pointers.  A gap remains for the main thread, whose TCB is not
allocated using application malloc—and can't be, as the application
malloc itself very likely depends on the TCB already being there.  We
could switch TCBs after allocating another one with malloc, but that
would require some hand-off protocol, I believe.  Maybe it's better to
register early allocations with the sanitizer directly, using some
appropriate API.

For (c), we could just stop caching TCBs after thread exit.  If we call
free, and reallocate for the new thread, that should avoid the false
data race.  This issue does not affect the main thread.

Based on that, I don't think we need to support discovery of TLS areas,
or export any other internal implementation details.  We just need to
use more malloc within glibc if we detect an active sanitizer, and find
a way to make the TCB allocation of the main thread known to the
sanitizer.

Thanks,
Florian

enh April 10, 2024, 3:46 p.m. UTC | #16

bionic implemented
https://sourceware.org/glibc/wiki/ThreadPropertiesAPI in
https://android.googlesource.com/platform/bionic/+/refs/heads/main/libc/include/sys/thread_properties.h
but tbh i'm not sure that the sanitizer folks moved over to the new
api?

(i don't think we could just use malloc() because jemalloc -- which we
still haven't fully removed for very low-end users in favor of scudo
-- was itself using TLS thread locals? scudo has its own reserved
constant slot in bionic, so that should be fine, i think, but we're
not there yet.)

On Wed, Apr 10, 2024 at 1:24 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Fangrui Song:
>
> > Last time I analyzed the __tls_get_addr interceptor in sanitizers, I
> > have made quite some notes at
> > https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#why-does-compiler-rt-need-to-know-tls-blocks
> >
> > Yes, an interceptor is needed.
>
> There's no guarantuee that TLS access goes through a regular function
> call, so any design that relies on such a call happening is
> fundamentally broken.
>
> Quoting from your article:
>
> | Note: if the allocation is rtld/libc internal and not intercepted,
> | there is no need to unpoison the range. The associated shadow is
> | supposed to be zeros. However, if the allocation is intercepted, the
> | runtime should unpoison the range in case the range reuses a previous
> | allocation which happens to contain poisoned bytes.
> |
> | In glibc, _dl_allocate_tls and _dl_deallocate_tls call malloc/free
> | functions which are internal and not intercepted, so the allocations
> | are opaque to the runtime and the shadow bytes are all zeroes.
>
> I don't think this is accurate.  We call the application malloc/free for
> non-main threads after initialization.
>
> Having an accurate description of sanitizer needs in this area would be
> really helpful, but I think we are not quite there yet.  (This is
> different from an API description.)
>
> I think there are several aspects here:
>
> (a) Avoid false errors for bounds checks for Address Sanitizer.
>
> (b) Support pointer discovery for Leak Sanitizer (essentially conservative
>     garbage collection).
>
> (c) Avoid false data race reports for Thread Sanitizer after TLS reuse
>     from one thread for a different thread (only with non-overlapping
>     lifetimes).
>
> Based on your description, I'm not sure if (a) is actually a problem.
> If we don't use application malloc for TLS allocations, bounds checking
> is bypassed apparently?  And if we use malloc, out-of-bounds accesses
> would be actual bugs.
>
> Aspect (b) is a real issue.  Could we address that by allocating the TCB
> (with static TLS) and all dynamic TLS with application malloc (or
> rather, memalign/aligned_alloc), and keep a pointer to the allocation on
> the thread stack?  Then a conservative collector could find it, and scan
> it for pointers.  A gap remains for the main thread, whose TCB is not
> allocated using application malloc—and can't be, as the application
> malloc itself very likely depends on the TCB already being there.  We
> could switch TCBs after allocating another one with malloc, but that
> would require some hand-off protocol, I believe.  Maybe it's better to
> register early allocations with the sanitizer directly, using some
> appropriate API.
>
> For (c), we could just stop caching TCBs after thread exit.  If we call
> free, and reallocate for the new thread, that should avoid the false
> data race.  This issue does not affect the main thread.
>
> Based on that, I don't think we need to support discovery of TLS areas,
> or export any other internal implementation details.  We just need to
> use more malloc within glibc if we detect an active sanitizer, and find
> a way to make the TCB allocation of the main thread known to the
> sanitizer.
>
> Thanks,
> Florian
>

Florian Weimer April 15, 2024, 11:41 a.m. UTC | #17

> bionic implemented
> https://sourceware.org/glibc/wiki/ThreadPropertiesAPI in
> https://android.googlesource.com/platform/bionic/+/refs/heads/main/libc/include/sys/thread_properties.h
> but tbh i'm not sure that the sanitizer folks moved over to the new
> api?

I think I wrote about this before: The APIs are underspecified.  I think
we want to be able to change the boundaries of static TLS eventually
(allocating more backing storage before the start or the end, using
address space that previously was only reserved with PROT_NONE).

Maybe I should put it into the wiki page.

> (i don't think we could just use malloc() because jemalloc -- which we
> still haven't fully removed for very low-end users in favor of scudo
> -- was itself using TLS thread locals? scudo has its own reserved
> constant slot in bionic, so that should be fine, i think, but we're
> not there yet.)

Yes, that's the TCB bootstrap problem I mentioned.  It's only relevant
to the initial thread.  There is no functional requirement to allocate
TCBs directly on the thread they are used for.  So once we solve the TCB
issue for the main thread (and have a mode that uses malloc otherwise),
I think we are done.

Thanks,
Florian

Adhemerval Zanella Netto April 15, 2024, 8:22 p.m. UTC | #18

On 10/04/24 05:23, Florian Weimer wrote:
> * Fangrui Song:
> 
>> Last time I analyzed the __tls_get_addr interceptor in sanitizers, I
>> have made quite some notes at
>> https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#why-does-compiler-rt-need-to-know-tls-blocks
>>
>> Yes, an interceptor is needed.
> 
> There's no guarantuee that TLS access goes through a regular function
> call, so any design that relies on such a call happening is
> fundamentally broken.
> 
> Quoting from your article:
> 
> | Note: if the allocation is rtld/libc internal and not intercepted,
> | there is no need to unpoison the range. The associated shadow is
> | supposed to be zeros. However, if the allocation is intercepted, the
> | runtime should unpoison the range in case the range reuses a previous
> | allocation which happens to contain poisoned bytes.
> |
> | In glibc, _dl_allocate_tls and _dl_deallocate_tls call malloc/free
> | functions which are internal and not intercepted, so the allocations
> | are opaque to the runtime and the shadow bytes are all zeroes.
> 
> I don't think this is accurate.  We call the application malloc/free for
> non-main threads after initialization.
> 
> Having an accurate description of sanitizer needs in this area would be
> really helpful, but I think we are not quite there yet.  (This is
> different from an API description.)
> 
> I think there are several aspects here:
> 
> (a) Avoid false errors for bounds checks for Address Sanitizer.
> 
> (b) Support pointer discovery for Leak Sanitizer (essentially conservative
>     garbage collection).
> 
> (c) Avoid false data race reports for Thread Sanitizer after TLS reuse
>     from one thread for a different thread (only with non-overlapping
>     lifetimes).
> 
> Based on your description, I'm not sure if (a) is actually a problem.
> If we don't use application malloc for TLS allocations, bounds checking
> is bypassed apparently?  And if we use malloc, out-of-bounds accesses
> would be actual bugs.
> 
> Aspect (b) is a real issue.  Could we address that by allocating the TCB
> (with static TLS) and all dynamic TLS with application malloc (or
> rather, memalign/aligned_alloc), and keep a pointer to the allocation on
> the thread stack?  Then a conservative collector could find it, and scan
> it for pointers.  A gap remains for the main thread, whose TCB is not
> allocated using application malloc—and can't be, as the application
> malloc itself very likely depends on the TCB already being there.  We
> could switch TCBs after allocating another one with malloc, but that
> would require some hand-off protocol, I believe.  Maybe it's better to
> register early allocations with the sanitizer directly, using some
> appropriate API.

Using malloc will also improve TCB hardening [1], so I think it would be
valuable to implement regardless of sanitizer work.  

> 
> For (c), we could just stop caching TCBs after thread exit.  If we call
> free, and reallocate for the new thread, that should avoid the false
> data race.  This issue does not affect the main thread.

We already have a tunable, glibc.pthread.stack_cache_size, which controls
the thread cached size and setting to 0 should disable it. I do not think
API to dynamically change tunables is a good approach (we might have
potential issues to adapt the code to a dynamic value), so maybe an option
would be to have interposable symbol programs could implement that can
override the tunable values at programs startup. 

> 
> Based on that, I don't think we need to support discovery of TLS areas,
> or export any other internal implementation details.  We just need to
> use more malloc within glibc if we detect an active sanitizer, and find
> a way to make the TCB allocation of the main thread known to the
> sanitizer.
> 
> Thanks,
> Florian
> 

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=22850

enh May 22, 2024, 3:40 p.m. UTC | #19

On Mon, Apr 15, 2024 at 7:41 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> > bionic implemented
> > https://sourceware.org/glibc/wiki/ThreadPropertiesAPI in
> > https://android.googlesource.com/platform/bionic/+/refs/heads/main/libc/include/sys/thread_properties.h
> > but tbh i'm not sure that the sanitizer folks moved over to the new
> > api?
>
> I think I wrote about this before: The APIs are underspecified.  I think
> we want to be able to change the boundaries of static TLS eventually
> (allocating more backing storage before the start or the end, using
> address space that previously was only reserved with PROT_NONE).
>
> Maybe I should put it into the wiki page.

TIL that the "tests" don't _build_, let alone pass. speaking of
"underspecified", the implementation and tests for
__libc_register_thread_exit_callback() disagree about whether or not
the _main_ thread exiting should cause the callbacks to be run. (the
implementation only calls them for pthread_create() threads, but the
test creates no threads and expects the callbacks to have run.)

given that there's no argument to the callback to identify _which_
thread exited, i'm not entirely sure how this was intended to be used
anyway? is it useful to know "something exited"? (afaict no llvm
sanitizer is using this.)

> > (i don't think we could just use malloc() because jemalloc -- which we
> > still haven't fully removed for very low-end users in favor of scudo
> > -- was itself using TLS thread locals? scudo has its own reserved
> > constant slot in bionic, so that should be fine, i think, but we're
> > not there yet.)
>
> Yes, that's the TCB bootstrap problem I mentioned.  It's only relevant
> to the initial thread.  There is no functional requirement to allocate
> TCBs directly on the thread they are used for.  So once we solve the TCB
> issue for the main thread (and have a mode that uses malloc otherwise),
> I think we are done.
>
> Thanks,
> Florian
>

Evgenii Stepanov May 30, 2024, 11:28 p.m. UTC | #20

On Wed, May 22, 2024 at 8:40 AM enh <enh@google.com> wrote:

> On Mon, Apr 15, 2024 at 7:41 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > > bionic implemented
> > > https://sourceware.org/glibc/wiki/ThreadPropertiesAPI in
> > >
> https://android.googlesource.com/platform/bionic/+/refs/heads/main/libc/include/sys/thread_properties.h
> > > but tbh i'm not sure that the sanitizer folks moved over to the new
> > > api?
> >
> > I think I wrote about this before: The APIs are underspecified.  I think
> > we want to be able to change the boundaries of static TLS eventually
> > (allocating more backing storage before the start or the end, using
> > address space that previously was only reserved with PROT_NONE).
> >
> > Maybe I should put it into the wiki page.
>
> TIL that the "tests" don't _build_, let alone pass. speaking of
> "underspecified", the implementation and tests for
> __libc_register_thread_exit_callback() disagree about whether or not
> the _main_ thread exiting should cause the callbacks to be run. (the
> implementation only calls them for pthread_create() threads, but the
> test creates no threads and expects the callbacks to have run.)
>
> given that there's no argument to the callback to identify _which_
> thread exited, i'm not entirely sure how this was intended to be used
> anyway? is it useful to know "something exited"? (afaict no llvm
> sanitizer is using this.)
>

This was meant to allow tool metadata cleanup for when a secondary thread
exits, replacing the ugly PTHREAD_DESTRUCTOR_ITERATIONS hack, with a
guarantee that no instrumented code will run on that thread after. It looks
like the compiler-rt side was never implemented.

I think our primary interest in this API was always in the dynamic tls
hooks.


> > > (i don't think we could just use malloc() because jemalloc -- which we
> > > still haven't fully removed for very low-end users in favor of scudo
> > > -- was itself using TLS thread locals? scudo has its own reserved
> > > constant slot in bionic, so that should be fine, i think, but we're
> > > not there yet.)
> >
> > Yes, that's the TCB bootstrap problem I mentioned.  It's only relevant
> > to the initial thread.  There is no functional requirement to allocate
> > TCBs directly on the thread they are used for.  So once we solve the TCB
> > issue for the main thread (and have a mode that uses malloc otherwise),
> > I think we are done.
> >
> > Thanks,
> > Florian
> >
>

aarch64: Remove ld.so __tls_get_addr plt usage

Commit Message

Comments

Patch