Message ID | 878r2ifdrx.fsf@oldenburg.str.redhat.com |
---|---|
State | New |
Headers | show |
Series | x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501) | expand |
On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec). > This realignment does not take into account that the function has > already used part of the red zone at this point, thus clobbering > the initally saved register values located there if the stack > alignment inherited from the caller is unfortunate. > > (Note: I do not know to write a good test case for this in the existing > framework. We saw this as a random LTO plugin crash when building GCC > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu > with this change here.) Will a different STATE_SAVE_OFFSET for TLS descriptor work? > --- > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h > index 9f02cfc3eb..8e49e7eece 100644 > --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h > @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic: > 2: > #if DL_RUNTIME_RESOLVE_REALIGN_STACK > movq %rbx, -24(%rsp) > + subq $24, %rsp > + cfi_adjust_cfa_offset(24) > mov %RSP_LP, %RBX_LP > cfi_def_cfa_register(%rbx) > and $-STATE_SAVE_ALIGNMENT, %RSP_LP > @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic: > #if DL_RUNTIME_RESOLVE_REALIGN_STACK > mov %RBX_LP, %RSP_LP > cfi_def_cfa_register(%rsp) > + addq $24, %rsp > + cfi_adjust_cfa_offset(-24) > movq -24(%rsp), %rbx > cfi_restore(%rbx) > #else > > base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68 >
On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec). > > This realignment does not take into account that the function has > > already used part of the red zone at this point, thus clobbering > > the initally saved register values located there if the stack > > alignment inherited from the caller is unfortunate. > > > > (Note: I do not know to write a good test case for this in the existing > > framework. We saw this as a random LTO plugin crash when building GCC > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu > > with this change here.) > > Will a different STATE_SAVE_OFFSET for TLS descriptor work? Correction. REGISTER_SAVE_AREA is for this purpose. Will a different value for TLS descriptor work? > > --- > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h > > index 9f02cfc3eb..8e49e7eece 100644 > > --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h > > @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic: > > 2: > > #if DL_RUNTIME_RESOLVE_REALIGN_STACK > > movq %rbx, -24(%rsp) > > + subq $24, %rsp > > + cfi_adjust_cfa_offset(24) > > mov %RSP_LP, %RBX_LP > > cfi_def_cfa_register(%rbx) > > and $-STATE_SAVE_ALIGNMENT, %RSP_LP > > @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic: > > #if DL_RUNTIME_RESOLVE_REALIGN_STACK > > mov %RBX_LP, %RSP_LP > > cfi_def_cfa_register(%rsp) > > + addq $24, %rsp > > + cfi_adjust_cfa_offset(-24) > > movq -24(%rsp), %rbx > > cfi_restore(%rbx) > > #else > > > > base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68 > > > > > -- > H.J.
On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is > > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec). > > > This realignment does not take into account that the function has > > > already used part of the red zone at this point, thus clobbering > > > the initally saved register values located there if the stack > > > alignment inherited from the caller is unfortunate. > > > > > > (Note: I do not know to write a good test case for this in the existing > > > framework. We saw this as a random LTO plugin crash when building GCC > > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu > > > with this change here.) We should try to find a testcase. Can you provide a backtrace when it happens? It should be possible to write a testcase with the backtrace. > > Will a different STATE_SAVE_OFFSET for TLS descriptor work? > > Correction. REGISTER_SAVE_AREA is for this purpose. Will a different > value for TLS descriptor work? > > > > --- > > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h > > > index 9f02cfc3eb..8e49e7eece 100644 > > > --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h > > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h > > > @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic: > > > 2: > > > #if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > movq %rbx, -24(%rsp) > > > + subq $24, %rsp > > > + cfi_adjust_cfa_offset(24) > > > mov %RSP_LP, %RBX_LP > > > cfi_def_cfa_register(%rbx) > > > and $-STATE_SAVE_ALIGNMENT, %RSP_LP > > > @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic: > > > #if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > mov %RBX_LP, %RSP_LP > > > cfi_def_cfa_register(%rsp) > > > + addq $24, %rsp > > > + cfi_adjust_cfa_offset(-24) > > > movq -24(%rsp), %rbx > > > cfi_restore(%rbx) > > > #else > > > > > > base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68 > > > > > > > > > -- > > H.J. > > > > -- > H.J. -- H.J.
* H. J. Lu: > On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote: >> >> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote: >> > >> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is >> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec). >> > This realignment does not take into account that the function has >> > already used part of the red zone at this point, thus clobbering >> > the initally saved register values located there if the stack >> > alignment inherited from the caller is unfortunate. >> > >> > (Note: I do not know to write a good test case for this in the existing >> > framework. We saw this as a random LTO plugin crash when building GCC >> > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu >> > with this change here.) >> >> Will a different STATE_SAVE_OFFSET for TLS descriptor work? > > Correction. REGISTER_SAVE_AREA is for this purpose. Will a different > value for TLS descriptor work? I think REGISTER_SAVE_AREA is for the later register saves? This use of the red zone is specific to to the TLS trampoline. The lazy binding trampoline doesn't do that. REGISTER_SAVE_AREA is used by both. Thanks, Florian
On Sat, Mar 16, 2024 at 7:57 AM Florian Weimer <fweimer@redhat.com> wrote: > > * H. J. Lu: > > > On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote: > >> > >> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote: > >> > > >> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is > >> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec). > >> > This realignment does not take into account that the function has > >> > already used part of the red zone at this point, thus clobbering > >> > the initally saved register values located there if the stack > >> > alignment inherited from the caller is unfortunate. > >> > > >> > (Note: I do not know to write a good test case for this in the existing > >> > framework. We saw this as a random LTO plugin crash when building GCC > >> > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu > >> > with this change here.) > >> > >> Will a different STATE_SAVE_OFFSET for TLS descriptor work? > > > > Correction. REGISTER_SAVE_AREA is for this purpose. Will a different > > value for TLS descriptor work? > > I think REGISTER_SAVE_AREA is for the later register saves? > > This use of the red zone is specific to to the TLS trampoline. The lazy > binding trampoline doesn't do that. REGISTER_SAVE_AREA is used by both. > # if DL_RUNTIME_RESOLVE_REALIGN_STACK /* STATE_SAVE_OFFSET has space for 8 integer registers. But we need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus RBX above. */ sub $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP # else sub $REGISTER_SAVE_AREA, %RSP_LP cfi_adjust_cfa_offset(REGISTER_SAVE_AREA) # endif Let's find a testcase first.
* H. J. Lu: > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote: >> >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote: >> > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote: >> > > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec). >> > > This realignment does not take into account that the function has >> > > already used part of the red zone at this point, thus clobbering >> > > the initally saved register values located there if the stack >> > > alignment inherited from the caller is unfortunate. >> > > >> > > (Note: I do not know to write a good test case for this in the existing >> > > framework. We saw this as a random LTO plugin crash when building GCC >> > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu >> > > with this change here.) > > We should try to find a testcase. Can you provide a backtrace when it > happens? It should be possible to write a testcase with the backtrace. In my reproducer, when %rbx is about to be clobbered, I see (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec. The %rbx register does not get clobbered if (%rsp % 64) == 56. Does this help? Thanks, Florian
On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote: > > * H. J. Lu: > > > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote: > >> > >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote: > >> > > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote: > >> > > > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is > >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec). > >> > > This realignment does not take into account that the function has > >> > > already used part of the red zone at this point, thus clobbering > >> > > the initally saved register values located there if the stack > >> > > alignment inherited from the caller is unfortunate. > >> > > > >> > > (Note: I do not know to write a good test case for this in the existing > >> > > framework. We saw this as a random LTO plugin crash when building GCC > >> > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu > >> > > with this change here.) > > > > We should try to find a testcase. Can you provide a backtrace when it > > happens? It should be possible to write a testcase with the backtrace. > > In my reproducer, when %rbx is about to be clobbered, I see > (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec. > > The %rbx register does not get clobbered if (%rsp % 64) == 56. > > Does this help? > Yes. I am working on a testcase.
On Sat, Mar 16, 2024 at 8:18 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > * H. J. Lu: > > > > > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > >> > > >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > >> > > > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > >> > > > > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is > > >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec). > > >> > > This realignment does not take into account that the function has > > >> > > already used part of the red zone at this point, thus clobbering > > >> > > the initally saved register values located there if the stack > > >> > > alignment inherited from the caller is unfortunate. > > >> > > > > >> > > (Note: I do not know to write a good test case for this in the existing > > >> > > framework. We saw this as a random LTO plugin crash when building GCC > > >> > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu > > >> > > with this change here.) > > > > > > We should try to find a testcase. Can you provide a backtrace when it > > > happens? It should be possible to write a testcase with the backtrace. > > > > In my reproducer, when %rbx is about to be clobbered, I see > > (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec. > > > > The %rbx register does not get clobbered if (%rsp % 64) == 56. > > > > Does this help? > > > > Yes. I am working on a testcase. Hi Florian, Please verify if this is the right testcase. Thanks.
On Sat, Mar 16, 2024 at 9:32 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sat, Mar 16, 2024 at 8:18 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > > > * H. J. Lu: > > > > > > > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > >> > > > >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > >> > > > > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > > >> > > > > > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is > > > >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec). > > > >> > > This realignment does not take into account that the function has > > > >> > > already used part of the red zone at this point, thus clobbering > > > >> > > the initally saved register values located there if the stack > > > >> > > alignment inherited from the caller is unfortunate. > > > >> > > > > > >> > > (Note: I do not know to write a good test case for this in the existing > > > >> > > framework. We saw this as a random LTO plugin crash when building GCC > > > >> > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu > > > >> > > with this change here.) > > > > > > > > We should try to find a testcase. Can you provide a backtrace when it > > > > happens? It should be possible to write a testcase with the backtrace. > > > > > > In my reproducer, when %rbx is about to be clobbered, I see > > > (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec. > > > > > > The %rbx register does not get clobbered if (%rsp % 64) == 56. > > > > > > Does this help? > > > > > > > Yes. I am working on a testcase. > > Hi Florian, > > Please verify if this is the right testcase. This test fails only on AVX512 machines.
* H. J. Lu: > Please verify if this is the right testcase. Test case works (fails without my fix, succeeds with my fix). Some comments below. > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > new file mode 100644 > index 0000000000..8129b28061 > --- /dev/null > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > @@ -0,0 +1,57 @@ > + .text > + .p2align 4 > + .globl apply_tls > + .type apply_tls, @function > +apply_tls: > + .cfi_startproc Missing CET marker. > + subq $24, %rsp > + .cfi_def_cfa_offset 32 > + movdqu (%rdi), %xmm0 > + movq %fs:40, %rax > + movq %rax, 8(%rsp) > + xorl %eax, %eax > + leaq tls_var0@TLSDESC(%rip), %rax > + call *tls_var0@TLSCALL(%rax) > + addq %fs:0, %rax > + movups %xmm0, (%rax) > + movdqu 16(%rdi), %xmm1 > + movups %xmm1, 16(%rax) > + movq 8(%rsp), %rdx > + subq %fs:40, %rdx > + jne .L5 > + addq $24, %rsp > + .cfi_remember_state > + .cfi_def_cfa_offset 8 > + ret > +.L5: > + .cfi_restore_state > + call __stack_chk_fail@PLT Not sure if we need this? Maybe add some comment what exactly this subtest is exercising? These are present in the other TLS modules as well. > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > new file mode 100644 > index 0000000000..af4b7ca761 > --- /dev/null > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to > + clobber %rbx. */ > +#define OFFSET (56 + 16 + 16 + 16) > + > + .text > + .p2align 4 > + .globl apply_tls > + .type apply_tls, @function > +apply_tls: > + .cfi_startproc > + pushq %rbp > + .cfi_def_cfa_offset 16 > + .cfi_offset 6, -16 > + movq %rsp, %rbp > + .cfi_def_cfa_register 6 > + /* Align stack to 64 bytes. */ > + andq $-64, %rsp > + pushq %rbx > + subq $OFFSET, %rsp The offset could be loaded from a global variable or something like that. We should exercise a wide range of stack alignments—the individual tests are cheap. And maybe check extra registers. Thanks, Florian
On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote: > > * H. J. Lu: > > > Please verify if this is the right testcase. > > Test case works (fails without my fix, succeeds with my fix). Some > comments below. > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > > new file mode 100644 > > index 0000000000..8129b28061 > > --- /dev/null > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > > @@ -0,0 +1,57 @@ > > > + .text > > + .p2align 4 > > + .globl apply_tls > > + .type apply_tls, @function > > +apply_tls: > > + .cfi_startproc > > Missing CET marker. > > > + subq $24, %rsp > > + .cfi_def_cfa_offset 32 > > + movdqu (%rdi), %xmm0 > > + movq %fs:40, %rax > > + movq %rax, 8(%rsp) > > + xorl %eax, %eax > > + leaq tls_var0@TLSDESC(%rip), %rax > > + call *tls_var0@TLSCALL(%rax) > > + addq %fs:0, %rax > > + movups %xmm0, (%rax) > > + movdqu 16(%rdi), %xmm1 > > + movups %xmm1, 16(%rax) > > + movq 8(%rsp), %rdx > > + subq %fs:40, %rdx > > + jne .L5 > > + addq $24, %rsp > > + .cfi_remember_state > > + .cfi_def_cfa_offset 8 > > + ret > > +.L5: > > + .cfi_restore_state > > + call __stack_chk_fail@PLT > > Not sure if we need this? > > Maybe add some comment what exactly this subtest is exercising? > > These are present in the other TLS modules as well. > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > > new file mode 100644 > > index 0000000000..af4b7ca761 > > --- /dev/null > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to > > + clobber %rbx. */ > > +#define OFFSET (56 + 16 + 16 + 16) > > + > > + .text > > + .p2align 4 > > + .globl apply_tls > > + .type apply_tls, @function > > +apply_tls: > > + .cfi_startproc > > + pushq %rbp > > + .cfi_def_cfa_offset 16 > > + .cfi_offset 6, -16 > > + movq %rsp, %rbp > > + .cfi_def_cfa_register 6 > > + /* Align stack to 64 bytes. */ > > + andq $-64, %rsp > > + pushq %rbx > > + subq $OFFSET, %rsp > > The offset could be loaded from a global variable or something like > that. We should exercise a wide range of stack alignments—the > individual tests are cheap. And maybe check extra registers. I will clean it up with a different fix.
On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > * H. J. Lu: > > > > > Please verify if this is the right testcase. > > > > Test case works (fails without my fix, succeeds with my fix). Some > > comments below. > > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > > > new file mode 100644 > > > index 0000000000..8129b28061 > > > --- /dev/null > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > > > @@ -0,0 +1,57 @@ > > > > > + .text > > > + .p2align 4 > > > + .globl apply_tls > > > + .type apply_tls, @function > > > +apply_tls: > > > + .cfi_startproc > > > > Missing CET marker. > > > > > + subq $24, %rsp > > > + .cfi_def_cfa_offset 32 > > > + movdqu (%rdi), %xmm0 > > > + movq %fs:40, %rax > > > + movq %rax, 8(%rsp) > > > + xorl %eax, %eax > > > + leaq tls_var0@TLSDESC(%rip), %rax > > > + call *tls_var0@TLSCALL(%rax) > > > + addq %fs:0, %rax > > > + movups %xmm0, (%rax) > > > + movdqu 16(%rdi), %xmm1 > > > + movups %xmm1, 16(%rax) > > > + movq 8(%rsp), %rdx > > > + subq %fs:40, %rdx > > > + jne .L5 > > > + addq $24, %rsp > > > + .cfi_remember_state > > > + .cfi_def_cfa_offset 8 > > > + ret > > > +.L5: > > > + .cfi_restore_state > > > + call __stack_chk_fail@PLT > > > > Not sure if we need this? > > > > Maybe add some comment what exactly this subtest is exercising? > > > > These are present in the other TLS modules as well. > > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > > > new file mode 100644 > > > index 0000000000..af4b7ca761 > > > --- /dev/null > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > > > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to > > > + clobber %rbx. */ > > > +#define OFFSET (56 + 16 + 16 + 16) > > > + > > > + .text > > > + .p2align 4 > > > + .globl apply_tls > > > + .type apply_tls, @function > > > +apply_tls: > > > + .cfi_startproc > > > + pushq %rbp > > > + .cfi_def_cfa_offset 16 > > > + .cfi_offset 6, -16 > > > + movq %rsp, %rbp > > > + .cfi_def_cfa_register 6 > > > + /* Align stack to 64 bytes. */ > > > + andq $-64, %rsp > > > + pushq %rbx > > > + subq $OFFSET, %rsp > > > > The offset could be loaded from a global variable or something like > > that. We should exercise a wide range of stack alignments—the > > individual tests are cheap. And maybe check extra registers. > > I will clean it up with a different fix. > I submitted a patch with a testase: https://patchwork.sourceware.org/project/glibc/list/?series=31963 My patch allocates 64 more bytes to avoid clobbering saved RDI, RSI and RBX values on stack by xsave. It avoids 2 stack adjustments. Either my fix or Florian's fix should fix the issue. I don't have a strong preference as long as my testcase is included. Thanks.
On Sat, Mar 16, 2024 at 3:05 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > > > * H. J. Lu: > > > > > > > Please verify if this is the right testcase. > > > > > > Test case works (fails without my fix, succeeds with my fix). Some > > > comments below. > > > > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > > > > new file mode 100644 > > > > index 0000000000..8129b28061 > > > > --- /dev/null > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > > > > @@ -0,0 +1,57 @@ > > > > > > > + .text > > > > + .p2align 4 > > > > + .globl apply_tls > > > > + .type apply_tls, @function > > > > +apply_tls: > > > > + .cfi_startproc > > > > > > Missing CET marker. > > > > > > > + subq $24, %rsp > > > > + .cfi_def_cfa_offset 32 > > > > + movdqu (%rdi), %xmm0 > > > > + movq %fs:40, %rax > > > > + movq %rax, 8(%rsp) > > > > + xorl %eax, %eax > > > > + leaq tls_var0@TLSDESC(%rip), %rax > > > > + call *tls_var0@TLSCALL(%rax) > > > > + addq %fs:0, %rax > > > > + movups %xmm0, (%rax) > > > > + movdqu 16(%rdi), %xmm1 > > > > + movups %xmm1, 16(%rax) > > > > + movq 8(%rsp), %rdx > > > > + subq %fs:40, %rdx > > > > + jne .L5 > > > > + addq $24, %rsp > > > > + .cfi_remember_state > > > > + .cfi_def_cfa_offset 8 > > > > + ret > > > > +.L5: > > > > + .cfi_restore_state > > > > + call __stack_chk_fail@PLT > > > > > > Not sure if we need this? > > > > > > Maybe add some comment what exactly this subtest is exercising? > > > > > > These are present in the other TLS modules as well. > > > > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > > > > new file mode 100644 > > > > index 0000000000..af4b7ca761 > > > > --- /dev/null > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > > > > > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to > > > > + clobber %rbx. */ > > > > +#define OFFSET (56 + 16 + 16 + 16) > > > > + > > > > + .text > > > > + .p2align 4 > > > > + .globl apply_tls > > > > + .type apply_tls, @function > > > > +apply_tls: > > > > + .cfi_startproc > > > > + pushq %rbp > > > > + .cfi_def_cfa_offset 16 > > > > + .cfi_offset 6, -16 > > > > + movq %rsp, %rbp > > > > + .cfi_def_cfa_register 6 > > > > + /* Align stack to 64 bytes. */ > > > > + andq $-64, %rsp > > > > + pushq %rbx > > > > + subq $OFFSET, %rsp > > > > > > The offset could be loaded from a global variable or something like > > > that. We should exercise a wide range of stack alignments—the > > > individual tests are cheap. And maybe check extra registers. > > > > I will clean it up with a different fix. > > > > I submitted a patch with a testase: > > https://patchwork.sourceware.org/project/glibc/list/?series=31963 > > My patch allocates 64 more bytes to avoid clobbering saved RDI, > RSI and RBX values on stack by xsave. It avoids 2 stack > adjustments. Either my fix or Florian's fix should fix the issue. > I don't have a strong preference as long as my testcase is > included. > > I think my testcase may fail on AMD AVX CPUs without the fix. On Intel AVX CPUs, the state size is 960 bytes. But the last 128 bytes may be unused.
On Sat, Mar 16, 2024 at 6:19 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sat, Mar 16, 2024 at 3:05 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > > > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > > > > > * H. J. Lu: > > > > > > > > > Please verify if this is the right testcase. > > > > > > > > Test case works (fails without my fix, succeeds with my fix). Some > > > > comments below. > > > > > > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > > > > > new file mode 100644 > > > > > index 0000000000..8129b28061 > > > > > --- /dev/null > > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S > > > > > @@ -0,0 +1,57 @@ > > > > > > > > > + .text > > > > > + .p2align 4 > > > > > + .globl apply_tls > > > > > + .type apply_tls, @function > > > > > +apply_tls: > > > > > + .cfi_startproc > > > > > > > > Missing CET marker. > > > > > > > > > + subq $24, %rsp > > > > > + .cfi_def_cfa_offset 32 > > > > > + movdqu (%rdi), %xmm0 > > > > > + movq %fs:40, %rax > > > > > + movq %rax, 8(%rsp) > > > > > + xorl %eax, %eax > > > > > + leaq tls_var0@TLSDESC(%rip), %rax > > > > > + call *tls_var0@TLSCALL(%rax) > > > > > + addq %fs:0, %rax > > > > > + movups %xmm0, (%rax) > > > > > + movdqu 16(%rdi), %xmm1 > > > > > + movups %xmm1, 16(%rax) > > > > > + movq 8(%rsp), %rdx > > > > > + subq %fs:40, %rdx > > > > > + jne .L5 > > > > > + addq $24, %rsp > > > > > + .cfi_remember_state > > > > > + .cfi_def_cfa_offset 8 > > > > > + ret > > > > > +.L5: > > > > > + .cfi_restore_state > > > > > + call __stack_chk_fail@PLT > > > > > > > > Not sure if we need this? > > > > > > > > Maybe add some comment what exactly this subtest is exercising? > > > > > > > > These are present in the other TLS modules as well. > > > > > > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > > > > > new file mode 100644 > > > > > index 0000000000..af4b7ca761 > > > > > --- /dev/null > > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S > > > > > > > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to > > > > > + clobber %rbx. */ > > > > > +#define OFFSET (56 + 16 + 16 + 16) > > > > > + > > > > > + .text > > > > > + .p2align 4 > > > > > + .globl apply_tls > > > > > + .type apply_tls, @function > > > > > +apply_tls: > > > > > + .cfi_startproc > > > > > + pushq %rbp > > > > > + .cfi_def_cfa_offset 16 > > > > > + .cfi_offset 6, -16 > > > > > + movq %rsp, %rbp > > > > > + .cfi_def_cfa_register 6 > > > > > + /* Align stack to 64 bytes. */ > > > > > + andq $-64, %rsp > > > > > + pushq %rbx > > > > > + subq $OFFSET, %rsp > > > > > > > > The offset could be loaded from a global variable or something like > > > > that. We should exercise a wide range of stack alignments—the > > > > individual tests are cheap. And maybe check extra registers. > > > > > > I will clean it up with a different fix. > > > > > > > I submitted a patch with a testase: > > > > https://patchwork.sourceware.org/project/glibc/list/?series=31963 > > > > My patch allocates 64 more bytes to avoid clobbering saved RDI, > > RSI and RBX values on stack by xsave. It avoids 2 stack > > adjustments. Either my fix or Florian's fix should fix the issue. > > I don't have a strong preference as long as my testcase is > > included. > > > > > I think my testcase may fail on AMD AVX CPUs without the > fix. On Intel AVX CPUs, the state size is 960 bytes. But the > last 128 bytes may be unused. > I sent out the v2 patch: https://patchwork.sourceware.org/project/glibc/list/?series=31966 to simplify the testcase.
diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h index 9f02cfc3eb..8e49e7eece 100644 --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic: 2: #if DL_RUNTIME_RESOLVE_REALIGN_STACK movq %rbx, -24(%rsp) + subq $24, %rsp + cfi_adjust_cfa_offset(24) mov %RSP_LP, %RBX_LP cfi_def_cfa_register(%rbx) and $-STATE_SAVE_ALIGNMENT, %RSP_LP @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic: #if DL_RUNTIME_RESOLVE_REALIGN_STACK mov %RBX_LP, %RSP_LP cfi_def_cfa_register(%rsp) + addq $24, %rsp + cfi_adjust_cfa_offset(-24) movq -24(%rsp), %rbx cfi_restore(%rbx) #else