Message ID | 20240215203241.1938288-2-hjl.tools@gmail.com |
---|---|
State | New |
Headers | show |
Series | x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers | expand |
* H. J. Lu: > Add APX registers to STATE_SAVE_MASK so that APX and Tile registers are > saved in ld.so trampoline. This fixes BZ #31371. First APX is confusing? What's the impact on xsave_state_size and xsave_state_full_size? I'm worried that the loader trampoline now overflows small stacks. We used to have such problems in the past. Thanks, Florian
* Florian Weimer: > * H. J. Lu: > >> Add APX registers to STATE_SAVE_MASK so that APX and Tile registers are >> saved in ld.so trampoline. This fixes BZ #31371. > > First APX is confusing? > > What's the impact on xsave_state_size and xsave_state_full_size? > I'm worried that the loader trampoline now overflows small stacks. > We used to have such problems in the past. It adds 8,256 bytes on one AMX-capable machine. This will impact compatibility with existing software. Stack usage is increased even if the application does not use AMX at all. I suggest to use a mechanism like STO_AARCH64_VARIANT_PCS to deal with AMX, and only add the APX flag for now. Thanks, Florian
On Thu, Feb 15, 2024 at 1:39 PM Florian Weimer <fweimer@redhat.com> wrote: > > * Florian Weimer: > > > * H. J. Lu: > > > >> Add APX registers to STATE_SAVE_MASK so that APX and Tile registers are > >> saved in ld.so trampoline. This fixes BZ #31371. > > > > First APX is confusing? What do you suggest? > > > > What's the impact on xsave_state_size and xsave_state_full_size? > > I'm worried that the loader trampoline now overflows small stacks. > > We used to have such problems in the past. > > It adds 8,256 bytes on one AMX-capable machine. This will impact > compatibility with existing software. Stack usage is increased even if > the application does not use AMX at all. Also AMX may be enabled much later. > I suggest to use a mechanism like STO_AARCH64_VARIANT_PCS to deal with > AMX, and only add the APX flag for now. > I will drop Tile registers.
diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h index 85d0a8c943..4da05f4b32 100644 --- a/sysdeps/x86/sysdep.h +++ b/sysdeps/x86/sysdep.h @@ -21,14 +21,56 @@ #include <sysdeps/generic/sysdep.h> +/* The extended state feature IDs in the state component bitmap. */ +#define X86_XSTATE_X87_ID 0 +#define X86_XSTATE_SSE_ID 1 +#define X86_XSTATE_AVX_ID 2 +#define X86_XSTATE_BNDREGS_ID 3 +#define X86_XSTATE_BNDCFG_ID 4 +#define X86_XSTATE_K_ID 5 +#define X86_XSTATE_ZMM_H_ID 6 +#define X86_XSTATE_ZMM_ID 7 +#define X86_XSTATE_PKRU_ID 9 +#define X86_XSTATE_TILECFG_ID 17 +#define X86_XSTATE_TILEDATA_ID 18 +#define X86_XSTATE_APX_F_ID 19 + +#ifdef __x86_64__ /* Offset for fxsave/xsave area used by _dl_runtime_resolve. Also need space to preserve RCX, RDX, RSI, RDI, R8, R9 and RAX. It must be - aligned to 16 bytes for fxsave and 64 bytes for xsave. */ -#define STATE_SAVE_OFFSET (8 * 7 + 8) - -/* Save SSE, AVX, AVX512, mask and bound registers. */ -#define STATE_SAVE_MASK \ - ((1 << 1) | (1 << 2) | (1 << 3) | (1 << 5) | (1 << 6) | (1 << 7)) + aligned to 16 bytes for fxsave and 64 bytes for xsave. + + NB: Is is non-zero because of the 128-byte red-zone. Some registers + are saved on stack without adjusting stack pointer first. When we + update stack pointer to allocate more space, we need to take the + red-zone into account. */ +# define STATE_SAVE_OFFSET (8 * 7 + 8) + +/* Save SSE, AVX, AVX512, mask, bound and APX registers. Bound and APX + registers are mutually exclusive. */ +# define STATE_SAVE_MASK \ + ((1 << X86_XSTATE_SSE_ID) \ + | (1 << X86_XSTATE_AVX_ID) \ + | (1 << X86_XSTATE_BNDREGS_ID) \ + | (1 << X86_XSTATE_K_ID) \ + | (1 << X86_XSTATE_ZMM_H_ID) \ + | (1 << X86_XSTATE_ZMM_ID) \ + | (1 << X86_XSTATE_TILECFG_ID) \ + | (1 << X86_XSTATE_TILEDATA_ID) \ + | (1 << X86_XSTATE_APX_F_ID)) +#else +/* Offset for fxsave/xsave area used by _dl_tlsdesc_dynamic. Since i386 + doesn't have red-zone, use 0 here. */ +# define STATE_SAVE_OFFSET 0 + +/* Save SSE, AVX, AXV512, mask and bound registers. */ +# define STATE_SAVE_MASK \ + ((1 << X86_XSTATE_SSE_ID) \ + | (1 << X86_XSTATE_AVX_ID) \ + | (1 << X86_XSTATE_BNDREGS_ID) \ + | (1 << X86_XSTATE_K_ID) \ + | (1 << X86_XSTATE_ZMM_H_ID)) +#endif /* Constants for bits in __x86_string_control: */