Message ID | 1526571941-9816-1-git-send-email-Dave.Martin@arm.com |
---|---|
State | New |
Headers | show |
Series | [v3] arm64: signal: Report signal frame size to userspace via auxv | expand |
On Thu, May 17, 2018 at 04:45:41PM +0100, Dave Martin wrote: > Stateful CPU architecture extensions may require the signal frame > to grow to a size that exceeds the arch's MINSIGSTKSZ #define. > However, changing this #define is an ABI break. > > To allow userspace the option of determining the signal frame size > in a more forwards-compatible way, this patch adds a new auxv entry > tagged with AT_MINSIGSTKSZ, which provides the maximum signal frame > size that the process can observe during its lifetime. > > If AT_MINSIGSTKSZ is absent from the aux vector, the caller can > assume that the MINSIGSTKSZ #define is sufficient. This allows for > a consistent interface with older kernels that do not provide > AT_MINSIGSTKSZ. > > The idea is that libc could expose this via sysconf() or some > similar mechanism. > > There is deliberately no AT_SIGSTKSZ. The kernel knows nothing > about userspace's own stack overheads and should not pretend to > know. I'm really not sure I follow your logic here. POSIX requirements are here: http://pubs.opengroup.org/onlinepubs/000095399/functions/sigaltstack.html and the requirement there is that the MINSIGSTKSZ constant is defined in signal.h to indicate to user programs the minimum signal stack size that the system requires. I don't see how passing the minimum signal stack size via AT_MINSIGSTKSZ helps in any way, since you propose to make programs use a sysconf() call to get that, and that is not covered by POSIX. So you're asking programs to do something special for ARM64. Simply increasing MINSIGSTKSZ doesn't cause an ABI break - new programs built against an increased MINSIGSTKSZ results in more stack being allocated, which doesn't break the ABI in any way. The problem comes when old programs built with the old MINSIGSTKSZ are run against a kernel requiring a larger MINSIGSTKSZ. It's almost the reverse problem - the kernel needs to know the MINSIGSTKSZ value that the problem was built with, but we don't have that facility either. > For arm64: > > The primary motivation for this interface is the Scalable Vector > Extension, which can require at least 4KB or so of extra space > in the signal frame for the largest hardware implementations. Presumably you only include the SVE state if the application makes use of SVE? Otherwise, you'd be saving and restoring a lot of state for features that are not being used. I suppose part of the issue is that SVE is supported but MINSIGSTKSZ is incorrect if this state has to be saved and restored, so there's apps out there using SVE with the too-small MINSIGSTKSZ value?
On Thu, May 17, 2018 at 05:25:32PM +0100, Russell King - ARM Linux wrote: > On Thu, May 17, 2018 at 04:45:41PM +0100, Dave Martin wrote: > > Stateful CPU architecture extensions may require the signal frame > > to grow to a size that exceeds the arch's MINSIGSTKSZ #define. > > However, changing this #define is an ABI break. > > > > To allow userspace the option of determining the signal frame size > > in a more forwards-compatible way, this patch adds a new auxv entry > > tagged with AT_MINSIGSTKSZ, which provides the maximum signal frame > > size that the process can observe during its lifetime. > > > > If AT_MINSIGSTKSZ is absent from the aux vector, the caller can > > assume that the MINSIGSTKSZ #define is sufficient. This allows for > > a consistent interface with older kernels that do not provide > > AT_MINSIGSTKSZ. > > > > The idea is that libc could expose this via sysconf() or some > > similar mechanism. > > > > There is deliberately no AT_SIGSTKSZ. The kernel knows nothing > > about userspace's own stack overheads and should not pretend to > > know. > > I'm really not sure I follow your logic here. > > POSIX requirements are here: > > http://pubs.opengroup.org/onlinepubs/000095399/functions/sigaltstack.html > > and the requirement there is that the MINSIGSTKSZ constant is defined > in signal.h to indicate to user programs the minimum signal stack size > that the system requires. At the birth of an arch, someone has to make a prescient guess about how big the signal frame will ever grow, or risk ABI breaks or new personalities that would require the userspace world to be rebuilt. POSIX doesn't envisage that an arch's user register state can possibly grow (or at least, not that much). Unfortunately, predicting the future isn't that easy. MINSIGSTKSZ has been wrong in the past, too. arm64's linux MINSIGSTKSZ was 4K for quite a while even though the arm64 signal frame is always bigger than that. This bug was hidden by a different definition (5K) in glibc that was subsequently backported into the kernel headers. But userspace doesn't use that definition, so this tells us little about how much would break out there if the definition is changed. IIUC, x86's MINSIGSTKSZ (2K) isn't big enough for AVX-512 (possibly not big enough even without AVX-512, though I haven't figured it out). According to Michael Ellerman, powerpc may have a similar issue at some point. > I don't see how passing the minimum signal stack size via AT_MINSIGSTKSZ > helps in any way, since you propose to make programs use a sysconf() > call to get that, and that is not covered by POSIX. So you're asking > programs to do something special for ARM64. My idea is indeed to recommend that this gets hidden behind sysconf(), so that programs can get a sensible value from there without needing to know which architecture they are running on. I have a glibc patch that I intend to post for discussion soon. This would mean something like #include <signal.h> #include <unistd.h> long size; #ifdef _SC_MINSIGSTKSZ size = sysconf(_SC_MINSIGSTKSZ); #else size = MINSIGSTKSZ; #endif Programs would of course have to migrate to this over time. I'm not saying it's a magic bullet. > Simply increasing MINSIGSTKSZ doesn't cause an ABI break - new programs > built against an increased MINSIGSTKSZ results in more stack being > allocated, which doesn't break the ABI in any way. The problem comes Maybe not per se, but if userspace exchanges pointers to stacks across object boundaries and assumes that they are MINSIGSTKSZ in size (say), then a prorgam may disagree with a library about what this size is. And it's hard to guarantee that there is no software abusing MINSIGSTKSZ or using it for dubious purposes such as sizing objects that are not bare stacks. Consider: struct thread { /* ... */ char stack[MINSIGSTKSZ]; /* ... */ }; /* lib.so */ void dup_thread(struct thread *dest, struct thread const *src) { *dest = *src; } /* application */ /* ... */ struct thread t1, t2; dup_thread(&t1, &t2); I don't say whether this kind of thing is a good idea, but POSIX does nothing to forbid it. If lib.so was built more recently and uses the new, larger MINSIGSTKSZ, while the application uses the old, smaller value then the call to dup_thread will trigger a buffer overflow. Changing MINSIGSTKSZ also papers over the problem of ucontext_t perhaps not covering the whole signal frame. If ucontexts ar exchanged across object boundaries that use different definitions of the type, then buffer overruns could easily happen. If ucontext_t is not redefined, part of the context will fall outside it. I plan to propose some ucontext API extensions for glibc to help mitigate this, but again, software would need to be ported to use them. > when old programs built with the old MINSIGSTKSZ are run against a > kernel requiring a larger MINSIGSTKSZ. It's almost the reverse problem > - the kernel needs to know the MINSIGSTKSZ value that the problem was > built with, but we don't have that facility either. > > > For arm64: > > > > The primary motivation for this interface is the Scalable Vector > > Extension, which can require at least 4KB or so of extra space > > in the signal frame for the largest hardware implementations. > > Presumably you only include the SVE state if the application makes use > of SVE? Otherwise, you'd be saving and restoring a lot of state for > features that are not being used. Yes. A program has to actually execute an SVE instruction in order for the full SVE register values to be context switched or included in the signal frame. > I suppose part of the issue is that SVE is supported but MINSIGSTKSZ > is incorrect if this state has to be saved and restored, so there's > apps out there using SVE with the too-small MINSIGSTKSZ value? There's no hardware yet, so there should be no programs in the wild. If we could simply change MINSIGSTKSZ, that would be great. But redefining ucontext_t is more of a problem, and the two are rather interrelated. My current approach is to hide this from software by default, by limiting the SVE vector length to a value small enough that the SVE state fits in the original (5K-ish) arm64 signal frame. Only if the distro/admin decides that it is safe to bump up this default, or if an application explicitly asks for a larger size via a prctl() is this limit increased. This is not ideal, but there didn't seem to be any ideal solution. In practice, MINSIGSTKSZ is hard to use correctly, and most programs use SIGSTKSZ instead. As luck would have it, arm64's SIGSTKSZ is big enough to cover the largest possible SVE signal frame. The first round of SVE implementations are unlikely to exceed a vector size of 512 bits, which again hides the problem. All this buys some time for arm64 at least. This patch is more about trying to find a better approach for the future. If there's a better option available, I'd love to hear about it! Cheers ---Dave
Hi Dave, On Thu, May 17, 2018 at 04:45:41PM +0100, Dave Martin wrote: > Stateful CPU architecture extensions may require the signal frame > to grow to a size that exceeds the arch's MINSIGSTKSZ #define. > However, changing this #define is an ABI break. [...] > For arm64 SVE: > > The SVE context block in the signal frame needs to be considered > too when computing the maximum possible signal frame size. > > Because the size of this block depends on the vector length, this > patch computes the size based not on the thread's current vector > length but instead on the maximum possible vector length: this > determines the maximum size of SVE context block that can be > observed in any signal frame for the lifetime of the process. > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Cc: Alex Bennée <alex.bennee@linaro.org> > > --- > > Changes since v2: > > * Redefine AT_MINSIGSTKSZ as 51 to avoid clash with values defined by > other architectures. > > This turns out to be a problem for glibc; also random userspace > software does not necessary check the architecture before using > getauxval() or otherwise parsing the aux vector, which can make > aliased tags problematic. > > Really, the headers need cleaning up tree-wide in such away that the > AT_* definitions do not appear to be arch-private. To be addressed > separately. > --- > arch/arm64/include/asm/elf.h | 11 ++++++++ > arch/arm64/include/asm/processor.h | 5 ++++ > arch/arm64/include/uapi/asm/auxvec.h | 3 ++- > arch/arm64/kernel/cpufeature.c | 1 + > arch/arm64/kernel/signal.c | 51 +++++++++++++++++++++++++++++++----- > 5 files changed, 63 insertions(+), 8 deletions(-) > > diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h > index fac1c4d..dc32adb 100644 > --- a/arch/arm64/include/asm/elf.h > +++ b/arch/arm64/include/asm/elf.h > @@ -24,6 +24,11 @@ > #include <asm/ptrace.h> > #include <asm/user.h> > > +#ifndef __ASSEMBLY__ > +#include <linux/bug.h> > +#include <asm/processor.h> /* for signal_minsigstksz, used by ARCH_DLINFO */ > +#endif Maybe move these inside the pre-existing #ifndef __ASSEMBLY__ block. > /* > * AArch64 static relocation types. > */ > @@ -146,8 +151,14 @@ typedef struct user_fpsimd_state elf_fpregset_t; > /* update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT entries changes */ > #define ARCH_DLINFO \ > do { \ > + int minsigstksz = signal_minsigstksz; \ > + \ > + if (WARN_ON(minsigstksz <= 0)) \ > + minsigstksz = MINSIGSTKSZ; \ > + \ How can this happen? > NEW_AUX_ENT(AT_SYSINFO_EHDR, \ > (elf_addr_t)current->mm->context.vdso); \ > + NEW_AUX_ENT(AT_MINSIGSTKSZ, minsigstksz); \ > } while (0) > > #define ARCH_HAS_SETUP_ADDITIONAL_PAGES > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h > index 7675989..6f60e92 100644 > --- a/arch/arm64/include/asm/processor.h > +++ b/arch/arm64/include/asm/processor.h > @@ -35,6 +35,8 @@ > #ifdef __KERNEL__ > > #include <linux/build_bug.h> > +#include <linux/cache.h> > +#include <linux/init.h> > #include <linux/stddef.h> > #include <linux/string.h> > > @@ -244,6 +246,9 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused); > void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused); > void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused); > > +extern int __ro_after_init signal_minsigstksz; /* user signal frame size */ Probably better as unsigned long, to be consistent with the size field of the sigframe user layout structure. > +extern void __init minsigstksz_setup(void); > + > /* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */ > #define SVE_SET_VL(arg) sve_set_current_vl(arg) > #define SVE_GET_VL() sve_get_current_vl() > diff --git a/arch/arm64/include/uapi/asm/auxvec.h b/arch/arm64/include/uapi/asm/auxvec.h > index ec0a86d..743c0b8 100644 > --- a/arch/arm64/include/uapi/asm/auxvec.h > +++ b/arch/arm64/include/uapi/asm/auxvec.h > @@ -19,7 +19,8 @@ > > /* vDSO location */ > #define AT_SYSINFO_EHDR 33 > +#define AT_MINSIGSTKSZ 51 /* stack needed for signal delivery */ > > -#define AT_VECTOR_SIZE_ARCH 1 /* entries in ARCH_DLINFO */ > +#define AT_VECTOR_SIZE_ARCH 2 /* entries in ARCH_DLINFO */ > > #endif > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 9d1b06d..0e0b53d 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -1619,6 +1619,7 @@ void __init setup_cpu_features(void) > pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n"); > > sve_setup(); > + minsigstksz_setup(); > > /* Advertise that we have computed the system capabilities */ > set_sys_caps_initialised(); > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c > index 154b7d3..ae8d4ea 100644 > --- a/arch/arm64/kernel/signal.c > +++ b/arch/arm64/kernel/signal.c > @@ -17,6 +17,7 @@ > * along with this program. If not, see <http://www.gnu.org/licenses/>. > */ > > +#include <linux/cache.h> > #include <linux/compat.h> > #include <linux/errno.h> > #include <linux/kernel.h> > @@ -570,8 +571,15 @@ asmlinkage long sys_rt_sigreturn(struct pt_regs *regs) > return 0; > } > > -/* Determine the layout of optional records in the signal frame */ > -static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) > +/* > + * Determine the layout of optional records in the signal frame > + * > + * add_all: if true, lays out the biggest possible signal frame for > + * this task; otherwise, generates a layout for the current state > + * of the task. > + */ > +static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, > + bool add_all) > { > int err; > > @@ -581,7 +589,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) > return err; > > /* fault information, if valid */ > - if (current->thread.fault_code) { > + if (add_all || current->thread.fault_code) { > err = sigframe_alloc(user, &user->esr_offset, > sizeof(struct esr_context)); > if (err) > @@ -591,8 +599,18 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) > if (system_supports_sve()) { > unsigned int vq = 0; > > - if (test_thread_flag(TIF_SVE)) > - vq = sve_vq_from_vl(current->thread.sve_vl); > + if (add_all || test_thread_flag(TIF_SVE)) { > + int vl = sve_max_vl; > + > + if (!add_all) > + vl = current->thread.sve_vl; > + > + /* Fail safe if something wasn't initialised */ > + if (WARN_ON(!sve_vl_valid(vl))) > + vl = SVE_VL_MIN; How can this happen? > + > + vq = sve_vq_from_vl(vl); > + } > > err = sigframe_alloc(user, &user->sve_offset, > SVE_SIG_CONTEXT_SIZE(vq)); > @@ -603,7 +621,6 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) > return sigframe_alloc_end(user); > } > > - > static int setup_sigframe(struct rt_sigframe_user_layout *user, > struct pt_regs *regs, sigset_t *set) > { > @@ -701,7 +718,7 @@ static int get_sigframe(struct rt_sigframe_user_layout *user, > int err; > > init_user_layout(user); > - err = setup_sigframe_layout(user); > + err = setup_sigframe_layout(user, false); > if (err) > return err; > > @@ -936,3 +953,23 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, > thread_flags = READ_ONCE(current_thread_info()->flags); > } while (thread_flags & _TIF_WORK_MASK); > } > + > +int __ro_after_init signal_minsigstksz; > + > +/* > + * Determine the stack space required for guaranteed signal devliery. > + * This function is used to populate AT_MINSIGSTKSZ at process startup. > + */ > +void __init minsigstksz_setup(void) > +{ > + struct rt_sigframe_user_layout user; > + > + init_user_layout(&user); > + > + if (WARN_ON(setup_sigframe_layout(&user, true))) > + signal_minsigstksz = SIGSTKSZ; Why not just omit the aux record in this case? Something has gone badly wrong, so it's unlikely we're going to get much further anyway. Will
On Tue, May 22, 2018 at 06:19:16PM +0100, Will Deacon wrote: > Hi Dave, > > On Thu, May 17, 2018 at 04:45:41PM +0100, Dave Martin wrote: > > Stateful CPU architecture extensions may require the signal frame > > to grow to a size that exceeds the arch's MINSIGSTKSZ #define. > > However, changing this #define is an ABI break. > > [...] > > > For arm64 SVE: > > > > The SVE context block in the signal frame needs to be considered > > too when computing the maximum possible signal frame size. > > > > Because the size of this block depends on the vector length, this > > patch computes the size based not on the thread's current vector > > length but instead on the maximum possible vector length: this > > determines the maximum size of SVE context block that can be > > observed in any signal frame for the lifetime of the process. > > > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: Will Deacon <will.deacon@arm.com> > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > Cc: Alex Bennée <alex.bennee@linaro.org> > > > > --- > > > > Changes since v2: > > > > * Redefine AT_MINSIGSTKSZ as 51 to avoid clash with values defined by > > other architectures. > > > > This turns out to be a problem for glibc; also random userspace > > software does not necessary check the architecture before using > > getauxval() or otherwise parsing the aux vector, which can make > > aliased tags problematic. > > > > Really, the headers need cleaning up tree-wide in such away that the > > AT_* definitions do not appear to be arch-private. To be addressed > > separately. > > --- > > arch/arm64/include/asm/elf.h | 11 ++++++++ > > arch/arm64/include/asm/processor.h | 5 ++++ > > arch/arm64/include/uapi/asm/auxvec.h | 3 ++- > > arch/arm64/kernel/cpufeature.c | 1 + > > arch/arm64/kernel/signal.c | 51 +++++++++++++++++++++++++++++++----- > > 5 files changed, 63 insertions(+), 8 deletions(-) > > > > diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h > > index fac1c4d..dc32adb 100644 > > --- a/arch/arm64/include/asm/elf.h > > +++ b/arch/arm64/include/asm/elf.h > > @@ -24,6 +24,11 @@ > > #include <asm/ptrace.h> > > #include <asm/user.h> > > > > +#ifndef __ASSEMBLY__ > > +#include <linux/bug.h> > > +#include <asm/processor.h> /* for signal_minsigstksz, used by ARCH_DLINFO */ > > +#endif > > Maybe move these inside the pre-existing #ifndef __ASSEMBLY__ block. Mark make the same point. Can do. > > /* > > * AArch64 static relocation types. > > */ > > @@ -146,8 +151,14 @@ typedef struct user_fpsimd_state elf_fpregset_t; > > /* update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT entries changes */ > > #define ARCH_DLINFO \ > > do { \ > > + int minsigstksz = signal_minsigstksz; \ > > + \ > > + if (WARN_ON(minsigstksz <= 0)) \ > > + minsigstksz = MINSIGSTKSZ; \ > > + \ > > How can this happen? It can't. minsigstksz == 0 means that it was not initialised yet. This is a sanity-check for something that is currently guaranteed by the way the code is structured. See the related comment on minsigstksz_setup(). Perhaps this should be a WARN_ON_ONCE(), with omission of the record. Looking at it again, I don't think we need a WARN both here and in minsigstksz_setup(). If minsigstksz_setup() goes wrong, we could leave signal_minsigstksz as 0 and then we'd the WARN here anyway. > > > NEW_AUX_ENT(AT_SYSINFO_EHDR, \ > > (elf_addr_t)current->mm->context.vdso); \ > > + NEW_AUX_ENT(AT_MINSIGSTKSZ, minsigstksz); \ > > } while (0) > > > > #define ARCH_HAS_SETUP_ADDITIONAL_PAGES > > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h > > index 7675989..6f60e92 100644 > > --- a/arch/arm64/include/asm/processor.h > > +++ b/arch/arm64/include/asm/processor.h > > @@ -35,6 +35,8 @@ > > #ifdef __KERNEL__ > > > > #include <linux/build_bug.h> > > +#include <linux/cache.h> > > +#include <linux/init.h> > > #include <linux/stddef.h> > > #include <linux/string.h> > > > > @@ -244,6 +246,9 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused); > > void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused); > > void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused); > > > > +extern int __ro_after_init signal_minsigstksz; /* user signal frame size */ > > Probably better as unsigned long, to be consistent with the size field > of the sigframe user layout structure. Yes, that probably makes sense. I think the "int" dates back to when I had a prctl() call for returning this to userspace, since int is the libc return type for prctl. auxv entries are effectively unsigned long / uint64_t / whatever you like to call it, so unsigned long would make sense here now. > > > +extern void __init minsigstksz_setup(void); > > + > > /* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */ > > #define SVE_SET_VL(arg) sve_set_current_vl(arg) > > #define SVE_GET_VL() sve_get_current_vl() > > diff --git a/arch/arm64/include/uapi/asm/auxvec.h b/arch/arm64/include/uapi/asm/auxvec.h > > index ec0a86d..743c0b8 100644 > > --- a/arch/arm64/include/uapi/asm/auxvec.h > > +++ b/arch/arm64/include/uapi/asm/auxvec.h > > @@ -19,7 +19,8 @@ > > > > /* vDSO location */ > > #define AT_SYSINFO_EHDR 33 > > +#define AT_MINSIGSTKSZ 51 /* stack needed for signal delivery */ > > > > -#define AT_VECTOR_SIZE_ARCH 1 /* entries in ARCH_DLINFO */ > > +#define AT_VECTOR_SIZE_ARCH 2 /* entries in ARCH_DLINFO */ > > > > #endif > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > index 9d1b06d..0e0b53d 100644 > > --- a/arch/arm64/kernel/cpufeature.c > > +++ b/arch/arm64/kernel/cpufeature.c > > @@ -1619,6 +1619,7 @@ void __init setup_cpu_features(void) > > pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n"); > > > > sve_setup(); > > + minsigstksz_setup(); > > > > /* Advertise that we have computed the system capabilities */ > > set_sys_caps_initialised(); > > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c > > index 154b7d3..ae8d4ea 100644 > > --- a/arch/arm64/kernel/signal.c > > +++ b/arch/arm64/kernel/signal.c > > @@ -17,6 +17,7 @@ > > * along with this program. If not, see <http://www.gnu.org/licenses/>. > > */ > > > > +#include <linux/cache.h> > > #include <linux/compat.h> > > #include <linux/errno.h> > > #include <linux/kernel.h> > > @@ -570,8 +571,15 @@ asmlinkage long sys_rt_sigreturn(struct pt_regs *regs) > > return 0; > > } > > > > -/* Determine the layout of optional records in the signal frame */ > > -static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) > > +/* > > + * Determine the layout of optional records in the signal frame > > + * > > + * add_all: if true, lays out the biggest possible signal frame for > > + * this task; otherwise, generates a layout for the current state > > + * of the task. > > + */ > > +static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, > > + bool add_all) > > { > > int err; > > > > @@ -581,7 +589,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) > > return err; > > > > /* fault information, if valid */ > > - if (current->thread.fault_code) { > > + if (add_all || current->thread.fault_code) { > > err = sigframe_alloc(user, &user->esr_offset, > > sizeof(struct esr_context)); > > if (err) > > @@ -591,8 +599,18 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) > > if (system_supports_sve()) { > > unsigned int vq = 0; > > > > - if (test_thread_flag(TIF_SVE)) > > - vq = sve_vq_from_vl(current->thread.sve_vl); > > + if (add_all || test_thread_flag(TIF_SVE)) { > > + int vl = sve_max_vl; > > + > > + if (!add_all) > > + vl = current->thread.sve_vl; > > + > > + /* Fail safe if something wasn't initialised */ > > + if (WARN_ON(!sve_vl_valid(vl))) > > + vl = SVE_VL_MIN; > > How can this happen? It can't. It is a sanity-check that things were set up in the right order. To fall foul of this, the cpufeatures setup would need not to have been completed yet. For now, this is impossible by construction, because this should only be called for user tasks, or from the end of cpufeatures setup via minsigstksz_setup(). This WARN_ON() is a defence against future refactoring breaking these assumptions. I can elaborate the comment if you like, but I think it's a good idea to keep some kind of check. > > > + > > + vq = sve_vq_from_vl(vl); > > + } > > > > err = sigframe_alloc(user, &user->sve_offset, > > SVE_SIG_CONTEXT_SIZE(vq)); > > @@ -603,7 +621,6 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) > > return sigframe_alloc_end(user); > > } > > > > - > > static int setup_sigframe(struct rt_sigframe_user_layout *user, > > struct pt_regs *regs, sigset_t *set) > > { > > @@ -701,7 +718,7 @@ static int get_sigframe(struct rt_sigframe_user_layout *user, > > int err; > > > > init_user_layout(user); > > - err = setup_sigframe_layout(user); > > + err = setup_sigframe_layout(user, false); > > if (err) > > return err; > > > > @@ -936,3 +953,23 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, > > thread_flags = READ_ONCE(current_thread_info()->flags); > > } while (thread_flags & _TIF_WORK_MASK); > > } > > + > > +int __ro_after_init signal_minsigstksz; > > + > > +/* > > + * Determine the stack space required for guaranteed signal devliery. > > + * This function is used to populate AT_MINSIGSTKSZ at process startup. > > + */ > > +void __init minsigstksz_setup(void) > > +{ > > + struct rt_sigframe_user_layout user; > > + > > + init_user_layout(&user); > > + > > + if (WARN_ON(setup_sigframe_layout(&user, true))) > > + signal_minsigstksz = SIGSTKSZ; > > Why not just omit the aux record in this case? Something has gone badly > wrong, so it's unlikely we're going to get much further anyway. It wasn't clear to me whether arch auxv entries can be optional. But it looks like binfmt_elf.c:create_elf_table() fills any unused tail of the aux vector for AT_NULL, which should fix that. Since my recommendation would be to assume MINSIGSTKSZ if the entry isn't there, omitting makes sense if we don't have a better guess. Cheers ---Dave
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index fac1c4d..dc32adb 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -24,6 +24,11 @@ #include <asm/ptrace.h> #include <asm/user.h> +#ifndef __ASSEMBLY__ +#include <linux/bug.h> +#include <asm/processor.h> /* for signal_minsigstksz, used by ARCH_DLINFO */ +#endif + /* * AArch64 static relocation types. */ @@ -146,8 +151,14 @@ typedef struct user_fpsimd_state elf_fpregset_t; /* update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT entries changes */ #define ARCH_DLINFO \ do { \ + int minsigstksz = signal_minsigstksz; \ + \ + if (WARN_ON(minsigstksz <= 0)) \ + minsigstksz = MINSIGSTKSZ; \ + \ NEW_AUX_ENT(AT_SYSINFO_EHDR, \ (elf_addr_t)current->mm->context.vdso); \ + NEW_AUX_ENT(AT_MINSIGSTKSZ, minsigstksz); \ } while (0) #define ARCH_HAS_SETUP_ADDITIONAL_PAGES diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 7675989..6f60e92 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -35,6 +35,8 @@ #ifdef __KERNEL__ #include <linux/build_bug.h> +#include <linux/cache.h> +#include <linux/init.h> #include <linux/stddef.h> #include <linux/string.h> @@ -244,6 +246,9 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused); void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused); void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused); +extern int __ro_after_init signal_minsigstksz; /* user signal frame size */ +extern void __init minsigstksz_setup(void); + /* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */ #define SVE_SET_VL(arg) sve_set_current_vl(arg) #define SVE_GET_VL() sve_get_current_vl() diff --git a/arch/arm64/include/uapi/asm/auxvec.h b/arch/arm64/include/uapi/asm/auxvec.h index ec0a86d..743c0b8 100644 --- a/arch/arm64/include/uapi/asm/auxvec.h +++ b/arch/arm64/include/uapi/asm/auxvec.h @@ -19,7 +19,8 @@ /* vDSO location */ #define AT_SYSINFO_EHDR 33 +#define AT_MINSIGSTKSZ 51 /* stack needed for signal delivery */ -#define AT_VECTOR_SIZE_ARCH 1 /* entries in ARCH_DLINFO */ +#define AT_VECTOR_SIZE_ARCH 2 /* entries in ARCH_DLINFO */ #endif diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9d1b06d..0e0b53d 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1619,6 +1619,7 @@ void __init setup_cpu_features(void) pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n"); sve_setup(); + minsigstksz_setup(); /* Advertise that we have computed the system capabilities */ set_sys_caps_initialised(); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 154b7d3..ae8d4ea 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -17,6 +17,7 @@ * along with this program. If not, see <http://www.gnu.org/licenses/>. */ +#include <linux/cache.h> #include <linux/compat.h> #include <linux/errno.h> #include <linux/kernel.h> @@ -570,8 +571,15 @@ asmlinkage long sys_rt_sigreturn(struct pt_regs *regs) return 0; } -/* Determine the layout of optional records in the signal frame */ -static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) +/* + * Determine the layout of optional records in the signal frame + * + * add_all: if true, lays out the biggest possible signal frame for + * this task; otherwise, generates a layout for the current state + * of the task. + */ +static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, + bool add_all) { int err; @@ -581,7 +589,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) return err; /* fault information, if valid */ - if (current->thread.fault_code) { + if (add_all || current->thread.fault_code) { err = sigframe_alloc(user, &user->esr_offset, sizeof(struct esr_context)); if (err) @@ -591,8 +599,18 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) if (system_supports_sve()) { unsigned int vq = 0; - if (test_thread_flag(TIF_SVE)) - vq = sve_vq_from_vl(current->thread.sve_vl); + if (add_all || test_thread_flag(TIF_SVE)) { + int vl = sve_max_vl; + + if (!add_all) + vl = current->thread.sve_vl; + + /* Fail safe if something wasn't initialised */ + if (WARN_ON(!sve_vl_valid(vl))) + vl = SVE_VL_MIN; + + vq = sve_vq_from_vl(vl); + } err = sigframe_alloc(user, &user->sve_offset, SVE_SIG_CONTEXT_SIZE(vq)); @@ -603,7 +621,6 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) return sigframe_alloc_end(user); } - static int setup_sigframe(struct rt_sigframe_user_layout *user, struct pt_regs *regs, sigset_t *set) { @@ -701,7 +718,7 @@ static int get_sigframe(struct rt_sigframe_user_layout *user, int err; init_user_layout(user); - err = setup_sigframe_layout(user); + err = setup_sigframe_layout(user, false); if (err) return err; @@ -936,3 +953,23 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, thread_flags = READ_ONCE(current_thread_info()->flags); } while (thread_flags & _TIF_WORK_MASK); } + +int __ro_after_init signal_minsigstksz; + +/* + * Determine the stack space required for guaranteed signal devliery. + * This function is used to populate AT_MINSIGSTKSZ at process startup. + */ +void __init minsigstksz_setup(void) +{ + struct rt_sigframe_user_layout user; + + init_user_layout(&user); + + if (WARN_ON(setup_sigframe_layout(&user, true))) + signal_minsigstksz = SIGSTKSZ; + else + signal_minsigstksz = sigframe_size(&user) + + round_up(sizeof(struct frame_record), 16) + + 16; /* max alignment padding */ +}
Stateful CPU architecture extensions may require the signal frame to grow to a size that exceeds the arch's MINSIGSTKSZ #define. However, changing this #define is an ABI break. To allow userspace the option of determining the signal frame size in a more forwards-compatible way, this patch adds a new auxv entry tagged with AT_MINSIGSTKSZ, which provides the maximum signal frame size that the process can observe during its lifetime. If AT_MINSIGSTKSZ is absent from the aux vector, the caller can assume that the MINSIGSTKSZ #define is sufficient. This allows for a consistent interface with older kernels that do not provide AT_MINSIGSTKSZ. The idea is that libc could expose this via sysconf() or some similar mechanism. There is deliberately no AT_SIGSTKSZ. The kernel knows nothing about userspace's own stack overheads and should not pretend to know. For arm64: The primary motivation for this interface is the Scalable Vector Extension, which can require at least 4KB or so of extra space in the signal frame for the largest hardware implementations. To determine the correct value, a "Christmas tree" mode (via the add_all argument) is added to setup_sigframe_layout(), to simulate addition of all possible records to the signal frame at maximum possible size. If this procedure goes wrong somehow, resulting in a stupidly large frame layout and hence failure of sigframe_alloc() to allocate a record to the frame, then this is indicative of a kernel bug: the kernel's internal SIGFRAME_MAXSZ is supposed to sanity-check against generting frames that we consider _impossibly_ large. In this case, SIGSTKSZ is returned as a "reasonable guess that is at least bigger than MINSIGSTKSZ" and we WARN(). For arm64 SVE: The SVE context block in the signal frame needs to be considered too when computing the maximum possible signal frame size. Because the size of this block depends on the vector length, this patch computes the size based not on the thread's current vector length but instead on the maximum possible vector length: this determines the maximum size of SVE context block that can be observed in any signal frame for the lifetime of the process. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Alex Bennée <alex.bennee@linaro.org> --- Changes since v2: * Redefine AT_MINSIGSTKSZ as 51 to avoid clash with values defined by other architectures. This turns out to be a problem for glibc; also random userspace software does not necessary check the architecture before using getauxval() or otherwise parsing the aux vector, which can make aliased tags problematic. Really, the headers need cleaning up tree-wide in such away that the AT_* definitions do not appear to be arch-private. To be addressed separately. --- arch/arm64/include/asm/elf.h | 11 ++++++++ arch/arm64/include/asm/processor.h | 5 ++++ arch/arm64/include/uapi/asm/auxvec.h | 3 ++- arch/arm64/kernel/cpufeature.c | 1 + arch/arm64/kernel/signal.c | 51 +++++++++++++++++++++++++++++++----- 5 files changed, 63 insertions(+), 8 deletions(-)