Message ID | 5a769da2dcc8e7f9b89fbdbc4bccd0b8a1660309.1731290567.git.thehajime@gmail.com |
---|---|
State | RFC |
Headers | show |
Series | nommu UML | expand |
Hi, On Mon, 2024-11-11 at 15:27 +0900, Hajime Tazaki wrote: > This commit updates the behavior of signal handling under !MMU > environment. 1) the stack preparation for the signal handlers and > 2) restoration of stack after rt_sigreturn(2) syscall. Those are needed > as the stack usage on vfork(2) syscall is different. > > It also adds the follow up routine for SIGSEGV as a signal delivery runs > in the same stack frame while we have to avoid endless SIGSEGV. > > Signed-off-by: Hajime Tazaki <thehajime@gmail.com> > --- > arch/um/include/shared/kern_util.h | 3 +++ > arch/um/kernel/trap.c | 10 ++++++++ > arch/um/os-Linux/signal.c | 18 ++++++++++++++- > arch/x86/um/signal.c | 37 +++++++++++++++++++++++++++++- > 4 files changed, 66 insertions(+), 2 deletions(-) > > [SNIP] > diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c > index 52852018a3ad..a06622415d8f 100644 > --- a/arch/um/os-Linux/signal.c > +++ b/arch/um/os-Linux/signal.c > @@ -36,7 +36,15 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc) > struct uml_pt_regs r; > int save_errno = errno; > > - r.is_user = 0; > +#ifndef CONFIG_MMU > + memset(&r, 0, sizeof(r)); > + /* mark is_user=1 when the IP is from userspace code. */ > + if (mc && (REGS_IP(mc->gregs) > uml_reserved > + && REGS_IP(mc->gregs) < high_physmem)) > + r.is_user = 1; > + else > +#endif > + r.is_user = 0; Does this work if we load modules dynamically? I suppose one could map them into a separate memory area rather than running them directly from the physical memory. Otherwise we'll also get problem with the SECCOMP filter. > if (sig == SIGSEGV) { > /* For segfaults, we want the data from the sigcontext. */ > get_regs_from_mc(&r, mc); > @@ -191,6 +199,7 @@ static void hard_handler(int sig, siginfo_t *si, void *p) > ucontext_t *uc = p; > mcontext_t *mc = &uc->uc_mcontext; > unsigned long pending = 1UL << sig; > + int is_segv = 0; > > do { > int nested, bail; > @@ -214,6 +223,7 @@ static void hard_handler(int sig, siginfo_t *si, void *p) > > while ((sig = ffs(pending)) != 0){ > sig--; > + is_segv = (sig == SIGSEGV) ? 1 : 0; > pending &= ~(1 << sig); > (*handlers[sig])(sig, (struct siginfo *)si, mc); > } > @@ -227,6 +237,12 @@ static void hard_handler(int sig, siginfo_t *si, void *p) > if (!nested) > pending = from_irq_stack(nested); > } while (pending); > + > +#ifndef CONFIG_MMU > + /* if there is SIGSEGV notified, let the userspace run w/ __noreturn */ > + if (is_segv) > + sigsegv_post_routine(); > +#endif > } I am confused, this doesn't feel quite correct to me. So, for normal UML, I think we always do an rt_sigreturn. Which means, we always go back to the corresponding *kernel* task. To schedule in response to SIGALRM, we forward the signal to the userspace process. I believe that means: 1. We cannot schedule kernel threads (that seems like a bug) 2. Scheduling for userspace happens once the signal is delivered. Then userspace() saves the state and calls interrupt_end(). Now, keep in mind that we are on the separate signal stack here. If we jump anywhere directly, we abandon the old state information stored by the host kernel into the mcontext. We can absolutely do that, but we need to be careful to not forget anything. As such, I wonder whether nommu should: 1. When entering from kernel, update "current->thread.switch_buf" from the mcontext. - If we need to schedule, push a stack frame that calls the scheduling code and returns with the correct state. 2. When entering from user, store the task registers from the mcontext. At some point (here or earlier) ensure that the "current->thread.switch_buf" is set up so that we can return to userspace by restoring the task registers. - To schedule, piggy back on 1. or add special code. 3. Always do a UML_LONGJMP() back into the "current" task. That said, I am probably not having the full picture right now. Benjamin PS: On a further note, I think the current code to enter userspace cannot handle single stepping. I suppose that is fine, but you should probably set arch_has_single_step to 0 for nommu. > void set_handler(int sig) > diff --git a/arch/x86/um/signal.c b/arch/x86/um/signal.c > index 75087e85b6fd..b7365c75a967 100644 > --- a/arch/x86/um/signal.c > +++ b/arch/x86/um/signal.c > @@ -371,6 +371,13 @@ int setup_signal_stack_si(unsigned long stack_top, struct ksignal *ksig, > round_down(stack_top - sizeof(struct rt_sigframe), 16); > > /* Add required space for math frame */ > +#ifndef CONFIG_MMU > + /* > + * the sig_frame on !MMU needs be aligned for SSE as > + * the frame is used as-is. > + */ > + math_size = round_down(math_size, 16); > +#endif > frame = (struct rt_sigframe __user *)((unsigned long)frame - math_size); > > /* Subtract 128 for a red zone and 8 for proper alignment */ > @@ -417,6 +424,18 @@ int setup_signal_stack_si(unsigned long stack_top, struct ksignal *ksig, > /* could use a vstub here */ > return err; > > +#ifndef CONFIG_MMU > + /* > + * we need to push handler address at top of stack, as > + * __kernel_vsyscall, called after this returns with ret with > + * stack contents, thus push the handler here. > + */ > + frame = (struct rt_sigframe __user *) ((unsigned long) frame - > + sizeof(unsigned long)); > + err |= __put_user((unsigned long)ksig->ka.sa.sa_handler, > + (unsigned long *)frame); > +#endif > + > if (err) > return err; > > @@ -442,9 +461,25 @@ SYSCALL_DEFINE0(rt_sigreturn) > unsigned long sp = PT_REGS_SP(¤t->thread.regs); > struct rt_sigframe __user *frame = > (struct rt_sigframe __user *)(sp - sizeof(long)); > - struct ucontext __user *uc = &frame->uc; > + struct ucontext __user *uc; > sigset_t set; > > +#ifndef CONFIG_MMU > + /** > + * we enter here with: > + * > + * __restore_rt: > + * mov $15, %rax > + * call *%rax (translated from syscall) > + * > + * (code is from musl libc) > + * so, stack needs to be popped of "call"ed address before > + * looking at rt_sigframe. > + */ > + frame = (struct rt_sigframe __user *)((unsigned long)frame + sizeof(long)); > +#endif > + uc = &frame->uc; > + > if (copy_from_user(&set, &uc->uc_sigmask, sizeof(set))) > goto segfault; >
Hello, On Thu, 28 Nov 2024 19:37:21 +0900, Benjamin Berg wrote: > > +#ifndef CONFIG_MMU > > + memset(&r, 0, sizeof(r)); > > + /* mark is_user=1 when the IP is from userspace code. */ > > + if (mc && (REGS_IP(mc->gregs) > uml_reserved > > + && REGS_IP(mc->gregs) < high_physmem)) > > + r.is_user = 1; > > + else > > +#endif > > + r.is_user = 0; > > Does this work if we load modules dynamically? > > I suppose one could map them into a separate memory area rather than > running them directly from the physical memory. > Otherwise we'll also get problem with the SECCOMP filter. currently, I thought modules use the separate area from execmem, but nommu allocator ignores this location info to map the memory; instead mixing up with area used by userspace programs. we may be able to come up with execmem_arch_setup() to fix this situation. so, no, this is_user detection doesn't work; modules also become is_user=1. MMU full allocator (normal UML and seccomp asl well ?) seems to be fine as long as using execmem. I will look into detail how we should handle. > > if (sig == SIGSEGV) { > > /* For segfaults, we want the data from the sigcontext. */ > > get_regs_from_mc(&r, mc); > > @@ -191,6 +199,7 @@ static void hard_handler(int sig, siginfo_t *si, void *p) > > ucontext_t *uc = p; > > mcontext_t *mc = &uc->uc_mcontext; > > unsigned long pending = 1UL << sig; > > + int is_segv = 0; > > > > do { > > int nested, bail; > > @@ -214,6 +223,7 @@ static void hard_handler(int sig, siginfo_t *si, void *p) > > > > while ((sig = ffs(pending)) != 0){ > > sig--; > > + is_segv = (sig == SIGSEGV) ? 1 : 0; > > pending &= ~(1 << sig); > > (*handlers[sig])(sig, (struct siginfo *)si, mc); > > } > > @@ -227,6 +237,12 @@ static void hard_handler(int sig, siginfo_t *si, void *p) > > if (!nested) > > pending = from_irq_stack(nested); > > } while (pending); > > + > > +#ifndef CONFIG_MMU > > + /* if there is SIGSEGV notified, let the userspace run w/ __noreturn */ > > + if (is_segv) > > + sigsegv_post_routine(); > > +#endif > > } > > I am confused, this doesn't feel quite correct to me. thanks for pointing this out. the above code, which I spot the working example under nommu, is indeed suspicious and doesn't look a right code. that signal handing (this patch) is immature, and need more work to understand existing code, nommu characteristic, etc. > So, for normal UML, I think we always do an rt_sigreturn. Which means, > we always go back to the corresponding *kernel* task. To schedule in > response to SIGALRM, we forward the signal to the userspace process. > I believe that means: > 1. We cannot schedule kernel threads (that seems like a bug) > 2. Scheduling for userspace happens once the signal is delivered. > Then userspace() saves the state and calls interrupt_end(). > > > Now, keep in mind that we are on the separate signal stack here. If we > jump anywhere directly, we abandon the old state information stored by > the host kernel into the mcontext. We can absolutely do that, but we > need to be careful to not forget anything. > > As such, I wonder whether nommu should: > 1. When entering from kernel, update "current->thread.switch_buf" > from the mcontext. > - If we need to schedule, push a stack frame that calls the scheduling > code and returns with the correct state. > 2. When entering from user, store the task registers from the > mcontext. At some point (here or earlier) ensure that the > "current->thread.switch_buf" is set up so that we can return to > userspace by restoring the task registers. > - To schedule, piggy back on 1. or add special code. > 3. Always do a UML_LONGJMP() back into the "current" task. thanks, the current code jumps in the signal handler and unblocking signals without returning the handler (and not calling rt_sigreturn at host either) upon SIGSEGV, which should not work as you mentioned. I will also investigate how I can handle. > That said, I am probably not having the full picture right now. > > Benjamin > > PS: On a further note, I think the current code to enter userspace > cannot handle single stepping. I suppose that is fine, but you should > probably set arch_has_single_step to 0 for nommu. I did almost zero tests with ptrace(2) (inside nommu UM) and might miss a lot of features that mmu-UM could. will also look into that. thanks, -- Hajime
diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h index f21dc8517538..bcc8d28279ae 100644 --- a/arch/um/include/shared/kern_util.h +++ b/arch/um/include/shared/kern_util.h @@ -62,6 +62,9 @@ extern int singlestepping(void); extern void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs); extern void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs); extern void fatal_sigsegv(void) __attribute__ ((noreturn)); +#ifndef CONFIG_MMU +extern void sigsegv_post_routine(void); +#endif void um_idle_sleep(void); diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index a7519b3de4bf..b9b54e777894 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -174,6 +174,16 @@ void fatal_sigsegv(void) os_dump_core(); } +#ifndef CONFIG_MMU +void sigsegv_post_routine(void) +{ + change_sig(SIGIO, 1); + change_sig(SIGALRM, 1); + change_sig(SIGWINCH, 1); + userspace(¤t->thread.regs.regs); +} +#endif + /** * segv_handler() - the SIGSEGV handler * @sig: the signal number diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c index 52852018a3ad..a06622415d8f 100644 --- a/arch/um/os-Linux/signal.c +++ b/arch/um/os-Linux/signal.c @@ -36,7 +36,15 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc) struct uml_pt_regs r; int save_errno = errno; - r.is_user = 0; +#ifndef CONFIG_MMU + memset(&r, 0, sizeof(r)); + /* mark is_user=1 when the IP is from userspace code. */ + if (mc && (REGS_IP(mc->gregs) > uml_reserved + && REGS_IP(mc->gregs) < high_physmem)) + r.is_user = 1; + else +#endif + r.is_user = 0; if (sig == SIGSEGV) { /* For segfaults, we want the data from the sigcontext. */ get_regs_from_mc(&r, mc); @@ -191,6 +199,7 @@ static void hard_handler(int sig, siginfo_t *si, void *p) ucontext_t *uc = p; mcontext_t *mc = &uc->uc_mcontext; unsigned long pending = 1UL << sig; + int is_segv = 0; do { int nested, bail; @@ -214,6 +223,7 @@ static void hard_handler(int sig, siginfo_t *si, void *p) while ((sig = ffs(pending)) != 0){ sig--; + is_segv = (sig == SIGSEGV) ? 1 : 0; pending &= ~(1 << sig); (*handlers[sig])(sig, (struct siginfo *)si, mc); } @@ -227,6 +237,12 @@ static void hard_handler(int sig, siginfo_t *si, void *p) if (!nested) pending = from_irq_stack(nested); } while (pending); + +#ifndef CONFIG_MMU + /* if there is SIGSEGV notified, let the userspace run w/ __noreturn */ + if (is_segv) + sigsegv_post_routine(); +#endif } void set_handler(int sig) diff --git a/arch/x86/um/signal.c b/arch/x86/um/signal.c index 75087e85b6fd..b7365c75a967 100644 --- a/arch/x86/um/signal.c +++ b/arch/x86/um/signal.c @@ -371,6 +371,13 @@ int setup_signal_stack_si(unsigned long stack_top, struct ksignal *ksig, round_down(stack_top - sizeof(struct rt_sigframe), 16); /* Add required space for math frame */ +#ifndef CONFIG_MMU + /* + * the sig_frame on !MMU needs be aligned for SSE as + * the frame is used as-is. + */ + math_size = round_down(math_size, 16); +#endif frame = (struct rt_sigframe __user *)((unsigned long)frame - math_size); /* Subtract 128 for a red zone and 8 for proper alignment */ @@ -417,6 +424,18 @@ int setup_signal_stack_si(unsigned long stack_top, struct ksignal *ksig, /* could use a vstub here */ return err; +#ifndef CONFIG_MMU + /* + * we need to push handler address at top of stack, as + * __kernel_vsyscall, called after this returns with ret with + * stack contents, thus push the handler here. + */ + frame = (struct rt_sigframe __user *) ((unsigned long) frame - + sizeof(unsigned long)); + err |= __put_user((unsigned long)ksig->ka.sa.sa_handler, + (unsigned long *)frame); +#endif + if (err) return err; @@ -442,9 +461,25 @@ SYSCALL_DEFINE0(rt_sigreturn) unsigned long sp = PT_REGS_SP(¤t->thread.regs); struct rt_sigframe __user *frame = (struct rt_sigframe __user *)(sp - sizeof(long)); - struct ucontext __user *uc = &frame->uc; + struct ucontext __user *uc; sigset_t set; +#ifndef CONFIG_MMU + /** + * we enter here with: + * + * __restore_rt: + * mov $15, %rax + * call *%rax (translated from syscall) + * + * (code is from musl libc) + * so, stack needs to be popped of "call"ed address before + * looking at rt_sigframe. + */ + frame = (struct rt_sigframe __user *)((unsigned long)frame + sizeof(long)); +#endif + uc = &frame->uc; + if (copy_from_user(&set, &uc->uc_sigmask, sizeof(set))) goto segfault;
This commit updates the behavior of signal handling under !MMU environment. 1) the stack preparation for the signal handlers and 2) restoration of stack after rt_sigreturn(2) syscall. Those are needed as the stack usage on vfork(2) syscall is different. It also adds the follow up routine for SIGSEGV as a signal delivery runs in the same stack frame while we have to avoid endless SIGSEGV. Signed-off-by: Hajime Tazaki <thehajime@gmail.com> --- arch/um/include/shared/kern_util.h | 3 +++ arch/um/kernel/trap.c | 10 ++++++++ arch/um/os-Linux/signal.c | 18 ++++++++++++++- arch/x86/um/signal.c | 37 +++++++++++++++++++++++++++++- 4 files changed, 66 insertions(+), 2 deletions(-)