Message ID | 20100922183200.GC19804@ZenIV.linux.org.uk |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
On Wed, Sep 22, 2010 at 11:32 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > That has unpleasant results - for starters, delivery of SIGSEGV upon > failure to set sigframe up is delayed unpredictably; we will take it > only when we trap again. I think this whole argument is a total red herring. It's a bug in next_signal() if we allow this to happen. We need to enqueue those synchronous signals first, and NO AMOUNT OF SIGNAL QUEUEING will ever change that. The fact is, even if you queue up all the signals at once, you need to queue up the synchronous ones first. Otherwise their instruction pointer information etc will simply be bogus. It's that simple. Your argument about queuing up one, two, or more signals is bogus, for the simple reason that it doesn't matter: whether you queue or not is irrelevant, since the "innermost" one in the queue always has to be the SIGSEGV. Whether we queue other signals on top of that (and they get executed first, since it's a stack) doesn't matter. That's a timing issue, and the program acts as if those asynchronous signals happened before the trap. But that's fine. All that matters is that the actual synchronous signal has the register contents of the time of the synchronous trap, ie it gets enqueued first. It's why we have that "if (x & SYNCHRONOUS_MASK)" in next_signal(). It's not pretty, it's not perfect, but it's required. Linus -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 22, 2010 at 11:46:27AM -0700, Linus Torvalds wrote: > On Wed, Sep 22, 2010 at 11:32 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > That has unpleasant results - for starters, delivery of SIGSEGV upon > > failure to set sigframe up is delayed unpredictably; we will take it > > only when we trap again. > > I think this whole argument is a total red herring. > > It's a bug in next_signal() if we allow this to happen. We need to > enqueue those synchronous signals first, and NO AMOUNT OF SIGNAL > QUEUEING will ever change that. > > The fact is, even if you queue up all the signals at once, you need to > queue up the synchronous ones first. Otherwise their instruction > pointer information etc will simply be bogus. It's that simple. Your > argument about queuing up one, two, or more signals is bogus, for the > simple reason that it doesn't matter: whether you queue or not is > irrelevant, since the "innermost" one in the queue always has to be > the SIGSEGV. > > Whether we queue other signals on top of that (and they get executed > first, since it's a stack) doesn't matter. That's a timing issue, and > the program acts as if those asynchronous signals happened before the > trap. But that's fine. All that matters is that the actual synchronous > signal has the register contents of the time of the synchronous trap, > ie it gets enqueued first. > > It's why we have that "if (x & SYNCHRONOUS_MASK)" in next_signal(). > It's not pretty, it's not perfect, but it's required. Um, no. You've *already* called get_signal_to_deliver(). There had been no SIGSEGV in sight. You happily went on to set a sigframe for e.g. SIGHUP, but ran out of stack. At that point you get force_sigsegv() from handle_signal(). _NOW_ you have a pending SIGSEGV; whether you'll be able to handle it (e.g. if your SIGSEGV handler is set to run on altstack) or not, you won't get to it until you call get_signal_to_deliver() again. Which requires do_signal() to run. Sure, it will be the first one to be picked, but we need to try and pick _something_ to get it. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 22, 2010 at 07:53:28PM +0100, Al Viro wrote: > Um, no. You've *already* called get_signal_to_deliver(). There had been > no SIGSEGV in sight. You happily went on to set a sigframe for e.g. > SIGHUP, but ran out of stack. At that point you get force_sigsegv() > from handle_signal(). _NOW_ you have a pending SIGSEGV; whether you'll > be able to handle it (e.g. if your SIGSEGV handler is set to run on > altstack) or not, you won't get to it until you call get_signal_to_deliver() > again. Which requires do_signal() to run. > > Sure, it will be the first one to be picked, but we need to try and pick > _something_ to get it. PS: on something like x86 that'll happen before we return to userland, since the glue on syscall exit does that: int_signal: testl $_TIF_DO_NOTIFY_MASK,%edx jz 1f movq %rsp,%rdi # &ptregs -> arg1 xorl %esi,%esi # oldset -> arg2 call do_notify_resume 1: movl $_TIF_WORK_MASK,%edi int_restore_rest: RESTORE_REST DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF jmp int_with_check and int_with_check leads back to int_signal if _TIF_SIGPENDING is still set. That kind of looping happens on everything except sparc and that's exactly the difference I'm talking about. On sparc the sucker manages to escape to userland with SIGSEGV still pending. Try it and you'll see - simple signal(SIGHUP, something) + munmap a little bit under your %sp + raise(SIGHUP), then look at the resulting coredump... On sparc you'll get that SIGSEGV later; in fact, putting write(1, "ouch\n", 5); after that raise() has a good chance of saying ouch before it dumps core. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 22, 2010 at 11:53 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Um, no. You've *already* called get_signal_to_deliver(). There had been > no SIGSEGV in sight. You happily went on to set a sigframe for e.g. > SIGHUP, but ran out of stack. At that point you get force_sigsegv() > from handle_signal(). _NOW_ you have a pending SIGSEGV Ahh. Ok. Different case from the one I thought you were worried about. And yeah, I guess that one does require us to mess with the low-level asm code (although I do wonder if we could not make the whole do_notify_resume + reschedule code be generic C code - it's a lot of duplicated subtle asm as it is). Linus -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 22, 2010 at 12:08:53PM -0700, Linus Torvalds wrote: > On Wed, Sep 22, 2010 at 11:53 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Um, no. ?You've *already* called get_signal_to_deliver(). ?There had been > > no SIGSEGV in sight. ?You happily went on to set a sigframe for e.g. > > SIGHUP, but ran out of stack. ?At that point you get force_sigsegv() > > from handle_signal(). ?_NOW_ you have a pending SIGSEGV > > Ahh. Ok. Different case from the one I thought you were worried about. > And yeah, I guess that one does require us to mess with the low-level > asm code (although I do wonder if we could not make the whole > do_notify_resume + reschedule code be generic C code - it's a lot of > duplicated subtle asm as it is). Worse than just that... Note that on sparc you need to deal with fault_in_user_windows(), which can also trigger signals. So much for merging it cross-architecture, even if we ignore the differences between the ways we pass data required for restart to do_signal(). Note that e.g. on alpha we pass _two_ values - one for overwritten v0 (syscall number), another for overwritten a3 (error flag), etc. Worse, quite a few targets do not save all registers on syscall entry. Callee-saved stuff doesn't have to be in pt_regs. Except that you want *all* of them saved on stack when it's time to fill sigframes. And once you've entered a C function you can't guarantee that compiler hasn't already modified them; sure, they'll be restored on return, but that doesn't help you to get to their values. So you get switch_stack built on stack around calling do_notify_resume() on those. And you really want to avoid doing that if you've got no signals - remember, we hit that on exit from irqs as well, so it's one hell of a hot path. For processors with big flat register file it can get very ugly. There is a lot of ugly almost-duplication in there, no arguments about that. Asm glue is actually not the worst part... -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 22, 2010 at 08:53:49PM +0100, Al Viro wrote: > On Wed, Sep 22, 2010 at 12:08:53PM -0700, Linus Torvalds wrote: > > On Wed, Sep 22, 2010 at 11:53 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > Um, no. ?You've *already* called get_signal_to_deliver(). ?There had been > > > no SIGSEGV in sight. ?You happily went on to set a sigframe for e.g. > > > SIGHUP, but ran out of stack. ?At that point you get force_sigsegv() > > > from handle_signal(). ?_NOW_ you have a pending SIGSEGV > > > > Ahh. Ok. Different case from the one I thought you were worried about. > > And yeah, I guess that one does require us to mess with the low-level > > asm code (although I do wonder if we could not make the whole > > do_notify_resume + reschedule code be generic C code - it's a lot of > > duplicated subtle asm as it is). > > Worse than just that... Note that on sparc you need to deal with > fault_in_user_windows(), which can also trigger signals. Actually, I wonder why don't we do the following: 1) check wsaved first, do fault_in_user_windows() if needed (and probably do Something Cruel(tm) if we fail copy_to_user() in there) 2) in a loop check if we need to reschedule / if we need to handle signals 3) don't bother with wsaved checks in setup_frame() variants at all - wsaved can't grow back at that point; we also can use flush_user_windows() instead of full synchronize_user_stack() in there. It's definitely a separate patch, but it looks like it might be worth doing... Comments? -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Al Viro <viro@ZenIV.linux.org.uk> Date: Wed, 22 Sep 2010 21:43:24 +0100 > Actually, I wonder why don't we do the following: > 1) check wsaved first, do fault_in_user_windows() if needed (and probably do > Something Cruel(tm) if we fail copy_to_user() in there) > 2) in a loop check if we need to reschedule / if we need to handle signals > 3) don't bother with wsaved checks in setup_frame() variants at all - > wsaved can't grow back at that point; we also can use flush_user_windows() > instead of full synchronize_user_stack() in there. > > It's definitely a separate patch, but it looks like it might be worth > doing... Comments? Ok, let me think about this. I think there are two things: 1) I think I originally intended to allow the signal dispatch to succeed even if we had windows we couldn't fault in. The idea is that the wsaved windows would be put into the signal frame. This never was implemented, however... 2) It would be nice to, in this case, still be able to let the debugger have a look. And part of "having a look" would mean letting it see the registers from the windows we couldn't save onto the stack. Otherwise GDB is just going to try and access the stack pages we were unable to. Making #2 work could be done with an implementation of #1, since GDB would need to be able to fetch the values easily from somewhere and the signal frame storage we create for #1 would be as good as any. I can't see much utility for a user signal handler for #1, however, other than getting an accurate stack backtrace to emit a crash log message or similar. Alternatively, #2 could be implemented using a special ptrace getregs call created specifically to fetch the windowed registers. And the regset implementation of that could be used for dumping them into core files as well, and this specifically I've been meaning to do at some point. Again, let me think about this some more. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 22, 2010 at 02:15:14PM -0700, David Miller wrote: > Alternatively, #2 could be implemented using a special ptrace getregs > call created specifically to fetch the windowed registers. And > the regset implementation of that could be used for dumping them > into core files as well, and this specifically I've been meaning to > do at some point. > > Again, let me think about this some more. OK... sparc32 question: just what the !@#!@# happens if sun4c_rett_stackchk find (%fp & 7) != 0? We go to ret_trap_user_stack_is_bolixed, which tries to page in the underlying page. OK, suppose it's already there and readable; we return without doing anything - and go to signal_p. Which finds itself with nothing to do, and %fp is *still* buggered. Spin ad infinitum? srmmu_rett_stackchk will do the same, BTW... -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 22, 2010 at 12:08:53PM -0700, Linus Torvalds wrote: > On Wed, Sep 22, 2010 at 11:53 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Um, no. ?You've *already* called get_signal_to_deliver(). ?There had been > > no SIGSEGV in sight. ?You happily went on to set a sigframe for e.g. > > SIGHUP, but ran out of stack. ?At that point you get force_sigsegv() > > from handle_signal(). ?_NOW_ you have a pending SIGSEGV > > Ahh. Ok. Different case from the one I thought you were worried about. > And yeah, I guess that one does require us to mess with the low-level > asm code (although I do wonder if we could not make the whole > do_notify_resume + reschedule code be generic C code - it's a lot of > duplicated subtle asm as it is). BTW, there's an interesting idea in s390 implementation (and I have to say that I'm bloody impressed by them - it's the only architecture besides x86 where I haven't found serious holes in signal handling yet; there are QOI issues, but that's it so far). What they do with syscall restarts is unusual and they might have a good point there. 1) They deal with restart immediately on the entry to do_signal(); if restarts are not suppressed and if the error is one of the restart-worthy ones, they do what should be done for no-handler case. 2) They store the pre-restart address, post-restart address and error. Then they call get_signal_to_deliver(). Of course, it may return us a positive signal number. In that case they may need to discard the restart they'd done. And they do it, but only if the address has remained equal to post-restart one. 3) They ignore ERESTART_RESTARTBLOCK if the address has changed. Actually, I suspect that they might need to clear the ->restart_block.fn in that case, but I haven't done analysis of that yet. They do have a reason for doing it that way and it's worth considering on other platforms. Think what happens if we are getting traced. We'll be stopped and tracer will be notified. Normally it'll tell us to continue execution, possibly with a different signal *AND* with a different userland address to return to. Suppose we've got a different return address set for us (e.g. with PTRACE_POKEUSR). Should we ever shift it back by what hopefully is a size of syscall insn? -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Al Viro <viro@ZenIV.linux.org.uk> Date: Thu, 23 Sep 2010 00:12:19 +0100 > OK... sparc32 question: just what the !@#!@# happens if sun4c_rett_stackchk > find (%fp & 7) != 0? We go to ret_trap_user_stack_is_bolixed, which > tries to page in the underlying page. OK, suppose it's already there and > readable; we return without doing anything - and go to signal_p. Which finds > itself with nothing to do, and %fp is *still* buggered. Spin ad infinitum? > srmmu_rett_stackchk will do the same, BTW... That's a bug. Likely all of the window_*_fault() routines should force a SIGILL when the stack is mis-aligned. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Al Viro <viro@ZenIV.linux.org.uk> Date: Thu, 23 Sep 2010 05:59:22 +0100 > They do have a reason for doing it that way and it's worth considering on > other platforms. Think what happens if we are getting traced. We'll be > stopped and tracer will be notified. Normally it'll tell us to continue > execution, possibly with a different signal *AND* with a different userland > address to return to. Suppose we've got a different return address set > for us (e.g. with PTRACE_POKEUSR). Should we ever shift it back by what > hopefully is a size of syscall insn? These "tracer changing the program counter" issues are why on sparc we have this software state bit in the %tstate/%psr we report via regsets (and thus via ptrace) to debuggers which tells if we are inside of a syscall or not. GDB uses this to know whether the kernel signal handling is going to modify the program counter or not when the inferior continues. If GDB ever writes the program counter of the inferior, it clears this "in-syscall" bit, and this short-circuits the restart syscall logic in the signal dispatch code. This is what the GDB code looks like: static void sparc_linux_write_pc (struct regcache *regcache, CORE_ADDR pc) { struct gdbarch_tdep *tdep = gdbarch_tdep (get_regcache_arch (regcache)); ULONGEST psr; regcache_cooked_write_unsigned (regcache, tdep->pc_regnum, pc); regcache_cooked_write_unsigned (regcache, tdep->npc_regnum, pc + 4); /* Clear the "in syscall" bit to prevent the kernel from messing with the PCs we just installed, if we happen to be within an interrupted system call that the kernel wants to restart. Note that after we return from the dummy call, the PSR et al. registers will be automatically restored, and the kernel continues to restart the system call at this point. */ regcache_cooked_read_unsigned (regcache, SPARC32_PSR_REGNUM, &psr); psr &= ~PSR_SYSCALL; regcache_cooked_write_unsigned (regcache, SPARC32_PSR_REGNUM, psr); } -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/sparc/kernel/rtrap_64.S b/arch/sparc/kernel/rtrap_64.S index 090b9e9..77f1b95 100644 --- a/arch/sparc/kernel/rtrap_64.S +++ b/arch/sparc/kernel/rtrap_64.S @@ -34,37 +34,9 @@ __handle_preemption: __handle_user_windows: call fault_in_user_windows wrpr %g0, RTRAP_PSTATE, %pstate - wrpr %g0, RTRAP_PSTATE_IRQOFF, %pstate - /* Redo sched+sig checks */ - ldx [%g6 + TI_FLAGS], %l0 - andcc %l0, _TIF_NEED_RESCHED, %g0 - - be,pt %xcc, 1f - nop - call schedule - wrpr %g0, RTRAP_PSTATE, %pstate - wrpr %g0, RTRAP_PSTATE_IRQOFF, %pstate - ldx [%g6 + TI_FLAGS], %l0 - -1: andcc %l0, _TIF_DO_NOTIFY_RESUME_MASK, %g0 - be,pt %xcc, __handle_user_windows_continue - nop - mov %l5, %o1 - add %sp, PTREGS_OFF, %o0 - mov %l0, %o2 - - call do_notify_resume - wrpr %g0, RTRAP_PSTATE, %pstate - wrpr %g0, RTRAP_PSTATE_IRQOFF, %pstate - /* Signal delivery can modify pt_regs tstate, so we must - * reload it. - */ - ldx [%sp + PTREGS_OFF + PT_V9_TSTATE], %l1 - sethi %hi(0xf << 20), %l4 - and %l1, %l4, %l4 - ba,pt %xcc, __handle_user_windows_continue + ba,pt %xcc, __handle_preemption_continue + wrpr %g0, RTRAP_PSTATE_IRQOFF, %pstate - andn %l1, %l4, %l1 __handle_userfpu: rd %fprs, %l5 andcc %l5, FPRS_FEF, %g0 @@ -87,7 +59,7 @@ __handle_signal: ldx [%sp + PTREGS_OFF + PT_V9_TSTATE], %l1 sethi %hi(0xf << 20), %l4 and %l1, %l4, %l4 - ba,pt %xcc, __handle_signal_continue + ba,pt %xcc, __handle_preemption_continue andn %l1, %l4, %l1 /* When returning from a NMI (%pil==15) interrupt we want to @@ -177,11 +149,9 @@ __handle_preemption_continue: bne,pn %xcc, __handle_preemption andcc %l0, _TIF_DO_NOTIFY_RESUME_MASK, %g0 bne,pn %xcc, __handle_signal -__handle_signal_continue: ldub [%g6 + TI_WSAVED], %o2 brnz,pn %o2, __handle_user_windows nop -__handle_user_windows_continue: sethi %hi(TSTATE_PEF), %o0 andcc %l1, %o0, %g0