Message ID | 20100215014028.GB13355@kryten (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Benjamin Herrenschmidt |
Headers | show |
On Mon, 2010-02-15 at 12:40 +1100, Anton Blanchard wrote: > Right now we clear the larx reservation on every system call exit. No code > should exist that tries to make use of larx/stcx across a system call (because > it would fail 100% of the time). > > We could continue to play it safe but system call latency affects so many > workloads. In the past we have already cut down the set of registers we > save and restore across a system call and this could be seen as an > extension of that. The PowerPC system call ABI does not (and could not) > preserve a larx reservation. > > On POWER6 the poster child for system call improvements, getppid, improves 6%. > > A more useful test is the private futex wake system call and that improves 5%. > This is a decent speedup on an important system call for threaded applications. > > Signed-off-by: Anton Blanchard <anton@samba.org> > --- > If my previous patches didn't worry you then this one is sure to. > > Getting this wrong will make someone's life miserable, so it could do with > some double checking (eg we don't branch through there on other exceptions and > we dont invoke system calls from the kernel that rely on the reservation being > cleared). Well, the main issue here is leaking kernel reservations into userspace, and thus the question of whether it is a big deal or not. There's also an issue I can see with signals. The risk with kernel reservations leaking into userspace is a problem on some processors that do not compare the reservation address locally (only for snoops), thus userspace code doing lwarx/syscall/stwcx. might end up with a suceeding stwcx. despite the fact that the original reservation was long lost. At this stage it becomes an ABI problem, ie, whether we define the behaviour of a lwarx/stwcx. accross a syscall as defined or not. The other problem I see is that signal handlers would have to be made very careful not to leave dangling reservations since the return from the syscall is a syscall, unless we add code specifically to this (and set_context too I'd say) to clear reservations. IE. You could have something like: lwarx, <interrupt>, signal handler, sigreturn, stwcx. In the above case, the reservation would be cleared by the return from the interrupt, but the signal handler might leave a dangling one, which sigreturn might fail to clear (in practice, our current implementation of sys_sigreturn() will probably clear any reservation as a side effect of restore_sigmask() spinlock or set_thread_flag() but it sounds a bit fragile to rely on unless it's well documented). Cheers, Ben. > Index: powerpc.git/arch/powerpc/kernel/entry_64.S > =================================================================== > --- powerpc.git.orig/arch/powerpc/kernel/entry_64.S 2010-02-13 16:26:43.794322638 +1100 > +++ powerpc.git/arch/powerpc/kernel/entry_64.S 2010-02-13 16:27:03.205575405 +1100 > @@ -202,7 +202,6 @@ syscall_exit: > bge- syscall_error > syscall_error_cont: > ld r7,_NIP(r1) > - stdcx. r0,0,r1 /* to clear the reservation */ > andi. r6,r8,MSR_PR > ld r4,_LINK(r1) > /*
Hi Ben, > Well, the main issue here is leaking kernel reservations into userspace, > and thus the question of whether it is a big deal or not. There's also > an issue I can see with signals. > > The risk with kernel reservations leaking into userspace is a problem on > some processors that do not compare the reservation address locally > (only for snoops), thus userspace code doing lwarx/syscall/stwcx. might > end up with a suceeding stwcx. despite the fact that the original > reservation was long lost. Yeah that was my primary concern. Right now these things fail 100%, so no one is relying on it. The worry is if people start writing their own crazy low level system call + locking stubs that might work most of the time (if we remove the stwcx in syscall exit). > At this stage it becomes an ABI problem, ie, whether we define the > behaviour of a lwarx/stwcx. accross a syscall as defined or not. > > The other problem I see is that signal handlers would have to be made > very careful not to leave dangling reservations since the return from > the syscall is a syscall, unless we add code specifically to this (and > set_context too I'd say) to clear reservations. > > IE. You could have something like: > > lwarx, <interrupt>, signal handler, sigreturn, stwcx. > > In the above case, the reservation would be cleared by the return from > the interrupt, but the signal handler might leave a dangling one, which > sigreturn might fail to clear (in practice, our current implementation > of sys_sigreturn() will probably clear any reservation as a side effect > of restore_sigmask() spinlock or set_thread_flag() but it sounds a bit > fragile to rely on unless it's well documented). Good point, I hadn't thought of signals and I agree we'd need to clear the reservation in the sigreturn path. Anton
On Mon, 2010-02-15 at 15:06 +1100, Anton Blanchard wrote: > > Yeah that was my primary concern. Right now these things fail 100%, so > no one is relying on it. The worry is if people start writing their own > crazy low level system call + locking stubs that might work most of the > time (if we remove the stwcx in syscall exit). Worse than that. It will look like it works, but it won't really since the dangling reservation matches a completely unrelated lwarx done by the kernel and not the original userspace one, so the stwcx. might succeed despite the fact that the original value -was- changed. So it's really a matter of documentation I suppose but we have to be careful as I've learned with time that there is nothing too silly for userspace to do :-) > Good point, I hadn't thought of signals and I agree we'd need to clear the > reservation in the sigreturn path. Right. As I said, I'm pretty sure it will happen as a side effect of other things but I'd rather pay the small price of having an explicit blurb to do it in sigreturn regardless. Cheers, Ben.
Index: powerpc.git/arch/powerpc/kernel/entry_64.S =================================================================== --- powerpc.git.orig/arch/powerpc/kernel/entry_64.S 2010-02-13 16:26:43.794322638 +1100 +++ powerpc.git/arch/powerpc/kernel/entry_64.S 2010-02-13 16:27:03.205575405 +1100 @@ -202,7 +202,6 @@ syscall_exit: bge- syscall_error syscall_error_cont: ld r7,_NIP(r1) - stdcx. r0,0,r1 /* to clear the reservation */ andi. r6,r8,MSR_PR ld r4,_LINK(r1) /*
Right now we clear the larx reservation on every system call exit. No code should exist that tries to make use of larx/stcx across a system call (because it would fail 100% of the time). We could continue to play it safe but system call latency affects so many workloads. In the past we have already cut down the set of registers we save and restore across a system call and this could be seen as an extension of that. The PowerPC system call ABI does not (and could not) preserve a larx reservation. On POWER6 the poster child for system call improvements, getppid, improves 6%. A more useful test is the private futex wake system call and that improves 5%. This is a decent speedup on an important system call for threaded applications. Signed-off-by: Anton Blanchard <anton@samba.org> --- If my previous patches didn't worry you then this one is sure to. Getting this wrong will make someone's life miserable, so it could do with some double checking (eg we don't branch through there on other exceptions and we dont invoke system calls from the kernel that rely on the reservation being cleared).