Message ID | 1450844167-7327-1-git-send-email-rob.gardner@oracle.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
From: Rob Gardner <rob.gardner@oracle.com> Date: Tue, 22 Dec 2015 21:16:06 -0700 > When an interrupt (such as a perf counter interrupt) is delivered > while executing in user space, the trap entry code puts ASI_AIUS in > %asi so that copy_from_user() and copy_to_user() will access the > correct memory. But if a perf counter interrupt is delivered while the > cpu is already executing in kernel space, then the trap entry code > will put ASI_P in %asi, and this will prevent copy_from_user() from > reading any useful stack data in either of the perf_callchain_user_X > functions, and thus no user callgraph data will be collected for this > sample period. An additional problem is that a fault is guaranteed > to occur, and though it will be silently covered up, it wastes time > and could perturb state. > > In perf_callchain_user(), we ensure that %asi contains ASI_AIUS > because we know for a fact that the subsequent calls to > copy_from_user() are intended to read the user's stack. > > Signed-off-by: Rob Gardner <rob.gardner@oracle.com> > Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> Thanks for tracking this down, David Ahern and I spent quite some time trying to figure this one out. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Rob Gardner <rob.gardner@oracle.com> Date: Tue, 22 Dec 2015 21:16:06 -0700 > When an interrupt (such as a perf counter interrupt) is delivered > while executing in user space, the trap entry code puts ASI_AIUS in > %asi so that copy_from_user() and copy_to_user() will access the > correct memory. But if a perf counter interrupt is delivered while the > cpu is already executing in kernel space, then the trap entry code > will put ASI_P in %asi, and this will prevent copy_from_user() from > reading any useful stack data in either of the perf_callchain_user_X > functions, and thus no user callgraph data will be collected for this > sample period. An additional problem is that a fault is guaranteed > to occur, and though it will be silently covered up, it wastes time > and could perturb state. > > In perf_callchain_user(), we ensure that %asi contains ASI_AIUS > because we know for a fact that the subsequent calls to > copy_from_user() are intended to read the user's stack. > > Signed-off-by: Rob Gardner <rob.gardner@oracle.com> > Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> I applied this, but modified it slightly. We have shorthand helpers "get_fs()" and "set_fs()" for doing this kind of work, and as you can see in arch/sparc/kernel/process_64.c and elsewhere this is the canonical way to adjust the %asi value in these kinds of circumstances. Also, set_fs() updates the thread's current_ds value properly. Notice that NGmemcpy.S uses the TI_CURRENT_DS value via the RESTORE_ASI() macro. So if that's not set properly, the %asi restore won't be done properly. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/24/2015 09:43 AM, David Miller wrote: > From: Rob Gardner <rob.gardner@oracle.com> > Date: Tue, 22 Dec 2015 21:16:06 -0700 > >> When an interrupt (such as a perf counter interrupt) is delivered >> while executing in user space, the trap entry code puts ASI_AIUS in >> %asi so that copy_from_user() and copy_to_user() will access the >> correct memory. But if a perf counter interrupt is delivered while the >> cpu is already executing in kernel space, then the trap entry code >> will put ASI_P in %asi, and this will prevent copy_from_user() from >> reading any useful stack data in either of the perf_callchain_user_X >> functions, and thus no user callgraph data will be collected for this >> sample period. An additional problem is that a fault is guaranteed >> to occur, and though it will be silently covered up, it wastes time >> and could perturb state. >> >> In perf_callchain_user(), we ensure that %asi contains ASI_AIUS >> because we know for a fact that the subsequent calls to >> copy_from_user() are intended to read the user's stack. >> >> Signed-off-by: Rob Gardner <rob.gardner@oracle.com> >> Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> > I applied this, but modified it slightly. > > We have shorthand helpers "get_fs()" and "set_fs()" for doing this > kind of work, and as you can see in arch/sparc/kernel/process_64.c > and elsewhere this is the canonical way to adjust the %asi value in > these kinds of circumstances. > > Also, set_fs() updates the thread's current_ds value properly. Notice > that NGmemcpy.S uses the TI_CURRENT_DS value via the RESTORE_ASI() > macro. So if that's not set properly, the %asi restore won't be done > properly. Sorry I should have noted this in the log message, but we intentionally did not use get_fs() and set_fs() there because they are not safe to use in a "nested" interrupt context. n.b. get_fs() is not guaranteed to report a value consistent with %asi while we're executing the perf interrupt handler, because it may have interrupted kernel code where %asi is inconsistent with the thread_info current_ds value. This is common, e.g. right in NGmemcpy. Rob -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Rob Gardner <rob.gardner@oracle.com> Date: Thu, 24 Dec 2015 10:39:05 -0700 > Sorry I should have noted this in the log message, but we > intentionally did not use get_fs() and set_fs() there because they are > not safe to use in a "nested" interrupt context. n.b. get_fs() is not > guaranteed to report a value consistent with %asi while we're > executing the perf interrupt handler, because it may have interrupted > kernel code where %asi is inconsistent with the thread_info current_ds > value. This is common, e.g. right in NGmemcpy. Is that a real problem? The return from trap will restore the %asi register properly from the %tstate register. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/25/2015 09:25 PM, David Miller wrote: > From: Rob Gardner <rob.gardner@oracle.com> > Date: Thu, 24 Dec 2015 10:39:05 -0700 > >> Sorry I should have noted this in the log message, but we >> intentionally did not use get_fs() and set_fs() there because they are >> not safe to use in a "nested" interrupt context. n.b. get_fs() is not >> guaranteed to report a value consistent with %asi while we're >> executing the perf interrupt handler, because it may have interrupted >> kernel code where %asi is inconsistent with the thread_info current_ds >> value. This is common, e.g. right in NGmemcpy. > Is that a real problem? > > The return from trap will restore the %asi register properly from > the %tstate register. > It's not a problem for the context that was interrupted since as you say, %asi will be restored properly. But in the perf interrupt context it seems a little messy to me because get_fs() can (and will) return a value inconsistent with %asi, so at the end of perf_callchain_user() when set_fs() is done, it will not truly restore %asi to what it was when the function began. So up until the return from trap, %asi contains an unintended value. I didn't track down all possibilities to see if anything bad might happen, but it just seemed wrong to leave things in that state. Also, in the code we submitted, there was an optimization in which %asi is read, and then only set to ASI_AIUS if necessary. This drastically reduces the number of writes to the %asi register since most of the time %asi will contain ASI_AIUS. This seems like a reasonable optimization, since this function may be called thousands of times per second on every cpu. But this doesn't work at all using get_fs() since it is inconsistent with %asi, and that is why we went with the inline assembler to read and write %asi directly. Merry Christmas. Rob -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Rob Gardner <rob.gardner@oracle.com> Date: Fri, 25 Dec 2015 22:02:43 -0700 > Also, in the code we submitted, there was an optimization in which > %asi is read, and then only set to ASI_AIUS if necessary. This > drastically reduces the number of writes to the %asi register since > most of the time %asi will contain ASI_AIUS. This seems like a > reasonable optimization, since this function may be called thousands > of times per second on every cpu. I noticed the optimization. If this was happening for every memcpy call, I'd say it's worth it. But it's happening once for a series of memcpy/copy_from_user_inatomic() calls so I'd say it's not really worth it. So unless you can show me how the current version fails, I'm keeping it as-is because either we should consistently use set_fs/get_fs in C code rather than open coded inline asm, or we should create a well documented set of helper functions for this specific situation and _ALSO_ use it elsewhere where the same problems exist such as some of the uses of set_fs/get_fs in process_64.c -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/25/2015 11:27 PM, David Miller wrote: > From: Rob Gardner <rob.gardner@oracle.com> > Date: Fri, 25 Dec 2015 22:02:43 -0700 > >> Also, in the code we submitted, there was an optimization in which >> %asi is read, and then only set to ASI_AIUS if necessary. This >> drastically reduces the number of writes to the %asi register since >> most of the time %asi will contain ASI_AIUS. This seems like a >> reasonable optimization, since this function may be called thousands >> of times per second on every cpu. > I noticed the optimization. > > If this was happening for every memcpy call, I'd say it's worth it. > > But it's happening once for a series of memcpy/copy_from_user_inatomic() > calls so I'd say it's not really worth it. > > So unless you can show me how the current version fails, I'm keeping it > as-is because either we should consistently use set_fs/get_fs in C > code rather than open coded inline asm, or we should create a well > documented set of helper functions for this specific situation and > _ALSO_ use it elsewhere where the same problems exist such as some > of the uses of set_fs/get_fs in process_64.c > Fair enough. You've convinced me that my worries are unfounded. Let's consider the matter settled. Thanks. Rob -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c index 689db65..55a6caa 100644 --- a/arch/sparc/kernel/perf_event.c +++ b/arch/sparc/kernel/perf_event.c @@ -1810,11 +1810,19 @@ static void perf_callchain_user_32(struct perf_callchain_entry *entry, void perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs) { + u8 saved_asi; + perf_callchain_store(entry, regs->tpc); if (!current->mm) return; + /* ensure we get user memory when trying to read stacks */ + __asm__ __volatile__ ("rd %%asi, %0\n" : "=r" (saved_asi)); + if (saved_asi != ASI_AIUS) + __asm__ __volatile__ ( + "wr %%g0, %0, %%asi\n" : : "i" (ASI_AIUS)); + flushw_user(); pagefault_disable(); @@ -1825,4 +1833,8 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs) perf_callchain_user_64(entry, regs); pagefault_enable(); + + if (saved_asi != ASI_AIUS) + __asm__ __volatile__ ( + "wr %%g0, %0, %%asi\n" : : "r" (saved_asi)); }