diff mbox

[1/2] sparc64: Ensure perf can access user stacks

Message ID 1450844167-7327-1-git-send-email-rob.gardner@oracle.com
State Accepted
Delegated to: David Miller
Headers show

Commit Message

Rob Gardner Dec. 23, 2015, 4:16 a.m. UTC
When an interrupt (such as a perf counter interrupt) is delivered
while executing in user space, the trap entry code puts ASI_AIUS in
%asi so that copy_from_user() and copy_to_user() will access the
correct memory. But if a perf counter interrupt is delivered while the
cpu is already executing in kernel space, then the trap entry code
will put ASI_P in %asi, and this will prevent copy_from_user() from
reading any useful stack data in either of the perf_callchain_user_X
functions, and thus no user callgraph data will be collected for this
sample period. An additional problem is that a fault is guaranteed
to occur, and though it will be silently covered up, it wastes time
and could perturb state.

In perf_callchain_user(), we ensure that %asi contains ASI_AIUS
because we know for a fact that the subsequent calls to
copy_from_user() are intended to read the user's stack.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
---
 arch/sparc/kernel/perf_event.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

David Miller Dec. 23, 2015, 5:45 a.m. UTC | #1
From: Rob Gardner <rob.gardner@oracle.com>
Date: Tue, 22 Dec 2015 21:16:06 -0700

> When an interrupt (such as a perf counter interrupt) is delivered
> while executing in user space, the trap entry code puts ASI_AIUS in
> %asi so that copy_from_user() and copy_to_user() will access the
> correct memory. But if a perf counter interrupt is delivered while the
> cpu is already executing in kernel space, then the trap entry code
> will put ASI_P in %asi, and this will prevent copy_from_user() from
> reading any useful stack data in either of the perf_callchain_user_X
> functions, and thus no user callgraph data will be collected for this
> sample period. An additional problem is that a fault is guaranteed
> to occur, and though it will be silently covered up, it wastes time
> and could perturb state.
> 
> In perf_callchain_user(), we ensure that %asi contains ASI_AIUS
> because we know for a fact that the subsequent calls to
> copy_from_user() are intended to read the user's stack.
> 
> Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
> Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>

Thanks for tracking this down, David Ahern and I spent quite some time
trying to figure this one out.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 24, 2015, 4:43 p.m. UTC | #2
From: Rob Gardner <rob.gardner@oracle.com>
Date: Tue, 22 Dec 2015 21:16:06 -0700

> When an interrupt (such as a perf counter interrupt) is delivered
> while executing in user space, the trap entry code puts ASI_AIUS in
> %asi so that copy_from_user() and copy_to_user() will access the
> correct memory. But if a perf counter interrupt is delivered while the
> cpu is already executing in kernel space, then the trap entry code
> will put ASI_P in %asi, and this will prevent copy_from_user() from
> reading any useful stack data in either of the perf_callchain_user_X
> functions, and thus no user callgraph data will be collected for this
> sample period. An additional problem is that a fault is guaranteed
> to occur, and though it will be silently covered up, it wastes time
> and could perturb state.
> 
> In perf_callchain_user(), we ensure that %asi contains ASI_AIUS
> because we know for a fact that the subsequent calls to
> copy_from_user() are intended to read the user's stack.
> 
> Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
> Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>

I applied this, but modified it slightly.

We have shorthand helpers "get_fs()" and "set_fs()" for doing this
kind of work, and as you can see in arch/sparc/kernel/process_64.c
and elsewhere this is the canonical way to adjust the %asi value in
these kinds of circumstances.

Also, set_fs() updates the thread's current_ds value properly.  Notice
that NGmemcpy.S uses the TI_CURRENT_DS value via the RESTORE_ASI()
macro.  So if that's not set properly, the %asi restore won't be done
properly.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Gardner Dec. 24, 2015, 5:39 p.m. UTC | #3
On 12/24/2015 09:43 AM, David Miller wrote:
> From: Rob Gardner <rob.gardner@oracle.com>
> Date: Tue, 22 Dec 2015 21:16:06 -0700
>
>> When an interrupt (such as a perf counter interrupt) is delivered
>> while executing in user space, the trap entry code puts ASI_AIUS in
>> %asi so that copy_from_user() and copy_to_user() will access the
>> correct memory. But if a perf counter interrupt is delivered while the
>> cpu is already executing in kernel space, then the trap entry code
>> will put ASI_P in %asi, and this will prevent copy_from_user() from
>> reading any useful stack data in either of the perf_callchain_user_X
>> functions, and thus no user callgraph data will be collected for this
>> sample period. An additional problem is that a fault is guaranteed
>> to occur, and though it will be silently covered up, it wastes time
>> and could perturb state.
>>
>> In perf_callchain_user(), we ensure that %asi contains ASI_AIUS
>> because we know for a fact that the subsequent calls to
>> copy_from_user() are intended to read the user's stack.
>>
>> Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
>> Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
> I applied this, but modified it slightly.
>
> We have shorthand helpers "get_fs()" and "set_fs()" for doing this
> kind of work, and as you can see in arch/sparc/kernel/process_64.c
> and elsewhere this is the canonical way to adjust the %asi value in
> these kinds of circumstances.
>
> Also, set_fs() updates the thread's current_ds value properly.  Notice
> that NGmemcpy.S uses the TI_CURRENT_DS value via the RESTORE_ASI()
> macro.  So if that's not set properly, the %asi restore won't be done
> properly.

Sorry I should have noted this in the log message, but we intentionally 
did not use get_fs() and set_fs() there because they are not safe to use 
in a "nested" interrupt context.  n.b. get_fs() is not guaranteed to 
report a value consistent with %asi while we're executing the perf 
interrupt handler, because it may have interrupted kernel code where 
%asi is inconsistent with the thread_info current_ds value. This is 
common, e.g. right in NGmemcpy.

Rob




--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 26, 2015, 4:25 a.m. UTC | #4
From: Rob Gardner <rob.gardner@oracle.com>
Date: Thu, 24 Dec 2015 10:39:05 -0700

> Sorry I should have noted this in the log message, but we
> intentionally did not use get_fs() and set_fs() there because they are
> not safe to use in a "nested" interrupt context.  n.b. get_fs() is not
> guaranteed to report a value consistent with %asi while we're
> executing the perf interrupt handler, because it may have interrupted
> kernel code where %asi is inconsistent with the thread_info current_ds
> value. This is common, e.g. right in NGmemcpy.

Is that a real problem?

The return from trap will restore the %asi register properly from
the %tstate register.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Gardner Dec. 26, 2015, 5:02 a.m. UTC | #5
On 12/25/2015 09:25 PM, David Miller wrote:
> From: Rob Gardner <rob.gardner@oracle.com>
> Date: Thu, 24 Dec 2015 10:39:05 -0700
>
>> Sorry I should have noted this in the log message, but we
>> intentionally did not use get_fs() and set_fs() there because they are
>> not safe to use in a "nested" interrupt context.  n.b. get_fs() is not
>> guaranteed to report a value consistent with %asi while we're
>> executing the perf interrupt handler, because it may have interrupted
>> kernel code where %asi is inconsistent with the thread_info current_ds
>> value. This is common, e.g. right in NGmemcpy.
> Is that a real problem?
>
> The return from trap will restore the %asi register properly from
> the %tstate register.
>

It's not a problem for the context that was interrupted since as you 
say, %asi will be restored properly. But in the perf interrupt context 
it seems a little messy to me because get_fs() can (and will) return a 
value inconsistent with %asi, so at the end of perf_callchain_user() 
when set_fs() is done, it will not truly restore %asi to what it was 
when the function began. So up until the return from trap, %asi contains 
an unintended value. I didn't track down all possibilities to see if 
anything bad might happen, but it just seemed wrong to leave things in 
that state.

Also, in the code we submitted, there was an optimization in which %asi 
is read, and then only set to ASI_AIUS if necessary. This drastically 
reduces the number of writes to the %asi register since most of the time 
%asi will contain ASI_AIUS. This seems like a reasonable optimization, 
since this function may be called thousands of times per second on every 
cpu. But this doesn't work at all using get_fs() since it is 
inconsistent with %asi, and that is why we went with the inline 
assembler to read and write %asi directly.

Merry Christmas.

Rob

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 26, 2015, 6:27 a.m. UTC | #6
From: Rob Gardner <rob.gardner@oracle.com>
Date: Fri, 25 Dec 2015 22:02:43 -0700

> Also, in the code we submitted, there was an optimization in which
> %asi is read, and then only set to ASI_AIUS if necessary. This
> drastically reduces the number of writes to the %asi register since
> most of the time %asi will contain ASI_AIUS. This seems like a
> reasonable optimization, since this function may be called thousands
> of times per second on every cpu.

I noticed the optimization.

If this was happening for every memcpy call, I'd say it's worth it.

But it's happening once for a series of memcpy/copy_from_user_inatomic()
calls so I'd say it's not really worth it.

So unless you can show me how the current version fails, I'm keeping it
as-is because either we should consistently use set_fs/get_fs in C
code rather than open coded inline asm, or we should create a well
documented set of helper functions for this specific situation and
_ALSO_ use it elsewhere where the same problems exist such as some
of the uses of set_fs/get_fs in process_64.c

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Gardner Dec. 26, 2015, 10:25 p.m. UTC | #7
On 12/25/2015 11:27 PM, David Miller wrote:
> From: Rob Gardner <rob.gardner@oracle.com>
> Date: Fri, 25 Dec 2015 22:02:43 -0700
>
>> Also, in the code we submitted, there was an optimization in which
>> %asi is read, and then only set to ASI_AIUS if necessary. This
>> drastically reduces the number of writes to the %asi register since
>> most of the time %asi will contain ASI_AIUS. This seems like a
>> reasonable optimization, since this function may be called thousands
>> of times per second on every cpu.
> I noticed the optimization.
>
> If this was happening for every memcpy call, I'd say it's worth it.
>
> But it's happening once for a series of memcpy/copy_from_user_inatomic()
> calls so I'd say it's not really worth it.
>
> So unless you can show me how the current version fails, I'm keeping it
> as-is because either we should consistently use set_fs/get_fs in C
> code rather than open coded inline asm, or we should create a well
> documented set of helper functions for this specific situation and
> _ALSO_ use it elsewhere where the same problems exist such as some
> of the uses of set_fs/get_fs in process_64.c
>

Fair enough. You've convinced me that my worries are unfounded. Let's 
consider the matter settled. Thanks.

Rob

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 689db65..55a6caa 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1810,11 +1810,19 @@  static void perf_callchain_user_32(struct perf_callchain_entry *entry,
 void
 perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
 {
+	u8  saved_asi;
+
 	perf_callchain_store(entry, regs->tpc);
 
 	if (!current->mm)
 		return;
 
+	/* ensure we get user memory when trying to read stacks */
+	__asm__ __volatile__ ("rd %%asi, %0\n" : "=r" (saved_asi));
+	if (saved_asi != ASI_AIUS)
+		__asm__ __volatile__ (
+			"wr %%g0, %0, %%asi\n" : : "i" (ASI_AIUS));
+
 	flushw_user();
 
 	pagefault_disable();
@@ -1825,4 +1833,8 @@  perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
 		perf_callchain_user_64(entry, regs);
 
 	pagefault_enable();
+
+	if (saved_asi != ASI_AIUS)
+		__asm__ __volatile__ (
+			"wr %%g0, %0, %%asi\n" : : "r" (saved_asi));
 }