diff mbox series

random32: Use rcuidle variant for tracepoint

Message ID 20200821063043.1949509-1-elver@google.com
State Not Applicable
Delegated to: David Miller
Headers show
Series random32: Use rcuidle variant for tracepoint | expand

Commit Message

Marco Elver Aug. 21, 2020, 6:30 a.m. UTC
With KCSAN enabled, prandom_u32() may be called from any context,
including idle CPUs.

Therefore, switch to using trace_prandom_u32_rcuidle(), to avoid various
issues due to recursion and lockdep warnings when KCSAN and tracing is
enabled.

Fixes: 94c7eb54c4b8 ("random32: add a tracepoint for prandom_u32()")
Link: https://lkml.kernel.org/r/20200820155923.3d5c4873@oasis.local.home
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Marco Elver <elver@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 lib/random32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Marco Elver Aug. 21, 2020, 8:58 a.m. UTC | #1
On Fri, 21 Aug 2020 at 08:30, Marco Elver <elver@google.com> wrote:
> With KCSAN enabled, prandom_u32() may be called from any context,
> including idle CPUs.
>
> Therefore, switch to using trace_prandom_u32_rcuidle(), to avoid various
> issues due to recursion and lockdep warnings when KCSAN and tracing is
> enabled.
>
> Fixes: 94c7eb54c4b8 ("random32: add a tracepoint for prandom_u32()")
> Link: https://lkml.kernel.org/r/20200820155923.3d5c4873@oasis.local.home
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Marco Elver <elver@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  lib/random32.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/random32.c b/lib/random32.c
> index 932345323af0..1c5607a411d4 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -83,7 +83,7 @@ u32 prandom_u32(void)
>         u32 res;
>
>         res = prandom_u32_state(state);
> -       trace_prandom_u32(res);
> +       trace_prandom_u32_rcuidle(res);
>         put_cpu_var(net_rand_state);
>

Note that, for KCSAN's use, there is still a small chance this will
mess things up. So, as a short-term fix, I'd appreciate if this patch
will be applied to 5.9, assuming it doesn't mess with the original
usecase.

Longer-term, I think I need to switch KCSAN to use either
prandom_u32_state() or open-code its own prandom function, as the
trace-rcuidle code calls into srcu, which might have other weird
effects if KCSAN instruments that. Still working on the longer-term
fix.

Thanks,
-- Marco
Peter Zijlstra Aug. 21, 2020, 8:59 a.m. UTC | #2
On Fri, Aug 21, 2020 at 08:30:43AM +0200, Marco Elver wrote:
> With KCSAN enabled, prandom_u32() may be called from any context,
> including idle CPUs.
> 
> Therefore, switch to using trace_prandom_u32_rcuidle(), to avoid various
> issues due to recursion and lockdep warnings when KCSAN and tracing is
> enabled.

At some point we're going to have to introduce noinstr to idle as well.
But until that time this should indeed cure things.
Eric Dumazet Aug. 21, 2020, 3:06 p.m. UTC | #3
On Fri, Aug 21, 2020 at 1:59 AM <peterz@infradead.org> wrote:
>
> On Fri, Aug 21, 2020 at 08:30:43AM +0200, Marco Elver wrote:
> > With KCSAN enabled, prandom_u32() may be called from any context,
> > including idle CPUs.
> >
> > Therefore, switch to using trace_prandom_u32_rcuidle(), to avoid various
> > issues due to recursion and lockdep warnings when KCSAN and tracing is
> > enabled.
>
> At some point we're going to have to introduce noinstr to idle as well.
> But until that time this should indeed cure things.

I do not understand what the issue is.  This _rcuidle() is kind of opaque ;)

Would this alternative patch work, or is it something more fundamental ?

Thanks !

diff --git a/lib/random32.c b/lib/random32.c
index 932345323af092a93fc2690b0ebbf4f7485ae4f3..17af2d1631e5ab6e02ad1e9288af7e007bed6d5f
100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -83,9 +83,10 @@ u32 prandom_u32(void)
        u32 res;

        res = prandom_u32_state(state);
-       trace_prandom_u32(res);
        put_cpu_var(net_rand_state);

+       trace_prandom_u32(res);
+
        return res;
 }
 EXPORT_SYMBOL(prandom_u32);
Marco Elver Aug. 21, 2020, 3:35 p.m. UTC | #4
On Fri, Aug 21, 2020 at 08:06AM -0700, Eric Dumazet wrote:
> On Fri, Aug 21, 2020 at 1:59 AM <peterz@infradead.org> wrote:
> >
> > On Fri, Aug 21, 2020 at 08:30:43AM +0200, Marco Elver wrote:
> > > With KCSAN enabled, prandom_u32() may be called from any context,
> > > including idle CPUs.
> > >
> > > Therefore, switch to using trace_prandom_u32_rcuidle(), to avoid various
> > > issues due to recursion and lockdep warnings when KCSAN and tracing is
> > > enabled.
> >
> > At some point we're going to have to introduce noinstr to idle as well.
> > But until that time this should indeed cure things.
> 
> I do not understand what the issue is.  This _rcuidle() is kind of opaque ;)
>
> Would this alternative patch work, or is it something more fundamental ?

There are 2 problems:

1. Recursion due to ending up in lockdep from the tracepoint. I need to
solve this either way. One way is to use _rcuidle() variant, which
doesn't call into lockdep.

2. Somehow running into trouble because we use tracing from an idle CPU.
At least that's what I gathered from the documentation -- but you'd have
to wait for Peter or Steven to get a better explanation.

> Thanks !
> 
> diff --git a/lib/random32.c b/lib/random32.c
> index 932345323af092a93fc2690b0ebbf4f7485ae4f3..17af2d1631e5ab6e02ad1e9288af7e007bed6d5f
> 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -83,9 +83,10 @@ u32 prandom_u32(void)
>         u32 res;
> 
>         res = prandom_u32_state(state);
> -       trace_prandom_u32(res);
>         put_cpu_var(net_rand_state);
> 
> +       trace_prandom_u32(res);
> +
>         return res;
>  }
>  EXPORT_SYMBOL(prandom_u32);

That unfortunately still gets me the same warning:

| ------------[ cut here ]------------
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 4 PID: 1861 at kernel/locking/lockdep.c:4875 check_flags.part.0+0x157/0x160 kernel/locking/lockdep.c:4875
| Modules linked in:
| CPU: 4 PID: 1861 Comm: kworker/u16:4 Not tainted 5.9.0-rc1+ #24
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
| RIP: 0010:check_flags.part.0+0x157/0x160 kernel/locking/lockdep.c:4875
| Code: c0 0f 84 70 5d 00 00 44 8b 0d fd 11 5f 06 45 85 c9 0f 85 60 5d 00 00 48 c7 c6 3e d0 f4 86 48 c7 c7 b2 49 f3 86 e8 8d 49 f6 ff <0f> 0b e9 46 5d 00 00 66 90 41 57 41 56 49 89 fe 41 55 41 89 d5 41
| RSP: 0000:ffffc900034bfcb0 EFLAGS: 00010082
| RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8136161c
| RDX: ffff88881a9dcb00 RSI: ffffffff81363835 RDI: 0000000000000006
| RBP: ffffc900034bfd00 R08: 0000000000000000 R09: 0000ffffffffffff
| R10: 0000000000000104 R11: 0000ffff874efd6b R12: ffffffff874f26c0
| R13: 0000000000000244 R14: 0000000000000000 R15: 0000000000000046
| FS:  0000000000000000(0000) GS:ffff88881fc00000(0000) knlGS:0000000000000000
| CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 0000000000000000 CR3: 0000000007489001 CR4: 0000000000770ee0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
| PKRU: 55555554
| Call Trace:
|  check_flags kernel/locking/lockdep.c:4871 [inline]
|  lock_is_held_type+0x42/0x100 kernel/locking/lockdep.c:5042
|  lock_is_held include/linux/lockdep.h:267 [inline]
|  rcu_read_lock_sched_held+0x41/0x80 kernel/rcu/update.c:136
|  trace_prandom_u32 include/trace/events/random.h:310 [inline]
|  prandom_u32+0x1bb/0x200 lib/random32.c:86
|  prandom_u32_max include/linux/prandom.h:46 [inline]
|  reset_kcsan_skip kernel/kcsan/core.c:277 [inline]
|  kcsan_setup_watchpoint+0x9b/0x600 kernel/kcsan/core.c:424
|  perf_lock_task_context+0x5e3/0x6e0 kernel/events/core.c:1491
|  perf_pin_task_context kernel/events/core.c:1506 [inline]
|  perf_event_exit_task_context kernel/events/core.c:12284 [inline]
|  perf_event_exit_task+0x1e2/0x910 kernel/events/core.c:12364
|  do_exit+0x70e/0x18b0 kernel/exit.c:815
|  call_usermodehelper_exec_async+0x2e2/0x2f0 kernel/umh.c:114
|  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
| irq event stamp: 107
| hardirqs last  enabled at (107): [<ffffffff815532ab>] perf_lock_task_context+0x5db/0x6e0 kernel/events/core.c:1491
| hardirqs last disabled at (106): [<ffffffff81552f12>] perf_lock_task_context+0x242/0x6e0 kernel/events/core.c:1459
| softirqs last  enabled at (0): [<ffffffff8129b95e>] copy_process+0xe9e/0x3970 kernel/fork.c:2004
| softirqs last disabled at (0): [<0000000000000000>] 0x0
| ---[ end trace a3058d9b157af5c4 ]---
| possible reason: unannotated irqs-off.
| irq event stamp: 107
| hardirqs last  enabled at (107): [<ffffffff815532ab>] perf_lock_task_context+0x5db/0x6e0 kernel/events/core.c:1491
| hardirqs last disabled at (106): [<ffffffff81552f12>] perf_lock_task_context+0x242/0x6e0 kernel/events/core.c:1459
| softirqs last  enabled at (0): [<ffffffff8129b95e>] copy_process+0xe9e/0x3970 kernel/fork.c:2004
| softirqs last disabled at (0): [<0000000000000000>] 0x0

I also have a patch which avoids the problem entirely by not using
prandom_u32(): https://lkml.kernel.org/r/20200821123126.3121494-1-elver@google.com
But that patch will likely only make it into the next merge window
(because of other conflicts).

So, if the _rcuidle() variant here doesn't break your usecase, there
should be no harm in using the _rcuidle() variant. This also lifts the
restriction on where prandom_u32() is usable to what it was before,
which should be any context.

Steven, Peter: What's the downside to of _rcuidle()?

Thanks,
-- Marco
Steven Rostedt Aug. 21, 2020, 3:38 p.m. UTC | #5
On Fri, 21 Aug 2020 08:06:49 -0700
Eric Dumazet <edumazet@google.com> wrote:

> On Fri, Aug 21, 2020 at 1:59 AM <peterz@infradead.org> wrote:
> >
> > On Fri, Aug 21, 2020 at 08:30:43AM +0200, Marco Elver wrote:  
> > > With KCSAN enabled, prandom_u32() may be called from any context,
> > > including idle CPUs.
> > >
> > > Therefore, switch to using trace_prandom_u32_rcuidle(), to avoid various
> > > issues due to recursion and lockdep warnings when KCSAN and tracing is
> > > enabled.  
> >
> > At some point we're going to have to introduce noinstr to idle as well.
> > But until that time this should indeed cure things.  
> 
> I do not understand what the issue is.  This _rcuidle() is kind of opaque ;)

kasan can be called when RCU is not "watching". That is, in the idle
code, RCU will stop bothering idle CPUs by checking on them to move
along the grace period. Just before going to idle, RCU will just set
that its in a quiescent state. The issue is, after RCU has basically
shutdown, and before getting to where the CPU is "sleeping", kasan is
called, and kasan call a tracepoint. The problem is that tracepoints
are protected by RCU. If RCU has shutdown, then it loses the
protection. There's code to detect this and give a warning.

All tracepoints have a _rcuidle() version. What this does is adds a
little bit more overhead to the tracepoint when enabled to check if RCU
is watching or not. If it is not watching, it tells RCU to start
watching again while it runs the tracepoint, and afterward it lets RCU
know that it can go back to not watching.

> 
> Would this alternative patch work, or is it something more fundamental ?

As I hope the above explained. The answer is "no".

-- Steve

> 
> Thanks !
> 
> diff --git a/lib/random32.c b/lib/random32.c
> index 932345323af092a93fc2690b0ebbf4f7485ae4f3..17af2d1631e5ab6e02ad1e9288af7e007bed6d5f
> 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -83,9 +83,10 @@ u32 prandom_u32(void)
>         u32 res;
> 
>         res = prandom_u32_state(state);
> -       trace_prandom_u32(res);
>         put_cpu_var(net_rand_state);
> 
> +       trace_prandom_u32(res);
> +
>         return res;
>  }
>  EXPORT_SYMBOL(prandom_u32);
Steven Rostedt Aug. 21, 2020, 3:41 p.m. UTC | #6
On Fri, 21 Aug 2020 11:38:31 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > > At some point we're going to have to introduce noinstr to idle as well.
> > > But until that time this should indeed cure things.    
> > 

What the above means, is that ideally we will get rid of all
tracepoints and kasan checks from these RCU not watching locations. But
to do so, we need to move the RCU not watching as close as possible to
where it doesn't need to be watching, and that is not as trivial of a
task as one might think. Once we get to a minimal code path for RCU not
to be watching, it will become "noinstr" and tracing and "debugging"
will be disabled in these sections.

Peter, please correct the above if it's not accurate.

-- Steve
Peter Zijlstra Aug. 21, 2020, 6:38 p.m. UTC | #7
On Fri, Aug 21, 2020 at 11:41:41AM -0400, Steven Rostedt wrote:
> On Fri, 21 Aug 2020 11:38:31 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > > > At some point we're going to have to introduce noinstr to idle as well.
> > > > But until that time this should indeed cure things.    
> > > 
> 
> What the above means, is that ideally we will get rid of all
> tracepoints and kasan checks from these RCU not watching locations. But

s/and kasan//

We only need to get rid of explicit tracepoints -- typically just move
them outside of the rcu_idle_enter/exit section.

> to do so, we need to move the RCU not watching as close as possible to
> where it doesn't need to be watching, and that is not as trivial of a
> task as one might think.

My recent patch series got a fair way towards that, but yes, there's
more work to be done still.

> Once we get to a minimal code path for RCU not
> to be watching, it will become "noinstr" and tracing and "debugging"
> will be disabled in these sections.

Right, noinstr is like notrace on steriods, not only does it disallow
tracing, it will also disable all the various *SAN/KCOV instrumentation.
It also has objtool based validation which ensures noinstr code doesn't
call out to regular code.
Steven Rostedt Sept. 18, 2020, 1:21 a.m. UTC | #8
[ Late reply, due to Plumbers followed by a much needed vacation and
  then drowning in 3 weeks of unread email! ]

On Fri, 21 Aug 2020 17:35:32 +0200
Marco Elver <elver@google.com> wrote:

> So, if the _rcuidle() variant here doesn't break your usecase, there
> should be no harm in using the _rcuidle() variant. This also lifts the
> restriction on where prandom_u32() is usable to what it was before,
> which should be any context.
> 
> Steven, Peter: What's the downside to of _rcuidle()?

_rcuidle() only has a slightly more overhead in the tracing path (it's
no different when not tracing). There's not a issue with _rcuidle()
itself. The issue is that we need to have it. We'd like it to be that
rcu *is* watching always except for a very minimal locations when
switching context (kernel to and from user and running to and from
idle), and then we just don't let tracing or anything that needs rcu in
those locations.

But for your patch:

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve
diff mbox series

Patch

diff --git a/lib/random32.c b/lib/random32.c
index 932345323af0..1c5607a411d4 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -83,7 +83,7 @@  u32 prandom_u32(void)
 	u32 res;
 
 	res = prandom_u32_state(state);
-	trace_prandom_u32(res);
+	trace_prandom_u32_rcuidle(res);
 	put_cpu_var(net_rand_state);
 
 	return res;