Message ID | 20201117211836.54acaef2@oasis.local.home |
---|---|
State | Superseded |
Headers | show |
Series | [v2] tracepoint: Do not fail unregistering a probe due to memory allocation | expand |
On Tue, Nov 17, 2020 at 6:18 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org> > > The list of tracepoint callbacks is managed by an array that is protected > by RCU. To update this array, a new array is allocated, the updates are > copied over to the new array, and then the list of functions for the > tracepoint is switched over to the new array. After a completion of an RCU > grace period, the old array is freed. > > This process happens for both adding a callback as well as removing one. > But on removing a callback, if the new array fails to be allocated, the > callback is not removed, and may be used after it is freed by the clients > of the tracepoint. > > There's really no reason to fail if the allocation for a new array fails > when removing a function. Instead, the function can simply be replaced by a > stub that will be ignored in the callback loop, and it will be cleaned up > on the next modification of the array. > > Link: https://lore.kernel.org/r/20201115055256.65625-1-mmullins@mmlx.us > Link: https://lkml.kernel.org/r/20201116175107.02db396d@gandalf.local.home > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Alexei Starovoitov <ast@kernel.org> > Cc: Daniel Borkmann <daniel@iogearbox.net> > Cc: Dmitry Vyukov <dvyukov@google.com> > Cc: Martin KaFai Lau <kafai@fb.com> > Cc: Song Liu <songliubraving@fb.com> > Cc: Yonghong Song <yhs@fb.com> > Cc: Andrii Nakryiko <andriin@fb.com> > Cc: John Fastabend <john.fastabend@gmail.com> > Cc: KP Singh <kpsingh@chromium.org> > Cc: netdev <netdev@vger.kernel.org> > Cc: bpf <bpf@vger.kernel.org> > Cc: Kees Cook <keescook@chromium.org> > Cc: stable@vger.kernel.org > Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints") > Reported-by: syzbot+83aa762ef23b6f0d1991@syzkaller.appspotmail.com > Reported-by: syzbot+d29e58bb557324e55e5e@syzkaller.appspotmail.com > Reported-by: Matt Mullins <mmullins@mmlx.us> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > --- > Changes since v1: > Use 1L value for stub function, and ignore calling it. > > include/linux/tracepoint.h | 9 ++++- > kernel/tracepoint.c | 80 +++++++++++++++++++++++++++++--------- > 2 files changed, 69 insertions(+), 20 deletions(-) > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index 0f21617f1a66..2e06e05b9d2a 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -33,6 +33,8 @@ struct trace_eval_map { > > #define TRACEPOINT_DEFAULT_PRIO 10 > > +#define TRACEPOINT_STUB ((void *)0x1L) > + > extern struct srcu_struct tracepoint_srcu; > > extern int > @@ -310,7 +312,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > do { \ > it_func = (it_func_ptr)->func; \ > __data = (it_func_ptr)->data; \ > - ((void(*)(void *, proto))(it_func))(__data, args); \ > + /* \ > + * Removed functions that couldn't be allocated \ > + * are replaced with TRACEPOINT_STUB. \ > + */ \ > + if (likely(it_func != TRACEPOINT_STUB)) \ > + ((void(*)(void *, proto))(it_func))(__data, args); \ I think you're overreacting to the problem. Adding run-time check to extremely unlikely problem seems wasteful. 99.9% of the time allocate_probes() will do kmalloc from slab of small objects. If that slab is out of memory it means it cannot allocate a single page. In such case so many things will be failing to alloc that system is unlikely operational. oom should have triggered long ago. Imo Matt's approach to add __GFP_NOFAIL to allocate_probes() when it's called from func_remove() is much better. The error was reported by syzbot that was using memory fault injections. ENOMEM in allocate_probes() was never seen in real life and highly unlikely will ever be seen.
On Tue, 17 Nov 2020 20:54:24 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > extern int > > @@ -310,7 +312,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > > do { \ > > it_func = (it_func_ptr)->func; \ > > __data = (it_func_ptr)->data; \ > > - ((void(*)(void *, proto))(it_func))(__data, args); \ > > + /* \ > > + * Removed functions that couldn't be allocated \ > > + * are replaced with TRACEPOINT_STUB. \ > > + */ \ > > + if (likely(it_func != TRACEPOINT_STUB)) \ > > + ((void(*)(void *, proto))(it_func))(__data, args); \ > > I think you're overreacting to the problem. I will disagree with that. > Adding run-time check to extremely unlikely problem seems wasteful. Show me a real benchmark that you can notice a problem here. I bet that check is even within the noise of calling an indirect function. Especially on a machine with retpolines. > 99.9% of the time allocate_probes() will do kmalloc from slab of small > objects. > If that slab is out of memory it means it cannot allocate a single page. > In such case so many things will be failing to alloc that system > is unlikely operational. oom should have triggered long ago. > Imo Matt's approach to add __GFP_NOFAIL to allocate_probes() Looking at the GFP_NOFAIL comment: * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller * cannot handle allocation failures. The allocation could block * indefinitely but will never return with failure. Testing for * failure is pointless. * New users should be evaluated carefully (and the flag should be * used only when there is no reasonable failure policy) but it is * definitely preferable to use the flag rather than opencode endless * loop around allocator. I realized I made a mistake in my patch for using it, as my patch is a failure policy. It looks like something we want to avoid in general. Thanks, I'll send a v3 that removes it. > when it's called from func_remove() is much better. > The error was reported by syzbot that was using > memory fault injections. ENOMEM in allocate_probes() was > never seen in real life and highly unlikely will ever be seen. And the biggest thing you are missing here, is that if you are running on a machine that has static calls, this code is never hit unless you have more than one callback on a single tracepoint. That's because when there's only one callback, it gets called directly, and this loop is not involved. -- Steve
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 0f21617f1a66..2e06e05b9d2a 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -33,6 +33,8 @@ struct trace_eval_map { #define TRACEPOINT_DEFAULT_PRIO 10 +#define TRACEPOINT_STUB ((void *)0x1L) + extern struct srcu_struct tracepoint_srcu; extern int @@ -310,7 +312,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) do { \ it_func = (it_func_ptr)->func; \ __data = (it_func_ptr)->data; \ - ((void(*)(void *, proto))(it_func))(__data, args); \ + /* \ + * Removed functions that couldn't be allocated \ + * are replaced with TRACEPOINT_STUB. \ + */ \ + if (likely(it_func != TRACEPOINT_STUB)) \ + ((void(*)(void *, proto))(it_func))(__data, args); \ } while ((++it_func_ptr)->func); \ return 0; \ } \ diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 3f659f855074..20ce78b3f578 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -53,10 +53,10 @@ struct tp_probes { struct tracepoint_func probes[]; }; -static inline void *allocate_probes(int count) +static inline void *allocate_probes(int count, gfp_t extra_flags) { struct tp_probes *p = kmalloc(struct_size(p, probes, count), - GFP_KERNEL); + GFP_KERNEL | extra_flags); return p == NULL ? NULL : p->probes; } @@ -131,6 +131,7 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func, { struct tracepoint_func *old, *new; int nr_probes = 0; + int stub_funcs = 0; int pos = -1; if (WARN_ON(!tp_func->func)) @@ -147,14 +148,34 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func, if (old[nr_probes].func == tp_func->func && old[nr_probes].data == tp_func->data) return ERR_PTR(-EEXIST); + if (old[nr_probes].func == TRACEPOINT_STUB) + stub_funcs++; } } - /* + 2 : one for new probe, one for NULL func */ - new = allocate_probes(nr_probes + 2); + /* + 2 : one for new probe, one for NULL func - stub functions */ + new = allocate_probes(nr_probes + 2 - stub_funcs, 0); if (new == NULL) return ERR_PTR(-ENOMEM); if (old) { - if (pos < 0) { + if (stub_funcs) { + /* Need to copy one at a time to remove stubs */ + int probes = 0; + + pos = -1; + for (nr_probes = 0; old[nr_probes].func; nr_probes++) { + if (old[nr_probes].func == TRACEPOINT_STUB) + continue; + if (pos < 0 && old[nr_probes].prio < prio) + pos = probes++; + new[probes++] = old[nr_probes]; + } + nr_probes = probes; + if (pos < 0) + pos = probes; + else + nr_probes--; /* Account for insertion */ + + } else if (pos < 0) { pos = nr_probes; memcpy(new, old, nr_probes * sizeof(struct tracepoint_func)); } else { @@ -188,8 +209,9 @@ static void *func_remove(struct tracepoint_func **funcs, /* (N -> M), (N > 1, M >= 0) probes */ if (tp_func->func) { for (nr_probes = 0; old[nr_probes].func; nr_probes++) { - if (old[nr_probes].func == tp_func->func && - old[nr_probes].data == tp_func->data) + if ((old[nr_probes].func == tp_func->func && + old[nr_probes].data == tp_func->data) || + old[nr_probes].func == TRACEPOINT_STUB) nr_del++; } } @@ -207,15 +229,33 @@ static void *func_remove(struct tracepoint_func **funcs, int j = 0; /* N -> M, (N > 1, M > 0) */ /* + 1 for NULL */ - new = allocate_probes(nr_probes - nr_del + 1); - if (new == NULL) - return ERR_PTR(-ENOMEM); - for (i = 0; old[i].func; i++) - if (old[i].func != tp_func->func - || old[i].data != tp_func->data) - new[j++] = old[i]; - new[nr_probes - nr_del].func = NULL; - *funcs = new; + new = allocate_probes(nr_probes - nr_del + 1, __GFP_NOFAIL); + if (new) { + for (i = 0; old[i].func; i++) + if ((old[i].func != tp_func->func + || old[i].data != tp_func->data) + && old[i].func != TRACEPOINT_STUB) + new[j++] = old[i]; + new[nr_probes - nr_del].func = NULL; + *funcs = new; + } else { + /* + * Failed to allocate, replace the old function + * with TRACEPOINT_STUB. + */ + for (i = 0; old[i].func; i++) + if (old[i].func == tp_func->func && + old[i].data == tp_func->data) { + old[i].func = TRACEPOINT_STUB; + /* Set the prio to the next event. */ + if (old[i + 1].func) + old[i].prio = + old[i + 1].prio; + else + old[i].prio = -1; + } + *funcs = old; + } } debug_print_probes(*funcs); return old; @@ -295,10 +335,12 @@ static int tracepoint_remove_func(struct tracepoint *tp, tp_funcs = rcu_dereference_protected(tp->funcs, lockdep_is_held(&tracepoints_mutex)); old = func_remove(&tp_funcs, func); - if (IS_ERR(old)) { - WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM); + if (WARN_ON_ONCE(IS_ERR(old))) return PTR_ERR(old); - } + + if (tp_funcs == old) + /* Failed allocating new tp_funcs, replaced func with stub */ + return 0; if (!tp_funcs) { /* Removed last function */