Message ID | e54f2aabf959f298939e5507b09c48f8c2e380be.1595170625.git.lorenzo@kernel.org |
---|---|
State | Accepted |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf-next] bpf: cpumap: fix possible rcpu kthread hung | expand |
On Sun, Jul 19, 2020 at 05:52 PM CEST, Lorenzo Bianconi wrote: > Fix the following cpumap kthread hung. The issue is currently occurring > when __cpu_map_load_bpf_program fails (e.g if the bpf prog has not > BPF_XDP_CPUMAP as expected_attach_type) > > $./test_progs -n 101 > 101/1 cpumap_with_progs:OK > 101 xdp_cpumap_attach:OK > Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED > [ 369.996478] INFO: task cpumap/0/map:7:205 blocked for more than 122 seconds. > [ 369.998463] Not tainted 5.8.0-rc4-01472-ge57892f50a07 #212 > [ 370.000102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 370.001918] cpumap/0/map:7 D 0 205 2 0x00004000 > [ 370.003228] Call Trace: > [ 370.003930] __schedule+0x5c7/0xf50 > [ 370.004901] ? io_schedule_timeout+0xb0/0xb0 > [ 370.005934] ? static_obj+0x31/0x80 > [ 370.006788] ? mark_held_locks+0x24/0x90 > [ 370.007752] ? cpu_map_bpf_prog_run_xdp+0x6c0/0x6c0 > [ 370.008930] schedule+0x6f/0x160 > [ 370.009728] schedule_preempt_disabled+0x14/0x20 > [ 370.010829] kthread+0x17b/0x240 > [ 370.011433] ? kthread_create_worker_on_cpu+0xd0/0xd0 > [ 370.011944] ret_from_fork+0x1f/0x30 > [ 370.012348] > Showing all locks held in the system: > [ 370.013025] 1 lock held by khungtaskd/33: > [ 370.013432] #0: ffffffff82b24720 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x28/0x1c3 > > [ 370.014461] ============================================= > > Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") > Reported-by: Jakub Sitnicki <jakub@cloudflare.com> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > --- > kernel/bpf/cpumap.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c > index 4c95d0615ca2..f1c46529929b 100644 > --- a/kernel/bpf/cpumap.c > +++ b/kernel/bpf/cpumap.c > @@ -453,24 +453,27 @@ __cpu_map_entry_alloc(struct bpf_cpumap_val *value, u32 cpu, int map_id) > rcpu->map_id = map_id; > rcpu->value.qsize = value->qsize; > > + if (fd > 0 && __cpu_map_load_bpf_program(rcpu, fd)) > + goto free_ptr_ring; > + I realize it's a code move, but fd == 0 is a valid descriptor number. The check is too strict, IMHO. > /* Setup kthread */ > rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa, > "cpumap/%d/map:%d", cpu, map_id); > if (IS_ERR(rcpu->kthread)) > - goto free_ptr_ring; > + goto free_prog; > > get_cpu_map_entry(rcpu); /* 1-refcnt for being in cmap->cpu_map[] */ > get_cpu_map_entry(rcpu); /* 1-refcnt for kthread */ > > - if (fd > 0 && __cpu_map_load_bpf_program(rcpu, fd)) > - goto free_ptr_ring; > - > /* Make sure kthread runs on a single CPU */ > kthread_bind(rcpu->kthread, cpu); > wake_up_process(rcpu->kthread); > > return rcpu; > > +free_prog: > + if (rcpu->prog) > + bpf_prog_put(rcpu->prog); > free_ptr_ring: > ptr_ring_cleanup(rcpu->queue, NULL); > free_queue: Hung task splat is gone: Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
On 7/20/20 3:14 AM, Jakub Sitnicki wrote:
> I realize it's a code move, but fd == 0 is a valid descriptor number.
this follows the decision made for devmap entries in that fd == 0 is NOT
a valid program fd.
On Mon, Jul 20, 2020 at 05:45 PM CEST, David Ahern wrote: > On 7/20/20 3:14 AM, Jakub Sitnicki wrote: >> I realize it's a code move, but fd == 0 is a valid descriptor number. > > this follows the decision made for devmap entries in that fd == 0 is NOT > a valid program fd. Surprising. Thanks for clarifying.
On Sun, Jul 19, 2020 at 05:52 PM CEST, Lorenzo Bianconi wrote: > Fix the following cpumap kthread hung. The issue is currently occurring > when __cpu_map_load_bpf_program fails (e.g if the bpf prog has not > BPF_XDP_CPUMAP as expected_attach_type) > > $./test_progs -n 101 > 101/1 cpumap_with_progs:OK > 101 xdp_cpumap_attach:OK > Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED > [ 369.996478] INFO: task cpumap/0/map:7:205 blocked for more than 122 seconds. > [ 369.998463] Not tainted 5.8.0-rc4-01472-ge57892f50a07 #212 > [ 370.000102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 370.001918] cpumap/0/map:7 D 0 205 2 0x00004000 > [ 370.003228] Call Trace: > [ 370.003930] __schedule+0x5c7/0xf50 > [ 370.004901] ? io_schedule_timeout+0xb0/0xb0 > [ 370.005934] ? static_obj+0x31/0x80 > [ 370.006788] ? mark_held_locks+0x24/0x90 > [ 370.007752] ? cpu_map_bpf_prog_run_xdp+0x6c0/0x6c0 > [ 370.008930] schedule+0x6f/0x160 > [ 370.009728] schedule_preempt_disabled+0x14/0x20 > [ 370.010829] kthread+0x17b/0x240 > [ 370.011433] ? kthread_create_worker_on_cpu+0xd0/0xd0 > [ 370.011944] ret_from_fork+0x1f/0x30 > [ 370.012348] > Showing all locks held in the system: > [ 370.013025] 1 lock held by khungtaskd/33: > [ 370.013432] #0: ffffffff82b24720 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x28/0x1c3 > > [ 370.014461] ============================================= > > Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") > Reported-by: Jakub Sitnicki <jakub@cloudflare.com> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > --- Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
On Tue, Jul 21, 2020 at 3:26 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > On Sun, Jul 19, 2020 at 05:52 PM CEST, Lorenzo Bianconi wrote: > > Fix the following cpumap kthread hung. The issue is currently occurring > > when __cpu_map_load_bpf_program fails (e.g if the bpf prog has not > > BPF_XDP_CPUMAP as expected_attach_type) > > > > $./test_progs -n 101 > > 101/1 cpumap_with_progs:OK > > 101 xdp_cpumap_attach:OK > > Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED > > [ 369.996478] INFO: task cpumap/0/map:7:205 blocked for more than 122 seconds. > > [ 369.998463] Not tainted 5.8.0-rc4-01472-ge57892f50a07 #212 > > [ 370.000102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 370.001918] cpumap/0/map:7 D 0 205 2 0x00004000 > > [ 370.003228] Call Trace: > > [ 370.003930] __schedule+0x5c7/0xf50 > > [ 370.004901] ? io_schedule_timeout+0xb0/0xb0 > > [ 370.005934] ? static_obj+0x31/0x80 > > [ 370.006788] ? mark_held_locks+0x24/0x90 > > [ 370.007752] ? cpu_map_bpf_prog_run_xdp+0x6c0/0x6c0 > > [ 370.008930] schedule+0x6f/0x160 > > [ 370.009728] schedule_preempt_disabled+0x14/0x20 > > [ 370.010829] kthread+0x17b/0x240 > > [ 370.011433] ? kthread_create_worker_on_cpu+0xd0/0xd0 > > [ 370.011944] ret_from_fork+0x1f/0x30 > > [ 370.012348] > > Showing all locks held in the system: > > [ 370.013025] 1 lock held by khungtaskd/33: > > [ 370.013432] #0: ffffffff82b24720 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x28/0x1c3 > > > > [ 370.014461] ============================================= > > > > Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") > > Reported-by: Jakub Sitnicki <jakub@cloudflare.com> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > --- > > Tested-by: Jakub Sitnicki <jakub@cloudflare.com> > Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Applied. Thanks
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 4c95d0615ca2..f1c46529929b 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -453,24 +453,27 @@ __cpu_map_entry_alloc(struct bpf_cpumap_val *value, u32 cpu, int map_id) rcpu->map_id = map_id; rcpu->value.qsize = value->qsize; + if (fd > 0 && __cpu_map_load_bpf_program(rcpu, fd)) + goto free_ptr_ring; + /* Setup kthread */ rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa, "cpumap/%d/map:%d", cpu, map_id); if (IS_ERR(rcpu->kthread)) - goto free_ptr_ring; + goto free_prog; get_cpu_map_entry(rcpu); /* 1-refcnt for being in cmap->cpu_map[] */ get_cpu_map_entry(rcpu); /* 1-refcnt for kthread */ - if (fd > 0 && __cpu_map_load_bpf_program(rcpu, fd)) - goto free_ptr_ring; - /* Make sure kthread runs on a single CPU */ kthread_bind(rcpu->kthread, cpu); wake_up_process(rcpu->kthread); return rcpu; +free_prog: + if (rcpu->prog) + bpf_prog_put(rcpu->prog); free_ptr_ring: ptr_ring_cleanup(rcpu->queue, NULL); free_queue:
Fix the following cpumap kthread hung. The issue is currently occurring when __cpu_map_load_bpf_program fails (e.g if the bpf prog has not BPF_XDP_CPUMAP as expected_attach_type) $./test_progs -n 101 101/1 cpumap_with_progs:OK 101 xdp_cpumap_attach:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED [ 369.996478] INFO: task cpumap/0/map:7:205 blocked for more than 122 seconds. [ 369.998463] Not tainted 5.8.0-rc4-01472-ge57892f50a07 #212 [ 370.000102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 370.001918] cpumap/0/map:7 D 0 205 2 0x00004000 [ 370.003228] Call Trace: [ 370.003930] __schedule+0x5c7/0xf50 [ 370.004901] ? io_schedule_timeout+0xb0/0xb0 [ 370.005934] ? static_obj+0x31/0x80 [ 370.006788] ? mark_held_locks+0x24/0x90 [ 370.007752] ? cpu_map_bpf_prog_run_xdp+0x6c0/0x6c0 [ 370.008930] schedule+0x6f/0x160 [ 370.009728] schedule_preempt_disabled+0x14/0x20 [ 370.010829] kthread+0x17b/0x240 [ 370.011433] ? kthread_create_worker_on_cpu+0xd0/0xd0 [ 370.011944] ret_from_fork+0x1f/0x30 [ 370.012348] Showing all locks held in the system: [ 370.013025] 1 lock held by khungtaskd/33: [ 370.013432] #0: ffffffff82b24720 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x28/0x1c3 [ 370.014461] ============================================= Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") Reported-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> --- kernel/bpf/cpumap.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)