Message ID | 20200309225702.63695-1-maheshb@google.com |
---|---|
State | Accepted |
Delegated to: | David Ahern |
Headers | show |
Series | [net] ipvlan: add cond_resched_rcu() while processing muticast backlog | expand |
From: Mahesh Bandewar <maheshb@google.com> Date: Mon, 9 Mar 2020 15:57:02 -0700 > If there are substantial number of slaves created as simulated by > Syzbot, the backlog processing could take much longer and result > into the issue found in the Syzbot report. > > INFO: rcu_sched detected stalls on CPUs/tasks: > (detected by 1, t=10502 jiffies, g=5049, c=5048, q=752) ... > > Fixes: ba35f8588f47 (“ipvlan: Defer multicast / broadcast processing to a work-queue”) > Signed-off-by: Mahesh Bandewar <maheshb@google.com> > Reported-by: syzbot <syzkaller@googlegroups.com> Applied and queued up for -stable.
On 3/9/20 3:57 PM, Mahesh Bandewar wrote: > If there are substantial number of slaves created as simulated by > Syzbot, the backlog processing could take much longer and result > into the issue found in the Syzbot report. > ... > > Fixes: ba35f8588f47 (“ipvlan: Defer multicast / broadcast processing to a work-queue”) > Signed-off-by: Mahesh Bandewar <maheshb@google.com> > Reported-by: syzbot <syzkaller@googlegroups.com> > --- > drivers/net/ipvlan/ipvlan_core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c > index 53dac397db37..5759e91dec71 100644 > --- a/drivers/net/ipvlan/ipvlan_core.c > +++ b/drivers/net/ipvlan/ipvlan_core.c > @@ -277,6 +277,7 @@ void ipvlan_process_multicast(struct work_struct *work) > } > ipvlan_count_rx(ipvlan, len, ret == NET_RX_SUCCESS, true); > local_bh_enable(); > + cond_resched_rcu(); This does not work : If you release rcu_read_lock() here, then the surrounding loop can not be continued without risking use-after-free rcu_read_lock(); list_for_each_entry_rcu(ipvlan, &port->ipvlans, pnode) { ... cond_resched_rcu(); // after this point bad things can happen } You probably should do instead : diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index 30cd0c4f0be0b4d1dea2c0a4d68d0e33d1931ebc..57617ff5565fb87035c13dcf1de9fa5431d04e10 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -293,6 +293,7 @@ void ipvlan_process_multicast(struct work_struct *work) } if (dev) dev_put(dev); + cond_resched(); } }
On 3/9/20 6:01 PM, David Miller wrote: > From: Mahesh Bandewar <maheshb@google.com> > Date: Mon, 9 Mar 2020 15:57:02 -0700 > >> If there are substantial number of slaves created as simulated by >> Syzbot, the backlog processing could take much longer and result >> into the issue found in the Syzbot report. >> >> INFO: rcu_sched detected stalls on CPUs/tasks: >> (detected by 1, t=10502 jiffies, g=5049, c=5048, q=752) > ... >> >> Fixes: ba35f8588f47 (“ipvlan: Defer multicast / broadcast processing to a work-queue”) >> Signed-off-by: Mahesh Bandewar <maheshb@google.com> >> Reported-by: syzbot <syzkaller@googlegroups.com> > > Applied and queued up for -stable. Ok, I will submit a fix on top of it then ;) Thanks.
On Mon, Mar 9, 2020 at 6:07 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 3/9/20 3:57 PM, Mahesh Bandewar wrote: > > If there are substantial number of slaves created as simulated by > > Syzbot, the backlog processing could take much longer and result > > into the issue found in the Syzbot report. > > > > ... > > > > > Fixes: ba35f8588f47 (“ipvlan: Defer multicast / broadcast processing to a work-queue”) > > Signed-off-by: Mahesh Bandewar <maheshb@google.com> > > Reported-by: syzbot <syzkaller@googlegroups.com> > > --- > > drivers/net/ipvlan/ipvlan_core.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c > > index 53dac397db37..5759e91dec71 100644 > > --- a/drivers/net/ipvlan/ipvlan_core.c > > +++ b/drivers/net/ipvlan/ipvlan_core.c > > @@ -277,6 +277,7 @@ void ipvlan_process_multicast(struct work_struct *work) > > } > > ipvlan_count_rx(ipvlan, len, ret == NET_RX_SUCCESS, true); > > local_bh_enable(); > > + cond_resched_rcu(); > > This does not work : If you release rcu_read_lock() here, > then the surrounding loop can not be continued without risking use-after-free > .. but cond_resched_rcu() is nothing but rcu_read_unlock(); cond_resched(); rcu_read_lock(); isn't that sufficient? > rcu_read_lock(); > list_for_each_entry_rcu(ipvlan, &port->ipvlans, pnode) { > ... > cond_resched_rcu(); > // after this point bad things can happen > } > > > You probably should do instead : > > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c > index 30cd0c4f0be0b4d1dea2c0a4d68d0e33d1931ebc..57617ff5565fb87035c13dcf1de9fa5431d04e10 100644 > --- a/drivers/net/ipvlan/ipvlan_core.c > +++ b/drivers/net/ipvlan/ipvlan_core.c > @@ -293,6 +293,7 @@ void ipvlan_process_multicast(struct work_struct *work) > } > if (dev) > dev_put(dev); > + cond_resched(); > } reason this may not work is because the inner loop is for slaves for a single packet and if there are 1k slaves, then skb_clone() will be called 1k times before doing cond_reched() and the problem may not even get mitigated. > } >
On 3/9/20 7:21 PM, Mahesh Bandewar (महेश बंडेवार) wrote: > On Mon, Mar 9, 2020 at 6:07 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: >> >> >> >> On 3/9/20 3:57 PM, Mahesh Bandewar wrote: >>> If there are substantial number of slaves created as simulated by >>> Syzbot, the backlog processing could take much longer and result >>> into the issue found in the Syzbot report. >>> >> >> ... >> >>> >>> Fixes: ba35f8588f47 (“ipvlan: Defer multicast / broadcast processing to a work-queue”) >>> Signed-off-by: Mahesh Bandewar <maheshb@google.com> >>> Reported-by: syzbot <syzkaller@googlegroups.com> >>> --- >>> drivers/net/ipvlan/ipvlan_core.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c >>> index 53dac397db37..5759e91dec71 100644 >>> --- a/drivers/net/ipvlan/ipvlan_core.c >>> +++ b/drivers/net/ipvlan/ipvlan_core.c >>> @@ -277,6 +277,7 @@ void ipvlan_process_multicast(struct work_struct *work) >>> } >>> ipvlan_count_rx(ipvlan, len, ret == NET_RX_SUCCESS, true); >>> local_bh_enable(); >>> + cond_resched_rcu(); >> >> This does not work : If you release rcu_read_lock() here, >> then the surrounding loop can not be continued without risking use-after-free >> > .. but cond_resched_rcu() is nothing but > rcu_read_unlock(); cond_resched(); rcu_read_lock(); > > isn't that sufficient? It is buggy. Think about iterating a list with a spinlock protection. Then in the middle of the loop, releasing the spinlock and re-acquiring it. The cursor in the loop might point to freed memory. Same for rcu really. > >> rcu_read_lock(); >> list_for_each_entry_rcu(ipvlan, &port->ipvlans, pnode) { >> ... >> cond_resched_rcu(); >> // after this point bad things can happen >> } >> >> >> You probably should do instead : >> >> diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c >> index 30cd0c4f0be0b4d1dea2c0a4d68d0e33d1931ebc..57617ff5565fb87035c13dcf1de9fa5431d04e10 100644 >> --- a/drivers/net/ipvlan/ipvlan_core.c >> +++ b/drivers/net/ipvlan/ipvlan_core.c >> @@ -293,6 +293,7 @@ void ipvlan_process_multicast(struct work_struct *work) >> } >> if (dev) >> dev_put(dev); >> + cond_resched(); >> } > > reason this may not work is because the inner loop is for slaves for a > single packet and if there are 1k slaves, then skb_clone() will be > called 1k times before doing cond_reched() and the problem may not > even get mitigated. The problem that syzbot found is that queuing IPVLAN_QBACKLOG_LIMIT (1000) packets on the backlog could force the ipvlan_process_multicast() worker to process 1000 packets. Multiply this by the number of slaves, say 1000 -> 1,000,000 skbs clones. After the patch, we divide by 1000 the time taken in one invocation, that should be just good enough. You do not need to schedule after _each_ clone. Think about netdev_max_backlog which is set to 1000 : we believe it is fine to process 1000 packets per round.
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index 53dac397db37..5759e91dec71 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -277,6 +277,7 @@ void ipvlan_process_multicast(struct work_struct *work) } ipvlan_count_rx(ipvlan, len, ret == NET_RX_SUCCESS, true); local_bh_enable(); + cond_resched_rcu(); } rcu_read_unlock();
If there are substantial number of slaves created as simulated by Syzbot, the backlog processing could take much longer and result into the issue found in the Syzbot report. INFO: rcu_sched detected stalls on CPUs/tasks: (detected by 1, t=10502 jiffies, g=5049, c=5048, q=752) All QSes seen, last rcu_sched kthread activity 10502 (4294965563-4294955061), jiffies_till_next_fqs=1, root ->qsmask 0x0 syz-executor.1 R running task on cpu 1 10984 11210 3866 0x30020008 179034491270 Call Trace: <IRQ> [<ffffffff81497163>] _sched_show_task kernel/sched/core.c:8063 [inline] [<ffffffff81497163>] _sched_show_task.cold+0x2fd/0x392 kernel/sched/core.c:8030 [<ffffffff8146a91b>] sched_show_task+0xb/0x10 kernel/sched/core.c:8073 [<ffffffff815c931b>] print_other_cpu_stall kernel/rcu/tree.c:1577 [inline] [<ffffffff815c931b>] check_cpu_stall kernel/rcu/tree.c:1695 [inline] [<ffffffff815c931b>] __rcu_pending kernel/rcu/tree.c:3478 [inline] [<ffffffff815c931b>] rcu_pending kernel/rcu/tree.c:3540 [inline] [<ffffffff815c931b>] rcu_check_callbacks.cold+0xbb4/0xc29 kernel/rcu/tree.c:2876 [<ffffffff815e3962>] update_process_times+0x32/0x80 kernel/time/timer.c:1635 [<ffffffff816164f0>] tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161 [<ffffffff81616ae4>] tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193 [<ffffffff815e75f7>] __run_hrtimer kernel/time/hrtimer.c:1393 [inline] [<ffffffff815e75f7>] __hrtimer_run_queues+0x307/0xd90 kernel/time/hrtimer.c:1455 [<ffffffff815e90ea>] hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1513 [<ffffffff844050f4>] local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline] [<ffffffff844050f4>] smp_apic_timer_interrupt+0x144/0x5e0 arch/x86/kernel/apic/apic.c:1056 [<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778 RIP: 0010:do_raw_read_lock+0x22/0x80 kernel/locking/spinlock_debug.c:153 RSP: 0018:ffff8801dad07ab8 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff12 RAX: 0000000000000000 RBX: ffff8801c4135680 RCX: 0000000000000000 RDX: 1ffff10038826afe RSI: ffff88019d816bb8 RDI: ffff8801c41357f0 RBP: ffff8801dad07ac0 R08: 0000000000004b15 R09: 0000000000310273 R10: ffff88019d816bb8 R11: 0000000000000001 R12: ffff8801c41357e8 R13: 0000000000000000 R14: ffff8801cfb19850 R15: ffff8801cfb198b0 [<ffffffff8101460e>] __raw_read_lock_bh include/linux/rwlock_api_smp.h:177 [inline] [<ffffffff8101460e>] _raw_read_lock_bh+0x3e/0x50 kernel/locking/spinlock.c:240 [<ffffffff840d78ca>] ipv6_chk_mcast_addr+0x11a/0x6f0 net/ipv6/mcast.c:1006 [<ffffffff84023439>] ip6_mc_input+0x319/0x8e0 net/ipv6/ip6_input.c:482 [<ffffffff840211c8>] dst_input include/net/dst.h:449 [inline] [<ffffffff840211c8>] ip6_rcv_finish+0x408/0x610 net/ipv6/ip6_input.c:78 [<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:292 [inline] [<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:286 [inline] [<ffffffff840214de>] ipv6_rcv+0x10e/0x420 net/ipv6/ip6_input.c:278 [<ffffffff83a29efa>] __netif_receive_skb_one_core+0x12a/0x1f0 net/core/dev.c:5303 [<ffffffff83a2a15c>] __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:5417 [<ffffffff83a2f536>] process_backlog+0x216/0x6c0 net/core/dev.c:6243 [<ffffffff83a30d1b>] napi_poll net/core/dev.c:6680 [inline] [<ffffffff83a30d1b>] net_rx_action+0x47b/0xfb0 net/core/dev.c:6748 [<ffffffff846002c8>] __do_softirq+0x2c8/0x99a kernel/softirq.c:317 [<ffffffff813e656a>] invoke_softirq kernel/softirq.c:399 [inline] [<ffffffff813e656a>] irq_exit+0x16a/0x1a0 kernel/softirq.c:439 [<ffffffff84405115>] exiting_irq arch/x86/include/asm/apic.h:561 [inline] [<ffffffff84405115>] smp_apic_timer_interrupt+0x165/0x5e0 arch/x86/kernel/apic/apic.c:1058 [<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778 </IRQ> RIP: 0010:__sanitizer_cov_trace_pc+0x26/0x50 kernel/kcov.c:102 RSP: 0018:ffff880196033bd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff12 RAX: ffff88019d8161c0 RBX: 00000000ffffffff RCX: ffffc90003501000 RDX: 0000000000000002 RSI: ffffffff816236d1 RDI: 0000000000000005 RBP: ffff880196033bd8 R08: ffff88019d8161c0 R09: 0000000000000000 R10: 1ffff10032c067f0 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000000 [<ffffffff816236d1>] do_futex+0x151/0x1d50 kernel/futex.c:3548 [<ffffffff816260f0>] C_SYSC_futex kernel/futex_compat.c:201 [inline] [<ffffffff816260f0>] compat_SyS_futex+0x270/0x3b0 kernel/futex_compat.c:175 [<ffffffff8101da17>] do_syscall_32_irqs_on arch/x86/entry/common.c:353 [inline] [<ffffffff8101da17>] do_fast_syscall_32+0x357/0xe1c arch/x86/entry/common.c:415 [<ffffffff84401a9b>] entry_SYSENTER_compat+0x8b/0x9d arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7f23c69 RSP: 002b:00000000f5d1f12c EFLAGS: 00000282 ORIG_RAX: 00000000000000f0 RAX: ffffffffffffffda RBX: 000000000816af88 RCX: 0000000000000080 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000816af8c RBP: 00000000f5d1f228 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 rcu_sched kthread starved for 10502 jiffies! g5049 c5048 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1 rcu_sched R running task on cpu 1 13048 8 2 0x90000000 179099587640 Call Trace: [<ffffffff8147321f>] context_switch+0x60f/0xa60 kernel/sched/core.c:3209 [<ffffffff8100095a>] __schedule+0x5aa/0x1da0 kernel/sched/core.c:3934 [<ffffffff810021df>] schedule+0x8f/0x1b0 kernel/sched/core.c:4011 [<ffffffff8101116d>] schedule_timeout+0x50d/0xee0 kernel/time/timer.c:1803 [<ffffffff815c13f1>] rcu_gp_kthread+0xda1/0x3b50 kernel/rcu/tree.c:2327 [<ffffffff8144b318>] kthread+0x348/0x420 kernel/kthread.c:246 [<ffffffff84400266>] ret_from_fork+0x56/0x70 arch/x86/entry/entry_64.S:393 Fixes: ba35f8588f47 (“ipvlan: Defer multicast / broadcast processing to a work-queue”) Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> --- drivers/net/ipvlan/ipvlan_core.c | 1 + 1 file changed, 1 insertion(+)