Message ID | 20180928202608.uycdlytob75iphfu@codemonkey.org.uk |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | bond: take rcu lock in netpoll_send_skb_on_dev | expand |
From: Dave Jones <davej@codemonkey.org.uk> Date: Fri, 28 Sep 2018 16:26:08 -0400 > The bonding driver lacks the rcu lock when it calls down into > netdev_lower_get_next_private_rcu from bond_poll_controller, which > results in a trace like: ... > We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev > before we call down into netpoll_poll_dev, so just take the lock there. > > Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> > Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Applied and queued up for -stable.
On 9/28/2018 11:26 PM, Dave Jones wrote: > The bonding driver lacks the rcu lock when it calls down into > netdev_lower_get_next_private_rcu from bond_poll_controller, which > results in a trace like: > > WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40 > CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1 > Workqueue: bond0 bond_mii_monitor > RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40 > Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8> > RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046 > RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000 > RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0 > RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000 > R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980 > R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff > FS: 0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0 > Call Trace: > bond_poll_controller+0x52/0x170 > netpoll_poll_dev+0x79/0x290 > netpoll_send_skb_on_dev+0x158/0x2c0 > netpoll_send_udp+0x2d5/0x430 > write_ext_msg+0x1e0/0x210 > console_unlock+0x3c4/0x630 > vprintk_emit+0xfa/0x2f0 > printk+0x52/0x6e > ? __netdev_printk+0x12b/0x220 > netdev_info+0x64/0x80 > ? bond_3ad_set_carrier+0xe9/0x180 > bond_select_active_slave+0x1fc/0x310 > bond_mii_monitor+0x709/0x9b0 > process_one_work+0x221/0x5e0 > worker_thread+0x4f/0x3b0 > kthread+0x100/0x140 > ? process_one_work+0x5e0/0x5e0 > ? kthread_delayed_work_timer_fn+0x90/0x90 > ret_from_fork+0x24/0x30 > > We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev > before we call down into netpoll_poll_dev, so just take the lock there. > > Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> > Signed-off-by: Dave Jones <davej@codemonkey.org.uk> > > diff --git a/net/core/netpoll.c b/net/core/netpoll.c > index 3219a2932463..692367d7c280 100644 > --- a/net/core/netpoll.c > +++ b/net/core/netpoll.c > @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, > /* It is up to the caller to keep npinfo alive. */ > struct netpoll_info *npinfo; > > + rcu_read_lock_bh(); Hi, This suggested fix introduced a regression while using netconsole module with mlx5_core module loaded. During irq handling, we hit a warning that this rcu_read_lock_bh cannot be taken inside an IRQ. Isn't it accepted from a driver to print to kernel log inside irq handler or maybe the lock was taken too high in the calling chain of bond_poll_controller? Attached below the trace we are hitting once we applied your patch over our systems. [2018-10-15 10:45:30] mlx5_core 0000:00:09.0: firmware version: 16.22.8010 [2018-10-15 10:45:30] mlx5_core 0000:00:09.0: 63.008 Gb/s available PCIe bandwidth, limited by 8 GT/s x8 link at 0000:00:09.0 (capable of 126.016 Gb/s with 8 GT/s x16 link) [2018-10-15 10:45:31] (0000:00:09.0): E-Switch: Total vports 1, per vport: max uc(1024) max mc(16384) [2018-10-15 10:45:31] mlx5_core 0000:00:09.0: Port module event: module 0, Cable plugged [2018-10-15 10:45:31] WARNING: CPU: 1 PID: 0 at kernel/softirq.c:168 __local_bh_enable_ip+0x35/0x50 [2018-10-15 10:45:31] Modules linked in: mlx5_core(+) mlxfw bonding ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre rdma_ucm ib_uverbs ib_ipoib ib_umad nfsv3 nfs_acl nfs lockd grace fscache netconsole mlx4_ib mlx4_en ptp pps_core mlx4_core cfg80211 devlink rfkill rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp sunrpc rdma_cm ib_cm iw_cm ib_core snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore pcspkr i2c_piix4 sch_fq_codel ip_tables cirrus drm_kms_helper ata_generic pata_acpi syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_net net_failover i2c_core failover serio_raw floppy ata_piix [last unloaded: mlxfw] [2018-10-15 10:45:31] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc6-J4083-G9e91d710a170 #1 [2018-10-15 10:45:31] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 [2018-10-15 10:45:31] RIP: 0010:__local_bh_enable_ip+0x35/0x50 [2018-10-15 10:45:31] Code: 7e a9 00 00 0f 00 75 22 83 ee 01 f7 de 65 01 35 91 8c f7 7e 65 8b 05 8a 8c f7 7e a9 00 ff 1f 00 74 0c 65 ff 0d 7c 8c f7 7e c3 <0f> 0b eb da 65 66 8b 05 1f 4e f8 7e 66 85 c0 74 e7 e8 55 ff ff ff [2018-10-15 10:45:31] RSP: 0018:ffff880237a43c10 EFLAGS: 00010006 [2018-10-15 10:45:31] RAX: 0000000080010200 RBX: 0000000000000006 RCX: 0000000000000001 [2018-10-15 10:45:31] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817a1321 [2018-10-15 10:45:31] RBP: ffff880237a43c60 R08: 0000000000480020 R09: 0000000000000000 [2018-10-15 10:45:31] R10: 000000020834c006 R11: 0000000000000000 R12: ffff880229963d68 [2018-10-15 10:45:31] R13: ffff88020834c034 R14: 0000000000006b00 R15: ffff8802297d8400 [2018-10-15 10:45:31] FS: 0000000000000000(0000) GS:ffff880237a40000(0000) knlGS:0000000000000000 [2018-10-15 10:45:31] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2018-10-15 10:45:31] CR2: 00007f96d4f57080 CR3: 00000001a157d000 CR4: 00000000000006e0 [2018-10-15 10:45:31] Call Trace: [2018-10-15 10:45:31] [2018-10-15 10:45:31] netpoll_send_udp+0x2de/0x410 [2018-10-15 10:45:31] write_msg+0xdb/0xf0 [netconsole] [2018-10-15 10:45:31] console_unlock+0x33e/0x500 [2018-10-15 10:45:31] vprintk_emit+0x211/0x280 [2018-10-15 10:45:31] dev_vprintk_emit+0x10b/0x200 [2018-10-15 10:45:31] dev_printk_emit+0x3b/0x50 [2018-10-15 10:45:31] ? ttwu_do_wakeup+0x19/0x130 [2018-10-15 10:45:31] _dev_info+0x55/0x60 [2018-10-15 10:45:31] mlx5_eq_int+0x27a/0x690 [mlx5_core] [2018-10-15 10:45:31] __handle_irq_event_percpu+0x3a/0x190 [2018-10-15 10:45:31] handle_irq_event_percpu+0x20/0x50 [2018-10-15 10:45:31] handle_irq_event+0x27/0x50 [2018-10-15 10:45:31] handle_edge_irq+0x6d/0x180 [2018-10-15 10:45:31] handle_irq+0xa5/0x110 [2018-10-15 10:45:31] do_IRQ+0x49/0xd0 [2018-10-15 10:45:31] common_interrupt+0xf/0xf [2018-10-15 10:45:31] [2018-10-15 10:45:31] RIP: 0010:native_safe_halt+0x2/0x10 [2018-10-15 10:45:31] Code: 7e ff ff ff 7f f3 c3 65 48 8b 04 25 80 5b 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 [2018-10-15 10:45:31] RSP: 0018:ffff88023663fed0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda [2018-10-15 10:45:31] RAX: 0000000080000000 RBX: 0000000000000001 RCX: ffff880237a5a880 [2018-10-15 10:45:31] RDX: ffffffff8221cd48 RSI: ffff880237a5a880 RDI: 0000000000000001 [2018-10-15 10:45:31] RBP: 0000000000000001 R08: 000000200b1d1602 R09: 0000000000000000 [2018-10-15 10:45:31] R10: ffff880236627d20 R11: 0000000000000000 R12: 0000000000000000 [2018-10-15 10:45:31] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [2018-10-15 10:45:31] default_idle+0x1c/0x140 [2018-10-15 10:45:31] do_idle+0x194/0x240 [2018-10-15 10:45:31] cpu_startup_entry+0x19/0x20 [2018-10-15 10:45:31] start_secondary+0x138/0x170 [2018-10-15 10:45:31] secondary_startup_64+0xa4/0xb0 [2018-10-15 10:45:31] ---[ end trace 10dfce1a9e88fa01 ]--- > lockdep_assert_irqs_disabled(); > > npinfo = rcu_dereference_bh(np->dev->npinfo); > @@ -374,6 +375,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, > skb_queue_tail(&npinfo->txq, skb); > schedule_delayed_work(&npinfo->tx_work,0); > } > + rcu_read_unlock_bh(); > } > EXPORT_SYMBOL(netpoll_send_skb_on_dev); > >
On Mon, Oct 15, 2018 at 4:36 AM Eran Ben Elisha <eranbe@mellanox.com> wrote: > Hi, > > This suggested fix introduced a regression while using netconsole module > with mlx5_core module loaded. It is already reported here: https://marc.info/?l=linux-kernel&m=153917359528669&w=2 > > During irq handling, we hit a warning that this rcu_read_lock_bh cannot > be taken inside an IRQ. Yes, I mentioned the same even before this patch was sent out: https://marc.info/?l=linux-netdev&m=153816136624679&w=2 Thanks.
On 10/18/2018 8:46 AM, Cong Wang wrote: > On Mon, Oct 15, 2018 at 4:36 AM Eran Ben Elisha <eranbe@mellanox.com> wrote: >> Hi, >> >> This suggested fix introduced a regression while using netconsole module >> with mlx5_core module loaded. > > It is already reported here: > https://marc.info/?l=linux-kernel&m=153917359528669&w=2 > > >> >> During irq handling, we hit a warning that this rcu_read_lock_bh cannot >> be taken inside an IRQ. > > Yes, I mentioned the same even before this patch was sent out: > https://marc.info/?l=linux-netdev&m=153816136624679&w=2 Thanks Cong. From the discussion, I understand that the solution shouldn't be touching netpoll_send_skb_on_dev. Some modules/drivers logging will now trigger traces while netconsole is loaded. DaveJ, Can you please submit a proper fix or at least revert the current one in the meanwhile. Thanks, Eran > > Thanks. >
diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 3219a2932463..692367d7c280 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, /* It is up to the caller to keep npinfo alive. */ struct netpoll_info *npinfo; + rcu_read_lock_bh(); lockdep_assert_irqs_disabled(); npinfo = rcu_dereference_bh(np->dev->npinfo); @@ -374,6 +375,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, skb_queue_tail(&npinfo->txq, skb); schedule_delayed_work(&npinfo->tx_work,0); } + rcu_read_unlock_bh(); } EXPORT_SYMBOL(netpoll_send_skb_on_dev);
The bonding driver lacks the rcu lock when it calls down into netdev_lower_get_next_private_rcu from bond_poll_controller, which results in a trace like: WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40 CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1 Workqueue: bond0 bond_mii_monitor RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40 Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8> RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0 RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000 R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980 R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff FS: 0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0 Call Trace: bond_poll_controller+0x52/0x170 netpoll_poll_dev+0x79/0x290 netpoll_send_skb_on_dev+0x158/0x2c0 netpoll_send_udp+0x2d5/0x430 write_ext_msg+0x1e0/0x210 console_unlock+0x3c4/0x630 vprintk_emit+0xfa/0x2f0 printk+0x52/0x6e ? __netdev_printk+0x12b/0x220 netdev_info+0x64/0x80 ? bond_3ad_set_carrier+0xe9/0x180 bond_select_active_slave+0x1fc/0x310 bond_mii_monitor+0x709/0x9b0 process_one_work+0x221/0x5e0 worker_thread+0x4f/0x3b0 kthread+0x100/0x140 ? process_one_work+0x5e0/0x5e0 ? kthread_delayed_work_timer_fn+0x90/0x90 ret_from_fork+0x24/0x30 We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev before we call down into netpoll_poll_dev, so just take the lock there. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Dave Jones <davej@codemonkey.org.uk>