Message ID | 20190428030539.17776-1-yuehaibing@huawei.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | tun: Fix use-after-free in tun_net_xmit | expand |
On 2019/4/28 上午11:05, Yue Haibing wrote: > From: YueHaibing <yuehaibing@huawei.com> > > KASAN report this: > > BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104 > Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0 > > CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 > Call Trace: > <IRQ> > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xca/0x13e lib/dump_stack.c:113 > print_address_description+0x79/0x330 mm/kasan/report.c:253 > kasan_report_error mm/kasan/report.c:351 [inline] > kasan_report+0x18a/0x2d0 mm/kasan/report.c:409 > tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104 > __netdev_start_xmit include/linux/netdevice.h:4300 [inline] > netdev_start_xmit include/linux/netdevice.h:4309 [inline] > xmit_one net/core/dev.c:3243 [inline] > dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259 > sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327 > qdisc_restart net/sched/sch_generic.c:390 [inline] > __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398 > qdisc_run include/net/pkt_sched.h:120 [inline] > __dev_xmit_skb net/core/dev.c:3438 [inline] > __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797 > neigh_output include/net/neighbour.h:501 [inline] > ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120 > ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154 > NF_HOOK_COND include/linux/netfilter.h:278 [inline] > ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171 > dst_output include/net/dst.h:444 [inline] > NF_HOOK include/linux/netfilter.h:289 [inline] > mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683 > mld_send_cr net/ipv6/mcast.c:1979 [inline] > mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478 > call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326 > expire_timers kernel/time/timer.c:1363 [inline] > __run_timers kernel/time/timer.c:1682 [inline] > run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695 > __do_softirq+0x26d/0xabd kernel/softirq.c:292 > invoke_softirq kernel/softirq.c:372 [inline] > irq_exit+0x209/0x290 kernel/softirq.c:412 > exiting_irq arch/x86/include/asm/apic.h:536 [inline] > smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056 > apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864 > </IRQ> > RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58 > Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 > RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 > RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c > RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0 > R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000 > arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] > default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561 > cpuidle_idle_call kernel/sched/idle.c:153 [inline] > do_idle+0x2ca/0x420 kernel/sched/idle.c:262 > cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368 > start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271 > secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 > > Allocated by task 19764: > set_track mm/kasan/kasan.c:460 [inline] > kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553 > __kmalloc+0x11b/0x2d0 mm/slub.c:3750 > kmalloc include/linux/slab.h:518 [inline] > sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469 > sk_alloc+0x3d/0xc00 net/core/sock.c:1523 > tun_chr_open+0x80/0x560 drivers/net/tun.c:3204 > misc_open+0x367/0x4e0 drivers/char/misc.c:141 > chrdev_open+0x212/0x580 fs/char_dev.c:417 > do_dentry_open+0x704/0x1050 fs/open.c:777 > do_last fs/namei.c:3418 [inline] > path_openat+0x7ed/0x2ae0 fs/namei.c:3533 > do_filp_open+0x1aa/0x2b0 fs/namei.c:3564 > do_sys_open+0x307/0x430 fs/open.c:1069 > do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Freed by task 19764: > set_track mm/kasan/kasan.c:460 [inline] > __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521 > slab_free_hook mm/slub.c:1370 [inline] > slab_free_freelist_hook mm/slub.c:1397 [inline] > slab_free mm/slub.c:2952 [inline] > kfree+0xeb/0x2f0 mm/slub.c:3905 > sk_prot_free net/core/sock.c:1506 [inline] > __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588 > sk_destruct+0x48/0x70 net/core/sock.c:1596 > __sk_free+0xa9/0x270 net/core/sock.c:1607 > sk_free+0x2a/0x30 net/core/sock.c:1618 > sock_put include/net/sock.h:1696 [inline] > __tun_detach+0x464/0xf70 drivers/net/tun.c:735 > tun_detach drivers/net/tun.c:747 [inline] > tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241 > __fput+0x27f/0x7f0 fs/file_table.c:278 > task_work_run+0x136/0x1b0 kernel/task_work.c:113 > tracehook_notify_resume include/linux/tracehook.h:193 [inline] > exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166 > prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline] > syscall_return_slowpath arch/x86/entry/common.c:268 [inline] > do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > The buggy address belongs to the object at ffff88836cc26600 > which belongs to the cache kmalloc-4096 of size 4096 > The buggy address is located 1136 bytes inside of > 4096-byte region [ffff88836cc26600, ffff88836cc27600) > The buggy address belongs to the page: > page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0 > flags: 0x2fffff80008100(slab|head) > raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600 > raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > If tun driver have multiqueues, user close the last queue by > tun_detach, then tun->tfiles[index] is not cleared. Then a new > queue may add to the tun, which using rcu_assign_pointer > tun->tfiles[index] to the new tfile and increase the numqueues. > However if there send a packet during this time, which picking the last > queue, it may uses the old tun->tfiles[index], beacause there no > RCU grace period. > > 1) tun_chr_close //close the last queue > --> __tun_detach //close the last queue, but tun->tfiles[index] still exist > > 2) tun_chr_open //attach a new queue > --> tun_attach > -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > //there need a RCU grace period > > -->tun->numqueues++; > > 3) tun_net_xmit //a new packet is sending, which pick the last queue > -->if (txq >= tun->numqueues) > //above check passed, but tfile still not renew > -->if (tfile->socket.sk->sk_filter ... > //use the old tfile,trigger use-after-free > > Reported-by: Hulk Robot <hulkci@huawei.com> > Fixes: c8d68e6be1c3 ("tuntap: multiqueue support") > Signed-off-by: YueHaibing <yuehaibing@huawei.com> > --- > drivers/net/tun.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index e9ca1c0..3770aba 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file, > */ > rcu_assign_pointer(tfile->tun, tun); > rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > + synchronize_net(); > tun->numqueues++; > tun_set_real_num_queues(tun); > out: Good catch, that's one of my suspicion as well. Assume that this has been tested. Acked-by: Jason Wang <jasowang@redhat.com> This patch is needed for -stable. I will post another theoretical issue similar to this soon. Thanks
On 2019/4/28 上午11:24, Jason Wang wrote: > > On 2019/4/28 上午11:05, Yue Haibing wrote: >> From: YueHaibing <yuehaibing@huawei.com> >> >> KASAN report this: >> >> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 >> drivers/net/tun.c:1104 >> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0 >> >> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.10.2-1ubuntu1 04/01/2014 >> Call Trace: >> <IRQ> >> __dump_stack lib/dump_stack.c:77 [inline] >> dump_stack+0xca/0x13e lib/dump_stack.c:113 >> print_address_description+0x79/0x330 mm/kasan/report.c:253 >> kasan_report_error mm/kasan/report.c:351 [inline] >> kasan_report+0x18a/0x2d0 mm/kasan/report.c:409 >> tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104 >> __netdev_start_xmit include/linux/netdevice.h:4300 [inline] >> netdev_start_xmit include/linux/netdevice.h:4309 [inline] >> xmit_one net/core/dev.c:3243 [inline] >> dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259 >> sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327 >> qdisc_restart net/sched/sch_generic.c:390 [inline] >> __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398 >> qdisc_run include/net/pkt_sched.h:120 [inline] >> __dev_xmit_skb net/core/dev.c:3438 [inline] >> __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797 >> neigh_output include/net/neighbour.h:501 [inline] >> ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120 >> ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154 >> NF_HOOK_COND include/linux/netfilter.h:278 [inline] >> ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171 >> dst_output include/net/dst.h:444 [inline] >> NF_HOOK include/linux/netfilter.h:289 [inline] >> mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683 >> mld_send_cr net/ipv6/mcast.c:1979 [inline] >> mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478 >> call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326 >> expire_timers kernel/time/timer.c:1363 [inline] >> __run_timers kernel/time/timer.c:1682 [inline] >> run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695 >> __do_softirq+0x26d/0xabd kernel/softirq.c:292 >> invoke_softirq kernel/softirq.c:372 [inline] >> irq_exit+0x209/0x290 kernel/softirq.c:412 >> exiting_irq arch/x86/include/asm/apic.h:536 [inline] >> smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056 >> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864 >> </IRQ> >> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58 >> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 >> ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> >> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 >> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 >> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000 >> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c >> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0 >> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000 >> arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] >> default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561 >> cpuidle_idle_call kernel/sched/idle.c:153 [inline] >> do_idle+0x2ca/0x420 kernel/sched/idle.c:262 >> cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368 >> start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271 >> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 >> >> Allocated by task 19764: >> set_track mm/kasan/kasan.c:460 [inline] >> kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553 >> __kmalloc+0x11b/0x2d0 mm/slub.c:3750 >> kmalloc include/linux/slab.h:518 [inline] >> sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469 >> sk_alloc+0x3d/0xc00 net/core/sock.c:1523 >> tun_chr_open+0x80/0x560 drivers/net/tun.c:3204 >> misc_open+0x367/0x4e0 drivers/char/misc.c:141 >> chrdev_open+0x212/0x580 fs/char_dev.c:417 >> do_dentry_open+0x704/0x1050 fs/open.c:777 >> do_last fs/namei.c:3418 [inline] >> path_openat+0x7ed/0x2ae0 fs/namei.c:3533 >> do_filp_open+0x1aa/0x2b0 fs/namei.c:3564 >> do_sys_open+0x307/0x430 fs/open.c:1069 >> do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 >> entry_SYSCALL_64_after_hwframe+0x49/0xbe >> >> Freed by task 19764: >> set_track mm/kasan/kasan.c:460 [inline] >> __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521 >> slab_free_hook mm/slub.c:1370 [inline] >> slab_free_freelist_hook mm/slub.c:1397 [inline] >> slab_free mm/slub.c:2952 [inline] >> kfree+0xeb/0x2f0 mm/slub.c:3905 >> sk_prot_free net/core/sock.c:1506 [inline] >> __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588 >> sk_destruct+0x48/0x70 net/core/sock.c:1596 >> __sk_free+0xa9/0x270 net/core/sock.c:1607 >> sk_free+0x2a/0x30 net/core/sock.c:1618 >> sock_put include/net/sock.h:1696 [inline] >> __tun_detach+0x464/0xf70 drivers/net/tun.c:735 >> tun_detach drivers/net/tun.c:747 [inline] >> tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241 >> __fput+0x27f/0x7f0 fs/file_table.c:278 >> task_work_run+0x136/0x1b0 kernel/task_work.c:113 >> tracehook_notify_resume include/linux/tracehook.h:193 [inline] >> exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166 >> prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline] >> syscall_return_slowpath arch/x86/entry/common.c:268 [inline] >> do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293 >> entry_SYSCALL_64_after_hwframe+0x49/0xbe >> >> The buggy address belongs to the object at ffff88836cc26600 >> which belongs to the cache kmalloc-4096 of size 4096 >> The buggy address is located 1136 bytes inside of >> 4096-byte region [ffff88836cc26600, ffff88836cc27600) >> The buggy address belongs to the page: >> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 >> index:0x0 compound_mapcount: 0 >> flags: 0x2fffff80008100(slab|head) >> raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600 >> raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 >> page dumped because: kasan: bad access detected >> >> Memory state around the buggy address: >> ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ^ >> ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> >> If tun driver have multiqueues, user close the last queue by >> tun_detach, then tun->tfiles[index] is not cleared. Then a new >> queue may add to the tun, which using rcu_assign_pointer >> tun->tfiles[index] to the new tfile and increase the numqueues. >> However if there send a packet during this time, which picking the last >> queue, it may uses the old tun->tfiles[index], beacause there no >> RCU grace period. >> >> 1) tun_chr_close //close the last queue >> --> __tun_detach //close the last queue, but tun->tfiles[index] >> still exist >> >> 2) tun_chr_open //attach a new queue >> --> tun_attach >> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); >> //there need a RCU grace period >> >> -->tun->numqueues++; >> >> 3) tun_net_xmit //a new packet is sending, which pick the last queue >> -->if (txq >= tun->numqueues) >> //above check passed, but tfile still not renew >> -->if (tfile->socket.sk->sk_filter ... >> //use the old tfile,trigger use-after-free >> >> Reported-by: Hulk Robot <hulkci@huawei.com> >> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support") >> Signed-off-by: YueHaibing <yuehaibing@huawei.com> >> --- >> drivers/net/tun.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >> index e9ca1c0..3770aba 100644 >> --- a/drivers/net/tun.c >> +++ b/drivers/net/tun.c >> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, >> struct file *file, >> */ >> rcu_assign_pointer(tfile->tun, tun); >> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); >> + synchronize_net(); >> tun->numqueues++; >> tun_set_real_num_queues(tun); >> out: > > > Good catch, that's one of my suspicion as well. Assume that this has > been tested. > > Acked-by: Jason Wang <jasowang@redhat.com> > > This patch is needed for -stable. > > I will post another theoretical issue similar to this soon. > > Thanks > Ok, a second thought. All evil come from the access of tun->numqueues without sufficient synchronization. This is a partial fix, another possible case is __tun_detach(), we need either 1) synchronize through pointers in tfiles 2) or smp_load_acquire()/smp_store_release() to solve it completely. Let me post a complete fix. Thanks
Hi Jason, > On 2019/4/28 上午11:24, Jason Wang wrote: > > > > On 2019/4/28 上午11:05, Yue Haibing wrote: > >> From: YueHaibing <yuehaibing@huawei.com> > >> > >> KASAN report this: > >> > >> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 > >> drivers/net/tun.c:1104 > >> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0 > >> > >> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > >> 1.10.2-1ubuntu1 04/01/2014 > >> Call Trace: > >> <IRQ> > >> __dump_stack lib/dump_stack.c:77 [inline] > >> dump_stack+0xca/0x13e lib/dump_stack.c:113 > >> print_address_description+0x79/0x330 mm/kasan/report.c:253 > >> kasan_report_error mm/kasan/report.c:351 [inline] > >> kasan_report+0x18a/0x2d0 mm/kasan/report.c:409 > >> tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104 > >> __netdev_start_xmit include/linux/netdevice.h:4300 [inline] > >> netdev_start_xmit include/linux/netdevice.h:4309 [inline] > >> xmit_one net/core/dev.c:3243 [inline] > >> dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259 > >> sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327 > >> qdisc_restart net/sched/sch_generic.c:390 [inline] > >> __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398 > >> qdisc_run include/net/pkt_sched.h:120 [inline] > >> __dev_xmit_skb net/core/dev.c:3438 [inline] > >> __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797 > >> neigh_output include/net/neighbour.h:501 [inline] > >> ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120 > >> ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154 > >> NF_HOOK_COND include/linux/netfilter.h:278 [inline] > >> ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171 > >> dst_output include/net/dst.h:444 [inline] > >> NF_HOOK include/linux/netfilter.h:289 [inline] > >> mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683 > >> mld_send_cr net/ipv6/mcast.c:1979 [inline] > >> mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478 > >> call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326 > >> expire_timers kernel/time/timer.c:1363 [inline] > >> __run_timers kernel/time/timer.c:1682 [inline] > >> run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695 > >> __do_softirq+0x26d/0xabd kernel/softirq.c:292 > >> invoke_softirq kernel/softirq.c:372 [inline] > >> irq_exit+0x209/0x290 kernel/softirq.c:412 > >> exiting_irq arch/x86/include/asm/apic.h:536 [inline] > >> smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056 > >> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864 > >> </IRQ> > >> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58 > >> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 > >> ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> > >> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 > >> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 > >> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000 > >> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c > >> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000 > >> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0 > >> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000 > >> arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] > >> default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561 > >> cpuidle_idle_call kernel/sched/idle.c:153 [inline] > >> do_idle+0x2ca/0x420 kernel/sched/idle.c:262 > >> cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368 > >> start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271 > >> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 > >> > >> Allocated by task 19764: > >> set_track mm/kasan/kasan.c:460 [inline] > >> kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553 > >> __kmalloc+0x11b/0x2d0 mm/slub.c:3750 > >> kmalloc include/linux/slab.h:518 [inline] > >> sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469 > >> sk_alloc+0x3d/0xc00 net/core/sock.c:1523 > >> tun_chr_open+0x80/0x560 drivers/net/tun.c:3204 > >> misc_open+0x367/0x4e0 drivers/char/misc.c:141 > >> chrdev_open+0x212/0x580 fs/char_dev.c:417 > >> do_dentry_open+0x704/0x1050 fs/open.c:777 > >> do_last fs/namei.c:3418 [inline] > >> path_openat+0x7ed/0x2ae0 fs/namei.c:3533 > >> do_filp_open+0x1aa/0x2b0 fs/namei.c:3564 > >> do_sys_open+0x307/0x430 fs/open.c:1069 > >> do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 > >> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >> > >> Freed by task 19764: > >> set_track mm/kasan/kasan.c:460 [inline] > >> __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521 > >> slab_free_hook mm/slub.c:1370 [inline] > >> slab_free_freelist_hook mm/slub.c:1397 [inline] > >> slab_free mm/slub.c:2952 [inline] > >> kfree+0xeb/0x2f0 mm/slub.c:3905 > >> sk_prot_free net/core/sock.c:1506 [inline] > >> __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588 > >> sk_destruct+0x48/0x70 net/core/sock.c:1596 > >> __sk_free+0xa9/0x270 net/core/sock.c:1607 > >> sk_free+0x2a/0x30 net/core/sock.c:1618 > >> sock_put include/net/sock.h:1696 [inline] > >> __tun_detach+0x464/0xf70 drivers/net/tun.c:735 > >> tun_detach drivers/net/tun.c:747 [inline] > >> tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241 > >> __fput+0x27f/0x7f0 fs/file_table.c:278 > >> task_work_run+0x136/0x1b0 kernel/task_work.c:113 > >> tracehook_notify_resume include/linux/tracehook.h:193 [inline] > >> exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166 > >> prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline] > >> syscall_return_slowpath arch/x86/entry/common.c:268 [inline] > >> do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293 > >> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >> > >> The buggy address belongs to the object at ffff88836cc26600 > >> which belongs to the cache kmalloc-4096 of size 4096 > >> The buggy address is located 1136 bytes inside of > >> 4096-byte region [ffff88836cc26600, ffff88836cc27600) > >> The buggy address belongs to the page: > >> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 > >> index:0x0 compound_mapcount: 0 > >> flags: 0x2fffff80008100(slab|head) > >> raw: 002fffff80008100 dead000000000100 dead000000000200 > ffff8883e280e600 > >> raw: 0000000000000000 0000000000070007 00000001ffffffff > 0000000000000000 > >> page dumped because: kasan: bad access detected > >> > >> Memory state around the buggy address: > >> ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >> ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >> ^ > >> ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >> ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >> > >> If tun driver have multiqueues, user close the last queue by > >> tun_detach, then tun->tfiles[index] is not cleared. Then a new > >> queue may add to the tun, which using rcu_assign_pointer > >> tun->tfiles[index] to the new tfile and increase the numqueues. > >> However if there send a packet during this time, which picking the last > >> queue, it may uses the old tun->tfiles[index], beacause there no > >> RCU grace period. > >> > >> 1) tun_chr_close //close the last queue > >> --> __tun_detach //close the last queue, but tun->tfiles[index] > >> still exist > >> > >> 2) tun_chr_open //attach a new queue > >> --> tun_attach > >> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > >> //there need a RCU grace period > >> > >> -->tun->numqueues++; > >> > >> 3) tun_net_xmit //a new packet is sending, which pick the last queue > >> -->if (txq >= tun->numqueues) > >> //above check passed, but tfile still not renew > >> -->if (tfile->socket.sk->sk_filter ... > >> //use the old tfile,trigger use-after-free > >> > >> Reported-by: Hulk Robot <hulkci@huawei.com> > >> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support") > >> Signed-off-by: YueHaibing <yuehaibing@huawei.com> > >> --- > >> drivers/net/tun.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c > >> index e9ca1c0..3770aba 100644 > >> --- a/drivers/net/tun.c > >> +++ b/drivers/net/tun.c > >> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, > >> struct file *file, > >> */ > >> rcu_assign_pointer(tfile->tun, tun); > >> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > >> + synchronize_net(); > >> tun->numqueues++; > >> tun_set_real_num_queues(tun); > >> out: > > > > > > Good catch, that's one of my suspicion as well. Assume that this has > > been tested. > > > > Acked-by: Jason Wang <jasowang@redhat.com> > > > > This patch is needed for -stable. > > > > I will post another theoretical issue similar to this soon. > > > > Thanks > > > > Ok, a second thought. All evil come from the access of tun->numqueues > without sufficient synchronization. This is a partial fix, another > possible case is __tun_detach(), we need either In __tun_detach(),tun->tfiles changes should always call synchronize_net(), and free tfile after RCU grace period. tun_net_xmit() doesn't have the chance to access the change because it holding the rcu_read_lock(). So __tun_detach() path cannot cause the use after free, isn't it? Regards > > 1) synchronize through pointers in tfiles > > 2) or smp_load_acquire()/smp_store_release() to solve it completely. > > Let me post a complete fix. > > Thanks
On 2019/4/28 下午1:40, weiyongjun (A) wrote: > Hi Jason, > >> On 2019/4/28 上午11:24, Jason Wang wrote: >>> On 2019/4/28 上午11:05, Yue Haibing wrote: >>>> From: YueHaibing <yuehaibing@huawei.com> >>>> >>>> KASAN report this: >>>> >>>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 >>>> drivers/net/tun.c:1104 >>>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0 >>>> >>>> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6 >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>>> 1.10.2-1ubuntu1 04/01/2014 >>>> Call Trace: >>>> <IRQ> >>>> __dump_stack lib/dump_stack.c:77 [inline] >>>> dump_stack+0xca/0x13e lib/dump_stack.c:113 >>>> print_address_description+0x79/0x330 mm/kasan/report.c:253 >>>> kasan_report_error mm/kasan/report.c:351 [inline] >>>> kasan_report+0x18a/0x2d0 mm/kasan/report.c:409 >>>> tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104 >>>> __netdev_start_xmit include/linux/netdevice.h:4300 [inline] >>>> netdev_start_xmit include/linux/netdevice.h:4309 [inline] >>>> xmit_one net/core/dev.c:3243 [inline] >>>> dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259 >>>> sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327 >>>> qdisc_restart net/sched/sch_generic.c:390 [inline] >>>> __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398 >>>> qdisc_run include/net/pkt_sched.h:120 [inline] >>>> __dev_xmit_skb net/core/dev.c:3438 [inline] >>>> __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797 >>>> neigh_output include/net/neighbour.h:501 [inline] >>>> ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120 >>>> ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154 >>>> NF_HOOK_COND include/linux/netfilter.h:278 [inline] >>>> ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171 >>>> dst_output include/net/dst.h:444 [inline] >>>> NF_HOOK include/linux/netfilter.h:289 [inline] >>>> mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683 >>>> mld_send_cr net/ipv6/mcast.c:1979 [inline] >>>> mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478 >>>> call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326 >>>> expire_timers kernel/time/timer.c:1363 [inline] >>>> __run_timers kernel/time/timer.c:1682 [inline] >>>> run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695 >>>> __do_softirq+0x26d/0xabd kernel/softirq.c:292 >>>> invoke_softirq kernel/softirq.c:372 [inline] >>>> irq_exit+0x209/0x290 kernel/softirq.c:412 >>>> exiting_irq arch/x86/include/asm/apic.h:536 [inline] >>>> smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056 >>>> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864 >>>> </IRQ> >>>> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58 >>>> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 >>>> ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> >>>> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 >>>> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 >>>> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000 >>>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c >>>> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000 >>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0 >>>> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000 >>>> arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] >>>> default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561 >>>> cpuidle_idle_call kernel/sched/idle.c:153 [inline] >>>> do_idle+0x2ca/0x420 kernel/sched/idle.c:262 >>>> cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368 >>>> start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271 >>>> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 >>>> >>>> Allocated by task 19764: >>>> set_track mm/kasan/kasan.c:460 [inline] >>>> kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553 >>>> __kmalloc+0x11b/0x2d0 mm/slub.c:3750 >>>> kmalloc include/linux/slab.h:518 [inline] >>>> sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469 >>>> sk_alloc+0x3d/0xc00 net/core/sock.c:1523 >>>> tun_chr_open+0x80/0x560 drivers/net/tun.c:3204 >>>> misc_open+0x367/0x4e0 drivers/char/misc.c:141 >>>> chrdev_open+0x212/0x580 fs/char_dev.c:417 >>>> do_dentry_open+0x704/0x1050 fs/open.c:777 >>>> do_last fs/namei.c:3418 [inline] >>>> path_openat+0x7ed/0x2ae0 fs/namei.c:3533 >>>> do_filp_open+0x1aa/0x2b0 fs/namei.c:3564 >>>> do_sys_open+0x307/0x430 fs/open.c:1069 >>>> do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>> >>>> Freed by task 19764: >>>> set_track mm/kasan/kasan.c:460 [inline] >>>> __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521 >>>> slab_free_hook mm/slub.c:1370 [inline] >>>> slab_free_freelist_hook mm/slub.c:1397 [inline] >>>> slab_free mm/slub.c:2952 [inline] >>>> kfree+0xeb/0x2f0 mm/slub.c:3905 >>>> sk_prot_free net/core/sock.c:1506 [inline] >>>> __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588 >>>> sk_destruct+0x48/0x70 net/core/sock.c:1596 >>>> __sk_free+0xa9/0x270 net/core/sock.c:1607 >>>> sk_free+0x2a/0x30 net/core/sock.c:1618 >>>> sock_put include/net/sock.h:1696 [inline] >>>> __tun_detach+0x464/0xf70 drivers/net/tun.c:735 >>>> tun_detach drivers/net/tun.c:747 [inline] >>>> tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241 >>>> __fput+0x27f/0x7f0 fs/file_table.c:278 >>>> task_work_run+0x136/0x1b0 kernel/task_work.c:113 >>>> tracehook_notify_resume include/linux/tracehook.h:193 [inline] >>>> exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166 >>>> prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline] >>>> syscall_return_slowpath arch/x86/entry/common.c:268 [inline] >>>> do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293 >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>> >>>> The buggy address belongs to the object at ffff88836cc26600 >>>> which belongs to the cache kmalloc-4096 of size 4096 >>>> The buggy address is located 1136 bytes inside of >>>> 4096-byte region [ffff88836cc26600, ffff88836cc27600) >>>> The buggy address belongs to the page: >>>> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 >>>> index:0x0 compound_mapcount: 0 >>>> flags: 0x2fffff80008100(slab|head) >>>> raw: 002fffff80008100 dead000000000100 dead000000000200 >> ffff8883e280e600 >>>> raw: 0000000000000000 0000000000070007 00000001ffffffff >> 0000000000000000 >>>> page dumped because: kasan: bad access detected >>>> >>>> Memory state around the buggy address: >>>> ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> ^ >>>> ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> >>>> If tun driver have multiqueues, user close the last queue by >>>> tun_detach, then tun->tfiles[index] is not cleared. Then a new >>>> queue may add to the tun, which using rcu_assign_pointer >>>> tun->tfiles[index] to the new tfile and increase the numqueues. >>>> However if there send a packet during this time, which picking the last >>>> queue, it may uses the old tun->tfiles[index], beacause there no >>>> RCU grace period. >>>> >>>> 1) tun_chr_close //close the last queue >>>> --> __tun_detach //close the last queue, but tun->tfiles[index] >>>> still exist >>>> >>>> 2) tun_chr_open //attach a new queue >>>> --> tun_attach >>>> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); >>>> //there need a RCU grace period >>>> >>>> -->tun->numqueues++; >>>> >>>> 3) tun_net_xmit //a new packet is sending, which pick the last queue >>>> -->if (txq >= tun->numqueues) >>>> //above check passed, but tfile still not renew >>>> -->if (tfile->socket.sk->sk_filter ... >>>> //use the old tfile,trigger use-after-free >>>> >>>> Reported-by: Hulk Robot <hulkci@huawei.com> >>>> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support") >>>> Signed-off-by: YueHaibing <yuehaibing@huawei.com> >>>> --- >>>> drivers/net/tun.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>>> index e9ca1c0..3770aba 100644 >>>> --- a/drivers/net/tun.c >>>> +++ b/drivers/net/tun.c >>>> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, >>>> struct file *file, >>>> */ >>>> rcu_assign_pointer(tfile->tun, tun); >>>> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); >>>> + synchronize_net(); >>>> tun->numqueues++; >>>> tun_set_real_num_queues(tun); >>>> out: >>> >>> Good catch, that's one of my suspicion as well. Assume that this has >>> been tested. >>> >>> Acked-by: Jason Wang <jasowang@redhat.com> >>> >>> This patch is needed for -stable. >>> >>> I will post another theoretical issue similar to this soon. >>> >>> Thanks >>> >> Ok, a second thought. All evil come from the access of tun->numqueues >> without sufficient synchronization. This is a partial fix, another >> possible case is __tun_detach(), we need either > In __tun_detach(),tun->tfiles changes should always call synchronize_net(), > and free tfile after RCU grace period. Exactly, and that's why I suggest to check pointers in tfiles against NULL for method 1) I mentioned below. > tun_net_xmit() doesn't have the chance to > access the change because it holding the rcu_read_lock(). The problem is the following codes: --tun->numqueues; ... synchronize_net(); We need make sure the decrement of tun->numqueues be visible to readers after synchronize_net(). And in tun_net_xmit(): rcu_read_lock(); tfile = rcu_dereference(tun->tfiles[txq]); /* Drop packet if interface is not attached */ if (txq >= tun->numqueues) goto drop; We should make sure tun->numqueues were read after tfiles array dereference. Unfortunately, we don't have such guarantee even with this patch. So we should change to use smp_load_acquire()/smp_store_release() for those cases (method 2). It looks to me checking file against NULL (and NULL out tfiles[tun->numqueues] in __tun_detach) is much more easier to be understand. What do you think? Thanks > > So __tun_detach() path cannot cause the use after free, isn't it? > > Regards > >> 1) synchronize through pointers in tfiles >> >> 2) or smp_load_acquire()/smp_store_release() to solve it completely. >> >> Let me post a complete fix. >> >> Thanks
On Sat, Apr 27, 2019 at 8:06 PM Yue Haibing <yuehaibing@huawei.com> wrote: > > If tun driver have multiqueues, user close the last queue by > tun_detach, then tun->tfiles[index] is not cleared. Then a new > queue may add to the tun, which using rcu_assign_pointer > tun->tfiles[index] to the new tfile and increase the numqueues. > However if there send a packet during this time, which picking the last > queue, it may uses the old tun->tfiles[index], beacause there no > RCU grace period. This analysis makes sense. It is a normal scenario for RCU, where readers could still read even after we unpublish the RCU protected structure, we only need to worry about when we free it. > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index e9ca1c0..3770aba 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file, > */ > rcu_assign_pointer(tfile->tun, tun); > rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > + synchronize_net(); > tun->numqueues++; > tun_set_real_num_queues(tun); But this fix doesn't make any sense, we only wait for RCU grace period when freeing old ones, not for new ones. RCU grace period is all about readers against free. This is why I came up with the SOCK_RCU_FREE patch, which is also blocking-free. Thanks.
On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote: > > tun_net_xmit() doesn't have the chance to > > access the change because it holding the rcu_read_lock(). > > > The problem is the following codes: > > > --tun->numqueues; > > ... > > synchronize_net(); > > We need make sure the decrement of tun->numqueues be visible to readers > after synchronize_net(). And in tun_net_xmit(): It doesn't matter at all. Readers are okay to read it even they still use the stale tun->numqueues, as long as the tfile is not freed readers can read whatever they want... The decrement of tun->numqueues is just how we unpublish the old tfile, it is still valid for readers to read it _after_ unpublish, we only need to worry about free, not about unpublish. This is the whole spirit of RCU. You need to rethink about my SOCK_RCU_FREE patch. Thanks.
On 2019/4/29 上午1:59, Cong Wang wrote: > On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote: >>> tun_net_xmit() doesn't have the chance to >>> access the change because it holding the rcu_read_lock(). >> >> >> The problem is the following codes: >> >> >> --tun->numqueues; >> >> ... >> >> synchronize_net(); >> >> We need make sure the decrement of tun->numqueues be visible to readers >> after synchronize_net(). And in tun_net_xmit(): > > It doesn't matter at all. Readers are okay to read it even they still use the > stale tun->numqueues, as long as the tfile is not freed readers can read > whatever they want... This is only true if we set SOCK_RCU_FREE, isn't it? > > The decrement of tun->numqueues is just how we unpublish the old > tfile, it is still valid for readers to read it _after_ unpublish, we only need > to worry about free, not about unpublish. This is the whole spirit of RCU. > The point is we don't convert tun->numqueues to RCU but use synchronize_net(). > You need to rethink about my SOCK_RCU_FREE patch. The code is wrote before SOCK_RCU_FREE is introduced and assume no de-reference from device after synchronize_net(). It doesn't harm to figure out the root cause which may give us more confidence to the fix (e.g like SOCK_RCU_FREE). I don't object to fix with SOCK_RCU_FREE, but then we should remove the redundant synchronize_net(). But I still prefer to synchronize everything explicitly like (completely untested): From df91f77d35a6aa7943b6f2a7d4b329990896a0fe Mon Sep 17 00:00:00 2001 From: Jason Wang <jasowang@redhat.com> Date: Mon, 29 Apr 2019 10:21:06 +0800 Subject: [PATCH] tuntap: synchronize through tfiles instead of numqueues Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/net/tun.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 80bff1b4ec17..03715f605fb5 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -698,6 +698,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean) rcu_assign_pointer(tun->tfiles[index], tun->tfiles[tun->numqueues - 1]); + rcu_assign_pointer(tun->tfiles[tun->numqueues], NULL); ntfile = rtnl_dereference(tun->tfiles[index]); ntfile->queue_index = index; @@ -1082,7 +1083,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) tfile = rcu_dereference(tun->tfiles[txq]); /* Drop packet if interface is not attached */ - if (txq >= tun->numqueues) + if (!tfile) goto drop; if (!rcu_dereference(tun->steering_prog)) @@ -1305,15 +1306,13 @@ static int tun_xdp_xmit(struct net_device *dev, int n, rcu_read_lock(); - numqueues = READ_ONCE(tun->numqueues); - if (!numqueues) { + tfile = rcu_dereference(tun->tfiles[smp_processor_id() % + tun->numqueues]); + if (!tfile) { rcu_read_unlock(); return -ENXIO; /* Caller will free/return all frames */ } - tfile = rcu_dereference(tun->tfiles[smp_processor_id() % - numqueues]); - spin_lock(&tfile->tx_ring.producer_lock); for (i = 0; i < n; i++) { struct xdp_frame *xdp = frames[i];
> > On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> > wrote: > >>> tun_net_xmit() doesn't have the chance to > >>> access the change because it holding the rcu_read_lock(). > >> > >> > >> The problem is the following codes: > >> > >> > >> --tun->numqueues; > >> > >> ... > >> > >> synchronize_net(); > >> > >> We need make sure the decrement of tun->numqueues be visible to > readers > >> after synchronize_net(). And in tun_net_xmit(): > > > > It doesn't matter at all. Readers are okay to read it even they still use the > > stale tun->numqueues, as long as the tfile is not freed readers can read > > whatever they want... > > This is only true if we set SOCK_RCU_FREE, isn't it? > > > > > The decrement of tun->numqueues is just how we unpublish the old > > tfile, it is still valid for readers to read it _after_ unpublish, we only need > > to worry about free, not about unpublish. This is the whole spirit of RCU. > > > > The point is we don't convert tun->numqueues to RCU but use > synchronize_net(). > > > You need to rethink about my SOCK_RCU_FREE patch. > > The code is wrote before SOCK_RCU_FREE is introduced and assume no > de-reference from device after synchronize_net(). It doesn't harm to > figure out the root cause which may give us more confidence to the fix > (e.g like SOCK_RCU_FREE). > > I don't object to fix with SOCK_RCU_FREE, but then we should remove > the redundant synchronize_net(). But I still prefer to synchronize > everything explicitly like (completely untested): > > From df91f77d35a6aa7943b6f2a7d4b329990896a0fe Mon Sep 17 00:00:00 > 2001 > From: Jason Wang <jasowang@redhat.com> > Date: Mon, 29 Apr 2019 10:21:06 +0800 > Subject: [PATCH] tuntap: synchronize through tfiles instead of numqueues > > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > drivers/net/tun.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 80bff1b4ec17..03715f605fb5 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -698,6 +698,7 @@ static void __tun_detach(struct tun_file *tfile, bool > clean) > > rcu_assign_pointer(tun->tfiles[index], > tun->tfiles[tun->numqueues - 1]); > + rcu_assign_pointer(tun->tfiles[tun->numqueues], NULL); Should be "rcu_assign_pointer(tun->tfiles[tun->numqueues - 1], NULL);" > ntfile = rtnl_dereference(tun->tfiles[index]); > ntfile->queue_index = index; > > @@ -1082,7 +1083,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff > *skb, struct net_device *dev) > tfile = rcu_dereference(tun->tfiles[txq]); > > /* Drop packet if interface is not attached */ > - if (txq >= tun->numqueues) > + if (!tfile) > goto drop; > > if (!rcu_dereference(tun->steering_prog)) > @@ -1305,15 +1306,13 @@ static int tun_xdp_xmit(struct net_device *dev, > int n, > > rcu_read_lock(); > > - numqueues = READ_ONCE(tun->numqueues); > - if (!numqueues) { > + tfile = rcu_dereference(tun->tfiles[smp_processor_id() % > + tun->numqueues]); > + if (!tfile) { > rcu_read_unlock(); > return -ENXIO; /* Caller will free/return all frames */ > } > > - tfile = rcu_dereference(tun->tfiles[smp_processor_id() % > - numqueues]); > - > spin_lock(&tfile->tx_ring.producer_lock); > for (i = 0; i < n; i++) { > struct xdp_frame *xdp = frames[i]; > -- > 2.19.1
On 2019/4/29 10:23, Jason Wang wrote: > > On 2019/4/29 上午1:59, Cong Wang wrote: >> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote: >>>> tun_net_xmit() doesn't have the chance to >>>> access the change because it holding the rcu_read_lock(). >>> >>> >>> The problem is the following codes: >>> >>> >>> --tun->numqueues; >>> >>> ... >>> >>> synchronize_net(); >>> >>> We need make sure the decrement of tun->numqueues be visible to readers >>> after synchronize_net(). And in tun_net_xmit(): >> >> It doesn't matter at all. Readers are okay to read it even they still use the >> stale tun->numqueues, as long as the tfile is not freed readers can read >> whatever they want... > > This is only true if we set SOCK_RCU_FREE, isn't it? > >> >> The decrement of tun->numqueues is just how we unpublish the old >> tfile, it is still valid for readers to read it _after_ unpublish, we only need >> to worry about free, not about unpublish. This is the whole spirit of RCU. >> > > The point is we don't convert tun->numqueues to RCU but use > synchronize_net(). > >> You need to rethink about my SOCK_RCU_FREE patch. > > The code is wrote before SOCK_RCU_FREE is introduced and assume no > de-reference from device after synchronize_net(). It doesn't harm to > figure out the root cause which may give us more confidence to the fix > (e.g like SOCK_RCU_FREE). > > I don't object to fix with SOCK_RCU_FREE, but then we should remove > the redundant synchronize_net(). But I still prefer to synchronize > everything explicitly like (completely untested): > >>From df91f77d35a6aa7943b6f2a7d4b329990896a0fe Mon Sep 17 00:00:00 2001 > From: Jason Wang <jasowang@redhat.com> > Date: Mon, 29 Apr 2019 10:21:06 +0800 > Subject: [PATCH] tuntap: synchronize through tfiles instead of numqueues > > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > drivers/net/tun.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 80bff1b4ec17..03715f605fb5 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -698,6 +698,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean) > > rcu_assign_pointer(tun->tfiles[index], > tun->tfiles[tun->numqueues - 1]); > + rcu_assign_pointer(tun->tfiles[tun->numqueues], NULL); > ntfile = rtnl_dereference(tun->tfiles[index]); move here to avoid NULL pointer dereference, it works for me rcu_assign_pointer(tun->tfiles[tun->numqueues -1 ], NULL); > ntfile->queue_index = index; > > @@ -1082,7 +1083,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) > tfile = rcu_dereference(tun->tfiles[txq]); > > /* Drop packet if interface is not attached */ > - if (txq >= tun->numqueues) > + if (!tfile) > goto drop; > > if (!rcu_dereference(tun->steering_prog)) > @@ -1305,15 +1306,13 @@ static int tun_xdp_xmit(struct net_device *dev, int n, > > rcu_read_lock(); > > - numqueues = READ_ONCE(tun->numqueues); > - if (!numqueues) { > + tfile = rcu_dereference(tun->tfiles[smp_processor_id() % > + tun->numqueues]); > + if (!tfile) { > rcu_read_unlock(); > return -ENXIO; /* Caller will free/return all frames */ > } > > - tfile = rcu_dereference(tun->tfiles[smp_processor_id() % > - numqueues]); > - > spin_lock(&tfile->tx_ring.producer_lock); > for (i = 0; i < n; i++) { > struct xdp_frame *xdp = frames[i]; >
On Sun, Apr 28, 2019 at 11:05:39AM +0800, Yue Haibing wrote: > From: YueHaibing <yuehaibing@huawei.com> > > KASAN report this: > > BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104 > Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0 > > CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 > Call Trace: > <IRQ> > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xca/0x13e lib/dump_stack.c:113 > print_address_description+0x79/0x330 mm/kasan/report.c:253 > kasan_report_error mm/kasan/report.c:351 [inline] > kasan_report+0x18a/0x2d0 mm/kasan/report.c:409 > tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104 > __netdev_start_xmit include/linux/netdevice.h:4300 [inline] > netdev_start_xmit include/linux/netdevice.h:4309 [inline] > xmit_one net/core/dev.c:3243 [inline] > dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259 > sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327 > qdisc_restart net/sched/sch_generic.c:390 [inline] > __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398 > qdisc_run include/net/pkt_sched.h:120 [inline] > __dev_xmit_skb net/core/dev.c:3438 [inline] > __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797 > neigh_output include/net/neighbour.h:501 [inline] > ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120 > ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154 > NF_HOOK_COND include/linux/netfilter.h:278 [inline] > ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171 > dst_output include/net/dst.h:444 [inline] > NF_HOOK include/linux/netfilter.h:289 [inline] > mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683 > mld_send_cr net/ipv6/mcast.c:1979 [inline] > mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478 > call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326 > expire_timers kernel/time/timer.c:1363 [inline] > __run_timers kernel/time/timer.c:1682 [inline] > run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695 > __do_softirq+0x26d/0xabd kernel/softirq.c:292 > invoke_softirq kernel/softirq.c:372 [inline] > irq_exit+0x209/0x290 kernel/softirq.c:412 > exiting_irq arch/x86/include/asm/apic.h:536 [inline] > smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056 > apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864 > </IRQ> > RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58 > Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 > RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 > RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c > RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0 > R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000 > arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] > default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561 > cpuidle_idle_call kernel/sched/idle.c:153 [inline] > do_idle+0x2ca/0x420 kernel/sched/idle.c:262 > cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368 > start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271 > secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 > > Allocated by task 19764: > set_track mm/kasan/kasan.c:460 [inline] > kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553 > __kmalloc+0x11b/0x2d0 mm/slub.c:3750 > kmalloc include/linux/slab.h:518 [inline] > sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469 > sk_alloc+0x3d/0xc00 net/core/sock.c:1523 > tun_chr_open+0x80/0x560 drivers/net/tun.c:3204 > misc_open+0x367/0x4e0 drivers/char/misc.c:141 > chrdev_open+0x212/0x580 fs/char_dev.c:417 > do_dentry_open+0x704/0x1050 fs/open.c:777 > do_last fs/namei.c:3418 [inline] > path_openat+0x7ed/0x2ae0 fs/namei.c:3533 > do_filp_open+0x1aa/0x2b0 fs/namei.c:3564 > do_sys_open+0x307/0x430 fs/open.c:1069 > do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Freed by task 19764: > set_track mm/kasan/kasan.c:460 [inline] > __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521 > slab_free_hook mm/slub.c:1370 [inline] > slab_free_freelist_hook mm/slub.c:1397 [inline] > slab_free mm/slub.c:2952 [inline] > kfree+0xeb/0x2f0 mm/slub.c:3905 > sk_prot_free net/core/sock.c:1506 [inline] > __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588 > sk_destruct+0x48/0x70 net/core/sock.c:1596 > __sk_free+0xa9/0x270 net/core/sock.c:1607 > sk_free+0x2a/0x30 net/core/sock.c:1618 > sock_put include/net/sock.h:1696 [inline] > __tun_detach+0x464/0xf70 drivers/net/tun.c:735 > tun_detach drivers/net/tun.c:747 [inline] > tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241 > __fput+0x27f/0x7f0 fs/file_table.c:278 > task_work_run+0x136/0x1b0 kernel/task_work.c:113 > tracehook_notify_resume include/linux/tracehook.h:193 [inline] > exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166 > prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline] > syscall_return_slowpath arch/x86/entry/common.c:268 [inline] > do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > The buggy address belongs to the object at ffff88836cc26600 > which belongs to the cache kmalloc-4096 of size 4096 > The buggy address is located 1136 bytes inside of > 4096-byte region [ffff88836cc26600, ffff88836cc27600) > The buggy address belongs to the page: > page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0 > flags: 0x2fffff80008100(slab|head) > raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600 > raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > If tun driver have multiqueues, user close the last queue by > tun_detach, then tun->tfiles[index] is not cleared. Then a new > queue may add to the tun, which using rcu_assign_pointer > tun->tfiles[index] to the new tfile and increase the numqueues. > However if there send a packet during this time, which picking the last > queue, it may uses the old tun->tfiles[index], beacause there no > RCU grace period. > > 1) tun_chr_close //close the last queue > --> __tun_detach //close the last queue, but tun->tfiles[index] still exist > > 2) tun_chr_open //attach a new queue > --> tun_attach > -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > //there need a RCU grace period > > -->tun->numqueues++; > > 3) tun_net_xmit //a new packet is sending, which pick the last queue > -->if (txq >= tun->numqueues) > //above check passed, but tfile still not renew > -->if (tfile->socket.sk->sk_filter ... > //use the old tfile,trigger use-after-free > > Reported-by: Hulk Robot <hulkci@huawei.com> > Fixes: c8d68e6be1c3 ("tuntap: multiqueue support") > Signed-off-by: YueHaibing <yuehaibing@huawei.com> > --- > drivers/net/tun.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index e9ca1c0..3770aba 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file, > */ > rcu_assign_pointer(tfile->tun, tun); > rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > + synchronize_net(); > tun->numqueues++; > tun_set_real_num_queues(tun); > out: The problem seems real enough, but an extra synchronize_net on tun_attach might be a problem, slowing guest startup significantly. Better ideas? > -- > 2.7.0 >
On Sun, Apr 28, 2019 at 7:23 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2019/4/29 上午1:59, Cong Wang wrote: > > On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote: > >>> tun_net_xmit() doesn't have the chance to > >>> access the change because it holding the rcu_read_lock(). > >> > >> > >> The problem is the following codes: > >> > >> > >> --tun->numqueues; > >> > >> ... > >> > >> synchronize_net(); > >> > >> We need make sure the decrement of tun->numqueues be visible to readers > >> after synchronize_net(). And in tun_net_xmit(): > > > > It doesn't matter at all. Readers are okay to read it even they still use the > > stale tun->numqueues, as long as the tfile is not freed readers can read > > whatever they want... > > This is only true if we set SOCK_RCU_FREE, isn't it? Sure, this is how RCU is supposed to work. > > > > > The decrement of tun->numqueues is just how we unpublish the old > > tfile, it is still valid for readers to read it _after_ unpublish, we only need > > to worry about free, not about unpublish. This is the whole spirit of RCU. > > > > The point is we don't convert tun->numqueues to RCU but use > synchronize_net(). Why tun->numqueues needs RCU? It is an integer, and reading a stale value is _perfectly_ fine. If you actually meant to say tun->tfiles[] itself, no, it is a fixed-size array, it doesn't shrink or grow, so we don't need RCU for it. This is also why a stale tun->numqueues is fine, as long as it never goes out-of-bound. > > > You need to rethink about my SOCK_RCU_FREE patch. > > The code is wrote before SOCK_RCU_FREE is introduced and assume no > de-reference from device after synchronize_net(). It doesn't harm to > figure out the root cause which may give us more confidence to the fix > (e.g like SOCK_RCU_FREE). I believe SOCK_RCU_FREE is the fix for the root cause, not just a cover-up. > > I don't object to fix with SOCK_RCU_FREE, but then we should remove > the redundant synchronize_net(). But I still prefer to synchronize > everything explicitly like (completely untested): I agree that synchronize_net() can be removed. However I don't understand your untested patch at all, it looks like to fix a completely different problem rather than this use-after-free. Thanks.
On Mon, Apr 29, 2019 at 7:55 AM Michael S. Tsirkin <mst@redhat.com> wrote: > The problem seems real enough, but an extra synchronize_net on tun_attach > might be a problem, slowing guest startup significantly. > Better ideas? Yes, I proposed the following patch in the other thread. diff --git a/drivers/net/tun.c b/drivers/net/tun.c index e9ca1c088d0b..31c3210288cb 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode, struct file * file) file->private_data = tfile; INIT_LIST_HEAD(&tfile->next); + sock_set_flag(&tfile->sk, SOCK_RCU_FREE); sock_set_flag(&tfile->sk, SOCK_ZEROCOPY); return 0;
On 2019/4/30 0:38, Cong Wang wrote: > On Sun, Apr 28, 2019 at 7:23 PM Jason Wang <jasowang@redhat.com> wrote: >> >> >> On 2019/4/29 上午1:59, Cong Wang wrote: >>> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote: >>>>> tun_net_xmit() doesn't have the chance to >>>>> access the change because it holding the rcu_read_lock(). >>>> >>>> >>>> The problem is the following codes: >>>> >>>> >>>> --tun->numqueues; >>>> >>>> ... >>>> >>>> synchronize_net(); >>>> >>>> We need make sure the decrement of tun->numqueues be visible to readers >>>> after synchronize_net(). And in tun_net_xmit(): >>> >>> It doesn't matter at all. Readers are okay to read it even they still use the >>> stale tun->numqueues, as long as the tfile is not freed readers can read >>> whatever they want... >> >> This is only true if we set SOCK_RCU_FREE, isn't it? > > > Sure, this is how RCU is supposed to work. > >> >>> >>> The decrement of tun->numqueues is just how we unpublish the old >>> tfile, it is still valid for readers to read it _after_ unpublish, we only need >>> to worry about free, not about unpublish. This is the whole spirit of RCU. >>> >> >> The point is we don't convert tun->numqueues to RCU but use >> synchronize_net(). > > Why tun->numqueues needs RCU? It is an integer, and reading a stale > value is _perfectly_ fine. > > If you actually meant to say tun->tfiles[] itself, no, it is a fixed-size array, > it doesn't shrink or grow, so we don't need RCU for it. This is also why > a stale tun->numqueues is fine, as long as it never goes out-of-bound. > > >> >>> You need to rethink about my SOCK_RCU_FREE patch. >> >> The code is wrote before SOCK_RCU_FREE is introduced and assume no >> de-reference from device after synchronize_net(). It doesn't harm to >> figure out the root cause which may give us more confidence to the fix >> (e.g like SOCK_RCU_FREE). > > I believe SOCK_RCU_FREE is the fix for the root cause, not just a > cover-up. With SOCK_RCU_FREE tfile is ok , but tfile->sk is freed by sock_put in __tun_detach, it will trgger use-after-free in tun_net_xmit if tun->numqueues check passed. > > >> >> I don't object to fix with SOCK_RCU_FREE, but then we should remove >> the redundant synchronize_net(). But I still prefer to synchronize >> everything explicitly like (completely untested): > > I agree that synchronize_net() can be removed. However I don't > understand your untested patch at all, it looks like to fix a completely > different problem rather than this use-after-free. > > Thanks. > > . >
> Network Developers <netdev@vger.kernel.org> > Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit > > On Mon, Apr 29, 2019 at 7:55 AM Michael S. Tsirkin <mst@redhat.com> > wrote: > > The problem seems real enough, but an extra synchronize_net on > tun_attach > > might be a problem, slowing guest startup significantly. > > Better ideas? > > Yes, I proposed the following patch in the other thread. > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index e9ca1c088d0b..31c3210288cb 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode, > struct file * file) > file->private_data = tfile; > INIT_LIST_HEAD(&tfile->next); > > + sock_set_flag(&tfile->sk, SOCK_RCU_FREE); > sock_set_flag(&tfile->sk, SOCK_ZEROCOPY); > > return 0; This patch should not work. The key point is that when detach the queue with index is equal to tun->numqueues - 1, we do not clear the point in tun->tfiles: static void __tun_detach(...) { ... **** if index == tun->numqueues - 1, nothing changed **** rcu_assign_pointer(tun->tfiles[index], tun->tfiles[tun->numqueues - 1]); .... } And after tfile free, xmit have change to get and use the freed file point. Regards
On Mon, Apr 29, 2019 at 7:44 PM YueHaibing <yuehaibing@huawei.com> wrote: > > With SOCK_RCU_FREE tfile is ok , > > but tfile->sk is freed by sock_put in __tun_detach, it will trgger SOCK_RCU_FREE is exactly for sock and for sock_put(), you need to look into sock_put() path to see where SOCK_RCU_FREE is tested. > > use-after-free in tun_net_xmit if tun->numqueues check passed. Why do you believe we still have use-after-free with SOCK_RCU_FREE? tun_net_xmit() holds RCU read lock, so with SOCK_RCU_FREE, the sock won't be freed until tun_net_xmit() releases RCU read lock. This is just how RCU works...
On Mon, Apr 29, 2019 at 10:11 PM weiyongjun (A) <weiyongjun1@huawei.com> wrote: > This patch should not work. The key point is that when detach the queue > with index is equal to tun->numqueues - 1, we do not clear the point > in tun->tfiles: > > static void __tun_detach(...) > { > ... > **** if index == tun->numqueues - 1, nothing changed **** > rcu_assign_pointer(tun->tfiles[index], > tun->tfiles[tun->numqueues - 1]); > .... > } This is _perfectly_ fine. This is just how we _unpublish_ it, RCU is NOT against unpublish, you keep missing this point. Think about list_del_rcu(). RCU readers could still read the list entry even _after_ list_del_rcu(), this is perfectly fine, list_del_rcu() just unpublishes the list entry from a global list, kfree_rcu() is the one frees it. So, RCU readers never hate "unpublish", they just hate "free". > > And after tfile free, xmit have change to get and use the freed file point. With SOCK_RCU_FREE, it won't be freed until the last reader is gone. This is the fundamental of RCU. Please, at least look into sk_destruct(). Thanks.
On 2019/4/30 上午12:38, Cong Wang wrote: > On Sun, Apr 28, 2019 at 7:23 PM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2019/4/29 上午1:59, Cong Wang wrote: >>> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote: >>>>> tun_net_xmit() doesn't have the chance to >>>>> access the change because it holding the rcu_read_lock(). >>>> >>>> The problem is the following codes: >>>> >>>> >>>> --tun->numqueues; >>>> >>>> ... >>>> >>>> synchronize_net(); >>>> >>>> We need make sure the decrement of tun->numqueues be visible to readers >>>> after synchronize_net(). And in tun_net_xmit(): >>> It doesn't matter at all. Readers are okay to read it even they still use the >>> stale tun->numqueues, as long as the tfile is not freed readers can read >>> whatever they want... >> This is only true if we set SOCK_RCU_FREE, isn't it? > > Sure, this is how RCU is supposed to work. > >>> The decrement of tun->numqueues is just how we unpublish the old >>> tfile, it is still valid for readers to read it _after_ unpublish, we only need >>> to worry about free, not about unpublish. This is the whole spirit of RCU. >>> >> The point is we don't convert tun->numqueues to RCU but use >> synchronize_net(). > Why tun->numqueues needs RCU? It is an integer, and reading a stale > value is _perfectly_ fine. I meant we don't want e.g tun_net_xmit() to see the stale value after synchronize_net() in __tun_detach(), since it has various other steps with the assumption that no tfile dereference from data path. E.g one example is XDP rxq information un-registering which looks racy in the case of XDP_TX. > > If you actually meant to say tun->tfiles[] itself, no, it is a fixed-size array, > it doesn't shrink or grow, so we don't need RCU for it. This is also why > a stale tun->numqueues is fine, as long as it never goes out-of-bound. We do kind of shrinking or growing through tun->numqueues. That's why we check against it in various places. But, of course this is buggy. > > >>> You need to rethink about my SOCK_RCU_FREE patch. >> The code is wrote before SOCK_RCU_FREE is introduced and assume no >> de-reference from device after synchronize_net(). It doesn't harm to >> figure out the root cause which may give us more confidence to the fix >> (e.g like SOCK_RCU_FREE). > I believe SOCK_RCU_FREE is the fix for the root cause, not just a > cover-up. > > >> I don't object to fix with SOCK_RCU_FREE, but then we should remove >> the redundant synchronize_net(). But I still prefer to synchronize >> everything explicitly like (completely untested): > I agree that synchronize_net() can be removed. However I don't > understand your untested patch at all, it looks like to fix a completely > different problem rather than this use-after-free. As has been mentioned, the problem of current code is that we still leave pointers to freed tfile in tfiles[] array in __tun_detach() and the check with tun->numqueues seems racy. So the patch just NULL out the detached tfile pointers and make sure no it can not be dereferenced from tfile after synchronize_net() by dereferencing tfile instead of checking tun->numqueues . Thanks > > Thanks.
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index e9ca1c0..3770aba 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file, */ rcu_assign_pointer(tfile->tun, tun); rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); + synchronize_net(); tun->numqueues++; tun_set_real_num_queues(tun); out: