Message ID | 20181129222712.8396-1-cpaasch@apple.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net] net: Prevent invalid access to skb->prev in __qdisc_drop_all | expand |
On 11/29/2018 02:27 PM, Christoph Paasch wrote: > There are places in the stack, where we access skb->prev directly and > modify it. Namely, __qdisc_drop_all(). > > With commit 68d2f84a1368 ("net: gro: properly remove skb from list") > the skb-list handling has been changed to set skb->next to NULL and set > the list-poison on skb->prev. > > With that change, __qdisc_drop_all() will panic when it tries to > dereference skb->prev. > > Since commit 992cba7e276d ("net: Add and use skb_list_del_init().") > __list_del_entry is used, leaving skb->prev unchanged (thus, > pointing to the list-head if it's the first skb of the list). > This will make __qdisc_drop_all modify the next-pointer of the list-head > and result in a panic later on: > > [ 34.501053] general protection fault: 0000 [#1] SMP KASAN PTI > [ 34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108 > [ 34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 > [ 34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90 > [ 34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04 > [ 34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202 > [ 34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6 > [ 34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038 > [ 34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062 > [ 34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000 > [ 34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008 > [ 34.512082] FS: 0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000 > [ 34.513036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0 > [ 34.514593] Call Trace: > [ 34.514893] <IRQ> > [ 34.515157] napi_gro_receive+0x93/0x150 > [ 34.515632] receive_buf+0x893/0x3700 > [ 34.516094] ? __netif_receive_skb+0x1f/0x1a0 > [ 34.516629] ? virtnet_probe+0x1b40/0x1b40 > [ 34.517153] ? __stable_node_chain+0x4d0/0x850 > [ 34.517684] ? kfree+0x9a/0x180 > [ 34.518067] ? __kasan_slab_free+0x171/0x190 > [ 34.518582] ? detach_buf+0x1df/0x650 > [ 34.519061] ? lapic_next_event+0x5a/0x90 > [ 34.519539] ? virtqueue_get_buf_ctx+0x280/0x7f0 > [ 34.520093] virtnet_poll+0x2df/0xd60 > [ 34.520533] ? receive_buf+0x3700/0x3700 > [ 34.521027] ? qdisc_watchdog_schedule_ns+0xd5/0x140 > [ 34.521631] ? htb_dequeue+0x1817/0x25f0 > [ 34.522107] ? sch_direct_xmit+0x142/0xf30 > [ 34.522595] ? virtqueue_napi_schedule+0x26/0x30 > [ 34.523155] net_rx_action+0x2f6/0xc50 > [ 34.523601] ? napi_complete_done+0x2f0/0x2f0 > [ 34.524126] ? kasan_check_read+0x11/0x20 > [ 34.524608] ? _raw_spin_lock+0x7d/0xd0 > [ 34.525070] ? _raw_spin_lock_bh+0xd0/0xd0 > [ 34.525563] ? kvm_guest_apic_eoi_write+0x6b/0x80 > [ 34.526130] ? apic_ack_irq+0x9e/0xe0 > [ 34.526567] __do_softirq+0x188/0x4b5 > [ 34.527015] irq_exit+0x151/0x180 > [ 34.527417] do_IRQ+0xdb/0x150 > [ 34.527783] common_interrupt+0xf/0xf > [ 34.528223] </IRQ> > > This patch makes sure that skb->prev is also set to NULL when removing > it from the list. > > The bug is in v4.19.x as well, but the patch can't be backported easily. > I can post a follow-up for that. > > Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> > Cc: Tyler Hicks <tyhicks@canonical.com> > Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list") > Signed-off-by: Christoph Paasch <cpaasch@apple.com> > --- > include/linux/skbuff.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 0d1b2c3f127b..3bb3bfd390eb 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -1373,6 +1373,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb) > static inline void skb_mark_not_on_list(struct sk_buff *skb) > { > skb->next = NULL; > + skb->prev = NULL; > } > > static inline void skb_list_del_init(struct sk_buff *skb) > skb_mark_not_on_list() is used in many place where we do not care of skb->prev What about fixing netem instead ?
On 29/11/18 - 14:44:44, Eric Dumazet wrote: > > > On 11/29/2018 02:27 PM, Christoph Paasch wrote: > > There are places in the stack, where we access skb->prev directly and > > modify it. Namely, __qdisc_drop_all(). > > > > With commit 68d2f84a1368 ("net: gro: properly remove skb from list") > > the skb-list handling has been changed to set skb->next to NULL and set > > the list-poison on skb->prev. > > > > With that change, __qdisc_drop_all() will panic when it tries to > > dereference skb->prev. > > > > Since commit 992cba7e276d ("net: Add and use skb_list_del_init().") > > __list_del_entry is used, leaving skb->prev unchanged (thus, > > pointing to the list-head if it's the first skb of the list). > > This will make __qdisc_drop_all modify the next-pointer of the list-head > > and result in a panic later on: > > > > [ 34.501053] general protection fault: 0000 [#1] SMP KASAN PTI > > [ 34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108 > > [ 34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 > > [ 34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90 > > [ 34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04 > > [ 34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202 > > [ 34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6 > > [ 34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038 > > [ 34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062 > > [ 34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000 > > [ 34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008 > > [ 34.512082] FS: 0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000 > > [ 34.513036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0 > > [ 34.514593] Call Trace: > > [ 34.514893] <IRQ> > > [ 34.515157] napi_gro_receive+0x93/0x150 > > [ 34.515632] receive_buf+0x893/0x3700 > > [ 34.516094] ? __netif_receive_skb+0x1f/0x1a0 > > [ 34.516629] ? virtnet_probe+0x1b40/0x1b40 > > [ 34.517153] ? __stable_node_chain+0x4d0/0x850 > > [ 34.517684] ? kfree+0x9a/0x180 > > [ 34.518067] ? __kasan_slab_free+0x171/0x190 > > [ 34.518582] ? detach_buf+0x1df/0x650 > > [ 34.519061] ? lapic_next_event+0x5a/0x90 > > [ 34.519539] ? virtqueue_get_buf_ctx+0x280/0x7f0 > > [ 34.520093] virtnet_poll+0x2df/0xd60 > > [ 34.520533] ? receive_buf+0x3700/0x3700 > > [ 34.521027] ? qdisc_watchdog_schedule_ns+0xd5/0x140 > > [ 34.521631] ? htb_dequeue+0x1817/0x25f0 > > [ 34.522107] ? sch_direct_xmit+0x142/0xf30 > > [ 34.522595] ? virtqueue_napi_schedule+0x26/0x30 > > [ 34.523155] net_rx_action+0x2f6/0xc50 > > [ 34.523601] ? napi_complete_done+0x2f0/0x2f0 > > [ 34.524126] ? kasan_check_read+0x11/0x20 > > [ 34.524608] ? _raw_spin_lock+0x7d/0xd0 > > [ 34.525070] ? _raw_spin_lock_bh+0xd0/0xd0 > > [ 34.525563] ? kvm_guest_apic_eoi_write+0x6b/0x80 > > [ 34.526130] ? apic_ack_irq+0x9e/0xe0 > > [ 34.526567] __do_softirq+0x188/0x4b5 > > [ 34.527015] irq_exit+0x151/0x180 > > [ 34.527417] do_IRQ+0xdb/0x150 > > [ 34.527783] common_interrupt+0xf/0xf > > [ 34.528223] </IRQ> > > > > This patch makes sure that skb->prev is also set to NULL when removing > > it from the list. > > > > The bug is in v4.19.x as well, but the patch can't be backported easily. > > I can post a follow-up for that. > > > > Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> > > Cc: Tyler Hicks <tyhicks@canonical.com> > > Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list") > > Signed-off-by: Christoph Paasch <cpaasch@apple.com> > > --- > > include/linux/skbuff.h | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index 0d1b2c3f127b..3bb3bfd390eb 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -1373,6 +1373,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb) > > static inline void skb_mark_not_on_list(struct sk_buff *skb) > > { > > skb->next = NULL; > > + skb->prev = NULL; > > } > > > > static inline void skb_list_del_init(struct sk_buff *skb) > > > > skb_mark_not_on_list() is used in many place where we do not care of skb->prev > > What about fixing netem instead ? Yes, I have been looking at that and Alexey's patch which introduced the access to skb->prev (cfr.: https://patchwork.ozlabs.org/patch/880717/). But then I thought that setting skb->prev to NULL is a less risky approach for -stable. How would you go about fixing netem instead? Because, from what I see we basically can enter netem_enqueue here with two different "types" of skb's. The ones where skb->prev points to the tail of the list of the segment and the ones where skb->prev points to the list-head. Could I match on skb_is_gso() to see if skb->prev is something valid? Christoph
On 11/29/2018 02:55 PM, Christoph Paasch wrote: > On 29/11/18 - 14:44:44, Eric Dumazet wrote: >> >> >> On 11/29/2018 02:27 PM, Christoph Paasch wrote: >>> There are places in the stack, where we access skb->prev directly and >>> modify it. Namely, __qdisc_drop_all(). >>> >>> With commit 68d2f84a1368 ("net: gro: properly remove skb from list") >>> the skb-list handling has been changed to set skb->next to NULL and set >>> the list-poison on skb->prev. >>> >>> With that change, __qdisc_drop_all() will panic when it tries to >>> dereference skb->prev. >>> >>> Since commit 992cba7e276d ("net: Add and use skb_list_del_init().") >>> __list_del_entry is used, leaving skb->prev unchanged (thus, >>> pointing to the list-head if it's the first skb of the list). >>> This will make __qdisc_drop_all modify the next-pointer of the list-head >>> and result in a panic later on: >>> >>> [ 34.501053] general protection fault: 0000 [#1] SMP KASAN PTI >>> [ 34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108 >>> [ 34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 >>> [ 34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90 >>> [ 34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04 >>> [ 34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202 >>> [ 34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6 >>> [ 34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038 >>> [ 34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062 >>> [ 34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000 >>> [ 34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008 >>> [ 34.512082] FS: 0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000 >>> [ 34.513036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0 >>> [ 34.514593] Call Trace: >>> [ 34.514893] <IRQ> >>> [ 34.515157] napi_gro_receive+0x93/0x150 >>> [ 34.515632] receive_buf+0x893/0x3700 >>> [ 34.516094] ? __netif_receive_skb+0x1f/0x1a0 >>> [ 34.516629] ? virtnet_probe+0x1b40/0x1b40 >>> [ 34.517153] ? __stable_node_chain+0x4d0/0x850 >>> [ 34.517684] ? kfree+0x9a/0x180 >>> [ 34.518067] ? __kasan_slab_free+0x171/0x190 >>> [ 34.518582] ? detach_buf+0x1df/0x650 >>> [ 34.519061] ? lapic_next_event+0x5a/0x90 >>> [ 34.519539] ? virtqueue_get_buf_ctx+0x280/0x7f0 >>> [ 34.520093] virtnet_poll+0x2df/0xd60 >>> [ 34.520533] ? receive_buf+0x3700/0x3700 >>> [ 34.521027] ? qdisc_watchdog_schedule_ns+0xd5/0x140 >>> [ 34.521631] ? htb_dequeue+0x1817/0x25f0 >>> [ 34.522107] ? sch_direct_xmit+0x142/0xf30 >>> [ 34.522595] ? virtqueue_napi_schedule+0x26/0x30 >>> [ 34.523155] net_rx_action+0x2f6/0xc50 >>> [ 34.523601] ? napi_complete_done+0x2f0/0x2f0 >>> [ 34.524126] ? kasan_check_read+0x11/0x20 >>> [ 34.524608] ? _raw_spin_lock+0x7d/0xd0 >>> [ 34.525070] ? _raw_spin_lock_bh+0xd0/0xd0 >>> [ 34.525563] ? kvm_guest_apic_eoi_write+0x6b/0x80 >>> [ 34.526130] ? apic_ack_irq+0x9e/0xe0 >>> [ 34.526567] __do_softirq+0x188/0x4b5 >>> [ 34.527015] irq_exit+0x151/0x180 >>> [ 34.527417] do_IRQ+0xdb/0x150 >>> [ 34.527783] common_interrupt+0xf/0xf >>> [ 34.528223] </IRQ> >>> >>> This patch makes sure that skb->prev is also set to NULL when removing >>> it from the list. >>> >>> The bug is in v4.19.x as well, but the patch can't be backported easily. >>> I can post a follow-up for that. >>> >>> Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> >>> Cc: Tyler Hicks <tyhicks@canonical.com> >>> Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list") >>> Signed-off-by: Christoph Paasch <cpaasch@apple.com> >>> --- >>> include/linux/skbuff.h | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >>> index 0d1b2c3f127b..3bb3bfd390eb 100644 >>> --- a/include/linux/skbuff.h >>> +++ b/include/linux/skbuff.h >>> @@ -1373,6 +1373,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb) >>> static inline void skb_mark_not_on_list(struct sk_buff *skb) >>> { >>> skb->next = NULL; >>> + skb->prev = NULL; >>> } >>> >>> static inline void skb_list_del_init(struct sk_buff *skb) >>> >> >> skb_mark_not_on_list() is used in many place where we do not care of skb->prev >> >> What about fixing netem instead ? > > Yes, I have been looking at that and Alexey's patch which introduced the > access to skb->prev (cfr.: https://patchwork.ozlabs.org/patch/880717/). > > But then I thought that setting skb->prev to NULL is a less risky approach for > -stable. > > > How would you go about fixing netem instead? > > Because, from what I see we basically can enter netem_enqueue here with two > different "types" of skb's. The ones where skb->prev points to the tail of > the list of the segment and the ones where skb->prev points to the > list-head. > > Could I match on skb_is_gso() to see if skb->prev is something valid? > I was simply thinking of something like : diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 2c38e3d0792468162ee0dc4137f1400160ab9276..22cd46a600576f286803536d45875cd9d537cdca 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -431,6 +431,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, int count = 1; int rc = NET_XMIT_SUCCESS; + /* Do not fool qdisc_drop_all() */ + skb->prev = NULL; + /* Random duplication */ if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor)) ++count;
On 29/11/18 - 15:09:18, Eric Dumazet wrote: > > > On 11/29/2018 02:55 PM, Christoph Paasch wrote: > > On 29/11/18 - 14:44:44, Eric Dumazet wrote: > >> > >> > >> On 11/29/2018 02:27 PM, Christoph Paasch wrote: > >>> There are places in the stack, where we access skb->prev directly and > >>> modify it. Namely, __qdisc_drop_all(). > >>> > >>> With commit 68d2f84a1368 ("net: gro: properly remove skb from list") > >>> the skb-list handling has been changed to set skb->next to NULL and set > >>> the list-poison on skb->prev. > >>> > >>> With that change, __qdisc_drop_all() will panic when it tries to > >>> dereference skb->prev. > >>> > >>> Since commit 992cba7e276d ("net: Add and use skb_list_del_init().") > >>> __list_del_entry is used, leaving skb->prev unchanged (thus, > >>> pointing to the list-head if it's the first skb of the list). > >>> This will make __qdisc_drop_all modify the next-pointer of the list-head > >>> and result in a panic later on: > >>> > >>> [ 34.501053] general protection fault: 0000 [#1] SMP KASAN PTI > >>> [ 34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108 > >>> [ 34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 > >>> [ 34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90 > >>> [ 34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04 > >>> [ 34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202 > >>> [ 34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6 > >>> [ 34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038 > >>> [ 34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062 > >>> [ 34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000 > >>> [ 34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008 > >>> [ 34.512082] FS: 0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000 > >>> [ 34.513036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0 > >>> [ 34.514593] Call Trace: > >>> [ 34.514893] <IRQ> > >>> [ 34.515157] napi_gro_receive+0x93/0x150 > >>> [ 34.515632] receive_buf+0x893/0x3700 > >>> [ 34.516094] ? __netif_receive_skb+0x1f/0x1a0 > >>> [ 34.516629] ? virtnet_probe+0x1b40/0x1b40 > >>> [ 34.517153] ? __stable_node_chain+0x4d0/0x850 > >>> [ 34.517684] ? kfree+0x9a/0x180 > >>> [ 34.518067] ? __kasan_slab_free+0x171/0x190 > >>> [ 34.518582] ? detach_buf+0x1df/0x650 > >>> [ 34.519061] ? lapic_next_event+0x5a/0x90 > >>> [ 34.519539] ? virtqueue_get_buf_ctx+0x280/0x7f0 > >>> [ 34.520093] virtnet_poll+0x2df/0xd60 > >>> [ 34.520533] ? receive_buf+0x3700/0x3700 > >>> [ 34.521027] ? qdisc_watchdog_schedule_ns+0xd5/0x140 > >>> [ 34.521631] ? htb_dequeue+0x1817/0x25f0 > >>> [ 34.522107] ? sch_direct_xmit+0x142/0xf30 > >>> [ 34.522595] ? virtqueue_napi_schedule+0x26/0x30 > >>> [ 34.523155] net_rx_action+0x2f6/0xc50 > >>> [ 34.523601] ? napi_complete_done+0x2f0/0x2f0 > >>> [ 34.524126] ? kasan_check_read+0x11/0x20 > >>> [ 34.524608] ? _raw_spin_lock+0x7d/0xd0 > >>> [ 34.525070] ? _raw_spin_lock_bh+0xd0/0xd0 > >>> [ 34.525563] ? kvm_guest_apic_eoi_write+0x6b/0x80 > >>> [ 34.526130] ? apic_ack_irq+0x9e/0xe0 > >>> [ 34.526567] __do_softirq+0x188/0x4b5 > >>> [ 34.527015] irq_exit+0x151/0x180 > >>> [ 34.527417] do_IRQ+0xdb/0x150 > >>> [ 34.527783] common_interrupt+0xf/0xf > >>> [ 34.528223] </IRQ> > >>> > >>> This patch makes sure that skb->prev is also set to NULL when removing > >>> it from the list. > >>> > >>> The bug is in v4.19.x as well, but the patch can't be backported easily. > >>> I can post a follow-up for that. > >>> > >>> Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> > >>> Cc: Tyler Hicks <tyhicks@canonical.com> > >>> Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list") > >>> Signed-off-by: Christoph Paasch <cpaasch@apple.com> > >>> --- > >>> include/linux/skbuff.h | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > >>> index 0d1b2c3f127b..3bb3bfd390eb 100644 > >>> --- a/include/linux/skbuff.h > >>> +++ b/include/linux/skbuff.h > >>> @@ -1373,6 +1373,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb) > >>> static inline void skb_mark_not_on_list(struct sk_buff *skb) > >>> { > >>> skb->next = NULL; > >>> + skb->prev = NULL; > >>> } > >>> > >>> static inline void skb_list_del_init(struct sk_buff *skb) > >>> > >> > >> skb_mark_not_on_list() is used in many place where we do not care of skb->prev > >> > >> What about fixing netem instead ? > > > > Yes, I have been looking at that and Alexey's patch which introduced the > > access to skb->prev (cfr.: https://patchwork.ozlabs.org/patch/880717/). > > > > But then I thought that setting skb->prev to NULL is a less risky approach for > > -stable. > > > > > > How would you go about fixing netem instead? > > > > Because, from what I see we basically can enter netem_enqueue here with two > > different "types" of skb's. The ones where skb->prev points to the tail of > > the list of the segment and the ones where skb->prev points to the > > list-head. > > > > Could I match on skb_is_gso() to see if skb->prev is something valid? > > > > > I was simply thinking of something like : > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c > index 2c38e3d0792468162ee0dc4137f1400160ab9276..22cd46a600576f286803536d45875cd9d537cdca 100644 > --- a/net/sched/sch_netem.c > +++ b/net/sched/sch_netem.c > @@ -431,6 +431,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, > int count = 1; > int rc = NET_XMIT_SUCCESS; > > + /* Do not fool qdisc_drop_all() */ > + skb->prev = NULL; > + Ah yeah, that should work! I thought we would enter netem_queue with an skb that was already segmented. Now I see that the segmentation actually happens in netem_enqueue for the corruption. I can resubmit a patch. Christoph > /* Random duplication */ > if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor)) > ++count; >
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 29 Nov 2018 15:09:18 -0800 > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c > index 2c38e3d0792468162ee0dc4137f1400160ab9276..22cd46a600576f286803536d45875cd9d537cdca 100644 > --- a/net/sched/sch_netem.c > +++ b/net/sched/sch_netem.c > @@ -431,6 +431,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, > int count = 1; > int rc = NET_XMIT_SUCCESS; > > + /* Do not fool qdisc_drop_all() */ > + skb->prev = NULL; > + > /* Random duplication */ > if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor)) > ++count; If this works I definitely prefer it to making the entire stack pay the price to fix this crash.
From: Christoph Paasch <cpaasch@apple.com> Date: Thu, 29 Nov 2018 15:45:19 -0800 > I can resubmit a patch. Please do after testing.
On Thu, Nov 29, 2018 at 3:54 PM David Miller <davem@davemloft.net> wrote: > > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 29 Nov 2018 15:09:18 -0800 > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c > > index 2c38e3d0792468162ee0dc4137f1400160ab9276..22cd46a600576f286803536d45875cd9d537cdca 100644 > > --- a/net/sched/sch_netem.c > > +++ b/net/sched/sch_netem.c > > @@ -431,6 +431,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, > > int count = 1; > > int rc = NET_XMIT_SUCCESS; > > > > + /* Do not fool qdisc_drop_all() */ > > + skb->prev = NULL; > > + > > /* Random duplication */ > > if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor)) > > ++count; > > If this works I definitely prefer it to making the entire stack pay the > price to fix this crash. Yes, I tried it out and it works. Christoph
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0d1b2c3f127b..3bb3bfd390eb 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1373,6 +1373,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb) static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; + skb->prev = NULL; } static inline void skb_list_del_init(struct sk_buff *skb)
There are places in the stack, where we access skb->prev directly and modify it. Namely, __qdisc_drop_all(). With commit 68d2f84a1368 ("net: gro: properly remove skb from list") the skb-list handling has been changed to set skb->next to NULL and set the list-poison on skb->prev. With that change, __qdisc_drop_all() will panic when it tries to dereference skb->prev. Since commit 992cba7e276d ("net: Add and use skb_list_del_init().") __list_del_entry is used, leaving skb->prev unchanged (thus, pointing to the list-head if it's the first skb of the list). This will make __qdisc_drop_all modify the next-pointer of the list-head and result in a panic later on: [ 34.501053] general protection fault: 0000 [#1] SMP KASAN PTI [ 34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108 [ 34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 [ 34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90 [ 34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04 [ 34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202 [ 34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6 [ 34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038 [ 34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062 [ 34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000 [ 34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008 [ 34.512082] FS: 0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000 [ 34.513036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0 [ 34.514593] Call Trace: [ 34.514893] <IRQ> [ 34.515157] napi_gro_receive+0x93/0x150 [ 34.515632] receive_buf+0x893/0x3700 [ 34.516094] ? __netif_receive_skb+0x1f/0x1a0 [ 34.516629] ? virtnet_probe+0x1b40/0x1b40 [ 34.517153] ? __stable_node_chain+0x4d0/0x850 [ 34.517684] ? kfree+0x9a/0x180 [ 34.518067] ? __kasan_slab_free+0x171/0x190 [ 34.518582] ? detach_buf+0x1df/0x650 [ 34.519061] ? lapic_next_event+0x5a/0x90 [ 34.519539] ? virtqueue_get_buf_ctx+0x280/0x7f0 [ 34.520093] virtnet_poll+0x2df/0xd60 [ 34.520533] ? receive_buf+0x3700/0x3700 [ 34.521027] ? qdisc_watchdog_schedule_ns+0xd5/0x140 [ 34.521631] ? htb_dequeue+0x1817/0x25f0 [ 34.522107] ? sch_direct_xmit+0x142/0xf30 [ 34.522595] ? virtqueue_napi_schedule+0x26/0x30 [ 34.523155] net_rx_action+0x2f6/0xc50 [ 34.523601] ? napi_complete_done+0x2f0/0x2f0 [ 34.524126] ? kasan_check_read+0x11/0x20 [ 34.524608] ? _raw_spin_lock+0x7d/0xd0 [ 34.525070] ? _raw_spin_lock_bh+0xd0/0xd0 [ 34.525563] ? kvm_guest_apic_eoi_write+0x6b/0x80 [ 34.526130] ? apic_ack_irq+0x9e/0xe0 [ 34.526567] __do_softirq+0x188/0x4b5 [ 34.527015] irq_exit+0x151/0x180 [ 34.527417] do_IRQ+0xdb/0x150 [ 34.527783] common_interrupt+0xf/0xf [ 34.528223] </IRQ> This patch makes sure that skb->prev is also set to NULL when removing it from the list. The bug is in v4.19.x as well, but the patch can't be backported easily. I can post a follow-up for that. Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Cc: Tyler Hicks <tyhicks@canonical.com> Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list") Signed-off-by: Christoph Paasch <cpaasch@apple.com> --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+)