diff mbox series

[net] net: Prevent invalid access to skb->prev in __qdisc_drop_all

Message ID 20181129222712.8396-1-cpaasch@apple.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series [net] net: Prevent invalid access to skb->prev in __qdisc_drop_all | expand

Commit Message

Christoph Paasch Nov. 29, 2018, 10:27 p.m. UTC
There are places in the stack, where we access skb->prev directly and
modify it. Namely, __qdisc_drop_all().

With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
the skb-list handling has been changed to set skb->next to NULL and set
the list-poison on skb->prev.

With that change, __qdisc_drop_all() will panic when it tries to
dereference skb->prev.

Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
__list_del_entry is used, leaving skb->prev unchanged (thus,
pointing to the list-head if it's the first skb of the list).
This will make __qdisc_drop_all modify the next-pointer of the list-head
and result in a panic later on:

[   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
[   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
[   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
[   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
[   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
[   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
[   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
[   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
[   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
[   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
[   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
[   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
[   34.514593] Call Trace:
[   34.514893]  <IRQ>
[   34.515157]  napi_gro_receive+0x93/0x150
[   34.515632]  receive_buf+0x893/0x3700
[   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
[   34.516629]  ? virtnet_probe+0x1b40/0x1b40
[   34.517153]  ? __stable_node_chain+0x4d0/0x850
[   34.517684]  ? kfree+0x9a/0x180
[   34.518067]  ? __kasan_slab_free+0x171/0x190
[   34.518582]  ? detach_buf+0x1df/0x650
[   34.519061]  ? lapic_next_event+0x5a/0x90
[   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
[   34.520093]  virtnet_poll+0x2df/0xd60
[   34.520533]  ? receive_buf+0x3700/0x3700
[   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
[   34.521631]  ? htb_dequeue+0x1817/0x25f0
[   34.522107]  ? sch_direct_xmit+0x142/0xf30
[   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
[   34.523155]  net_rx_action+0x2f6/0xc50
[   34.523601]  ? napi_complete_done+0x2f0/0x2f0
[   34.524126]  ? kasan_check_read+0x11/0x20
[   34.524608]  ? _raw_spin_lock+0x7d/0xd0
[   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
[   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
[   34.526130]  ? apic_ack_irq+0x9e/0xe0
[   34.526567]  __do_softirq+0x188/0x4b5
[   34.527015]  irq_exit+0x151/0x180
[   34.527417]  do_IRQ+0xdb/0x150
[   34.527783]  common_interrupt+0xf/0xf
[   34.528223]  </IRQ>

This patch makes sure that skb->prev is also set to NULL when removing
it from the list.

The bug is in v4.19.x as well, but the patch can't be backported easily.
I can post a follow-up for that.

Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Cc: Tyler Hicks <tyhicks@canonical.com>
Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
---
 include/linux/skbuff.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Eric Dumazet Nov. 29, 2018, 10:44 p.m. UTC | #1
On 11/29/2018 02:27 PM, Christoph Paasch wrote:
> There are places in the stack, where we access skb->prev directly and
> modify it. Namely, __qdisc_drop_all().
> 
> With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
> the skb-list handling has been changed to set skb->next to NULL and set
> the list-poison on skb->prev.
> 
> With that change, __qdisc_drop_all() will panic when it tries to
> dereference skb->prev.
> 
> Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
> __list_del_entry is used, leaving skb->prev unchanged (thus,
> pointing to the list-head if it's the first skb of the list).
> This will make __qdisc_drop_all modify the next-pointer of the list-head
> and result in a panic later on:
> 
> [   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
> [   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
> [   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
> [   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
> [   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
> [   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
> [   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
> [   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
> [   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
> [   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
> [   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
> [   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
> [   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
> [   34.514593] Call Trace:
> [   34.514893]  <IRQ>
> [   34.515157]  napi_gro_receive+0x93/0x150
> [   34.515632]  receive_buf+0x893/0x3700
> [   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
> [   34.516629]  ? virtnet_probe+0x1b40/0x1b40
> [   34.517153]  ? __stable_node_chain+0x4d0/0x850
> [   34.517684]  ? kfree+0x9a/0x180
> [   34.518067]  ? __kasan_slab_free+0x171/0x190
> [   34.518582]  ? detach_buf+0x1df/0x650
> [   34.519061]  ? lapic_next_event+0x5a/0x90
> [   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
> [   34.520093]  virtnet_poll+0x2df/0xd60
> [   34.520533]  ? receive_buf+0x3700/0x3700
> [   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
> [   34.521631]  ? htb_dequeue+0x1817/0x25f0
> [   34.522107]  ? sch_direct_xmit+0x142/0xf30
> [   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
> [   34.523155]  net_rx_action+0x2f6/0xc50
> [   34.523601]  ? napi_complete_done+0x2f0/0x2f0
> [   34.524126]  ? kasan_check_read+0x11/0x20
> [   34.524608]  ? _raw_spin_lock+0x7d/0xd0
> [   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
> [   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
> [   34.526130]  ? apic_ack_irq+0x9e/0xe0
> [   34.526567]  __do_softirq+0x188/0x4b5
> [   34.527015]  irq_exit+0x151/0x180
> [   34.527417]  do_IRQ+0xdb/0x150
> [   34.527783]  common_interrupt+0xf/0xf
> [   34.528223]  </IRQ>
> 
> This patch makes sure that skb->prev is also set to NULL when removing
> it from the list.
> 
> The bug is in v4.19.x as well, but the patch can't be backported easily.
> I can post a follow-up for that.
> 
> Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> Cc: Tyler Hicks <tyhicks@canonical.com>
> Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
> Signed-off-by: Christoph Paasch <cpaasch@apple.com>
> ---
>  include/linux/skbuff.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0d1b2c3f127b..3bb3bfd390eb 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1373,6 +1373,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb)
>  static inline void skb_mark_not_on_list(struct sk_buff *skb)
>  {
>  	skb->next = NULL;
> +	skb->prev = NULL;
>  }
>  
>  static inline void skb_list_del_init(struct sk_buff *skb)
> 

skb_mark_not_on_list() is used in many place where we do not care of skb->prev

What about fixing netem instead ?
Christoph Paasch Nov. 29, 2018, 10:55 p.m. UTC | #2
On 29/11/18 - 14:44:44, Eric Dumazet wrote:
> 
> 
> On 11/29/2018 02:27 PM, Christoph Paasch wrote:
> > There are places in the stack, where we access skb->prev directly and
> > modify it. Namely, __qdisc_drop_all().
> > 
> > With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
> > the skb-list handling has been changed to set skb->next to NULL and set
> > the list-poison on skb->prev.
> > 
> > With that change, __qdisc_drop_all() will panic when it tries to
> > dereference skb->prev.
> > 
> > Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
> > __list_del_entry is used, leaving skb->prev unchanged (thus,
> > pointing to the list-head if it's the first skb of the list).
> > This will make __qdisc_drop_all modify the next-pointer of the list-head
> > and result in a panic later on:
> > 
> > [   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
> > [   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
> > [   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
> > [   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
> > [   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
> > [   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
> > [   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
> > [   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
> > [   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
> > [   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
> > [   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
> > [   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
> > [   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
> > [   34.514593] Call Trace:
> > [   34.514893]  <IRQ>
> > [   34.515157]  napi_gro_receive+0x93/0x150
> > [   34.515632]  receive_buf+0x893/0x3700
> > [   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
> > [   34.516629]  ? virtnet_probe+0x1b40/0x1b40
> > [   34.517153]  ? __stable_node_chain+0x4d0/0x850
> > [   34.517684]  ? kfree+0x9a/0x180
> > [   34.518067]  ? __kasan_slab_free+0x171/0x190
> > [   34.518582]  ? detach_buf+0x1df/0x650
> > [   34.519061]  ? lapic_next_event+0x5a/0x90
> > [   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
> > [   34.520093]  virtnet_poll+0x2df/0xd60
> > [   34.520533]  ? receive_buf+0x3700/0x3700
> > [   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
> > [   34.521631]  ? htb_dequeue+0x1817/0x25f0
> > [   34.522107]  ? sch_direct_xmit+0x142/0xf30
> > [   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
> > [   34.523155]  net_rx_action+0x2f6/0xc50
> > [   34.523601]  ? napi_complete_done+0x2f0/0x2f0
> > [   34.524126]  ? kasan_check_read+0x11/0x20
> > [   34.524608]  ? _raw_spin_lock+0x7d/0xd0
> > [   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
> > [   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
> > [   34.526130]  ? apic_ack_irq+0x9e/0xe0
> > [   34.526567]  __do_softirq+0x188/0x4b5
> > [   34.527015]  irq_exit+0x151/0x180
> > [   34.527417]  do_IRQ+0xdb/0x150
> > [   34.527783]  common_interrupt+0xf/0xf
> > [   34.528223]  </IRQ>
> > 
> > This patch makes sure that skb->prev is also set to NULL when removing
> > it from the list.
> > 
> > The bug is in v4.19.x as well, but the patch can't be backported easily.
> > I can post a follow-up for that.
> > 
> > Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> > Cc: Tyler Hicks <tyhicks@canonical.com>
> > Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
> > Signed-off-by: Christoph Paasch <cpaasch@apple.com>
> > ---
> >  include/linux/skbuff.h | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 0d1b2c3f127b..3bb3bfd390eb 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1373,6 +1373,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb)
> >  static inline void skb_mark_not_on_list(struct sk_buff *skb)
> >  {
> >  	skb->next = NULL;
> > +	skb->prev = NULL;
> >  }
> >  
> >  static inline void skb_list_del_init(struct sk_buff *skb)
> > 
> 
> skb_mark_not_on_list() is used in many place where we do not care of skb->prev
> 
> What about fixing netem instead ?

Yes, I have been looking at that and Alexey's patch which introduced the
access to skb->prev (cfr.: https://patchwork.ozlabs.org/patch/880717/).

But then I thought that setting skb->prev to NULL is a less risky approach for
-stable.


How would you go about fixing netem instead?

Because, from what I see we basically can enter netem_enqueue here with two
different "types" of skb's. The ones where skb->prev points to the tail of
the list of the segment and the ones where skb->prev points to the
list-head.

Could I match on skb_is_gso() to see if skb->prev is something valid?


Christoph
Eric Dumazet Nov. 29, 2018, 11:09 p.m. UTC | #3
On 11/29/2018 02:55 PM, Christoph Paasch wrote:
> On 29/11/18 - 14:44:44, Eric Dumazet wrote:
>>
>>
>> On 11/29/2018 02:27 PM, Christoph Paasch wrote:
>>> There are places in the stack, where we access skb->prev directly and
>>> modify it. Namely, __qdisc_drop_all().
>>>
>>> With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
>>> the skb-list handling has been changed to set skb->next to NULL and set
>>> the list-poison on skb->prev.
>>>
>>> With that change, __qdisc_drop_all() will panic when it tries to
>>> dereference skb->prev.
>>>
>>> Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
>>> __list_del_entry is used, leaving skb->prev unchanged (thus,
>>> pointing to the list-head if it's the first skb of the list).
>>> This will make __qdisc_drop_all modify the next-pointer of the list-head
>>> and result in a panic later on:
>>>
>>> [   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
>>> [   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
>>> [   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
>>> [   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
>>> [   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
>>> [   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
>>> [   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
>>> [   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
>>> [   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
>>> [   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
>>> [   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
>>> [   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
>>> [   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
>>> [   34.514593] Call Trace:
>>> [   34.514893]  <IRQ>
>>> [   34.515157]  napi_gro_receive+0x93/0x150
>>> [   34.515632]  receive_buf+0x893/0x3700
>>> [   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
>>> [   34.516629]  ? virtnet_probe+0x1b40/0x1b40
>>> [   34.517153]  ? __stable_node_chain+0x4d0/0x850
>>> [   34.517684]  ? kfree+0x9a/0x180
>>> [   34.518067]  ? __kasan_slab_free+0x171/0x190
>>> [   34.518582]  ? detach_buf+0x1df/0x650
>>> [   34.519061]  ? lapic_next_event+0x5a/0x90
>>> [   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
>>> [   34.520093]  virtnet_poll+0x2df/0xd60
>>> [   34.520533]  ? receive_buf+0x3700/0x3700
>>> [   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
>>> [   34.521631]  ? htb_dequeue+0x1817/0x25f0
>>> [   34.522107]  ? sch_direct_xmit+0x142/0xf30
>>> [   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
>>> [   34.523155]  net_rx_action+0x2f6/0xc50
>>> [   34.523601]  ? napi_complete_done+0x2f0/0x2f0
>>> [   34.524126]  ? kasan_check_read+0x11/0x20
>>> [   34.524608]  ? _raw_spin_lock+0x7d/0xd0
>>> [   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
>>> [   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
>>> [   34.526130]  ? apic_ack_irq+0x9e/0xe0
>>> [   34.526567]  __do_softirq+0x188/0x4b5
>>> [   34.527015]  irq_exit+0x151/0x180
>>> [   34.527417]  do_IRQ+0xdb/0x150
>>> [   34.527783]  common_interrupt+0xf/0xf
>>> [   34.528223]  </IRQ>
>>>
>>> This patch makes sure that skb->prev is also set to NULL when removing
>>> it from the list.
>>>
>>> The bug is in v4.19.x as well, but the patch can't be backported easily.
>>> I can post a follow-up for that.
>>>
>>> Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
>>> Cc: Tyler Hicks <tyhicks@canonical.com>
>>> Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
>>> Signed-off-by: Christoph Paasch <cpaasch@apple.com>
>>> ---
>>>  include/linux/skbuff.h | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>> index 0d1b2c3f127b..3bb3bfd390eb 100644
>>> --- a/include/linux/skbuff.h
>>> +++ b/include/linux/skbuff.h
>>> @@ -1373,6 +1373,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb)
>>>  static inline void skb_mark_not_on_list(struct sk_buff *skb)
>>>  {
>>>  	skb->next = NULL;
>>> +	skb->prev = NULL;
>>>  }
>>>  
>>>  static inline void skb_list_del_init(struct sk_buff *skb)
>>>
>>
>> skb_mark_not_on_list() is used in many place where we do not care of skb->prev
>>
>> What about fixing netem instead ?
> 
> Yes, I have been looking at that and Alexey's patch which introduced the
> access to skb->prev (cfr.: https://patchwork.ozlabs.org/patch/880717/).
> 
> But then I thought that setting skb->prev to NULL is a less risky approach for
> -stable.
> 
> 
> How would you go about fixing netem instead?
> 
> Because, from what I see we basically can enter netem_enqueue here with two
> different "types" of skb's. The ones where skb->prev points to the tail of
> the list of the segment and the ones where skb->prev points to the
> list-head.
> 
> Could I match on skb_is_gso() to see if skb->prev is something valid?
>


I was simply thinking of something like :

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 2c38e3d0792468162ee0dc4137f1400160ab9276..22cd46a600576f286803536d45875cd9d537cdca 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -431,6 +431,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
        int count = 1;
        int rc = NET_XMIT_SUCCESS;
 
+       /* Do not fool qdisc_drop_all() */
+       skb->prev = NULL;
+
        /* Random duplication */
        if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
                ++count;
Christoph Paasch Nov. 29, 2018, 11:45 p.m. UTC | #4
On 29/11/18 - 15:09:18, Eric Dumazet wrote:
> 
> 
> On 11/29/2018 02:55 PM, Christoph Paasch wrote:
> > On 29/11/18 - 14:44:44, Eric Dumazet wrote:
> >>
> >>
> >> On 11/29/2018 02:27 PM, Christoph Paasch wrote:
> >>> There are places in the stack, where we access skb->prev directly and
> >>> modify it. Namely, __qdisc_drop_all().
> >>>
> >>> With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
> >>> the skb-list handling has been changed to set skb->next to NULL and set
> >>> the list-poison on skb->prev.
> >>>
> >>> With that change, __qdisc_drop_all() will panic when it tries to
> >>> dereference skb->prev.
> >>>
> >>> Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
> >>> __list_del_entry is used, leaving skb->prev unchanged (thus,
> >>> pointing to the list-head if it's the first skb of the list).
> >>> This will make __qdisc_drop_all modify the next-pointer of the list-head
> >>> and result in a panic later on:
> >>>
> >>> [   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
> >>> [   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
> >>> [   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
> >>> [   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
> >>> [   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
> >>> [   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
> >>> [   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
> >>> [   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
> >>> [   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
> >>> [   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
> >>> [   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
> >>> [   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
> >>> [   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
> >>> [   34.514593] Call Trace:
> >>> [   34.514893]  <IRQ>
> >>> [   34.515157]  napi_gro_receive+0x93/0x150
> >>> [   34.515632]  receive_buf+0x893/0x3700
> >>> [   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
> >>> [   34.516629]  ? virtnet_probe+0x1b40/0x1b40
> >>> [   34.517153]  ? __stable_node_chain+0x4d0/0x850
> >>> [   34.517684]  ? kfree+0x9a/0x180
> >>> [   34.518067]  ? __kasan_slab_free+0x171/0x190
> >>> [   34.518582]  ? detach_buf+0x1df/0x650
> >>> [   34.519061]  ? lapic_next_event+0x5a/0x90
> >>> [   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
> >>> [   34.520093]  virtnet_poll+0x2df/0xd60
> >>> [   34.520533]  ? receive_buf+0x3700/0x3700
> >>> [   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
> >>> [   34.521631]  ? htb_dequeue+0x1817/0x25f0
> >>> [   34.522107]  ? sch_direct_xmit+0x142/0xf30
> >>> [   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
> >>> [   34.523155]  net_rx_action+0x2f6/0xc50
> >>> [   34.523601]  ? napi_complete_done+0x2f0/0x2f0
> >>> [   34.524126]  ? kasan_check_read+0x11/0x20
> >>> [   34.524608]  ? _raw_spin_lock+0x7d/0xd0
> >>> [   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
> >>> [   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
> >>> [   34.526130]  ? apic_ack_irq+0x9e/0xe0
> >>> [   34.526567]  __do_softirq+0x188/0x4b5
> >>> [   34.527015]  irq_exit+0x151/0x180
> >>> [   34.527417]  do_IRQ+0xdb/0x150
> >>> [   34.527783]  common_interrupt+0xf/0xf
> >>> [   34.528223]  </IRQ>
> >>>
> >>> This patch makes sure that skb->prev is also set to NULL when removing
> >>> it from the list.
> >>>
> >>> The bug is in v4.19.x as well, but the patch can't be backported easily.
> >>> I can post a follow-up for that.
> >>>
> >>> Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> >>> Cc: Tyler Hicks <tyhicks@canonical.com>
> >>> Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
> >>> Signed-off-by: Christoph Paasch <cpaasch@apple.com>
> >>> ---
> >>>  include/linux/skbuff.h | 1 +
> >>>  1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >>> index 0d1b2c3f127b..3bb3bfd390eb 100644
> >>> --- a/include/linux/skbuff.h
> >>> +++ b/include/linux/skbuff.h
> >>> @@ -1373,6 +1373,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb)
> >>>  static inline void skb_mark_not_on_list(struct sk_buff *skb)
> >>>  {
> >>>  	skb->next = NULL;
> >>> +	skb->prev = NULL;
> >>>  }
> >>>  
> >>>  static inline void skb_list_del_init(struct sk_buff *skb)
> >>>
> >>
> >> skb_mark_not_on_list() is used in many place where we do not care of skb->prev
> >>
> >> What about fixing netem instead ?
> > 
> > Yes, I have been looking at that and Alexey's patch which introduced the
> > access to skb->prev (cfr.: https://patchwork.ozlabs.org/patch/880717/).
> > 
> > But then I thought that setting skb->prev to NULL is a less risky approach for
> > -stable.
> > 
> > 
> > How would you go about fixing netem instead?
> > 
> > Because, from what I see we basically can enter netem_enqueue here with two
> > different "types" of skb's. The ones where skb->prev points to the tail of
> > the list of the segment and the ones where skb->prev points to the
> > list-head.
> > 
> > Could I match on skb_is_gso() to see if skb->prev is something valid?
> >
> 
> 
> I was simply thinking of something like :
> 
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index 2c38e3d0792468162ee0dc4137f1400160ab9276..22cd46a600576f286803536d45875cd9d537cdca 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -431,6 +431,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>         int count = 1;
>         int rc = NET_XMIT_SUCCESS;
>  
> +       /* Do not fool qdisc_drop_all() */
> +       skb->prev = NULL;
> +

Ah yeah, that should work!

I thought we would enter netem_queue with an skb that was already segmented.
Now I see that the segmentation actually happens in netem_enqueue for the
corruption.


I can resubmit a patch.


Christoph


>         /* Random duplication */
>         if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
>                 ++count;
>
David Miller Nov. 29, 2018, 11:53 p.m. UTC | #5
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 29 Nov 2018 15:09:18 -0800

> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index 2c38e3d0792468162ee0dc4137f1400160ab9276..22cd46a600576f286803536d45875cd9d537cdca 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -431,6 +431,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>         int count = 1;
>         int rc = NET_XMIT_SUCCESS;
>  
> +       /* Do not fool qdisc_drop_all() */
> +       skb->prev = NULL;
> +
>         /* Random duplication */
>         if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
>                 ++count;

If this works I definitely prefer it to making the entire stack pay the
price to fix this crash.
David Miller Nov. 29, 2018, 11:59 p.m. UTC | #6
From: Christoph Paasch <cpaasch@apple.com>
Date: Thu, 29 Nov 2018 15:45:19 -0800

> I can resubmit a patch.

Please do after testing.
Christoph Paasch Nov. 30, 2018, midnight UTC | #7
On Thu, Nov 29, 2018 at 3:54 PM David Miller <davem@davemloft.net> wrote:
>
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 29 Nov 2018 15:09:18 -0800
>
> > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > index 2c38e3d0792468162ee0dc4137f1400160ab9276..22cd46a600576f286803536d45875cd9d537cdca 100644
> > --- a/net/sched/sch_netem.c
> > +++ b/net/sched/sch_netem.c
> > @@ -431,6 +431,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >         int count = 1;
> >         int rc = NET_XMIT_SUCCESS;
> >
> > +       /* Do not fool qdisc_drop_all() */
> > +       skb->prev = NULL;
> > +
> >         /* Random duplication */
> >         if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
> >                 ++count;
>
> If this works I definitely prefer it to making the entire stack pay the
> price to fix this crash.

Yes, I tried it out and it works.


Christoph
diff mbox series

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0d1b2c3f127b..3bb3bfd390eb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1373,6 +1373,7 @@  static inline void skb_zcopy_abort(struct sk_buff *skb)
 static inline void skb_mark_not_on_list(struct sk_buff *skb)
 {
 	skb->next = NULL;
+	skb->prev = NULL;
 }
 
 static inline void skb_list_del_init(struct sk_buff *skb)