diff mbox series

[4.14] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue

Message ID 20190824060351.3776-1-tim.froidcoeur@tessares.net
State Not Applicable
Delegated to: David Miller
Headers show
Series [4.14] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue | expand

Commit Message

Tim Froidcoeur Aug. 24, 2019, 6:03 a.m. UTC
Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
triggers following stack trace:

[25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
[25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
[25244.888167] Call Trace:
[25244.889182]  <IRQ>
[25244.890001]  tcp_fragment+0x9c/0x2cf
[25244.891295]  tcp_write_xmit+0x68f/0x988
[25244.892732]  __tcp_push_pending_frames+0x3b/0xa0
[25244.894347]  tcp_data_snd_check+0x2a/0xc8
[25244.895775]  tcp_rcv_established+0x2a8/0x30d
[25244.897282]  tcp_v4_do_rcv+0xb2/0x158
[25244.898666]  tcp_v4_rcv+0x692/0x956
[25244.899959]  ip_local_deliver_finish+0xeb/0x169
[25244.901547]  __netif_receive_skb_core+0x51c/0x582
[25244.903193]  ? inet_gro_receive+0x239/0x247
[25244.904756]  netif_receive_skb_internal+0xab/0xc6
[25244.906395]  napi_gro_receive+0x8a/0xc0
[25244.907760]  receive_buf+0x9a1/0x9cd
[25244.909160]  ? load_balance+0x17a/0x7b7
[25244.910536]  ? vring_unmap_one+0x18/0x61
[25244.911932]  ? detach_buf+0x60/0xfa
[25244.913234]  virtnet_poll+0x128/0x1e1
[25244.914607]  net_rx_action+0x12a/0x2b1
[25244.915953]  __do_softirq+0x11c/0x26b
[25244.917269]  ? handle_irq_event+0x44/0x56
[25244.918695]  irq_exit+0x61/0xa0
[25244.919947]  do_IRQ+0x9d/0xbb
[25244.921065]  common_interrupt+0x85/0x85
[25244.922479]  </IRQ>

tcp_rtx_queue_tail() (called by tcp_fragment()) can call
tcp_write_queue_prev() on the first packet in the queue, which will trigger
the BUG in tcp_write_queue_prev(), because there is no previous packet.

This happens when the retransmit queue is empty, for example in case of a
zero window.

Patch is needed for 4.4, 4.9 and 4.14 stable branches.

Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
Change-Id: I839bde7167ae59e2f7d916c913507372445765c5
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
---
 include/net/tcp.h | 4 ++++
 1 file changed, 4 insertions(+)

--
2.23.0

Comments

Jonathan Lemon Aug. 24, 2019, 10:05 p.m. UTC | #1
On 23 Aug 2019, at 23:03, Tim Froidcoeur wrote:

> Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
> triggers following stack trace:
>
> [25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
> [25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
> [25244.888167] Call Trace:
> [25244.889182]  <IRQ>
> [25244.890001]  tcp_fragment+0x9c/0x2cf
> [25244.891295]  tcp_write_xmit+0x68f/0x988
> [25244.892732]  __tcp_push_pending_frames+0x3b/0xa0
> [25244.894347]  tcp_data_snd_check+0x2a/0xc8
> [25244.895775]  tcp_rcv_established+0x2a8/0x30d
> [25244.897282]  tcp_v4_do_rcv+0xb2/0x158
> [25244.898666]  tcp_v4_rcv+0x692/0x956
> [25244.899959]  ip_local_deliver_finish+0xeb/0x169
> [25244.901547]  __netif_receive_skb_core+0x51c/0x582
> [25244.903193]  ? inet_gro_receive+0x239/0x247
> [25244.904756]  netif_receive_skb_internal+0xab/0xc6
> [25244.906395]  napi_gro_receive+0x8a/0xc0
> [25244.907760]  receive_buf+0x9a1/0x9cd
> [25244.909160]  ? load_balance+0x17a/0x7b7
> [25244.910536]  ? vring_unmap_one+0x18/0x61
> [25244.911932]  ? detach_buf+0x60/0xfa
> [25244.913234]  virtnet_poll+0x128/0x1e1
> [25244.914607]  net_rx_action+0x12a/0x2b1
> [25244.915953]  __do_softirq+0x11c/0x26b
> [25244.917269]  ? handle_irq_event+0x44/0x56
> [25244.918695]  irq_exit+0x61/0xa0
> [25244.919947]  do_IRQ+0x9d/0xbb
> [25244.921065]  common_interrupt+0x85/0x85
> [25244.922479]  </IRQ>
>
> tcp_rtx_queue_tail() (called by tcp_fragment()) can call
> tcp_write_queue_prev() on the first packet in the queue, which will trigger
> the BUG in tcp_write_queue_prev(), because there is no previous packet.
>
> This happens when the retransmit queue is empty, for example in case of a
> zero window.
>
> Patch is needed for 4.4, 4.9 and 4.14 stable branches.
>
> Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
> Change-Id: I839bde7167ae59e2f7d916c913507372445765c5
> Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
> Reviewed-by: Christoph Paasch <cpaasch@apple.com>

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Christoph Paasch Aug. 30, 2019, 11:26 p.m. UTC | #2
Hello,

On 24/08/19 - 15:05:20, Jonathan Lemon wrote:
> 
> 
> On 23 Aug 2019, at 23:03, Tim Froidcoeur wrote:
> 
> > Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
> > triggers following stack trace:
> >
> > [25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
> > [25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
> > [25244.888167] Call Trace:
> > [25244.889182]  <IRQ>
> > [25244.890001]  tcp_fragment+0x9c/0x2cf
> > [25244.891295]  tcp_write_xmit+0x68f/0x988
> > [25244.892732]  __tcp_push_pending_frames+0x3b/0xa0
> > [25244.894347]  tcp_data_snd_check+0x2a/0xc8
> > [25244.895775]  tcp_rcv_established+0x2a8/0x30d
> > [25244.897282]  tcp_v4_do_rcv+0xb2/0x158
> > [25244.898666]  tcp_v4_rcv+0x692/0x956
> > [25244.899959]  ip_local_deliver_finish+0xeb/0x169
> > [25244.901547]  __netif_receive_skb_core+0x51c/0x582
> > [25244.903193]  ? inet_gro_receive+0x239/0x247
> > [25244.904756]  netif_receive_skb_internal+0xab/0xc6
> > [25244.906395]  napi_gro_receive+0x8a/0xc0
> > [25244.907760]  receive_buf+0x9a1/0x9cd
> > [25244.909160]  ? load_balance+0x17a/0x7b7
> > [25244.910536]  ? vring_unmap_one+0x18/0x61
> > [25244.911932]  ? detach_buf+0x60/0xfa
> > [25244.913234]  virtnet_poll+0x128/0x1e1
> > [25244.914607]  net_rx_action+0x12a/0x2b1
> > [25244.915953]  __do_softirq+0x11c/0x26b
> > [25244.917269]  ? handle_irq_event+0x44/0x56
> > [25244.918695]  irq_exit+0x61/0xa0
> > [25244.919947]  do_IRQ+0x9d/0xbb
> > [25244.921065]  common_interrupt+0x85/0x85
> > [25244.922479]  </IRQ>
> >
> > tcp_rtx_queue_tail() (called by tcp_fragment()) can call
> > tcp_write_queue_prev() on the first packet in the queue, which will trigger
> > the BUG in tcp_write_queue_prev(), because there is no previous packet.
> >
> > This happens when the retransmit queue is empty, for example in case of a
> > zero window.
> >
> > Patch is needed for 4.4, 4.9 and 4.14 stable branches.
> >
> > Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
> > Change-Id: I839bde7167ae59e2f7d916c913507372445765c5
> > Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
> > Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
> > Reviewed-by: Christoph Paasch <cpaasch@apple.com>
> 
> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

just checking in, if the patch is getting picked up for -stable ?

(I don't see it in the stable-queue)


Thanks,
Christoph
David Miller Aug. 31, 2019, 2:20 a.m. UTC | #3
From: Christoph Paasch <cpaasch@apple.com>
Date: Fri, 30 Aug 2019 16:26:57 -0700

> (I don't see it in the stable-queue)

I don't handle any stable branch before the most recent two, so this isn't
my territory.
maowenan Aug. 31, 2019, 10:53 a.m. UTC | #4
[<ffffff90094930dc>] skb_peek_tail include/linux/skbuff.h:1516 [inline]
[<ffffff90094930dc>] tcp_write_queue_tail include/net/tcp.h:1478 [inline]
[<ffffff90094930dc>] tcp_rtx_queue_tail include/net/tcp.h:1543 [inline]
[<ffffff90094930dc>] tcp_fragment+0xc64/0xce8 net/ipv4/tcp_output.c:1175
[<ffffff90094a37f0>] tcp_write_wakeup+0x3f8/0x4a0 net/ipv4/tcp_output.c:3496
[<ffffff90094a38d0>] tcp_send_probe0+0x38/0x2d8 net/ipv4/tcp_output.c:3523
[<ffffff90094a75a0>] tcp_probe_timer net/ipv4/tcp_timer.c:343 [inline]
[<ffffff90094a75a0>] tcp_write_timer_handler+0x640/0x720 net/ipv4/tcp_timer.c:548
[<ffffff90094a76f8>] tcp_write_timer+0x78/0x1d0 net/ipv4/tcp_timer.c:562
[<ffffff90082610b0>] call_timer_fn.isra.1+0x58/0x180 kernel/time/timer.c:1144
[<ffffff90082616ec>] __run_timers kernel/time/timer.c:1218 [inline]
[<ffffff90082616ec>] run_timer_softirq+0x514/0x848 kernel/time/timer.c:1401
[<ffffff9008141a28>] __do_softirq+0x340/0x878 kernel/softirq.c:273
[<ffffff9008142890>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
[<ffffff9008142890>] invoke_softirq kernel/softirq.c:357 [inline]
[<ffffff9008142890>] irq_exit+0x170/0x370 kernel/softirq.c:391
[<ffffff900821d550>] __handle_domain_irq+0x100/0x1c0 kernel/irq/irqdesc.c:459
[<ffffff90080914a0>] handle_domain_irq include/linux/irqdesc.h:168 [inline]
[<ffffff90080914a0>] gic_handle_irq+0xd0/0x1f0 drivers/irqchip/irq-gic.c:377
maowenan Aug. 31, 2019, 11:44 a.m. UTC | #5
Tim 

 Can you share the reproduce steps for this issue? C or syzkaller is ok.
 Thanks a lot.
Sasha Levin Aug. 31, 2019, 12:20 p.m. UTC | #6
On Sat, Aug 24, 2019 at 08:03:51AM +0200, Tim Froidcoeur wrote:
>Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
>triggers following stack trace:
>
>[25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
>[25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
>[25244.888167] Call Trace:
>[25244.889182]  <IRQ>
>[25244.890001]  tcp_fragment+0x9c/0x2cf
>[25244.891295]  tcp_write_xmit+0x68f/0x988
>[25244.892732]  __tcp_push_pending_frames+0x3b/0xa0
>[25244.894347]  tcp_data_snd_check+0x2a/0xc8
>[25244.895775]  tcp_rcv_established+0x2a8/0x30d
>[25244.897282]  tcp_v4_do_rcv+0xb2/0x158
>[25244.898666]  tcp_v4_rcv+0x692/0x956
>[25244.899959]  ip_local_deliver_finish+0xeb/0x169
>[25244.901547]  __netif_receive_skb_core+0x51c/0x582
>[25244.903193]  ? inet_gro_receive+0x239/0x247
>[25244.904756]  netif_receive_skb_internal+0xab/0xc6
>[25244.906395]  napi_gro_receive+0x8a/0xc0
>[25244.907760]  receive_buf+0x9a1/0x9cd
>[25244.909160]  ? load_balance+0x17a/0x7b7
>[25244.910536]  ? vring_unmap_one+0x18/0x61
>[25244.911932]  ? detach_buf+0x60/0xfa
>[25244.913234]  virtnet_poll+0x128/0x1e1
>[25244.914607]  net_rx_action+0x12a/0x2b1
>[25244.915953]  __do_softirq+0x11c/0x26b
>[25244.917269]  ? handle_irq_event+0x44/0x56
>[25244.918695]  irq_exit+0x61/0xa0
>[25244.919947]  do_IRQ+0x9d/0xbb
>[25244.921065]  common_interrupt+0x85/0x85
>[25244.922479]  </IRQ>
>
>tcp_rtx_queue_tail() (called by tcp_fragment()) can call
>tcp_write_queue_prev() on the first packet in the queue, which will trigger
>the BUG in tcp_write_queue_prev(), because there is no previous packet.
>
>This happens when the retransmit queue is empty, for example in case of a
>zero window.
>
>Patch is needed for 4.4, 4.9 and 4.14 stable branches.

There needs to be a better explanation of why it's not needed
upstream...

--
Thanks,
Sasha
Matthieu Baerts Aug. 31, 2019, 1:14 p.m. UTC | #7
Hi Sasha,

Thank you for your reply!

On 31/08/2019 14:20, Sasha Levin wrote:
> On Sat, Aug 24, 2019 at 08:03:51AM +0200, Tim Froidcoeur wrote:
>> Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
>> triggers following stack trace:
>>
>> [25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
>> [25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
>> [25244.888167] Call Trace:
>> [25244.889182]  <IRQ>
>> [25244.890001]  tcp_fragment+0x9c/0x2cf
>> [25244.891295]  tcp_write_xmit+0x68f/0x988
>> [25244.892732]  __tcp_push_pending_frames+0x3b/0xa0
>> [25244.894347]  tcp_data_snd_check+0x2a/0xc8
>> [25244.895775]  tcp_rcv_established+0x2a8/0x30d
>> [25244.897282]  tcp_v4_do_rcv+0xb2/0x158
>> [25244.898666]  tcp_v4_rcv+0x692/0x956
>> [25244.899959]  ip_local_deliver_finish+0xeb/0x169
>> [25244.901547]  __netif_receive_skb_core+0x51c/0x582
>> [25244.903193]  ? inet_gro_receive+0x239/0x247
>> [25244.904756]  netif_receive_skb_internal+0xab/0xc6
>> [25244.906395]  napi_gro_receive+0x8a/0xc0
>> [25244.907760]  receive_buf+0x9a1/0x9cd
>> [25244.909160]  ? load_balance+0x17a/0x7b7
>> [25244.910536]  ? vring_unmap_one+0x18/0x61
>> [25244.911932]  ? detach_buf+0x60/0xfa
>> [25244.913234]  virtnet_poll+0x128/0x1e1
>> [25244.914607]  net_rx_action+0x12a/0x2b1
>> [25244.915953]  __do_softirq+0x11c/0x26b
>> [25244.917269]  ? handle_irq_event+0x44/0x56
>> [25244.918695]  irq_exit+0x61/0xa0
>> [25244.919947]  do_IRQ+0x9d/0xbb
>> [25244.921065]  common_interrupt+0x85/0x85
>> [25244.922479]  </IRQ>
>>
>> tcp_rtx_queue_tail() (called by tcp_fragment()) can call
>> tcp_write_queue_prev() on the first packet in the queue, which will
>> trigger
>> the BUG in tcp_write_queue_prev(), because there is no previous packet.
>>
>> This happens when the retransmit queue is empty, for example in case of a
>> zero window.
>>
>> Patch is needed for 4.4, 4.9 and 4.14 stable branches.
> 
> There needs to be a better explanation of why it's not needed
> upstream...

Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") was not a
simple cherry-pick of the original one from master (b617158dc096)
because there is a specific TCP rtx queue only since v4.15. For more
details, please see the commit message of b617158dc096 ("tcp: be more
careful in tcp_fragment()").

The BUG() is hit due to the specific code added to versions older than
v4.15. The comment in skb_queue_prev() (include/linux/skbuff.h:1406),
just before the BUG_ON() somehow suggests to add a check before using
it, what Tim did.

In master, this code path causing the issue will not be taken because
the implementation of tcp_rtx_queue_tail() is different:

    tcp_fragment() → tcp_rtx_queue_tail() → tcp_write_queue_prev() →
skb_queue_prev() → BUG_ON()

Because this patch is specific to versions older than the two last
stable ones but still linked to the network architecture, who can review
and approve it? :)

Cheers,
Matt
Sasha Levin Sept. 1, 2019, 12:07 a.m. UTC | #8
On Sat, Aug 31, 2019 at 03:14:35PM +0200, Matthieu Baerts wrote:
>Hi Sasha,
>
>Thank you for your reply!
>
>On 31/08/2019 14:20, Sasha Levin wrote:
>> On Sat, Aug 24, 2019 at 08:03:51AM +0200, Tim Froidcoeur wrote:
>>> Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
>>> triggers following stack trace:
>>>
>>> [25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
>>> [25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
>>> [25244.888167] Call Trace:
>>> [25244.889182]  <IRQ>
>>> [25244.890001]  tcp_fragment+0x9c/0x2cf
>>> [25244.891295]  tcp_write_xmit+0x68f/0x988
>>> [25244.892732]  __tcp_push_pending_frames+0x3b/0xa0
>>> [25244.894347]  tcp_data_snd_check+0x2a/0xc8
>>> [25244.895775]  tcp_rcv_established+0x2a8/0x30d
>>> [25244.897282]  tcp_v4_do_rcv+0xb2/0x158
>>> [25244.898666]  tcp_v4_rcv+0x692/0x956
>>> [25244.899959]  ip_local_deliver_finish+0xeb/0x169
>>> [25244.901547]  __netif_receive_skb_core+0x51c/0x582
>>> [25244.903193]  ? inet_gro_receive+0x239/0x247
>>> [25244.904756]  netif_receive_skb_internal+0xab/0xc6
>>> [25244.906395]  napi_gro_receive+0x8a/0xc0
>>> [25244.907760]  receive_buf+0x9a1/0x9cd
>>> [25244.909160]  ? load_balance+0x17a/0x7b7
>>> [25244.910536]  ? vring_unmap_one+0x18/0x61
>>> [25244.911932]  ? detach_buf+0x60/0xfa
>>> [25244.913234]  virtnet_poll+0x128/0x1e1
>>> [25244.914607]  net_rx_action+0x12a/0x2b1
>>> [25244.915953]  __do_softirq+0x11c/0x26b
>>> [25244.917269]  ? handle_irq_event+0x44/0x56
>>> [25244.918695]  irq_exit+0x61/0xa0
>>> [25244.919947]  do_IRQ+0x9d/0xbb
>>> [25244.921065]  common_interrupt+0x85/0x85
>>> [25244.922479]  </IRQ>
>>>
>>> tcp_rtx_queue_tail() (called by tcp_fragment()) can call
>>> tcp_write_queue_prev() on the first packet in the queue, which will
>>> trigger
>>> the BUG in tcp_write_queue_prev(), because there is no previous packet.
>>>
>>> This happens when the retransmit queue is empty, for example in case of a
>>> zero window.
>>>
>>> Patch is needed for 4.4, 4.9 and 4.14 stable branches.
>>
>> There needs to be a better explanation of why it's not needed
>> upstream...
>
>Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") was not a
>simple cherry-pick of the original one from master (b617158dc096)
>because there is a specific TCP rtx queue only since v4.15. For more
>details, please see the commit message of b617158dc096 ("tcp: be more
>careful in tcp_fragment()").
>
>The BUG() is hit due to the specific code added to versions older than
>v4.15. The comment in skb_queue_prev() (include/linux/skbuff.h:1406),
>just before the BUG_ON() somehow suggests to add a check before using
>it, what Tim did.
>
>In master, this code path causing the issue will not be taken because
>the implementation of tcp_rtx_queue_tail() is different:
>
>    tcp_fragment() → tcp_rtx_queue_tail() → tcp_write_queue_prev() →
>skb_queue_prev() → BUG_ON()
>
>Because this patch is specific to versions older than the two last
>stable ones but still linked to the network architecture, who can review
>and approve it? :)

Thanks for the explanation. I've changed the commit message to include
this explanation and queued it for 4.4, 4.9 and 4.14.

--
Thanks,
Sasha
Tim Froidcoeur Sept. 3, 2019, 6:58 a.m. UTC | #9
Hi,

I also tried to reproduce this in a targeted way, and run into the
same difficulty as you: satisfying the first condition “
(sk->sk_wmem_queued >> 1) > limit “.
I will not have bandwidth the coming days to try and reproduce it in
this way. Maybe simply forcing a very small send buffer using sysctl
net.ipv4.tcp_wmem might even do the trick?

I suspect that the bug is easier to trigger with the MPTCP patch like
I did originally, due to the way this patch manages the tcp subflow
buffers (it can temporarily overfill the buffers, satisfying that
first condition more often).

another thing, the stacktrace you shared before seems caused by
another issue (corrupted socket?), it will not be solved by the patch
we submitted.

kind regards,

Tim


On Tue, Sep 3, 2019 at 5:22 AM maowenan <maowenan@huawei.com> wrote:
>
> Hi Tim,
>
>
>
> I try to reproduce it with packetdrill or user application, but I can’t.
>
> The first condition “ (sk->sk_wmem_queued >> 1) > limit “    can’t be satisfied,
>
> This condition is to avoid tiny SO_SNDBUF values set by user.
>
> It also adds the some room due to the fact that tcp_sendmsg()
>
> and tcp_sendpage() might overshoot sk_wmem_queued by about one full
>
> TSO skb (64KB size).
>
>
>
>         limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
>
>         if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
>
>                      skb != tcp_rtx_queue_head(sk) &&
>
>                      skb != tcp_rtx_queue_tail(sk))) {
>
>                 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>
>                 return -ENOMEM;
>
>         }
>
>
>
> Can you try to reproduce it with packetdrill or C socket application?
>
>
maowenan Sept. 3, 2019, 8:55 a.m. UTC | #10
On 2019/9/3 14:58, Tim Froidcoeur wrote:
> Hi,
> 
> I also tried to reproduce this in a targeted way, and run into the
> same difficulty as you: satisfying the first condition “
> (sk->sk_wmem_queued >> 1) > limit “.
> I will not have bandwidth the coming days to try and reproduce it in
> this way. Maybe simply forcing a very small send buffer using sysctl
> net.ipv4.tcp_wmem might even do the trick?
> 
> I suspect that the bug is easier to trigger with the MPTCP patch like
> I did originally, due to the way this patch manages the tcp subflow
> buffers (it can temporarily overfill the buffers, satisfying that
> first condition more often).
> 
> another thing, the stacktrace you shared before seems caused by
> another issue (corrupted socket?), it will not be solved by the patch
> we submitted.

The trace shows zero window probe message can be BUG_ON in skb_queue_prev,
this is reproduced on our platform with syzkaller. It can be resolved by
your fix patch.
The thing I need to think is why the first condition can be satisfied?
Eric, Do you have any comments to reproduce it as the first condition
is hard to be true?
(sk->sk_wmem_queued >> 1) > limit

> 
> kind regards,
> 
> Tim
> 
> 
> On Tue, Sep 3, 2019 at 5:22 AM maowenan <maowenan@huawei.com> wrote:
>>
>> Hi Tim,
>>
>>
>>
>> I try to reproduce it with packetdrill or user application, but I can’t.
>>
>> The first condition “ (sk->sk_wmem_queued >> 1) > limit “    can’t be satisfied,
>>
>> This condition is to avoid tiny SO_SNDBUF values set by user.
>>
>> It also adds the some room due to the fact that tcp_sendmsg()
>>
>> and tcp_sendpage() might overshoot sk_wmem_queued by about one full
>>
>> TSO skb (64KB size).
>>
>>
>>
>>         limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
>>
>>         if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
>>
>>                      skb != tcp_rtx_queue_head(sk) &&
>>
>>                      skb != tcp_rtx_queue_tail(sk))) {
>>
>>                 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>>
>>                 return -ENOMEM;
>>
>>         }
>>
>>
>>
>> Can you try to reproduce it with packetdrill or C socket application?
>>
>>
> 
> 
>
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9de2c8cdcc51..1e70ca75c8bf 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1705,6 +1705,10 @@  static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
 {
 	struct sk_buff *skb = tcp_send_head(sk);

+	/* empty retransmit queue, for example due to zero window */
+	if (skb == tcp_write_queue_head(sk))
+		return NULL;
+
 	return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
 }