diff mbox

ipv4: avoid divide 0 error in tcp_incr_quickack

Message ID 1414978173-6948-1-git-send-email-chenweilong@huawei.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

chenweilong Nov. 3, 2014, 1:29 a.m. UTC
From: Weilong Chen <chenweilong@huawei.com>

We got a problem like this:
 [ffff8801c1a05570] machine_kexec at ffffffff81025039
 [ffff8801c1a055d0] crash_kexec at ffffffff8109b253
 [ffff8801c1a056a0] oops_end at ffffffff81442aed
 [ffff8801c1a056d0] die at ffffffff81005603
 [ffff8801c1a05700] do_trap at ffffffff81442448
 [ffff8801c1a05760] do_divide_error at ffffffff81002c10
 [ffff8801c1a05888] tcp_send_dupack at ffffffff81385e44
 [ffff8801c1a058c8] tcp_validate_incoming at ffffffff813886b5
 [ffff8801c1a05908] tcp_rcv_state_process at ffffffff8138d0b7
 [ffff8801c1a05958] tcp_child_process at ffffffff81397255
 [ffff8801c1a05988] tcp_v4_do_rcv at ffffffff81395a70
 [ffff8801c1a059d8] tcp_v4_rcv at ffffffff81396fc8
 [ffff8801c1a05a48] ip_local_deliver_finish at ffffffff813746e9
 [ffff8801c1a05a78] ip_local_deliver at ffffffff81374a20
 [ffff8801c1a05aa8] ip_rcv_finish at ffffffff81374389
 [ffff8801c1a05ad8] ip_rcv at ffffffff81374c78
There was a wrong ack packet coming during TCP handshake. The socket's state
was TCP_SYN_RECV, its rcv_mss was not initialize yet. So
tcp_send_dupack -> tcp_enter_quickack_mode got a divide 0 error.
This patch add a state check before tcp_enter_quickack_mode.

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
---
 net/ipv4/tcp_input.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Eric Dumazet Nov. 3, 2014, 3:42 a.m. UTC | #1
On Mon, 2014-11-03 at 09:29 +0800, Chen Weilong wrote:
> From: Weilong Chen <chenweilong@huawei.com>
> 
> We got a problem like this:

> There was a wrong ack packet coming during TCP handshake. The socket's state
> was TCP_SYN_RECV, its rcv_mss was not initialize yet. So
> tcp_send_dupack -> tcp_enter_quickack_mode got a divide 0 error.
> This patch add a state check before tcp_enter_quickack_mode.
> 
> Signed-off-by: Weilong Chen <chenweilong@huawei.com>
> ---
>  net/ipv4/tcp_input.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 4e4617e..9eb56dc 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3986,7 +3986,8 @@ static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
>  	if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
>  	    before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
>  		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
> -		tcp_enter_quickack_mode(sk);
> +		if (sk->sk_state != TCP_SYN_RECV)
> +			tcp_enter_quickack_mode(sk);
>  
>  		if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
>  			u32 end_seq = TCP_SKB_CB(skb)->end_seq;


Sorry I do not think this is the right fix.

We have to not simply avoid the divide, but fix this issue by
understanding the missing steps.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
chenweilong Nov. 3, 2014, 5:31 a.m. UTC | #2
On 2014/11/3 11:42, Eric Dumazet wrote:
> On Mon, 2014-11-03 at 09:29 +0800, Chen Weilong wrote:
>> From: Weilong Chen <chenweilong@huawei.com>
>>
>> We got a problem like this:
> 
>> There was a wrong ack packet coming during TCP handshake. The socket's state
>> was TCP_SYN_RECV, its rcv_mss was not initialize yet. So
>> tcp_send_dupack -> tcp_enter_quickack_mode got a divide 0 error.
>> This patch add a state check before tcp_enter_quickack_mode.
>>
>> Signed-off-by: Weilong Chen <chenweilong@huawei.com>
>> ---
>>  net/ipv4/tcp_input.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 4e4617e..9eb56dc 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -3986,7 +3986,8 @@ static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
>>  	if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
>>  	    before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
>>  		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
>> -		tcp_enter_quickack_mode(sk);
>> +		if (sk->sk_state != TCP_SYN_RECV)
>> +			tcp_enter_quickack_mode(sk);
>>  
>>  		if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
>>  			u32 end_seq = TCP_SKB_CB(skb)->end_seq;
> 
> 
> Sorry I do not think this is the right fix.
> 
> We have to not simply avoid the divide, but fix this issue by
> understanding the missing steps.
> 
Hi Eric,

I check the code and find that:

1.In function "tcp_rcv_state_process",
the "tcp_initialize_rcv_mss" is called at "step 5: check the ACK field" when the sk->sk_state is TCP_SYN_RECV
and there is a "tcp_validate_incoming" just before it.
So when we call "tcp_validate_incoming", the rcv_mss may not been initialized.

2.In function "tcp_validate_incoming",
the "Step 1: check sequence number", according to RFC793 page 69,
If an incoming segment is not acceptable,an acknowledgment should be sent in reply (unless the RST
bit is set, if so drop the segment and return).
So we may call "tcp_send_dupack" while the rcv_mss hasn't been initialized.

3.In function "tcp_send_dupack",
when the condition is suitable, it'll enter quick ack mode. Notice it only check the seq !
So I think add another state check should be OK.

Any suggestion ?

Thanks,
Weilong
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 3, 2014, 3:30 p.m. UTC | #3
On Mon, 2014-11-03 at 13:31 +0800, chenweilong wrote:

> Hi Eric,
> 
> I check the code and find that:
> 
> 1.In function "tcp_rcv_state_process",
> the "tcp_initialize_rcv_mss" is called at "step 5: check the ACK field" when the sk->sk_state is TCP_SYN_RECV
> and there is a "tcp_validate_incoming" just before it.
> So when we call "tcp_validate_incoming", the rcv_mss may not been initialized.
> 
> 2.In function "tcp_validate_incoming",
> the "Step 1: check sequence number", according to RFC793 page 69,
> If an incoming segment is not acceptable,an acknowledgment should be sent in reply (unless the RST
> bit is set, if so drop the segment and return).
> So we may call "tcp_send_dupack" while the rcv_mss hasn't been initialized.
> 
> 3.In function "tcp_send_dupack",
> when the condition is suitable, it'll enter quick ack mode. Notice it only check the seq !
> So I think add another state check should be OK.
> 
> Any suggestion ?
> 

You did find what immediate conditions for the crash (rcv_mss = 0, state
= TCP_SYN_RCV) were.

Your patch avoids the zero divide, but leaves other issues. rcv_mss = 0
here is a sign some logic is wrong in the stack.

Given this potential zero divide had been there for years, I believe we
should take the time for a more complete fix, instead of papering over
the immediate problem.

We have been working with Neal to reproduce the issue with packetdrill,
we'll post our results when we manage to get our first crash ;)

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4e4617e..9eb56dc 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3986,7 +3986,8 @@  static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
 	if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
 	    before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
-		tcp_enter_quickack_mode(sk);
+		if (sk->sk_state != TCP_SYN_RECV)
+			tcp_enter_quickack_mode(sk);
 
 		if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
 			u32 end_seq = TCP_SKB_CB(skb)->end_seq;