diff mbox

tcp: fix premature termination of FIN_WAIT2 time-wait sockets

Message ID 200908150339.12730.opurdila@ixiacom.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Octavian Purdila Aug. 15, 2009, 12:39 a.m. UTC
NOTE: this issue has been found, fixed and tested on an ancient 2.6.7 kernel. 
This patch is a blind port of that fix, since unfortunately there is no easy 
way for me to reproduce the original issue with a newer kernel. But the issue 
still seems to be there.

tavi
---

There is a race condition in the time-wait sockets code that can lead
to premature termination of FIN_WAIT2 and, subsequently, to RST
generation when the FIN,ACK from the peer finally arrives:

Time     TCP header
0.000000 30755 > http [SYN] Seq=0 Win=2920 Len=0 MSS=1460 TSV=282912 TSER=0
0.000008 http > 30755 aSYN, ACK] Seq=0 Ack=1 Win=2896 Len=0 MSS=1460 TSV=...
0.136899 HEAD /1b.html?n1Lg=v1 HTTP/1.0 [Packet size limited during capture]
0.136934 HTTP/1.0 200 OK [Packet size limited during capture]
0.136945 http > 30755 [FIN, ACK] Seq=187 Ack=207 Win=2690 Len=0 TSV=270521...
0.136974 30755 > http [ACK] Seq=207 Ack=187 Win=2734 Len=0 TSV=283049 TSER=...
0.177983 30755 > http [ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283089 TSER=...
0.238618 30755 > http [FIN, ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283151...
0.238625 http > 30755 [RST] Seq=188 Win=0 Len=0

Say twdr->slot = 1 and we are running inet_twdr_hangman and in this
instance inet_twdr_do_twkill_work returns 1. At that point we will
mark slot 1 and schedule inet_twdr_twkill_work. We will also make
twdr->slot = 2.

Next, a connection is closed and tcp_time_wait(TCP_FIN_WAIT2, timeo)
is called which will create a new FIN_WAIT2 time-wait socket and will
place it in the last to be reached slot, i.e. twdr->slot = 1.

At this point say inet_twdr_twkill_work will run which will start
destroying the time-wait sockets in slot 1, including the just added
TCP_FIN_WAIT2 one.

To avoid this issue we increment the slot only if all entries in the
slot have been purged.

This change may delay the slots cleanup by a time-wait death row
period but only if the worker thread didn't had the time to run/purge
the current slot in the next period (6 seconds with default sysctl
settings). However, on such a busy system even without this change we
would probably see delays...

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
---
 net/ipv4/inet_timewait_sock.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Octavian Purdila Aug. 24, 2009, 8:47 p.m. UTC | #1
On Saturday 15 August 2009 03:39:12 Octavian Purdila wrote:

> NOTE: this issue has been found, fixed and tested on an ancient 2.6.7
> kernel. This patch is a blind port of that fix, since unfortunately there
> is no easy way for me to reproduce the original issue with a newer kernel.
> But the issue still seems to be there.

Update: I was able to reproduce the issue on a 2.6.30 debian kernel with the 
attached test. It took me about 10 runs of 2-5 mins each to reproduce it 
(multiple runs to keep the capture file reasonable in terms of size).

tavi
David Miller Aug. 29, 2009, 7 a.m. UTC | #2
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Mon, 24 Aug 2009 23:47:05 +0300

> On Saturday 15 August 2009 03:39:12 Octavian Purdila wrote:
> 
>> NOTE: this issue has been found, fixed and tested on an ancient 2.6.7
>> kernel. This patch is a blind port of that fix, since unfortunately there
>> is no easy way for me to reproduce the original issue with a newer kernel.
>> But the issue still seems to be there.
> 
> Update: I was able to reproduce the issue on a 2.6.30 debian kernel with the 
> attached test. It took me about 10 runs of 2-5 mins each to reproduce it 
> (multiple runs to keep the capture file reasonable in terms of size).

Thanks a lot for fixing this bug.

I've applied your patch to net-next-2.6
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 61283f9..13f0781 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -218,8 +218,8 @@  void inet_twdr_hangman(unsigned long data)
 		/* We purged the entire slot, anything left?  */
 		if (twdr->tw_count)
 			need_timer = 1;
+		twdr->slot = ((twdr->slot + 1) & (INET_TWDR_TWKILL_SLOTS - 1));
 	}
-	twdr->slot = ((twdr->slot + 1) & (INET_TWDR_TWKILL_SLOTS - 1));
 	if (need_timer)
 		mod_timer(&twdr->tw_timer, jiffies + twdr->period);
 out: