From patchwork Sat Aug 15 00:39:12 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Octavian Purdila X-Patchwork-Id: 31463 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id 7FF08B6F34 for ; Sat, 15 Aug 2009 10:41:33 +1000 (EST) Received: by ozlabs.org (Postfix) id 697ACDDD1B; Sat, 15 Aug 2009 10:41:33 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id EBE35DDD01 for ; Sat, 15 Aug 2009 10:41:32 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752988AbZHOAlZ (ORCPT ); Fri, 14 Aug 2009 20:41:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751353AbZHOAlZ (ORCPT ); Fri, 14 Aug 2009 20:41:25 -0400 Received: from ixro-out-rtc.ixiacom.com ([92.87.192.98]:16936 "EHLO ixro-ex1.ixiacom.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750805AbZHOAlY (ORCPT ); Fri, 14 Aug 2009 20:41:24 -0400 Received: from ixro-opurdila-lap.localnet ([10.205.9.70]) by ixro-ex1.ixiacom.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 15 Aug 2009 03:41:24 +0300 Subject: [PATCH] tcp: fix premature termination of FIN_WAIT2 time-wait sockets From: Octavian Purdila Organization: Ixia Date: Sat, 15 Aug 2009 03:39:12 +0300 MIME-Version: 1.0 X-UID: 1804 To: netdev@vger.kernel.org Message-Id: <200908150339.12730.opurdila@ixiacom.com> X-OriginalArrivalTime: 15 Aug 2009 00:41:24.0465 (UTC) FILETIME=[1DDE0610:01CA1D41] Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org NOTE: this issue has been found, fixed and tested on an ancient 2.6.7 kernel. This patch is a blind port of that fix, since unfortunately there is no easy way for me to reproduce the original issue with a newer kernel. But the issue still seems to be there. tavi --- There is a race condition in the time-wait sockets code that can lead to premature termination of FIN_WAIT2 and, subsequently, to RST generation when the FIN,ACK from the peer finally arrives: Time TCP header 0.000000 30755 > http [SYN] Seq=0 Win=2920 Len=0 MSS=1460 TSV=282912 TSER=0 0.000008 http > 30755 aSYN, ACK] Seq=0 Ack=1 Win=2896 Len=0 MSS=1460 TSV=... 0.136899 HEAD /1b.html?n1Lg=v1 HTTP/1.0 [Packet size limited during capture] 0.136934 HTTP/1.0 200 OK [Packet size limited during capture] 0.136945 http > 30755 [FIN, ACK] Seq=187 Ack=207 Win=2690 Len=0 TSV=270521... 0.136974 30755 > http [ACK] Seq=207 Ack=187 Win=2734 Len=0 TSV=283049 TSER=... 0.177983 30755 > http [ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283089 TSER=... 0.238618 30755 > http [FIN, ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283151... 0.238625 http > 30755 [RST] Seq=188 Win=0 Len=0 Say twdr->slot = 1 and we are running inet_twdr_hangman and in this instance inet_twdr_do_twkill_work returns 1. At that point we will mark slot 1 and schedule inet_twdr_twkill_work. We will also make twdr->slot = 2. Next, a connection is closed and tcp_time_wait(TCP_FIN_WAIT2, timeo) is called which will create a new FIN_WAIT2 time-wait socket and will place it in the last to be reached slot, i.e. twdr->slot = 1. At this point say inet_twdr_twkill_work will run which will start destroying the time-wait sockets in slot 1, including the just added TCP_FIN_WAIT2 one. To avoid this issue we increment the slot only if all entries in the slot have been purged. This change may delay the slots cleanup by a time-wait death row period but only if the worker thread didn't had the time to run/purge the current slot in the next period (6 seconds with default sysctl settings). However, on such a busy system even without this change we would probably see delays... Signed-off-by: Octavian Purdila --- net/ipv4/inet_timewait_sock.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 61283f9..13f0781 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -218,8 +218,8 @@ void inet_twdr_hangman(unsigned long data) /* We purged the entire slot, anything left? */ if (twdr->tw_count) need_timer = 1; + twdr->slot = ((twdr->slot + 1) & (INET_TWDR_TWKILL_SLOTS - 1)); } - twdr->slot = ((twdr->slot + 1) & (INET_TWDR_TWKILL_SLOTS - 1)); if (need_timer) mod_timer(&twdr->tw_timer, jiffies + twdr->period); out: