From patchwork Tue Nov 26 15:51:55 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Venkat Venkatsubra X-Patchwork-Id: 294347 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 899402C00A7 for ; Wed, 27 Nov 2013 02:52:00 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757206Ab3KZPv4 (ORCPT ); Tue, 26 Nov 2013 10:51:56 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:28625 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756983Ab3KZPvz convert rfc822-to-8bit (ORCPT ); Tue, 26 Nov 2013 10:51:55 -0500 Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by userp1040.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id rAQFpqvR016235 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 26 Nov 2013 15:51:53 GMT Received: from userz7021.oracle.com (userz7021.oracle.com [156.151.31.85]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id rAQFppmH006794 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 26 Nov 2013 15:51:52 GMT Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userz7021.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id rAQFppL3029770; Tue, 26 Nov 2013 15:51:51 GMT MIME-Version: 1.0 Message-ID: <4b6029b3-55da-441a-9550-0fed3b49506a@default> Date: Tue, 26 Nov 2013 07:51:55 -0800 (PST) From: Venkat Venkatsubra To: netdev@vger.kernel.org Cc: David Miller Subject: When TCP keepalives tuned shorter than retransmission timeouts X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.8 (707110) [OL 12.0.6680.5000 (x86)] X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Some of our customers have tcp socket level options set to: TCP_KEEPIDLE 60 TCP_KEEPINTVL 6 TCP_KEEPCNT 10 And when the peer is dead they expect the connection to timeout in 2 minutes instead of the 15 minutes from retransmission timeouts. (We know the tunables are set very low.) As this code in tcp_keepalive_timer() indicates we skip keepalive probes if there are packets in flight Or we have more data to send: /* It is alive without keepalive 8) */         if (tp->packets_out || tcp_send_head(sk))                 goto resched; The reason I guess is why burden the network with keepalive packets when somebody else (retransmissions) is doing it for you. The change we tried was to not actually send the keepalive probes in this situation but keep counting them as sent. To not do this when the receiver window is closed we check tp->snd_wnd. Maybe there are other (more correct ?) ways to do that. By the way, we didn't try to address yet the similar issue when the communication with peer dies after the receiver closes the window. This is the code change we tried. We seek your opinion. Thanks. Venkat --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- tcp_timer.c.orig    2013-11-25 07:09:18.328112851 -0800 +++ tcp_timer.c 2013-11-25 08:06:47.339666980 -0800 @@ -588,18 +588,13 @@                         }                 }                 tcp_send_active_reset(sk, GFP_ATOMIC); -               goto death; +               tcp_done(sk); +               goto out;         }         if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)                 goto out; -       elapsed = keepalive_time_when(tp); - -       /* It is alive without keepalive 8) */ -       if (tp->packets_out || tcp_send_head(sk)) -               goto resched; -         elapsed = keepalive_time_elapsed(tp);         if (elapsed >= keepalive_time_when(tp)) { @@ -615,8 +610,9 @@                         tcp_write_err(sk);                         goto out;                 } -               if (tcp_write_wakeup(sk) <= 0) { -                       icsk->icsk_probes_out++; +               if (tp->packets_out || tcp_send_head(sk) || (tcp_write_wakeup(sk) <= 0)) { +                       if (tp->snd_wnd) +                               icsk->icsk_probes_out++;                         elapsed = keepalive_intvl_when(tp);                 } else {                         /* If keepalive was lost due to local congestion, @@ -631,12 +627,7 @@         sk_mem_reclaim(sk); -resched:         inet_csk_reset_keepalive_timer (sk, elapsed); -       goto out; - -death: -       tcp_done(sk); out:         bh_unlock_sock(sk);