scp stalls mysteriously

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

On Sat, 5 Dec 2009, Damian Lukowski wrote:

> Could you please make another test and unplug the cable or drop ACKs for
> several seconds, so that some RTO retransmissions are performed?
> I'd like to see if retrans_stamp remains constant. In the dmesg output of
> the 11th run, it seems to change while icsk_retransmits also increases.
> This is kind of bad for connection timeout calculation in the RTO case ...

After taking some more look into this, this is partly a red herring. It 
looks like that because of the place of the printk that was still in the 
end of the function. You can see the full trace of what happens in .13., 
they come from independent incarnations of RTO recovery (when finally no 
error happens in tcp_retransmit_skb).

However, the problem itself could occur. Here's the patch which should 
prevent that (I'm rather convinced that this really isn't stable worthy 
but net-next or net-2.6 would be fine):

--
[PATCH] tcp: fix retrans_stamp advancing in error cases

It can happen, that tcp_retransmit_skb fails due to some error.
In such cases we might end up into a state where tp->retrans_out
is zero but that's only because we removed the TCPCB_SACKED_RETRANS
bit from a segment but couldn't retransmit it because of the error
that happened. Therefore some assumptions that retrans_out checks
are based do not necessarily hold, as there still can be an old
retransmission but that is only visible in TCPCB_EVER_RETRANS bit.
As retransmission happen in sequential order (except for some very
rare corner cases), it's enough to check the head skb for that bit.

Main reason for all this complexity is the fact that connection dying
time now depends on the validity of the retrans_stamp, in particular,
that successive retransmissions of a segment must not advance
retrans_stamp under any conditions. It seems after quick thinking that
this has relatively low impact as eventually TCP will go into CA_Loss
and either use the existing check for !retrans_stamp case or send a
retransmission successfully, setting a new base time for the dying
timer (can happen only once). At worst, the dying time will be
approximately the double of the intented time. In addition,
tcp_packet_delayed() will return wrong result (has some cc aspects
but due to rarity of these errors, it's hardly an issue).

One of retrans_stamp clearing happens indirectly through first going
into CA_Open state and then a later ACK lets the clearing to happen.
Thus tcp_try_keep_open has to be modified too.

Thanks to Damian Lukowski <damian@tvk.rwth-aachen.de> for hinting
that this possibility exists (though the particular case discussed
didn't after all have it happening but was just a debug patch
artifact).

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   35 ++++++++++++++++++++++++++++++++---
 1 files changed, 32 insertions(+), 3 deletions(-)

Message ID	alpine.DEB.2.00.0912071532080.7024@wel-95.cs.helsinki.fi
State	Accepted, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id BB658B6F10 for <patchwork-incoming@ozlabs.org>; Tue, 8 Dec 2009 01:02:06 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935124AbZLGOBv (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Mon, 7 Dec 2009 09:01:51 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S935114AbZLGOBv (ORCPT <rfc822;netdev-outgoing>); Mon, 7 Dec 2009 09:01:51 -0500 Received: from courier.cs.helsinki.fi ([128.214.9.1]:53033 "EHLO mail.cs.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935111AbZLGOBt (ORCPT <rfc822;netdev@vger.kernel.org>); Mon, 7 Dec 2009 09:01:49 -0500 Received: from wel-95.cs.helsinki.fi (wel-95.cs.helsinki.fi [128.214.10.211]) (TLS: TLSv1/SSLv3,256bits,AES256-SHA) by mail.cs.helsinki.fi with esmtp; Mon, 07 Dec 2009 16:01:55 +0200 id 0008CFCB.4B1D0AD3.000031BF Date: Mon, 7 Dec 2009 16:01:53 +0200 (EET) From: "=?ISO-8859-15?Q?Ilpo_J=E4rvinen?=" <ilpo.jarvinen@helsinki.fi> X-X-Sender: ijjarvin@wel-95.cs.helsinki.fi To: Damian Lukowski <damian@tvk.rwth-aachen.de> cc: Frederic Leroy <fredo@starox.org>, Netdev <netdev@vger.kernel.org>, David Miller <davem@davemloft.net>, Eric Dumazet <eric.dumazet@gmail.com>, Herbert Xu <herbert@gondor.apana.org.au>, Greg KH <gregkh@suse.de> Subject: Re: scp stalls mysteriously In-Reply-To: <4B1ADF79.6000101@tvk.rwth-aachen.de> Message-ID: <alpine.DEB.2.00.0912071532080.7024@wel-95.cs.helsinki.fi> References: <20091130213727.2f4047d2@houba> <alpine.DEB.2.00.0911302244160.9826@melkinpaasi.cs.helsinki.fi> <20091201211945.505d3c98@houba> <alpine.DEB.2.00.0912012220570.1904@melkinpaasi.cs.helsinki.fi> <20091202085925.472136e2@houba> <alpine.DEB.2.00.0912021438030.20416@melkinpaasi.cs.helsinki.fi> <20091202154403.GB30730@sd-11162.dedibox.fr> <alpine.DEB.2.00.0912021745200.7024@wel-95.cs.helsinki.fi> <20091202183451.173db5f2@houba> <4B16BD58.3040802@tvk.rwth-aachen.de> <20091203085933.GD30730@sd-11162.dedibox.fr> <alpine.DEB.2.00.0912031121190.7024@wel-95.cs.helsinki.fi> <4B17A791.80808@tvk.rwth-aachen.de> <alpine.DEB.2.00.0912031444150.7024@wel-95.cs.helsinki.fi> <4B17C6C3.1060000@tvk.rwth-aachen.de> <20091203202328.62d7551a@houba> <4B1ADF79.6000101@tvk.rwth-aachen.de> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-1334440251-1260194515=:7024" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

scp stalls mysteriously

Commit Message

Comments

Patch