diff mbox

scp stalls mysteriously

Message ID alpine.DEB.2.00.0912071532080.7024@wel-95.cs.helsinki.fi
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Ilpo Järvinen Dec. 7, 2009, 2:01 p.m. UTC
On Sat, 5 Dec 2009, Damian Lukowski wrote:

> Could you please make another test and unplug the cable or drop ACKs for
> several seconds, so that some RTO retransmissions are performed?
> I'd like to see if retrans_stamp remains constant. In the dmesg output of
> the 11th run, it seems to change while icsk_retransmits also increases.
> This is kind of bad for connection timeout calculation in the RTO case ...

After taking some more look into this, this is partly a red herring. It 
looks like that because of the place of the printk that was still in the 
end of the function. You can see the full trace of what happens in .13., 
they come from independent incarnations of RTO recovery (when finally no 
error happens in tcp_retransmit_skb).

However, the problem itself could occur. Here's the patch which should 
prevent that (I'm rather convinced that this really isn't stable worthy 
but net-next or net-2.6 would be fine):

--
[PATCH] tcp: fix retrans_stamp advancing in error cases

It can happen, that tcp_retransmit_skb fails due to some error.
In such cases we might end up into a state where tp->retrans_out
is zero but that's only because we removed the TCPCB_SACKED_RETRANS
bit from a segment but couldn't retransmit it because of the error
that happened. Therefore some assumptions that retrans_out checks
are based do not necessarily hold, as there still can be an old
retransmission but that is only visible in TCPCB_EVER_RETRANS bit.
As retransmission happen in sequential order (except for some very
rare corner cases), it's enough to check the head skb for that bit.

Main reason for all this complexity is the fact that connection dying
time now depends on the validity of the retrans_stamp, in particular,
that successive retransmissions of a segment must not advance
retrans_stamp under any conditions. It seems after quick thinking that
this has relatively low impact as eventually TCP will go into CA_Loss
and either use the existing check for !retrans_stamp case or send a
retransmission successfully, setting a new base time for the dying
timer (can happen only once). At worst, the dying time will be
approximately the double of the intented time. In addition,
tcp_packet_delayed() will return wrong result (has some cc aspects
but due to rarity of these errors, it's hardly an issue).

One of retrans_stamp clearing happens indirectly through first going
into CA_Open state and then a later ACK lets the clearing to happen.
Thus tcp_try_keep_open has to be modified too.

Thanks to Damian Lukowski <damian@tvk.rwth-aachen.de> for hinting
that this possibility exists (though the particular case discussed
didn't after all have it happening but was just a debug patch
artifact).

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   35 ++++++++++++++++++++++++++++++++---
 1 files changed, 32 insertions(+), 3 deletions(-)

Comments

Frederic Leroy Dec. 7, 2009, 10:18 p.m. UTC | #1
Le Mon, 7 Dec 2009 16:01:53 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> On Sat, 5 Dec 2009, Damian Lukowski wrote:
> 
> > Could you please make another test and unplug the cable or drop
> > [...]
> After taking some more look into this, this is partly a red herring.
> It looks like that because of the place of the printk that was still
> in the end of the function. You can see the full trace of what
> happens in .13., they come from independent incarnations of RTO
> recovery (when finally no error happens in tcp_retransmit_skb).

Doh ! Sorry :( 

> However, the problem itself could occur. Here's the patch which
> should prevent that (I'm rather convinced that this really isn't
> stable worthy but net-next or net-2.6 would be fine):
> 
> --
> [PATCH] tcp: fix retrans_stamp advancing in error cases
> [...]

Tonight, I made 2 more tests : .20 and .21 . 

The first with latest damian patch from today.
Added the printk (This time I doubled checked ;).
Start the copy, wait 20s, disconnect cable 20s, reconnect. 

The second try was identical, but I added your patch.
The copy was slower comparing to the first try.

I didn't took time to understand tcp retransmission timeout and read
the code. So, I'm not sure the printk is at the good place and usefull.
Ilpo Järvinen Dec. 7, 2009, 10:38 p.m. UTC | #2
Trimmed Ccs.

On Mon, 7 Dec 2009, Frederic Leroy wrote:

> Le Mon, 7 Dec 2009 16:01:53 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > On Sat, 5 Dec 2009, Damian Lukowski wrote:
> > 
> > > Could you please make another test and unplug the cable or drop
> > > [...]
> > After taking some more look into this, this is partly a red herring.
> > It looks like that because of the place of the printk that was still
> > in the end of the function. You can see the full trace of what
> > happens in .13., they come from independent incarnations of RTO
> > recovery (when finally no error happens in tcp_retransmit_skb).
> 
> Doh ! Sorry :( 

Now I think we have had just too many testcases and are all confused :-).
I was referring to the case .11. (the same case as Damian did) ...Not 
something too newish you did, sorry about that.

> > However, the problem itself could occur. Here's the patch which
> > should prevent that (I'm rather convinced that this really isn't
> > stable worthy but net-next or net-2.6 would be fine):
> > 
> > --
> > [PATCH] tcp: fix retrans_stamp advancing in error cases
> > [...]
> 
> Tonight, I made 2 more tests : .20 and .21 . 
> 
> The first with latest damian patch from today.
> Added the printk (This time I doubled checked ;).
> Start the copy, wait 20s, disconnect cable 20s, reconnect. 
> 
> The second try was identical, but I added your patch.
> The copy was slower comparing to the first try.

The losses you are getting are somewhat random process, so it is usually 
the main explination on different transfer rates. One thing leads to 
another and therefore one case suffers more than other.

> I didn't took time to understand tcp retransmission timeout and read
> the code. So, I'm not sure the printk is at the good place and usefull.

Thanks anyway for all testing so far. I'll try to come up with the other 
debug patch tomorrow to get some information on that -EAGAIN. Unless you 
want to do it yourself and printk all the variables involved in this check 
(in tcp_output.c):

        /* Do not sent more than we queued. 1/4 is reserved for possible
         * copying overhead: fragmentation, tunneling, mangling etc.
         */
        if (atomic_read(&sk->sk_wmem_alloc) >
            min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
                return -EAGAIN;

...better to print them before the check it regardless of the actual 
result of the comparison.


...So far only the Damian's patch is clearly required for stable (but I 
suppose DaveM will handle the stable submissions as usual, hopefully it 
won't take too long though as some other people might start reporting this 
same issue once some time has passed and they notice that something is 
wrong with TCP of their new and shiny 2.6.32 :-)).
David Miller Dec. 9, 2009, 4:54 a.m. UTC | #3
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Mon, 7 Dec 2009 16:01:53 +0200 (EET)

> [PATCH] tcp: fix retrans_stamp advancing in error cases

Applied to net-2.6

Not queued to -stable, let me know if that becomes necessary.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d86784b..5608691 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2717,6 +2717,35 @@  static void tcp_try_undo_dsack(struct sock *sk)
 	}
 }
 
+/* We can clear retrans_stamp when there are no retransmissions in the
+ * window. It would seem that it is trivially available for us in
+ * tp->retrans_out, however, that kind of assumptions doesn't consider
+ * what will happen if errors occur when sending retransmission for the
+ * second time. ...It could the that such segment has only
+ * TCPCB_EVER_RETRANS set at the present time. It seems that checking
+ * the head skb is enough except for some reneging corner cases that
+ * are not worth the effort.
+ *
+ * Main reason for all this complexity is the fact that connection dying
+ * time now depends on the validity of the retrans_stamp, in particular,
+ * that successive retransmissions of a segment must not advance
+ * retrans_stamp under any conditions.
+ */
+static int tcp_any_retrans_done(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+
+	if (tp->retrans_out)
+		return 1;
+
+	skb = tcp_write_queue_head(sk);
+	if (unlikely(skb && TCP_SKB_CB(skb)->sacked & TCPCB_EVER_RETRANS))
+		return 1;
+
+	return 0;
+}
+
 /* Undo during fast recovery after partial ACK. */
 
 static int tcp_try_undo_partial(struct sock *sk, int acked)
@@ -2729,7 +2758,7 @@  static int tcp_try_undo_partial(struct sock *sk, int acked)
 		/* Plain luck! Hole if filled with delayed
 		 * packet, rather than with a retransmit.
 		 */
-		if (tp->retrans_out == 0)
+		if (!tcp_any_retrans_done(sk))
 			tp->retrans_stamp = 0;
 
 		tcp_update_reordering(sk, tcp_fackets_out(tp) + acked, 1);
@@ -2788,7 +2817,7 @@  static void tcp_try_keep_open(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	int state = TCP_CA_Open;
 
-	if (tcp_left_out(tp) || tp->retrans_out || tp->undo_marker)
+	if (tcp_left_out(tp) || tcp_any_retrans_done(sk) || tp->undo_marker)
 		state = TCP_CA_Disorder;
 
 	if (inet_csk(sk)->icsk_ca_state != state) {
@@ -2803,7 +2832,7 @@  static void tcp_try_to_open(struct sock *sk, int flag)
 
 	tcp_verify_left_out(tp);
 
-	if (!tp->frto_counter && tp->retrans_out == 0)
+	if (!tp->frto_counter && !tcp_any_retrans_done(sk))
 		tp->retrans_stamp = 0;
 
 	if (flag & FLAG_ECE)