[PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

Message ID	alpine.DEB.2.00.1007191319010.13002@wel-95.cs.helsinki.fi
State	Accepted, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id A31DB1007D7 for <patchwork-incoming@ozlabs.org>; Mon, 19 Jul 2010 21:16:53 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760561Ab0GSLQX (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Mon, 19 Jul 2010 07:16:23 -0400 Received: from courier.cs.helsinki.fi ([128.214.9.1]:58579 "EHLO mail.cs.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760545Ab0GSLQX (ORCPT <rfc822;netdev@vger.kernel.org>); Mon, 19 Jul 2010 07:16:23 -0400 Received: from wel-95.cs.helsinki.fi (wel-95.cs.helsinki.fi [128.214.10.211]) (TLS: TLSv1/SSLv3,256bits,AES256-SHA) by mail.cs.helsinki.fi with esmtp; Mon, 19 Jul 2010 14:16:19 +0300 id 00093EC7.4C443403.000007E4 Date: Mon, 19 Jul 2010 14:16:18 +0300 (EEST) From: "=?ISO-8859-15?Q?Ilpo_J=E4rvinen?=" <ilpo.jarvinen@helsinki.fi> X-X-Sender: ijjarvin@wel-95.cs.helsinki.fi To: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de>, David Miller <davem@davemloft.net> cc: Eric Dumazet <eric.dumazet@gmail.com>, Tejun Heo <tj@kernel.org>, lkml <linux-kernel@vger.kernel.org>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "Fehrmann, Henning" <henning.fehrmann@aei.mpg.de>, Carsten Aulbert <carsten.aulbert@aei.mpg.de> Subject: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue In-Reply-To: <4C440771.7080107@nets.rwth-aachen.de> Message-ID: <alpine.DEB.2.00.1007191319010.13002@wel-95.cs.helsinki.fi> References: <4C358AAA.9080400@kernel.org> <alpine.DEB.2.00.1007111825510.15736@melkinpaasi.cs.helsinki.fi> <4C3EF7EA.2040900@nets.rwth-aachen.de> <1279195528.2496.2.camel@edumazet-laptop> <4C3F053F.7090704@nets.rwth-aachen.de> <alpine.DEB.2.00.1007161448330.13946@melkinpaasi.cs.helsinki.fi> <4C404FC5.6040107@nets.rwth-aachen.de> <alpine.DEB.2.00.1007161602510.13946@melkinpaasi.cs.helsinki.fi> <4C440771.7080107@nets.rwth-aachen.de> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-380630650-1279538179=:13002" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

Ilpo Järvinen July 19, 2010, 11:16 a.m. UTC

On Mon, 19 Jul 2010, Lennart Schulte wrote:

> I ran tests for about 2 hours with this patch and I got no output from the
> debug patch. This seems to have solved at least my problem :)
> 
> Thanks!
> > [PATCH] tcp: fix crash in tcp_xmit_retransmit_queue
> > 
> > It can happen that there are no packets in queue while calling
> > tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
> > NULL and that gets deref'ed to get sacked into a local var.
> > 
> > There is no work to do if no packets are outstanding so we just
> > exit early.
> > 
> > There may still be another bug affecting this same function.

Thanks for testing.

DaveM, I think this oops was introduced for 2.6.28 (in 
08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to 
stables it should go too please. I've only tweaked the message (so no need 
for Lennart to retest v2 :-)).

--
[PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

It can happen that there are no packets in queue while calling
tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
NULL and that gets deref'ed to get sacked into a local var.

There is no work to do if no packets are outstanding so we just
exit early.

This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
guard to make joining diff nicer).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de>
Tested-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de>
---
 net/ipv4/tcp_output.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

Eric Dumazet July 19, 2010, 2:09 p.m. UTC | #1

Le lundi 19 juillet 2010 à 14:16 +0300, Ilpo Järvinen a écrit :

> Thanks for testing.
> 
> DaveM, I think this oops was introduced for 2.6.28 (in 
> 08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to 
> stables it should go too please. I've only tweaked the message (so no need 
> for Lennart to retest v2 :-)).
> 
> --
> [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue
> 
> It can happen that there are no packets in queue while calling
> tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
> NULL and that gets deref'ed to get sacked into a local var.
> 
> There is no work to do if no packets are outstanding so we just
> exit early.
> 
> This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
> guard to make joining diff nicer).
> 

But prior to commit 08ebd1721ab8fd3, we were not testing
tp->packets_out, but tp->lost_out

if it was 0, we were not doing the tcp_for_write_queue_from() loop.

Not sure it makes a difference ?

> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de>
> Tested-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de>
> ---
>  net/ipv4/tcp_output.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b4ed957..7ed9dc1 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
>  	int mib_idx;
>  	int fwd_rexmitting = 0;
>  
> +	if (!tp->packets_out)
> +		return;
> +
>  	if (!tp->lost_out)
>  		tp->retransmit_high = tp->snd_una;
>  


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilpo Järvinen July 19, 2010, 5:25 p.m. UTC | #2

On Mon, 19 Jul 2010, Eric Dumazet wrote:

> Le lundi 19 juillet 2010 à 14:16 +0300, Ilpo Järvinen a écrit :
> 
> > Thanks for testing.
> > 
> > DaveM, I think this oops was introduced for 2.6.28 (in 
> > 08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to 
> > stables it should go too please. I've only tweaked the message (so no need 
> > for Lennart to retest v2 :-)).
> > 
> > --
> > [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue
> > 
> > It can happen that there are no packets in queue while calling
> > tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
> > NULL and that gets deref'ed to get sacked into a local var.
> > 
> > There is no work to do if no packets are outstanding so we just
> > exit early.
> > 
> > This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
> > guard to make joining diff nicer).
> > 
> 
> But prior to commit 08ebd1721ab8fd3, we were not testing
> tp->packets_out, but tp->lost_out

That's right, but back then we were not testing it for the same purpose.

> if it was 0, we were not doing the tcp_for_write_queue_from() loop.

This invariant _should_ be true all the time:
 lost_out <= packets_out

...and if it's not we would get Leak printouts every now and then. Thus is 
packets_out is zero no NULL defer with the if lost_out either. The other 
loop too (in pre 08eb kernels) will work because of earlier mentioned 
send_head check side-effects.

> Not sure it makes a difference ?

This difference is well thought and intentional, I didn't use different 
one by accident. We want to make sure we won't use NULL from 
tcp_write_queue_head() while the pre 08ebd1721ab8fd3 kernels was 
interested mainly whether the first loop should run or not (and of course 
ends up avoid the null deref too but it's more optimization like 
thing in there, ie., if there's no lost packets no work to-do). The deref 
could have been fixed by moving TCP_SKB_CB(skb)->sacked a bit later but 
that would again make us depend on the side-effect of the send_head check 
(in the case of packets_out being zero and wq empty) which is something I 
don't like too much.

Eric Dumazet July 19, 2010, 5:39 p.m. UTC | #3

Le lundi 19 juillet 2010 à 20:25 +0300, Ilpo Järvinen a écrit :

> This difference is well thought and intentional, I didn't use different 
> one by accident. We want to make sure we won't use NULL from 
> tcp_write_queue_head() while the pre 08ebd1721ab8fd3 kernels was 
> interested mainly whether the first loop should run or not (and of course 
> ends up avoid the null deref too but it's more optimization like 
> thing in there, ie., if there's no lost packets no work to-do). The deref 
> could have been fixed by moving TCP_SKB_CB(skb)->sacked a bit later but 
> that would again make us depend on the side-effect of the send_head check 
> (in the case of packets_out being zero and wq empty) which is something I 
> don't like too much.
> 

Thanks Ilpo.

Do you know in what exact circumstance the bug triggers ?

It's hard to believe thousand of machines on the Internet never hit
it :(

Maybe another problem in congestion control ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller July 19, 2010, 7:55 p.m. UTC | #4

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 19 Jul 2010 19:39:08 +0200

> Do you know in what exact circumstance the bug triggers ?
> 
> It's hard to believe thousand of machines on the Internet never hit
> it :(
> 
> Maybe another problem in congestion control ?

This is something to investigate, but the conditions under which
tcp_fastretrans_alert() (the main invoker of tcp_xmit_retransmit_queue())
does it's thing are complicated enough that I'm going to add this fix
for the time being and push it out to stable too.

Thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilpo Järvinen July 20, 2010, 8:33 a.m. UTC | #5

On Mon, 19 Jul 2010, David Miller wrote:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 19 Jul 2010 19:39:08 +0200
> 
> > Do you know in what exact circumstance the bug triggers ?
> > 
> > It's hard to believe thousand of machines on the Internet never hit
> > it :(
> > 
> > Maybe another problem in congestion control ?
> 
> This is something to investigate, but the conditions under which
> tcp_fastretrans_alert() (the main invoker of tcp_xmit_retransmit_queue())
> does it's thing are complicated enough that I'm going to add this fix
> for the time being and push it out to stable too.

This is so true. ...So far I've managed to twice rule out of the 
possibility of this being really triggerable (ie., it would mean
Lennart's out of tree changes broke it), and once in the middle came
into opposite conclusion. Thus by majority voting we can deduce that it 
won't happen - how reassuring :-/. It seems that tcp_try_undo_recovery 
causes return if TCP remained in CA_Loss/CA_Recovery and that 
tcp_time_to_recover won't really let past return either under normal 
circumstances (more details below), and tcp_simple_retransmit 
requires lost_out to change; seems safe in mainline to me.

Hmm... It seems that I've just solved another report too. ...Somebody a 
while back found out that setting reordering sysctl to zero (ie. to a 
value which does not make too much sense) crashed the kernel. It seems 
that at least then tcp_time_to_recover() would return true and trigger 
this bug (though I'm not sure if that's the only breakage to happen).

Also worth to keep in mind is the bugzilla entry ("New freez in 
TCP" or something like that) so I'm not really sure I could say for sure 
nobody never hit it. The bugzilla one goes away by disable SACK (at least 
for some) but it might mix two different issues. It seems that there 
really are two different issues, the other may have something to do with 
SACK though there are other variables then involved, e.g., the changes in 
retransmission logic/timing, so it's impossible to say if the SACK disable 
really "fixed" the bugzilla one or not. Also Tejun's ->next == NULL 
finding points out to a different bug than this Lennart's one.

[PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

Commit Message

Comments

Patch