diff mbox

TCP_FAILFAST: a new socket option to timeout/abort a connection quicker

Message ID 1282630819-23104-1-git-send-email-hkchu@google.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jerry Chu Aug. 24, 2010, 6:20 a.m. UTC
From: Jerry Chu <hkchu@google.com>

This is a TCP level socket option that takes an unsigned int to specify
how long in ms TCP should resend a lost data packet before giving up
and returning ETIMEDOUT. The normal TCP retry/abort timeout limit still
applies. In other words this option is only meant for those applications
that need to "fail faster" than the default TCP timeout. The latter
may take upto 20 minutes in a normal WAN environment.

The option is disabled (by default) when set to 0. Also it does not
apply during the connection establishment phase.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
---
 include/linux/tcp.h                |    1 +
 include/net/inet_connection_sock.h |    1 +
 net/ipv4/tcp.c                     |   11 ++++++++-
 net/ipv4/tcp_timer.c               |   42 +++++++++++++++++++++++++++++++----
 4 files changed, 49 insertions(+), 6 deletions(-)

Comments

Eric Dumazet Aug. 24, 2010, 6:44 a.m. UTC | #1
Le lundi 23 août 2010 à 23:20 -0700, H.K. Jerry Chu a écrit :
> From: Jerry Chu <hkchu@google.com>
> 
> This is a TCP level socket option that takes an unsigned int to specify
> how long in ms TCP should resend a lost data packet before giving up
> and returning ETIMEDOUT. The normal TCP retry/abort timeout limit still
> applies. In other words this option is only meant for those applications
> that need to "fail faster" than the default TCP timeout. The latter
> may take upto 20 minutes in a normal WAN environment.
> 
> The option is disabled (by default) when set to 0. Also it does not
> apply during the connection establishment phase.
> 
> Signed-off-by: H.K. Jerry Chu <hkchu@google.com>

TCP_FAILFAST might be misleading. It reads as a boolean option, while
its an option to cap the timeout, with a time unit, instead of the usual
"number of retransmits".

Its also funny you dont ask for a default value, given by a sysctl
tunable ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Hannemann Aug. 24, 2010, 8:04 a.m. UTC | #2
Am 24.08.2010 08:44, schrieb Eric Dumazet:
> Le lundi 23 août 2010 à 23:20 -0700, H.K. Jerry Chu a écrit :
>> From: Jerry Chu <hkchu@google.com>
>>
>> This is a TCP level socket option that takes an unsigned int to specify
>> how long in ms TCP should resend a lost data packet before giving up
>> and returning ETIMEDOUT. The normal TCP retry/abort timeout limit still
>> applies. In other words this option is only meant for those applications
>> that need to "fail faster" than the default TCP timeout. The latter
>> may take upto 20 minutes in a normal WAN environment.
>>
>> The option is disabled (by default) when set to 0. Also it does not
>> apply during the connection establishment phase.
>>
>> Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
> 
> TCP_FAILFAST might be misleading. It reads as a boolean option, while
> its an option to cap the timeout, with a time unit, instead of the usual
> "number of retransmits".

Why not call it TCP_USERTIMEOUT?
Later you can also send it via the TCP user timeout option... (RFC5482)
Hmm... is the ms granularity really needed? Does it make sense to abort
a connection below a second?

> Its also funny you dont ask for a default value, given by a sysctl
> tunable ;)

Well retries1/2 would be the tunables, no?

Best regards,
Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hagen Paul Pfeifer Aug. 24, 2010, 9:10 a.m. UTC | #3
On Tue, 24 Aug 2010 10:04:37 +0200, Arnd Hannemann wrote:

> Why not call it TCP_USERTIMEOUT?
> Later you can also send it via the TCP user timeout option... (RFC5482)
> Hmm... is the ms granularity really needed? Does it make sense to abort
> a connection below a second?

I am working on a patch for UTO, the lion share is already implemented. As
I can see this patch introduce a upper limit (max) where UTO on the other
hand provides a lower limit (min). Therefore I am not sure if we should
call this option TCP_USERTIMEOUT.

Cheers, Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Hannemann Aug. 24, 2010, 2:58 p.m. UTC | #4
Hi,

Am 24.08.2010 11:10, schrieb Hagen Paul Pfeifer:
> 
> On Tue, 24 Aug 2010 10:04:37 +0200, Arnd Hannemann wrote:
> 
>> Why not call it TCP_USERTIMEOUT?
> 
>> Later you can also send it via the TCP user timeout option... (RFC5482)
> 
>> Hmm... is the ms granularity really needed? Does it make sense to abort
> 
>> a connection below a second?
> 
> 
> 
> I am working on a patch for UTO, the lion share is already implemented. As

Nice, so did you come up with a name for the socket option yet?

> I can see this patch introduce a upper limit (max) where UTO on the other
> 
> hand provides a lower limit (min). Therefore I am not sure if we should
> 
> call this option TCP_USERTIMEOUT.

Hmm, is there really a difference? If an application specifies
a wanted timeout e.g. with USER_TIMEOUT, CHANGEABLE will
become false and the value would be announced via ADV_UTO.
The connection could be aborted locally after that time passed,
regardless of what the remote site thinks the timeout should be.

As I understand it U_LIMIT and L_LIMIT would only be there
for safety to disallow nonsensical values of USER_TIMEOUT.

Did I miss something?

Best regards,
Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hagen Paul Pfeifer Aug. 24, 2010, 4:28 p.m. UTC | #5
* Arnd Hannemann | 2010-08-24 16:58:58 [+0200]:

>Nice, so did you come up with a name for the socket option yet?

+#define      TCP_UTO       18  /* User Timeout Option */

The patch is an early state and details as well as testing is a little bit
costly.

>Hmm, is there really a difference? If an application specifies
>a wanted timeout e.g. with USER_TIMEOUT, CHANGEABLE will
>become false and the value would be announced via ADV_UTO.
>The connection could be aborted locally after that time passed,
>regardless of what the remote site thinks the timeout should be.
>
>As I understand it U_LIMIT and L_LIMIT would only be there
>for safety to disallow nonsensical values of USER_TIMEOUT.
>
>Did I miss something?

Maybe not, aot sure. I must take a look at the patch from Jerry. I had no time
until now.

Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jerry Chu Aug. 24, 2010, 8:47 p.m. UTC | #6
On Mon, Aug 23, 2010 at 11:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 23 août 2010 à 23:20 -0700, H.K. Jerry Chu a écrit :
>> From: Jerry Chu <hkchu@google.com>
>>
>> This is a TCP level socket option that takes an unsigned int to specify
>> how long in ms TCP should resend a lost data packet before giving up
>> and returning ETIMEDOUT. The normal TCP retry/abort timeout limit still
>> applies. In other words this option is only meant for those applications
>> that need to "fail faster" than the default TCP timeout. The latter
>> may take upto 20 minutes in a normal WAN environment.
>>
>> The option is disabled (by default) when set to 0. Also it does not
>> apply during the connection establishment phase.
>>
>> Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
>
> TCP_FAILFAST might be misleading. It reads as a boolean option, while
> its an option to cap the timeout, with a time unit, instead of the usual
> "number of retransmits".

I'm open to better names. Perhaps it can be combined with TCP_UTO
mentioned in subsequent reply?

>
> Its also funny you dont ask for a default value, given by a sysctl
> tunable ;)

This socket option takes time unit directly. The other sysctls
use # of retries and max_rto. (Not sure if that's what you asked.)

>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jerry Chu Aug. 24, 2010, 9:56 p.m. UTC | #7
On Tue, Aug 24, 2010 at 1:04 AM, Arnd Hannemann
<hannemann@nets.rwth-aachen.de> wrote:
> Am 24.08.2010 08:44, schrieb Eric Dumazet:
>> Le lundi 23 août 2010 à 23:20 -0700, H.K. Jerry Chu a écrit :
>>> From: Jerry Chu <hkchu@google.com>
>>>
>>> This is a TCP level socket option that takes an unsigned int to specify
>>> how long in ms TCP should resend a lost data packet before giving up
>>> and returning ETIMEDOUT. The normal TCP retry/abort timeout limit still
>>> applies. In other words this option is only meant for those applications
>>> that need to "fail faster" than the default TCP timeout. The latter
>>> may take upto 20 minutes in a normal WAN environment.
>>>
>>> The option is disabled (by default) when set to 0. Also it does not
>>> apply during the connection establishment phase.
>>>
>>> Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
>>
>> TCP_FAILFAST might be misleading. It reads as a boolean option, while
>> its an option to cap the timeout, with a time unit, instead of the usual
>> "number of retransmits".
>
> Why not call it TCP_USERTIMEOUT?

Sure, except that it was designed to shorten the system default user timeout,
not lengthen it. (But perhaps it can be combined with TCP_UTO?)

The current default user timeout of 13-20minutes in Linux may be adequate for
some apps but too long for many others. A per connection socket option solves
this problem.

> Later you can also send it via the TCP user timeout option... (RFC5482)
> Hmm... is the ms granularity really needed? Does it make sense to abort
> a connection below a second?

Yes I thought about that too, but decided it's better to allow the
flexibility of sub-
sec level timeout for possible future usage in HPC type of applications, rather
than to regret later.

>
>> Its also funny you dont ask for a default value, given by a sysctl
>> tunable ;)
>
> Well retries1/2 would be the tunables, no?

The was my first thought, to allow tcp_retries2 to be reduced on a per
connection basis. But I also see a need to reduce TCP_RTO_MAX in
order to allow a reasonable # of retries, given a shorter timeout.

I saw a patch submitted a couple of months ago to allow tcp_retries2 to be
configured but haven't seen any forward progress on the patch. If people
think letting apps configure tcp_retries2 and TCP_RTO_MAX directly is
a better solution I'm for it too.

Thanks,

Jerry

>
> Best regards,
> Arnd
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jerry Chu Aug. 24, 2010, 10:13 p.m. UTC | #8
On Tue, Aug 24, 2010 at 9:28 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> * Arnd Hannemann | 2010-08-24 16:58:58 [+0200]:
>
>>Nice, so did you come up with a name for the socket option yet?
>
> +#define      TCP_UTO       18  /* User Timeout Option */
>
> The patch is an early state and details as well as testing is a little bit
> costly.
>
>>Hmm, is there really a difference? If an application specifies
>>a wanted timeout e.g. with USER_TIMEOUT, CHANGEABLE will
>>become false and the value would be announced via ADV_UTO.
>>The connection could be aborted locally after that time passed,
>>regardless of what the remote site thinks the timeout should be.
>>
>>As I understand it U_LIMIT and L_LIMIT would only be there
>>for safety to disallow nonsensical values of USER_TIMEOUT.
>>
>>Did I miss something?
>
> Maybe not, aot sure. I must take a look at the patch from Jerry. I had no time
> until now.

According to RFC5482
"Decreasing the user timeouts allows busy servers to explicitly notify
their clients that they will maintain the connection state only for a
short time without connectivity."

So it looks like the user timeout can be used in either senario (shortening
or lengthening) and in both cases is a lower bound, i.e., the connection
should abort at or shortly after the specified user timeout.

In this case does it make sense to combine the two? Will your TCP_UTO
patch be ready anytime soon?

Again an alternative is to allow configuring tcp_retries2 and TCP_RTO_MAX
directly. I'm open to suggestion but we'd like to get something in sooner.

Thanks,

Jerry

>
> Hagen
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hagen Paul Pfeifer Aug. 25, 2010, 8:21 a.m. UTC | #9
On Tue, 24 Aug 2010 15:13:44 -0700, Jerry Chu <hkchu@google.com> wrote:
> On Tue, Aug 24, 2010 at 9:28 AM, Hagen Paul Pfeifer <hagen@jauu.net>
wrote:

> So it looks like the user timeout can be used in either senario
(shortening
> or lengthening) and in both cases is a lower bound, i.e., the connection
> should abort at or shortly after the specified user timeout.
>
> In this case does it make sense to combine the two? Will your TCP_UTO
> patch be ready anytime soon?
> 
> Again an alternative is to allow configuring tcp_retries2 and
TCP_RTO_MAX
> directly. I'm open to suggestion but we'd like to get something in
sooner.

Hello Chu! My Idea: you provide functionality to modify the user timeout.
The interface should be generic enough to allow small as well as large - up
to 22 days - values. This interface should be sufficient for you and later
for me. Afterwards I provide an patch which apply on your groundwork. My
patch handle TCP UTO specific functionality like TCP option protocol
handling functionality, socket option, permissions, lower- and upper
bounds, ...

Did you check interactions with other TCP timers like keep-alive timer? 

Today in the evening I will focus on TCP Quick ACK modifications. After
that I am in the Alps for vacation for 5 days. Later on I will work on the
patch (the patch is in a good state, modification and testing should
consume only 2 evenings - hopefully ;-).

Cherry, is this ok for you?

Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jerry Chu Aug. 25, 2010, 8:20 p.m. UTC | #10
Hi Hagen,

On Wed, Aug 25, 2010 at 1:21 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
>
> On Tue, 24 Aug 2010 15:13:44 -0700, Jerry Chu <hkchu@google.com> wrote:
>> On Tue, Aug 24, 2010 at 9:28 AM, Hagen Paul Pfeifer <hagen@jauu.net>
> wrote:
>
>> So it looks like the user timeout can be used in either senario
> (shortening
>> or lengthening) and in both cases is a lower bound, i.e., the connection
>> should abort at or shortly after the specified user timeout.
>>
>> In this case does it make sense to combine the two? Will your TCP_UTO
>> patch be ready anytime soon?
>>
>> Again an alternative is to allow configuring tcp_retries2 and
> TCP_RTO_MAX
>> directly. I'm open to suggestion but we'd like to get something in
> sooner.
>
> Hello Chu! My Idea: you provide functionality to modify the user timeout.
> The interface should be generic enough to allow small as well as large - up
> to 22 days - values.

Ok, let's try to finalize the API signature so our apps folks can program to it
now and don't have to change it later when we have a more complete
implementation involving the TCP option as well.

What should we call this new option? TCP_UTO or TCP_USERTIMEOUT
or else?

It will take a single argument of unsigned int in milliseconds
(right?) that specifies
"user_timeout". The first retransmit timer pops after user_timeout will cause
the connection to be aborted and ETIMEOUT to be returned.

The RTO backoff code is largely intact. I've added some small tweak when
user_timeout is small to allow for a couple of more retries.

>This interface should be sufficient for you and later
> for me. Afterwards I provide an patch which apply on your groundwork. My
> patch handle TCP UTO specific functionality like TCP option protocol
> handling functionality, socket option, permissions, lower- and upper
> bounds, ...

Sounds good - you will provide all the missing details as described in RFC5482.

>
> Did you check interactions with other TCP timers like keep-alive timer?

The keepalive timer is driven off a different timer sk_timer than
icsk_retransmit_timer
so as far as code is concerned they are separate (and I don't see any
interaction
between the two).

But RFC5482 does contain the following paragraph:

4.2. TCP Keep-Alives

   Some TCP implementations, such as those in BSD systems, use a
   different abort policy for TCP keep-alives than for user data.  Thus,
   the TCP keep-alive mechanism might abort a connection that would
   otherwise have survived the transient period without connectivity.
   Therefore, if a connection that enables keep-alives is also using the
   TCP User Timeout Option, then the keep-alive timer MUST be set to a
   value larger than that of the adopted USER TIMEOUT.

At this moment I'm not inclined to muck with the keepalive code (although
the change could be simple) so I'll leave this case for you to handle.

>
> Today in the evening I will focus on TCP Quick ACK modifications. After
> that I am in the Alps for vacation for 5 days. Later on I will work on the
> patch (the patch is in a good state, modification and testing should
> consume only 2 evenings - hopefully ;-).
>
> Cherry, is this ok for you?

Sound good (and it's Jerry, not the girlish Cherry :) Oh, please comment on
my plan above.

Jerry

>
> Hagen
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hagen Paul Pfeifer Aug. 25, 2010, 10:59 p.m. UTC | #11
* Jerry Chu | 2010-08-25 13:20:52 [-0700]:

>Ok, let's try to finalize the API signature so our apps folks can program to it
>now and don't have to change it later when we have a more complete
>implementation involving the TCP option as well.
>
>What should we call this new option? TCP_UTO or TCP_USERTIMEOUT
>or else?

Well, currently I am not sure if this is the best idea. Your implementation
address the local timeout, user's can tweak their local timeout. UTO on the
other hand provides functionality to tweak peer's timeout. I will use your
timeout implementation (see the comments below) if I receive a TCP UTO options
message, but via setsockopt TCP_UTO I try to modify peer's TCP timeout. Local
and remote RTO of them must not necessarily coherent.  For example the local
RTO can be larger as the remote RTO.

My idea is that TCP_UTO should be used for the remote part. Another option
should be taken for the local part, no matter if TCP_UTO will overwrite the
local part too if the local timeout if smaller as the announced timeout.

TCP_REMOTE_UTO and TCP_LOCAL_RTO, ... no idea at the moment! ;(

Any ideas on that?

>It will take a single argument of unsigned int in milliseconds
>(right?) that specifies
>"user_timeout". The first retransmit timer pops after user_timeout will cause
>the connection to be aborted and ETIMEOUT to be returned.

2^32 / (1000 * 60 * 60 * 24) > 22 days -> great!

>>This interface should be sufficient for you and later
>> for me. Afterwards I provide an patch which apply on your groundwork. My
>> patch handle TCP UTO specific functionality like TCP option protocol
>> handling functionality, socket option, permissions, lower- and upper
>> bounds, ...
>
>The keepalive timer is driven off a different timer sk_timer than
>icsk_retransmit_timer
>so as far as code is concerned they are separate (and I don't see any
>interaction
>between the two).
>
>But RFC5482 does contain the following paragraph:
>
>4.2. TCP Keep-Alives
>
>   Some TCP implementations, such as those in BSD systems, use a
>   different abort policy for TCP keep-alives than for user data.  Thus,
>   the TCP keep-alive mechanism might abort a connection that would
>   otherwise have survived the transient period without connectivity.
>   Therefore, if a connection that enables keep-alives is also using the
>   TCP User Timeout Option, then the keep-alive timer MUST be set to a
>   value larger than that of the adopted USER TIMEOUT.
>
>At this moment I'm not inclined to muck with the keepalive code (although
>the change could be simple) so I'll leave this case for you to handle.
>
>>
>> Today in the evening I will focus on TCP Quick ACK modifications. After
>> that I am in the Alps for vacation for 5 days. Later on I will work on the
>> patch (the patch is in a good state, modification and testing should
>> consume only 2 evenings - hopefully ;-).
>>
>> Cherry, is this ok for you?
>
>Sound good (and it's Jerry, not the girlish Cherry :) Oh, please comment on
>my plan above.

Sounds good for me! I mean I must see working code for final conclusion, but
the basic components are good. And sorry for the naming, Jerry - it was not
intention!

Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jerry Chu Aug. 26, 2010, 1:49 a.m. UTC | #12
On Wed, Aug 25, 2010 at 3:59 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> * Jerry Chu | 2010-08-25 13:20:52 [-0700]:
>
>>Ok, let's try to finalize the API signature so our apps folks can program to it
>>now and don't have to change it later when we have a more complete
>>implementation involving the TCP option as well.
>>
>>What should we call this new option? TCP_UTO or TCP_USERTIMEOUT
>>or else?
>
> Well, currently I am not sure if this is the best idea. Your implementation
> address the local timeout, user's can tweak their local timeout. UTO on the
> other hand provides functionality to tweak peer's timeout. I will use your

Yes on a 2nd look RFC5482 seems more complex than I originally thought, allowing
many different combinations of local/adv/remote UTO... Are they really
useful, e.g.,
why allowing USER_TIMEOUT to be different from ADV_UTO?? My original thought
was the local UTO will always be the same as the one advertised to
remote so only
one API will be needed plus bunch of flags for ENABLED, CHANGEABLE...

> timeout implementation (see the comments below) if I receive a TCP UTO options
> message, but via setsockopt TCP_UTO I try to modify peer's TCP timeout. Local
> and remote RTO of them must not necessarily coherent.  For example the local
> RTO can be larger as the remote RTO.
>
> My idea is that TCP_UTO should be used for the remote part. Another option
> should be taken for the local part, no matter if TCP_UTO will overwrite the
> local part too if the local timeout if smaller as the announced timeout.

Ok. How about TCP_USER_TIMEOUT, which clearly refers to the local timeout?
(Is it useful to elevate it to SO_USER_TIMEOUT?)

You can call yours TCP_UTO and the key differentiator is 'O' (referring to a TCP
option).

>
> TCP_REMOTE_UTO and TCP_LOCAL_RTO, ... no idea at the moment! ;(
>
> Any ideas on that?
>
>>It will take a single argument of unsigned int in milliseconds
>>(right?) that specifies
>>"user_timeout". The first retransmit timer pops after user_timeout will cause
>>the connection to be aborted and ETIMEOUT to be returned.
>
> 2^32 / (1000 * 60 * 60 * 24) > 22 days -> great!
>
>>>This interface should be sufficient for you and later
>>> for me. Afterwards I provide an patch which apply on your groundwork. My
>>> patch handle TCP UTO specific functionality like TCP option protocol
>>> handling functionality, socket option, permissions, lower- and upper
>>> bounds, ...
>>
>>The keepalive timer is driven off a different timer sk_timer than
>>icsk_retransmit_timer
>>so as far as code is concerned they are separate (and I don't see any
>>interaction
>>between the two).
>>
>>But RFC5482 does contain the following paragraph:
>>
>>4.2. TCP Keep-Alives
>>
>>   Some TCP implementations, such as those in BSD systems, use a
>>   different abort policy for TCP keep-alives than for user data.  Thus,
>>   the TCP keep-alive mechanism might abort a connection that would
>>   otherwise have survived the transient period without connectivity.
>>   Therefore, if a connection that enables keep-alives is also using the
>>   TCP User Timeout Option, then the keep-alive timer MUST be set to a
>>   value larger than that of the adopted USER TIMEOUT.
>>
>>At this moment I'm not inclined to muck with the keepalive code (although
>>the change could be simple) so I'll leave this case for you to handle.
>>
>>>
>>> Today in the evening I will focus on TCP Quick ACK modifications. After
>>> that I am in the Alps for vacation for 5 days. Later on I will work on the
>>> patch (the patch is in a good state, modification and testing should
>>> consume only 2 evenings - hopefully ;-).
>>>
>>> Cherry, is this ok for you?
>>
>>Sound good (and it's Jerry, not the girlish Cherry :) Oh, please comment on
>>my plan above.
>
> Sounds good for me! I mean I must see working code for final conclusion, but
> the basic components are good. And sorry for the naming, Jerry - it was not
> intention!

Ok, will send a new patch soon (but please comment on the above and naming).

Jerry

>
> Hagen
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lars Eggert Aug. 26, 2010, 6:01 a.m. UTC | #13
Hi,

On 2010-8-26, at 4:49, Jerry Chu wrote:
> Yes on a 2nd look RFC5482 seems more complex than I originally thought, allowing
> many different combinations of local/adv/remote UTO... Are they really
> useful, e.g.,
> why allowing USER_TIMEOUT to be different from ADV_UTO?? My original thought
> was the local UTO will always be the same as the one advertised to
> remote so only
> one API will be needed plus bunch of flags for ENABLED, CHANGEABLE...


USER_TIMEOUT is what is locally used for a connection (i.e., takes into account what the remote peer advertised and what we'd like to use), while ADV_UTO is (only) what we'd like to use and are advertising.

(Yes, we initially thought we could make the mechanism simpler, but then we started to think through all the corner cases...)

Lars
Arnd Hannemann Aug. 26, 2010, 7:12 a.m. UTC | #14
Hi Lars,

Am 26.08.2010 08:01, schrieb Lars Eggert:
> On 2010-8-26, at 4:49, Jerry Chu wrote:
>   
>> Yes on a 2nd look RFC5482 seems more complex than I originally thought, allowing
>> many different combinations of local/adv/remote UTO... Are they really
>> useful, e.g.,
>> why allowing USER_TIMEOUT to be different from ADV_UTO?? My original thought
>> was the local UTO will always be the same as the one advertised to
>> remote so only
>> one API will be needed plus bunch of flags for ENABLED, CHANGEABLE...
>>     
>
> USER_TIMEOUT is what is locally used for a connection (i.e., takes into account what the remote peer advertised and what we'd like to use), while ADV_UTO is (only) what we'd like to use and are advertising.
>
> (Yes, we initially thought we could make the mechanism simpler, but then we started to think through all the corner cases...)
>   

But from the application point of view it is enough to request a
specific UTO
as a socket option, (which will then get announced via ADV_UTO), right?
Is there any reason, (besides local policy) to not abort the connection
locally
after the time the application specified via the above mentioned socket
option?


Best regards,
Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hagen Paul Pfeifer Aug. 26, 2010, 7:27 a.m. UTC | #15
On Thu, 26 Aug 2010 09:01:31 +0300, Lars Eggert wrote:

Hi Jerry, Hi Lars

> USER_TIMEOUT is what is locally used for a connection (i.e., takes into
> account what the remote peer advertised and what we'd like to use),
while
> ADV_UTO is (only) what we'd like to use and are advertising.
> 
> (Yes, we initially thought we could make the mechanism simpler, but then
> we started to think through all the corner cases...)

o TCP_USER_TIMEOUT for the local timeout seems fine. This is consistent
with the literature and everybody knows how to interpret such a variable.

o TCP_ADV_UTO for the announced timeout. It correspond with the RFC and
inherent the word "advertise".

Jerry, for you the former is of interest, you can completely ignore my
on-top-functionality. I am satisfied with this!

Cheers, Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hagen Paul Pfeifer Aug. 26, 2010, 7:42 a.m. UTC | #16
On Thu, 26 Aug 2010 09:12:24 +0200, Arnd Hannemann wrote:

>> USER_TIMEOUT is what is locally used for a connection (i.e., takes into
>> account what the remote peer advertised and what we'd like to use),
while
>> ADV_UTO is (only) what we'd like to use and are advertising.
>>
>> (Yes, we initially thought we could make the mechanism simpler, but
then
>> we started to think through all the corner cases...)
>>   
> 
> But from the application point of view it is enough to request a
> specific UTO
> as a socket option, (which will then get announced via ADV_UTO), right?
> Is there any reason, (besides local policy) to not abort the connection
> locally
> after the time the application specified via the above mentioned socket
> option?

The original USER_TIMEOUT (RFC 793) functionality boils down to Jerry's
TCP_FAILFAST patch. This _per see_ has no correlation with the ADV_UTO.
Likely that the ADV_UTO will update the USER_TIMEOUT too. I change my
position from day to day: artificially limit the mechanism and provide a
per see "clever" mechanism and keep both values coherent or provide all
freedom to the user.

Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index a778ee0..60b7244 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -105,6 +105,7 @@  enum {
 #define TCP_COOKIE_TRANSACTIONS	15	/* TCP Cookie Transactions */
 #define TCP_THIN_LINEAR_TIMEOUTS 16      /* Use linear timeouts for thin streams*/
 #define TCP_THIN_DUPACK         17      /* Fast retrans. after 1 dupack */
+#define TCP_FAILFAST		18	/* Abort connection in loss retry sooner*/
 
 /* for TCP_INFO socket option */
 #define TCPI_OPT_TIMESTAMPS	1
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index b6d3b55..6553921 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -125,6 +125,7 @@  struct inet_connection_sock {
 		int		  probe_size;
 	} icsk_mtup;
 	u32			  icsk_ca_priv[16];
+	u32			  icsk_max_timeout;
 #define ICSK_CA_PRIV_SIZE	(16 * sizeof(u32))
 };
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 176e11a..ddb548a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2391,7 +2391,12 @@  static int do_tcp_setsockopt(struct sock *sk, int level,
 		err = tp->af_specific->md5_parse(sk, optval, optlen);
 		break;
 #endif
-
+	case TCP_FAILFAST:
+		/* Cap the max timeout in ms TCP will retry/retrans
+		 * before giving up and aborting (ETIMEDOUT) a connection.
+		 */
+		icsk->icsk_max_timeout = msecs_to_jiffies(val);
+		break;
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -2610,6 +2615,10 @@  static int do_tcp_getsockopt(struct sock *sk, int level,
 	case TCP_THIN_DUPACK:
 		val = tp->thin_dupack;
 		break;
+
+	case TCP_FAILFAST:
+		val = jiffies_to_msecs(icsk->icsk_max_timeout);
+		break;
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 808bb92..95c2548 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -138,7 +138,8 @@  static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
  * retransmissions with an initial RTO of TCP_RTO_MIN.
  */
 static bool retransmits_timed_out(struct sock *sk,
-				  unsigned int boundary)
+				  unsigned int boundary,
+				  unsigned int max_timeout)
 {
 	unsigned int timeout, linear_backoff_thresh;
 	unsigned int start_ts;
@@ -159,6 +160,9 @@  static bool retransmits_timed_out(struct sock *sk,
 		timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
 			  (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
 
+	if (max_timeout != 0 && timeout > max_timeout)
+		timeout = max_timeout;
+
 	return (tcp_time_stamp - start_ts) >= timeout;
 }
 
@@ -174,7 +178,7 @@  static int tcp_write_timeout(struct sock *sk)
 			dst_negative_advice(sk);
 		retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
 	} else {
-		if (retransmits_timed_out(sk, sysctl_tcp_retries1)) {
+		if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0)) {
 			/* Black hole detection */
 			tcp_mtu_probing(icsk, sk);
 
@@ -187,14 +191,16 @@  static int tcp_write_timeout(struct sock *sk)
 
 			retry_until = tcp_orphan_retries(sk, alive);
 			do_reset = alive ||
-				   !retransmits_timed_out(sk, retry_until);
+				   !retransmits_timed_out(sk, retry_until, 0);
 
 			if (tcp_out_of_resources(sk, do_reset))
 				return 1;
 		}
 	}
 
-	if (retransmits_timed_out(sk, retry_until)) {
+	if (retransmits_timed_out(sk, retry_until,
+	    (1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV) ? 0 :
+	    icsk->icsk_max_timeout)) {
 		/* Has it gone just too far? */
 		tcp_write_err(sk);
 		return 1;
@@ -434,9 +440,35 @@  out_reset_timer:
 	} else {
 		/* Use normal (exponential) backoff */
 		icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
+		if (icsk->icsk_max_timeout &&
+		    ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) == 0) {
+			int ts;
+			unsigned int base_rto =
+			    min(__tcp_set_rto(tp), TCP_RTO_MAX);
+
+			if (unlikely(!tp->retrans_stamp))
+				ts = (int)TCP_SKB_CB(tcp_write_queue_head(sk))->when;
+			else
+				ts = (int)tp->retrans_stamp;
+			ts = icsk->icsk_max_timeout - (tcp_time_stamp - ts) -
+				base_rto-1;
+			/*
+			 * Adjust rto so that the total timeout is not far off
+			 * the max_timeout range. Also if the total # of
+			 * retries would be less than 6, allow one more shot.
+			 */
+			if (icsk->icsk_rto > ts && icsk->icsk_retransmits < 6)
+				icsk->icsk_rto >>= 1;
+			if ((int)(icsk->icsk_rto) > ts) {
+				if (ts < (int)base_rto)
+					icsk->icsk_rto = base_rto;
+				else
+					icsk->icsk_rto = ts;
+			}
+		}
 	}
 	inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
-	if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
+	if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1, 0))
 		__sk_dst_reset(sk);
 
 out:;