diff mbox

[PATCHv3,net-next,3/4] ipv4: shrink current mss for tcp PMTU blackhole detection

Message ID 1425093785-27380-4-git-send-email-fan.du@intel.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Fan Du Feb. 28, 2015, 3:23 a.m. UTC
Reducing current tcp mss instead of search_low will make
more sense, as current tcp mss still got lost.

In addition, rename tcp_mtu_probing to tcp_blackhole_probe
to clearify confusion between tcp_mtu_probing and
tcp_mtu_probe.

Signed-off-by: Fan Du <fan.du@intel.com>
---
 net/ipv4/tcp_timer.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

Comments

John Heffner Feb. 28, 2015, 11:39 p.m. UTC | #1
On Fri, Feb 27, 2015 at 10:23 PM, Fan Du <fan.du@intel.com> wrote:
> Reducing current tcp mss instead of search_low will make
> more sense, as current tcp mss still got lost.

> @@ -113,7 +114,8 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
>                         struct tcp_sock *tp = tcp_sk(sk);
>                         int mss;
>
> -                       mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
> +                       /* try mss smaller than current mss */
> +                       mss = tcp_current_mss(sk) >> 1;
>                         mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
>                         mss = max(mss, 68 - tp->tcp_header_len);
>                         icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
> @@ -176,7 +178,7 @@ static int tcp_write_timeout(struct sock *sk)

I don't believe there's a problem with the code as written.  Modulo
headers, search_low <= mss =< search_high.  This code, on successive
timeouts, widens the search range by adjusting search_low (and mss)
downward.

  -John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
FengYu LeiDian March 2, 2015, 9:29 a.m. UTC | #2
于 2015年03月01日 07:39, John Heffner 写道:
> On Fri, Feb 27, 2015 at 10:23 PM, Fan Du<fan.du@intel.com>  wrote:
>> >Reducing current tcp mss instead of search_low will make
>> >more sense, as current tcp mss still got lost.
>> >@@ -113,7 +114,8 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
>> >                         struct tcp_sock *tp = tcp_sk(sk);
>> >                         int mss;
>> >
>> >-                       mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
>> >+                       /* try mss smaller than current mss */
>> >+                       mss = tcp_current_mss(sk) >> 1;
>> >                         mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
>> >                         mss = max(mss, 68 - tp->tcp_header_len);
>> >                         icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
>> >@@ -176,7 +178,7 @@ static int tcp_write_timeout(struct sock *sk)
> I don't believe there's a problem with the code as written.  Modulo
> headers, search_low <= mss =< search_high.  This code, on successive
> timeouts, widens the search range by adjusting search_low (and mss)
> downward.
With original approach(doubling mss), mss is not in between search_low and search_high,
it always equates search_low(subtract headers), the potential mss in case of blackhole
is 256 and 128, after doubling, it will become 512 and 256 eventually no matter how
route changes, even mtu reduced from 1500 to 1100 in intermediate node.

After using binary search, halve search_low will make search window left boundary expending
in a slower manner, as a result the limit of mss is search_high/2, because search_high does
not change.

Timeout indicates search_high should be set to the new mtu corresponding to current_mss no
matter how we change search_low. So the best shot here IMO would be updating search_high
with current_mss, which in return makes the search window *slide* from right to left, and
the probing will converge in good speed eventually.

So my thoughts is:
@@ -113,6 +113,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
                         struct tcp_sock *tp = tcp_sk(sk);
                         int mss;

+                       icsk_mtup.search_high = tcp_mss_to_mtu(sk, tcp_current_mss(sk));
                         mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
                         mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
                         mss = max(mss, 68 - tp->tcp_header_len);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Heffner March 2, 2015, 8:32 p.m. UTC | #3
On Mon, Mar 2, 2015 at 4:29 AM, Fan Du <fengyuleidian0615@gmail.com> wrote:
> Timeout indicates search_high should be set to the new mtu corresponding to
> current_mss no
> matter how we change search_low. So the best shot here IMO would be updating
> search_high
> with current_mss, which in return makes the search window *slide* from right
> to left, and
> the probing will converge in good speed eventually.
>
> So my thoughts is:
> @@ -113,6 +113,7 @@ static void tcp_mtu_probing(struct inet_connection_sock
> *icsk, struct sock *sk)
>                         struct tcp_sock *tp = tcp_sk(sk);
>                         int mss;
>
> +                       icsk_mtup.search_high = tcp_mss_to_mtu(sk,
> tcp_current_mss(sk));
>                         mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low)
>>> 1;
>                         mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
>                         mss = max(mss, 68 - tp->tcp_header_len);

Search_high should be adjusted downward only when you're quite certain
that you've gotten a negative signal.  There are many possible reasons
for successive timeouts including intermittent disconnection, and they
should not have a persistent (or very long-term) effect on MTU.  Leave
search_high where it is until a working MTU can be established, then
probe upward until probing can give you confidence you've found a new
ceiling, or gotten back to the old one.

If you think the current approach is broken, it would help to see a
concrete demonstration of how it's deficient (a real packet trace is
good!), and how a different approach work better.

Thanks,
  -John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
FengYu LeiDian March 3, 2015, 9:18 a.m. UTC | #4
于 2015年03月03日 04:32, John Heffner 写道:
> On Mon, Mar 2, 2015 at 4:29 AM, Fan Du<fengyuleidian0615@gmail.com>  wrote:
>> >Timeout indicates search_high should be set to the new mtu corresponding to
>> >current_mss no
>> >matter how we change search_low. So the best shot here IMO would be updating
>> >search_high
>> >with current_mss, which in return makes the search window*slide*  from right
>> >to left, and
>> >the probing will converge in good speed eventually.
>> >
>> >So my thoughts is:
>> >@@ -113,6 +113,7 @@ static void tcp_mtu_probing(struct inet_connection_sock
>> >*icsk, struct sock *sk)
>> >                         struct tcp_sock *tp = tcp_sk(sk);
>> >                         int mss;
>> >
>> >+                       icsk_mtup.search_high = tcp_mss_to_mtu(sk,
>> >tcp_current_mss(sk));
>> >                         mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low)
>>>> >>>1;
>> >                         mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
>> >                         mss = max(mss, 68 - tp->tcp_header_len);
> Search_high should be adjusted downward only when you're quite certain
> that you've gotten a negative signal.  There are many possible reasons
> for successive timeouts including intermittent disconnection, and they
> should not have a persistent (or very long-term) effect on MTU.  Leave
> search_high where it is until a working MTU can be established, then
> probe upward until probing can give you confidence you've found a new
> ceiling, or gotten back to the old one.
>
> If you think the current approach is broken, it would help to see a
> concrete demonstration of how it's deficient (a real packet trace is
> good!), and how a different approach work better.

 > With original approach(doubling mss), mss is not in between search_low and search_high,
 > it always equates search_low(subtract headers), the potential mss in case of blackhole
 > is 256 and 128, after doubling, it will become 512 and 256 eventually no matter how
 > route changes, even mtu reduced from 1500 to 1100 in intermediate node.

As for above statement, my test scenario is simple, using vxlan tunnel to connect two
docker instances on two hosts. All mtu default to 1500, run iperf between docker instances.
After the connection is setup, adjust the iperf sender host physical eth0 mtu to
1100, and after a couple of seconds, mss will be set to 512.

And after using binary search, actually, search_high will be set to probe_size trigger from
tcp_fastretrans_alert by my investigation. So the searching window does slide to left, thus
I will drop this patch. Maybe this is because my test method is not practical. And I believe
what you said about:
> There are many possible reasons for successive timeouts including intermittent disconnection,
 > and they should not have a persistent (or very long-term) effect on MTU.

Never mind, no matter whether to adjust search_high, the reprobe timer will restore search_high
to maximum allowed, and once again, new mss will be available.

I will resend the rest of patches after incorporating your comments for reviewing.
Thanks for your feedback.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 0732b78..3465175 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -99,7 +99,8 @@  static int tcp_orphan_retries(struct sock *sk, int alive)
 	return retries;
 }
 
-static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
+static void tcp_blackhole_probe(struct inet_connection_sock *icsk,
+				struct sock *sk)
 {
 	struct net *net = sock_net(sk);
 
@@ -113,7 +114,8 @@  static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
 			struct tcp_sock *tp = tcp_sk(sk);
 			int mss;
 
-			mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
+			/* try mss smaller than current mss */
+			mss = tcp_current_mss(sk) >> 1;
 			mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
 			mss = max(mss, 68 - tp->tcp_header_len);
 			icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
@@ -176,7 +178,7 @@  static int tcp_write_timeout(struct sock *sk)
 	} else {
 		if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0, 0)) {
 			/* Black hole detection */
-			tcp_mtu_probing(icsk, sk);
+			tcp_blackhole_probe(icsk, sk);
 
 			dst_negative_advice(sk);
 		}