From patchwork Wed Feb 4 08:10:32 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fan Du X-Patchwork-Id: 436174 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id E0C31140187 for ; Wed, 4 Feb 2015 19:14:45 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932850AbbBDIOm (ORCPT ); Wed, 4 Feb 2015 03:14:42 -0500 Received: from mga02.intel.com ([134.134.136.20]:7161 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753786AbbBDIOl (ORCPT ); Wed, 4 Feb 2015 03:14:41 -0500 Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 04 Feb 2015 00:14:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,517,1418112000"; d="scan'208";a="647084858" Received: from dufan-optiplex-9010.bj.intel.com ([10.238.155.116]) by orsmga001.jf.intel.com with ESMTP; 04 Feb 2015 00:14:38 -0800 From: Fan Du To: netdev@vger.kernel.org Cc: jesse@nicira.com, pshelar@nicira.com, dev@openvswitch.org, fengyuleidian0615@gmail.com Subject: [PATCH RFC] ipv4 tcp: Use fine granularity to increase probe_size for tcp pmtu Date: Wed, 4 Feb 2015 16:10:32 +0800 Message-Id: <1423037432-13996-1-git-send-email-fan.du@intel.com> X-Mailer: git-send-email 1.7.9.5 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org A couple of month ago, I proposed a fix for over-MTU-sized vxlan packet loss at link[1], neither by fragmenting the tunnelled vxlan packet, nor pushing back PMTU ICMP need fragmented message is accepted by community. The upstream workaround is by adjusting guest mtu smaller or host mtu bigger, or by making virtio driver auto-tuned guest mtu(no consensus by now). Note, gre tunnel also suffer the over-MTU-sized packet loss. While For TCPv4 case, this issue could be solved by using Packetization Layer Path MTU Discovery which is defined as [3] from commit: 5d424d5a674f ("[TCP]: MTU probing"). echo 1 > /proc/sys/net/ipv4/tcp_mtu_probing One drawback of tcp level mtu probing is:The original strategy is double mss_cache for each probe, this is way too aggressive for over-MTU-sized vxlan packet loss issue from the performance result. Also, the probing is characterized by tcp retransmission, which usual taking 6 seconds from the first drop packet to normal connectivity recovery. By incrementing 25% of original mss_cache each time, performance boost from ~1.3Gbits/s(mss_cache 1024Bytes) to ~1.55Gbits/s( mss_cache 1250Bytes), more generic theme could be used there for other tunnel technology. No sure why tcp level mtu probing got disabled by default, any historic known issues or pitfalls? [1]: http://www.spinics.net/lists/netdev/msg306502.html [2]: http://www.ietf.org/rfc/rfc4821.txt Signed-off-by: Fan Du --- net/ipv4/tcp_output.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 20ab06b..ab7e46b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1856,9 +1856,11 @@ static int tcp_mtu_probe(struct sock *sk) tp->rx_opt.num_sacks || tp->rx_opt.dsack) return -1; - /* Very simple search strategy: just double the MSS. */ + /* Very simple search strategy: + * Increment 25% of orignal MSS forward + */ mss_now = tcp_current_mss(sk); - probe_size = 2 * tp->mss_cache; + probe_size = (tp->mss_cache + (tp->mss_cache >> 2)); size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache; if (probe_size > tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_high)) { /* TODO: set timer for probe_converge_event */