From patchwork Mon Sep 21 08:00:09 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wengang Wang X-Patchwork-Id: 520166 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 2715E14027C for ; Mon, 21 Sep 2015 17:59:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756030AbbIUH64 (ORCPT ); Mon, 21 Sep 2015 03:58:56 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:40016 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755990AbbIUH6z (ORCPT ); Mon, 21 Sep 2015 03:58:55 -0400 Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t8L7wqqm016810 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 21 Sep 2015 07:58:53 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id t8L7wq01026160 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Mon, 21 Sep 2015 07:58:52 GMT Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id t8L7wpGp016934 for ; Mon, 21 Sep 2015 07:58:52 GMT Received: from oracle.com (/10.182.64.160) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 21 Sep 2015 00:58:51 -0700 From: Wengang Wang To: netdev@vger.kernel.org Cc: wen.gang.wang@oracle.com Subject: [PATCH] ip: find correct route for socket which is not bound (v2) Date: Mon, 21 Sep 2015 16:00:09 +0800 Message-Id: <1442822409-9799-1-git-send-email-wen.gang.wang@oracle.com> X-Mailer: git-send-email 2.1.0 MIME-Version: 1.0 X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This is the v2, comparing the v1, the changes is: * for loopback outbound device, it continue skipping cached route; for others, it goes through the cached route. For multi-cast, we should find valid route(thus get the meaniful pmtu) for the package on the socket which is not bound to a device(sk_bound_dev_if being 0) too. From man page of socket(7) SO_BINDTODEVICE Bind this socket to a particular device like “eth0”, as specified in the passed interface name. If the name is an empty string or the option length is zero, the socket device binding is removed. The passed option is a variable-length null-terminated interface name string with the maximum size of IFNAMSIZ. If a socket is bound to an interface, only packets received from that particular interface are processed by the socket. Note that this works only for some socket types, particularly AF_INET sockets. It is not supported for packet sockets (use normal bind(2) there). The man page doesn't say when socket not bound packages won't be routed. A problem is hit that all multi-cast packages dropped by kernel(from sender host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096 length multi-cast package. In side IPoIB the first send is dropped because is exeeding the internal package size limitation mcast_mtu which is 2044. So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A correct route is configured for the multi-cast, so the setting of pmtu cucceeded and the next multi-cast package(to the same target) is expected to succeed(it would be well fragmented accroding to the pmtu I just set). But actually the second and later multi-cast packages got dropped too. And the reason is that the neighor looking up(fib_lookup) is skipped because of the socket is not bound to device(sk_bound_dev_if being 0). After applied the patch I proposed here, it works fine. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 5f4a556..c0534c2 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2097,7 +2097,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) */ fl4->flowi4_oif = dev_out->ifindex; - goto make_route; + if (dev_out->flags & IFF_LOOPBACK) + goto make_route; + else + goto lookup; } if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) { @@ -2153,6 +2156,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) goto make_route; } +lookup: if (fib_lookup(net, fl4, &res, 0)) { res.fi = NULL; res.table = NULL;