Message ID | 1442385255-27014-1-git-send-email-wen.gang.wang@oracle.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Wengang Wang <wen.gang.wang@oracle.com> Date: Wed, 16 Sep 2015 14:34:15 +0800 > For multi-cast, we should find valid route(thus get the meaniful pmtu) for > the package on the socket which is not bound to a device(sk_bound_dev_if > being 0) too. Your patch breaks exactly the situation explained in full detail in the huge comment about the first change you are making: /* Special hack: user can direct multicasts and limited broadcast via necessary interface without fiddling with IP_MULTICAST_IF or IP_PKTINFO. This hack is not just for fun, it allows vic,vat and friends to work. They bind socket to loopback, set ttl to zero and expect that it will work. From the viewpoint of routing cache they are broken, because we are not allowed to build multicast path with loopback source addr (look, routing cache cannot know, that ttl is zero, so that packet will not leave this host and route is valid). Luckily, this hack is good workaround. */ That situation will now fail after your patch. So I cannot apply this patch, sorry. I know what you want, you want to end up with a cached route that will track the PMTU. But this hack will break other things at the same time so is not acceptable. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, Thanks for your review on this. 在 2015年09月18日 12:20, David Miller 写道: > From: Wengang Wang <wen.gang.wang@oracle.com> > Date: Wed, 16 Sep 2015 14:34:15 +0800 > >> For multi-cast, we should find valid route(thus get the meaniful pmtu) for >> the package on the socket which is not bound to a device(sk_bound_dev_if >> being 0) too. > Your patch breaks exactly the situation explained in full detail > in the huge comment about the first change you are making: > > /* Special hack: user can direct multicasts > and limited broadcast via necessary interface > without fiddling with IP_MULTICAST_IF or IP_PKTINFO. > This hack is not just for fun, it allows > vic,vat and friends to work. > They bind socket to loopback, set ttl to zero > and expect that it will work. > From the viewpoint of routing cache they are broken, > because we are not allowed to build multicast path > with loopback source addr (look, routing cache > cannot know, that ttl is zero, so that packet > will not leave this host and route is valid). > Luckily, this hack is good workaround. > */ > > That situation will now fail after your patch. Yes, I noticed the above comment before I made the patch. I have question regarding that comment: Seems it's talking about the loopback source address(loopback device interface is with the index of 1). Do you think we can make something specific to loopback device and let the package with other outbound devices go through the routing cache? > So I cannot apply this patch, sorry. > > I know what you want, you want to end up with a cached route that > will track the PMTU. Yes, that's exactly what I want. Actually the user space can work fine by adding a syscall setsockopt(..., SO_BINDTODEVICE, ..) to bind socket to the device. Well, the problem here is that there are a lot of user space applications which don't bind sockets and they are working fine with old kernel(2.6.39). When moving to new kernel(4.x), the problem appeared. And seems it's hard for force all of them change to bind sockets and recompile. -- that's the pain. Do you have any idea which won't break the existing hack but can help with my case? thanks, wengang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Wengang Wang <wen.gang.wang@oracle.com> Date: Mon, 21 Sep 2015 10:04:59 +0800 > Do you have any idea which won't break the existing > -- hack but can help with my case? Although I am obligated to find deficiencies and shortcomings in your submissions, I am not obligated to design or implement your changes correctly for you, sorry. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 5f4a556..032481a 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2097,7 +2097,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) */ fl4->flowi4_oif = dev_out->ifindex; - goto make_route; + goto lookup; } if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) { @@ -2153,6 +2153,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) goto make_route; } +lookup: if (fib_lookup(net, fl4, &res, 0)) { res.fi = NULL; res.table = NULL;
For multi-cast, we should find valid route(thus get the meaniful pmtu) for the package on the socket which is not bound to a device(sk_bound_dev_if being 0) too. From man page of socket(7) SO_BINDTODEVICE Bind this socket to a particular device like “eth0”, as specified in the passed interface name. If the name is an empty string or the option length is zero, the socket device binding is removed. The passed option is a variable-length null-terminated interface name string with the maximum size of IFNAMSIZ. If a socket is bound to an interface, only packets received from that particular interface are processed by the socket. Note that this works only for some socket types, particularly AF_INET sockets. It is not supported for packet sockets (use normal bind(2) there). The man page doesn't say when socket not bound packages won't be routed. A problem is hit that all multi-cast packages dropped by kernel(from sender host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096 length multi-cast package. In side IPoIB the first send is dropped because is exeeding the internal package size limitation mcast_mtu which is 2044. So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A correct route is configured for the multi-cast, so the setting of pmtu cucceeded and the next multi-cast package(to the same target) is expected to succeed(it would be well fragmented accroding to the pmtu I just set). But actually the second and later multi-cast packages got dropped too. And the reason is that the neighor looking up(fib_lookup) is skipped because of the socket is not bound to device(sk_bound_dev_if being 0). After applied the patch I proposed here, it works fine. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> --- net/ipv4/route.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)