Message ID | 4D7F9592.5050408@iki.fi |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 03/15/2011 06:36 PM, Timo Teräs wrote: > On 03/15/2011 05:34 PM, Eric Dumazet wrote: >> (Timo mentioned : >> If the NOARP packets are not dropped, ipgre_tunnel_xmit() will >> take rt->rt_gateway (= NBMA IP) and use that for route >> look up (and may lead to bogus xfrm acquires).) >> >> Is the following works for you ? > > I have memory that _header() is called with daddr being valid pointer, > but pointing to zero memory. So basically my situation would break with > this. Ok, I've gone through now the code paths. And I believe I made originally the assumption that ipgre_tunnel_xmit would should not ever get tiph->daddr == 0 if we got ipgre_header() call. However, what actually happens is for (NOARP interfaces) in arp.c: - unicast traffic gets NOARP entries mapped to dev->dev_addr (in gre case it's the tunnel 'local' address) - multicast gets mapped to dev->broadcast And if we create gre tunnel without local or remote address we end up getting the NOARP entries with hwaddr 0.0.0.0. Now, for unicast traffic it's mostly pointless. If the tunnel was locally bound the packets would never leave: they'd get NOARP entry for local address. And if it's locally unbound, the packets get rt_gateway, which is pretty confusing routing wise (it apparently assumes your link device has same ipv4 subnet as the gre device). On multicast side it makes a bit more sense to map multicast groups. And this happened implicitly. IMHO, we should fix the arp code in ipv4 and ipv6 to do proper ARPHRD_IPGRE mappings so that the _header() gets called with proper data. I think the multicast-to-same-multicast group mapping makes sense. But do not really know what to do with unicast packets sent to gre interface with NOARP and no link broadcast IP address. Actually this was my problem: the unicast packets for gre interface with NOARP flag resulted in trying to send packets out. So I could probably just fix this by creating my gre interface *with* the ARP flag in the first place. But is there any sensible thing to do with the unicast packets in above case? I think those should be just dropped. Or does someone think that it'd ever make sense to take the inner unicast address and use it as the outer address too? If so, my patch should be just reverted. My honest thought is to keep the ip_gre header check as it is currently and fix arp code in ipv4 / neighbour code in ipv6 to do the proper NOARP mappings as needed. We might be able to get rid of the huge protocol dependent "tiph->daddr == 0" check in xmit path this way, and make sure that the header is set properly. This would also allow us to see proper NOARP entries when doing "ip neigh show nud noarp". Now it will just show 0.0.0.0 entries with gre devices without telling where to the packets are actually being sent to. Any thoughts? Cheers, Timo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Timo, --- On Tue, 3/15/11, Timo Teräs <timo.teras@iki.fi> wrote: > From: Timo Teräs <timo.teras@iki.fi> > Subject: Re: Multicast Fails Over Multipoint GRE Tunnel > To: "Eric Dumazet" <eric.dumazet@gmail.com> > Cc: "Doug Kehn" <rdkehn@yahoo.com>, netdev@vger.kernel.org > Date: Tuesday, March 15, 2011, 12:36 PM > On 03/15/2011 05:34 PM, Eric Dumazet > wrote: > > Le lundi 14 mars 2011 à 16:34 -0700, Doug Kehn a > écrit : > >> I'm running kernel version 2.6.36 on ARM XSCALE > (big-endian) and multicast over a multipoint GRE tunnel > isn't working. For my architecture, this worked on > 2.6.26.8. For x86, multicast over a multipoint GRE > tunnel worked with kernel version 2.6.31 but failed with > version 2.6.35. Multicast over a multipoint GRE tunnel > fails because ipgre_header() fails the 'if (iph->daddr)' > check and reutrns -t->hlen. ipgre_header() is being > called, from neigh_connected_output(), with a non-null > daddr; the contents of daddr is zero. > >> > >> Reverting the ip_gre.c patch posted in http://marc.info/?l=linux-netdev&m=126762491525281&w=2 > resolves the problem. (Reviewing the HEAD of > net-next-2.6 it appears that ipgre_header() remains > unchanged from 2.6.36.) > >> > >> The configuration used to discover/diagnose the > problem: > >> > >> ip tunnel add tun1 mode gre key 11223344 ttl 64 > csum remote any > >> ip link set dev tun1 up > >> ip link set dev tun1 multicast on > >> ip addr flush dev tun1 > >> ip addr add 10.40.92.114/24 broadcast 10.40.92.255 > dev tun1 > >> > >> 12: tun1: <MULTICAST,NOARP,UP,10000> mtu > 1468 qdisc noqueue > >> link/gre 0.0.0.0 brd > 0.0.0.0 > >> inet 10.40.92.114/24 brd > 10.40.92.255 scope global tun1 > >> > >> Then attempt: > >> ping -I tun1 224.0.0.9 > >> > >> Are additional configuration steps now required > for multicast over multipoint GRE tunnel or is > ipgre_header() in error? > > > > Hi Doug > > > > CC Timo Teras <timo.teras@iki.fi> > > > > I would do a partial revert of Timo patch, but this > means initial > > concern should be addressed ? > > > > (Timo mentioned : > > If the NOARP packets are not > dropped, ipgre_tunnel_xmit() will > > take rt->rt_gateway (= NBMA IP) > and use that for route > > look up (and may lead to bogus xfrm > acquires).) > > > > > > Is the following works for you ? > > I have memory that _header() is called with daddr being > valid pointer, > but pointing to zero memory. So basically my situation > would break with > this. > > The above configuration would be fixable by setting > broadcast to tun1 > interface explicitly. But I'm not sure if the above > configuration is > somehow different that it'd need to work. > > Basically how things work on send path is: > 1. arp_constructor maps multicast address to NUD_NOARP > 2. arp_mc_map in turn copies dev->broadcast to haddr > (so you get the ip packet pointing to zero > ip) > - i assumed normally gre tunnels would have > the > broadcast address set > 3. ipgre_tunnel_xmit checks for daddr==0 and uses the > rt_gateway then, which would map to the > multicast > address > > So I assume that the above ping command not working would > end up sending > gre encapsulated packet where both the inner and outer IP > address is the > same multicast address? > > Looks like my patch also broke the default behaviour that > if one has > such GRE tunnel, and you send unicast packets, it would > default the link > destination to be same as the inner destination. Not sure > if this would > be useful. > > So basically my situation is undistinguishable from the > above one. The > only difference is that the above problems only with > multicast, and I'm > having with unicast. > > I think the fundamental problem is that arp_mc_map maps the > multicast > address to zeroes (due to device broadcast being zero). > > Could we instead maybe do something like: > > diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c > index 7927589..372448a 100644 > --- a/net/ipv4/arp.c > +++ b/net/ipv4/arp.c > @@ -215,6 +215,12 @@ int arp_mc_map(__be32 addr, u8 *haddr, > struct > net_device *dev, int dir) > case ARPHRD_INFINIBAND: > > ip_ib_mc_map(addr, dev->broadcast, haddr); > > return 0; > + case ARPHRD_IPGRE: > + > if (dev->broadcast) > + > memcpy(haddr, > dev->broadcast, dev->addr_len); > + > else > + > memcpy(haddr, &addr, > sizeof(addr)); > + > return 0; > default: > if > (dir) { > > memcpy(haddr, dev->broadcast, > dev->addr_len); > I wasn't sure if the above patch was in addition too or in lieu of the partial revert of ipgre_header() suggested by Eric; both cases were attempted. The multicast scenario described does not work if only the arp_mc_map() patch is applied. The multicast scenario described does work if the partial revert of ipgre_header(), suggested by Eric, and the arp_mc_map patch are applied. Regards, ...doug -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Timo, --- On Tue, 3/15/11, Timo Teräs <timo.teras@iki.fi> wrote: > From: Timo Teräs <timo.teras@iki.fi> > Subject: Re: Multicast Fails Over Multipoint GRE Tunnel > To: "Eric Dumazet" <eric.dumazet@gmail.com> > Cc: "Doug Kehn" <rdkehn@yahoo.com>, netdev@vger.kernel.org > Date: Tuesday, March 15, 2011, 12:36 PM > On 03/15/2011 05:34 PM, Eric Dumazet > wrote: > > Le lundi 14 mars 2011 à 16:34 -0700, Doug Kehn a > écrit : > >> I'm running kernel version 2.6.36 on ARM XSCALE > (big-endian) and multicast over a multipoint GRE tunnel > isn't working. For my architecture, this worked on > 2.6.26.8. For x86, multicast over a multipoint GRE > tunnel worked with kernel version 2.6.31 but failed with > version 2.6.35. Multicast over a multipoint GRE tunnel > fails because ipgre_header() fails the 'if (iph->daddr)' > check and reutrns -t->hlen. ipgre_header() is being > called, from neigh_connected_output(), with a non-null > daddr; the contents of daddr is zero. > >> > >> Reverting the ip_gre.c patch posted in http://marc.info/?l=linux-netdev&m=126762491525281&w=2 > resolves the problem. (Reviewing the HEAD of > net-next-2.6 it appears that ipgre_header() remains > unchanged from 2.6.36.) > >> > >> The configuration used to discover/diagnose the > problem: > >> > >> ip tunnel add tun1 mode gre key 11223344 ttl 64 > csum remote any > >> ip link set dev tun1 up > >> ip link set dev tun1 multicast on > >> ip addr flush dev tun1 > >> ip addr add 10.40.92.114/24 broadcast 10.40.92.255 > dev tun1 > >> > >> 12: tun1: <MULTICAST,NOARP,UP,10000> mtu > 1468 qdisc noqueue > >> link/gre 0.0.0.0 brd > 0.0.0.0 > >> inet 10.40.92.114/24 brd > 10.40.92.255 scope global tun1 > >> > >> Then attempt: > >> ping -I tun1 224.0.0.9 > >> > >> Are additional configuration steps now required > for multicast over multipoint GRE tunnel or is > ipgre_header() in error? > > > > Hi Doug > > > > CC Timo Teras <timo.teras@iki.fi> > > > > I would do a partial revert of Timo patch, but this > means initial > > concern should be addressed ? > > > > (Timo mentioned : > > If the NOARP packets are not > dropped, ipgre_tunnel_xmit() will > > take rt->rt_gateway (= NBMA IP) > and use that for route > > look up (and may lead to bogus xfrm > acquires).) > > > > > > Is the following works for you ? > > I have memory that _header() is called with daddr being > valid pointer, > but pointing to zero memory. So basically my situation > would break with > this. > > The above configuration would be fixable by setting > broadcast to tun1 > interface explicitly. But I'm not sure if the above > configuration is > somehow different that it'd need to work. > > Basically how things work on send path is: > 1. arp_constructor maps multicast address to NUD_NOARP > 2. arp_mc_map in turn copies dev->broadcast to haddr > (so you get the ip packet pointing to zero > ip) > - i assumed normally gre tunnels would have > the > broadcast address set > 3. ipgre_tunnel_xmit checks for daddr==0 and uses the > rt_gateway then, which would map to the > multicast > address > > So I assume that the above ping command not working would > end up sending > gre encapsulated packet where both the inner and outer IP > address is the > same multicast address? > > Looks like my patch also broke the default behaviour that > if one has > such GRE tunnel, and you send unicast packets, it would > default the link > destination to be same as the inner destination. Not sure > if this would > be useful. > > So basically my situation is undistinguishable from the > above one. The > only difference is that the above problems only with > multicast, and I'm > having with unicast. > > I think the fundamental problem is that arp_mc_map maps the > multicast > address to zeroes (due to device broadcast being zero). > > Could we instead maybe do something like: > > diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c > index 7927589..372448a 100644 > --- a/net/ipv4/arp.c > +++ b/net/ipv4/arp.c > @@ -215,6 +215,12 @@ int arp_mc_map(__be32 addr, u8 *haddr, > struct > net_device *dev, int dir) > case ARPHRD_INFINIBAND: > > ip_ib_mc_map(addr, dev->broadcast, haddr); > > return 0; > + case ARPHRD_IPGRE: > + > if (dev->broadcast) > + > memcpy(haddr, > dev->broadcast, dev->addr_len); > + > else > + > memcpy(haddr, &addr, > sizeof(addr)); > + > return 0; > default: > if > (dir) { > > memcpy(haddr, dev->broadcast, > dev->addr_len); > I wasn't sure if the above patch was in addition too or in lieu of the partial revert of ipgre_header() suggested by Eric; both cases were attempted. The multicast scenario described does not work if only the arp_mc_map() patch is applied. The multicast scenario described does work if the partial revert of ipgre_header(), suggested by Eric, and the arp_mc_map patch are applied. Regards, ...doug -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 7927589..372448a 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -215,6 +215,12 @@ int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir) case ARPHRD_INFINIBAND: ip_ib_mc_map(addr, dev->broadcast, haddr); return 0; + case ARPHRD_IPGRE: + if (dev->broadcast) + memcpy(haddr, dev->broadcast, dev->addr_len); + else + memcpy(haddr, &addr, sizeof(addr)); + return 0; default: if (dir) {