Message ID | 20160506055705.GA9276@penelope.isobedori.kobe.vergenet.net |
---|---|
State | Not Applicable |
Headers | show |
On Fri, 6 May 2016 14:57:07 +0900, Simon Horman wrote: > On Thu, May 05, 2016 at 10:37:08AM -0700, pravin shelar wrote: > > On transmit side you are using mac_len to detect l3 packet, why not do > > same while extracting the key? I agree. The skb should be self-contained, i.e. it should be obvious whether it has the MAC header set or not just from the skb itself at any point in the packet processing. Otherwise, I'd expect things like recirculation to break after push/pop of eth header. > Unfortunately mac_len can't be relied on here, emprically it has the same > value (28 in my tests) for both the TEB and layer3 case above. That's strange, it looks like there's something setting the mac header unconditionally in ovs code. We should find that place and change it. The ARPHRD_NONE interfaces don't even set mac_header and mac_len, this will need to be set by ovs upon getting frame from such interface. > Perhaps that could be changed by futher enhancements in the tunneling code > but I think things are symetric as they stand: > > * On recieve skb->protocol can be read to distinguish TEB and layer3 packets > * On transmit skb->protocol should be set to distinguish TEB and layer3 packets Yes, but you need to act upon this directly after receiving the frame/just before sending the frame and set up an internal flag that will be used throughout the code. That way, the packet can be handed over to different parts of the code, recirculated, etc. without worries. skb->mac_len is probably a good candidate for such flag. Jiri
On Fri, May 06, 2016 at 11:25:14AM +0200, Jiri Benc wrote: > On Fri, 6 May 2016 14:57:07 +0900, Simon Horman wrote: > > On Thu, May 05, 2016 at 10:37:08AM -0700, pravin shelar wrote: > > > On transmit side you are using mac_len to detect l3 packet, why not do > > > same while extracting the key? > > I agree. The skb should be self-contained, i.e. it should be obvious > whether it has the MAC header set or not just from the skb itself at > any point in the packet processing. Otherwise, I'd expect things like > recirculation to break after push/pop of eth header. > > > Unfortunately mac_len can't be relied on here, emprically it has the same > > value (28 in my tests) for both the TEB and layer3 case above. > > That's strange, it looks like there's something setting the mac header > unconditionally in ovs code. We should find that place and change it. It seems to be caused by the following: 1. __ipgre_rcv() calls skb_pop_mac_header() which sets skb->mac_header to the skb->network_header. 2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls skb_reset_network_header(). This updates skb->network_header to just after the end of the GRE header. This is 28 bytes after the old skb->network_header as there is a 20 byte IPv4 header followed by an 8 byte GRE header. 3. Later, dev_gro_receive() calls skb_reset_mac_len(). This calculates skb->mac_len based on skb->network_header and skb->mac_header. I.e. 28 bytes. I think this may be possible to address by calling skb_reset_network_header() instead of skb_pop_mac_header() in __ipgre_rcv(). I think that would leave skb->mac_header and skb->network_header both set to just after the end of the GRE header and result in mac_len of 0. It looks like this owuld be for for both TEB and non-TEB GRE packets and that OVS would need to check against mac_len, protocol and most likely dev->type early on. > The ARPHRD_NONE interfaces don't even set mac_header and mac_len, this > will need to be set by ovs upon getting frame from such interface. Noted. > > Perhaps that could be changed by futher enhancements in the tunneling code > > but I think things are symetric as they stand: > > > > * On recieve skb->protocol can be read to distinguish TEB and layer3 packets > > * On transmit skb->protocol should be set to distinguish TEB and layer3 packets > > Yes, but you need to act upon this directly after receiving the > frame/just before sending the frame and set up an internal flag that > will be used throughout the code. That way, the packet can be handed > over to different parts of the code, recirculated, etc. without > worries. skb->mac_len is probably a good candidate for such flag. Its possible that I've overlooked something but as things stand I think things look like this: * ovs_flow_key_extract() keys off dev->type and skb->protocol. * ovs_flow_key_extract() calls key_extract() which amongst other things sets up the skb->mac_header and skb->mac_len of the skb. * ovs_flow_key_extract() sets skb->protocol to that of the inner packet in the case of TEB * Actions update the above mentioned skb fields as appropriate. So it seems to me that it should be safe to rely on skb->protocol in the receive path. Or more specifically, in netdev_port_receive(). If mac_len is also able to be used then I think fine. But it seems to me that it needs to be set up by OvS at least for the ARPHRD_NONE case. This could be done early on, say in netdev_port_receive(). But it seems that would involve duplicating some of what is already occurring in key_extract().
On Mon, 9 May 2016 17:04:22 +0900, Simon Horman wrote: > It seems to be caused by the following: > > 1. __ipgre_rcv() calls skb_pop_mac_header() which > sets skb->mac_header to the skb->network_header. > > 2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls > skb_reset_network_header(). This updates skb->network_header to > just after the end of the GRE header. > > This is 28 bytes after the old skb->network_header > as there is a 20 byte IPv4 header followed by an > 8 byte GRE header. > > 3. Later, dev_gro_receive() calls skb_reset_mac_len(). > This calculates skb->mac_len based on skb->network_header and > skb->mac_header. I.e. 28 bytes. Right. Thanks for tracking this down! > I think this may be possible to address by calling > skb_reset_network_header() instead of skb_pop_mac_header() > in __ipgre_rcv(). We can't do that. The interface type is ARPHRD_IPGRE and not ARPHRD_NONE, so the current behavior makes pretty good sense. See e.g. commit 0e3da5bb8da45. We have two options here: 1. As for metadata tunnels all the info is in metadata_dst and we don't need the IP/GRE header for anything, we can make the ipgre interface ARPHRD_NONE in metadata based mode. 2. We can fix this up in ovs after receiving the packet from ARPHRD_IPGRE interface. I think the first option is the correct one. We already don't assign dev->header_ops in metadata mode. I'll prepare a patch. > Its possible that I've overlooked something but as things stand I think > things look like this: > > * ovs_flow_key_extract() keys off dev->type and skb->protocol. > * ovs_flow_key_extract() calls key_extract() which amongst other things > sets up the skb->mac_header and skb->mac_len of the skb. > * ovs_flow_key_extract() sets skb->protocol to that of the inner packet > in the case of TEB > * Actions update the above mentioned skb fields as appropriate. Okay, that actually eases things somewhat. > So it seems to me that it should be safe to rely on skb->protocol > in the receive path. Or more specifically, in netdev_port_receive(). > > If mac_len is also able to be used then I think fine. But it seems to me > that it needs to be set up by OvS at least for the ARPHRD_NONE case. This > could be done early on, say in netdev_port_receive(). But it seems that > would involve duplicating some of what is already occurring in > key_extract(). I'd actually prefer doing this earlier, netdev_port_receive looks like the right place. Just set mac_len there (or whatever) and let key_extract do the rest of the work, not depending on dev->type in there. My point about recirculation was not actually valid, as I missed you're doing this in ovs_flow_key_extract and not in key_extract. Still, I think the special handling of particular interface types belongs to the tx processing on those interfaces, not to the common code. Thanks! Jiri
Hi Jiri, On Tue, May 10, 2016 at 02:01:06PM +0200, Jiri Benc wrote: > On Mon, 9 May 2016 17:04:22 +0900, Simon Horman wrote: > > It seems to be caused by the following: > > > > 1. __ipgre_rcv() calls skb_pop_mac_header() which > > sets skb->mac_header to the skb->network_header. > > > > 2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls > > skb_reset_network_header(). This updates skb->network_header to > > just after the end of the GRE header. > > > > This is 28 bytes after the old skb->network_header > > as there is a 20 byte IPv4 header followed by an > > 8 byte GRE header. > > > > 3. Later, dev_gro_receive() calls skb_reset_mac_len(). > > This calculates skb->mac_len based on skb->network_header and > > skb->mac_header. I.e. 28 bytes. > > Right. Thanks for tracking this down! > > > I think this may be possible to address by calling > > skb_reset_network_header() instead of skb_pop_mac_header() > > in __ipgre_rcv(). > > We can't do that. The interface type is ARPHRD_IPGRE and not > ARPHRD_NONE, so the current behavior makes pretty good sense. See > e.g. commit 0e3da5bb8da45. > > We have two options here: > > 1. As for metadata tunnels all the info is in metadata_dst and we > don't need the IP/GRE header for anything, we can make the ipgre > interface ARPHRD_NONE in metadata based mode. > > 2. We can fix this up in ovs after receiving the packet from > ARPHRD_IPGRE interface. > > I think the first option is the correct one. We already don't assign > dev->header_ops in metadata mode. I'll prepare a patch. I agree that 1. seems to be the better approach. > > Its possible that I've overlooked something but as things stand I think > > things look like this: > > > > * ovs_flow_key_extract() keys off dev->type and skb->protocol. > > * ovs_flow_key_extract() calls key_extract() which amongst other things > > sets up the skb->mac_header and skb->mac_len of the skb. > > * ovs_flow_key_extract() sets skb->protocol to that of the inner packet > > in the case of TEB > > * Actions update the above mentioned skb fields as appropriate. > > Okay, that actually eases things somewhat. > > > So it seems to me that it should be safe to rely on skb->protocol > > in the receive path. Or more specifically, in netdev_port_receive(). > > > > If mac_len is also able to be used then I think fine. But it seems to me > > that it needs to be set up by OvS at least for the ARPHRD_NONE case. This > > could be done early on, say in netdev_port_receive(). But it seems that > > would involve duplicating some of what is already occurring in > > key_extract(). > > I'd actually prefer doing this earlier, netdev_port_receive looks like > the right place. Just set mac_len there (or whatever) and let > key_extract do the rest of the work, not depending on dev->type in > there. > > My point about recirculation was not actually valid, as I missed you're > doing this in ovs_flow_key_extract and not in key_extract. Still, > I think the special handling of particular interface types belongs to > the tx processing on those interfaces, not to the common code. Sure, if that is your preference I think it should be simple enough to implement. I agree that netdev_port_receive() looks like a good place for this.
On Wed, 11 May 2016 10:50:12 +0900, Simon Horman wrote: > On Tue, May 10, 2016 at 02:01:06PM +0200, Jiri Benc wrote: > > We have two options here: > > > > 1. As for metadata tunnels all the info is in metadata_dst and we > > don't need the IP/GRE header for anything, we can make the ipgre > > interface ARPHRD_NONE in metadata based mode. > > > > 2. We can fix this up in ovs after receiving the packet from > > ARPHRD_IPGRE interface. > > > > I think the first option is the correct one. We already don't assign > > dev->header_ops in metadata mode. I'll prepare a patch. > > I agree that 1. seems to be the better approach. I just sent a patch that fixes this. And we have the same bug in VXLAN-GPE, I sent a patch, too. > Sure, if that is your preference I think it should be simple enough to > implement. I agree that netdev_port_receive() looks like a good place for > this. Great, thanks! Jiri
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index d320c2657627..fc92cf542101 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -701,7 +701,6 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, struct sk_buff *skb, struct sw_flow_key *key) { bool is_layer3 = false; - bool is_teb = false; int err; /* Extract metadata from packet. */ @@ -709,13 +708,9 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, key->tun_proto = ip_tunnel_info_af(tun_info); memcpy(&key->tun_key, &tun_info->key, sizeof(key->tun_key)); - if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) { - if (skb->protocol == htons(ETH_P_TEB)) - is_teb = true; - else - is_layer3 = true; - } - + if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER && + skb->protocol != htons(ETH_P_TEB)) + is_layer3 = true; if (tun_info->options_len) { BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) * @@ -746,10 +741,12 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, if (err < 0) return err; - if (is_teb) - skb->protocol = key->eth.type; - else if (is_layer3) - key->eth.type = skb->protocol; + if (tun_info && OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) { + if (is_layer3) + key->eth.type = skb->protocol; + else + skb->protocol = key->eth.type; + } return err; }