diff mbox

[v3,net-next,3/3] openvswitch: Fix skb->protocol for vlan frames.

Message ID 1480462253-114713-3-git-send-email-jarno@ovn.org
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jarno Rajahalme Nov. 29, 2016, 11:30 p.m. UTC
Do not always set skb->protocol to be the ethertype of the L3 header.
For a packet with non-accelerated VLAN tags skb->protocol needs to be
the ethertype of the outermost non-accelerated VLAN ethertype.

Any VLAN offloading is undone on the OVS netlink interface, and any
VLAN tags added by userspace are non-accelerated, as are double tagged
VLAN packets.

Fixes: 018c1dda5f ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
---
v3: Set skb->protocol properly also for double tagged packets, as suggested
    by Pravin.  This patch is no longer needed for the MTU test failure, as
    the new patch 2/3 addresses that.

 net/openvswitch/datapath.c |  1 -
 net/openvswitch/flow.c     | 62 +++++++++++++++++++++++-----------------------
 2 files changed, 31 insertions(+), 32 deletions(-)

Comments

Pravin Shelar Nov. 30, 2016, 7:34 a.m. UTC | #1
On Tue, Nov 29, 2016 at 3:30 PM, Jarno Rajahalme <jarno@ovn.org> wrote:
> Do not always set skb->protocol to be the ethertype of the L3 header.
> For a packet with non-accelerated VLAN tags skb->protocol needs to be
> the ethertype of the outermost non-accelerated VLAN ethertype.
>
> Any VLAN offloading is undone on the OVS netlink interface, and any
> VLAN tags added by userspace are non-accelerated, as are double tagged
> VLAN packets.
>
> Fixes: 018c1dda5f ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
> Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>

Looks much better now. Thanks for fixing it. I have one minor comment.

Acked-by: Pravin B Shelar <pshelar@ovn.org>

> ---
> v3: Set skb->protocol properly also for double tagged packets, as suggested
>     by Pravin.  This patch is no longer needed for the MTU test failure, as
>     the new patch 2/3 addresses that.
>
>  net/openvswitch/datapath.c |  1 -
>  net/openvswitch/flow.c     | 62 +++++++++++++++++++++++-----------------------
>  2 files changed, 31 insertions(+), 32 deletions(-)
>
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 08aa926..e2fe2e5 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -354,6 +354,7 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
>                 res = parse_vlan_tag(skb, &key->eth.vlan);
>                 if (res <= 0)
>                         return res;
> +               skb->protocol = key->eth.vlan.tpid;
>         }
>
>         /* Parse inner vlan tag. */
> @@ -361,6 +362,11 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
>         if (res <= 0)
>                 return res;
>
> +       /* If the outer vlan tag was accelerated, skb->protocol should
> +        * refelect the inner vlan type. */
> +       if (!eth_type_vlan(skb->protocol))

Since you would be spinning another version, can you change this
condition to directly check for skb-vlan-tag rather than indirectly
checking for the vlan accelerated case?
Jiri Benc Nov. 30, 2016, 2:30 p.m. UTC | #2
On Tue, 29 Nov 2016 15:30:53 -0800, Jarno Rajahalme wrote:
> Do not always set skb->protocol to be the ethertype of the L3 header.
> For a packet with non-accelerated VLAN tags skb->protocol needs to be
> the ethertype of the outermost non-accelerated VLAN ethertype.

Well, the current handling of skb->protocol matches what used to be the
handling of the kernel net stack before Jiri Pirko cleaned up the vlan
code.

I'm not opposed to changing this but I'm afraid it needs much deeper
review. Because with this in place, no core kernel functions that
depend on skb->protocol may be called from within openvswitch.

> @@ -361,6 +362,11 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
>  	if (res <= 0)
>  		return res;
>  
> +	/* If the outer vlan tag was accelerated, skb->protocol should
> +	 * refelect the inner vlan type. */
> +	if (!eth_type_vlan(skb->protocol))
> +		skb->protocol = key->eth.cvlan.tpid;

This should not depend on the current value in skb->protocol which
could be arbitrary at this point (from the point of view of how this
patch understands the skb->protocol values). It's easy to fix, though -
just add a local bool variable tracking whether the skb->protocol has
been set.

> @@ -531,15 +544,16 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>  		if (unlikely(parse_vlan(skb, key)))
>  			return -ENOMEM;
>  
> -		skb->protocol = parse_ethertype(skb);
> -		if (unlikely(skb->protocol == htons(0)))
> +		key->eth.type = parse_ethertype(skb);
> +		if (unlikely(key->eth.type == htons(0)))
>  			return -ENOMEM;
>  
>  		skb_reset_network_header(skb);
>  		__skb_push(skb, skb->data - skb_mac_header(skb));
>  	}
>  	skb_reset_mac_len(skb);
> -	key->eth.type = skb->protocol;
> +	if (!eth_type_vlan(skb->protocol))
> +		skb->protocol = key->eth.type;

This leaves key->eth.type undefined for key->mac_proto ==
MAC_PROTO_NONE.

Plus the same problem as above with unknown value of skb->protocol. But
this is more complicated here, as skb->protocol may be either
uninitialized at this point or already initialized by parse_vlan.

>  
>  	/* Network layer. */
>  	if (key->eth.type == htons(ETH_P_IP)) {
> @@ -800,29 +814,15 @@ int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
>  	if (err)
>  		return err;
>  
> -	if (ovs_key_mac_proto(key) == MAC_PROTO_NONE) {
> -		/* key_extract assumes that skb->protocol is set-up for
> -		 * layer 3 packets which is the case for other callers,
> -		 * in particular packets recieved from the network stack.
> -		 * Here the correct value can be set from the metadata
> -		 * extracted above.
> -		 */
> -		skb->protocol = key->eth.type;
> -	} else {
> -		struct ethhdr *eth;
> -
> -		skb_reset_mac_header(skb);
> -		eth = eth_hdr(skb);
> -
> -		/* Normally, setting the skb 'protocol' field would be
> -		 * handled by a call to eth_type_trans(), but it assumes
> -		 * there's a sending device, which we may not have.
> -		 */
> -		if (eth_proto_is_802_3(eth->h_proto))
> -			skb->protocol = eth->h_proto;
> -		else
> -			skb->protocol = htons(ETH_P_802_2);
> -	}
> +	/* key_extract assumes that skb->protocol is set-up for
> +	 * layer 3 packets which is the case for other callers,
> +	 * in particular packets recieved from the network stack.
> +	 * Here the correct value can be set from the metadata
> +	 * extracted above.  For layer 2 packets we initialize
> +         * skb->protocol to zero and set it in key_extract() while
> +         * parsing the L2 headers.
> +	 */
> +	skb->protocol = key->eth.type;
>  
>  	return key_extract(skb, key);
>  }

Interesting. This hunk looks safe even without the rest of the patch.
You should fix the comment indentation, though.

 Jiri
Pravin Shelar Dec. 1, 2016, 8:31 p.m. UTC | #3
On Wed, Nov 30, 2016 at 6:30 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Tue, 29 Nov 2016 15:30:53 -0800, Jarno Rajahalme wrote:
>> Do not always set skb->protocol to be the ethertype of the L3 header.
>> For a packet with non-accelerated VLAN tags skb->protocol needs to be
>> the ethertype of the outermost non-accelerated VLAN ethertype.
>
> Well, the current handling of skb->protocol matches what used to be the
> handling of the kernel net stack before Jiri Pirko cleaned up the vlan
> code.
>
> I'm not opposed to changing this but I'm afraid it needs much deeper
> review. Because with this in place, no core kernel functions that
> depend on skb->protocol may be called from within openvswitch.
>
Can you give specific example where it does not work?

>> @@ -361,6 +362,11 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
>>       if (res <= 0)
>>               return res;
>>
>> +     /* If the outer vlan tag was accelerated, skb->protocol should
>> +      * refelect the inner vlan type. */
>> +     if (!eth_type_vlan(skb->protocol))
>> +             skb->protocol = key->eth.cvlan.tpid;
>
> This should not depend on the current value in skb->protocol which
> could be arbitrary at this point (from the point of view of how this
> patch understands the skb->protocol values). It's easy to fix, though -
> just add a local bool variable tracking whether the skb->protocol has
> been set.
>
skb-protocol value is set by the caller, so it should not be
arbitrary. is it missing in any case?
Jiri Benc Dec. 2, 2016, 9:42 a.m. UTC | #4
On Thu, 1 Dec 2016 12:31:09 -0800, Pravin Shelar wrote:
> On Wed, Nov 30, 2016 at 6:30 AM, Jiri Benc <jbenc@redhat.com> wrote:
> > I'm not opposed to changing this but I'm afraid it needs much deeper
> > review. Because with this in place, no core kernel functions that
> > depend on skb->protocol may be called from within openvswitch.
> >
> Can you give specific example where it does not work?

I can't, I haven't reviewed the usage. I'm just saying that the stack
does not expect skb->protocol being ETH_P_8021Q for e.g. IPv4 packets.
It may not be relevant for the calls used by openvswitch but we should
be sure about that. Especially defragmentation and conntrack is worth
looking at.

Again, I'm not saying this is wrong nor that there is an actual
problem. I'm just pointing out that openvswitch has different
expectations about skb wrt. vlans than the rest of the kernel and we
should be reasonably sure the behavior is correct when passing between
the two.

> skb-protocol value is set by the caller, so it should not be
> arbitrary. is it missing in any case?

It's not set exactly by the caller, because that's what this patch is
removing. It is set by whoever handed over the packet to openvswitch.
The point is we don't know *what* it is set to. It may as well be
ETH_P_8021Q, breaking the conditions here. It should not happen in
practice but still, it seems weird to depend on the fact that the
packet coming to ovs has never skb->protocol equal to ETH_P_8021Q nor
ETH_P_8021AD.

 Jiri
Jiri Benc Dec. 2, 2016, 9:49 a.m. UTC | #5
On Fri, 2 Dec 2016 10:42:02 +0100, Jiri Benc wrote:
> On Thu, 1 Dec 2016 12:31:09 -0800, Pravin Shelar wrote:
> It's not set exactly by the caller, because that's what this patch is
> removing. It is set by whoever handed over the packet to openvswitch.
> The point is we don't know *what* it is set to. It may as well be
> ETH_P_8021Q, breaking the conditions here. It should not happen in
> practice but still, it seems weird to depend on the fact that the
> packet coming to ovs has never skb->protocol equal to ETH_P_8021Q nor
> ETH_P_8021AD.

I'm wondering whether we should not revive the patchset that makes the
first vlan tag always accelerated. It makes handling of various packet
formats and the checks for forwardability so much simpler...

 Jiri
Pravin Shelar Dec. 5, 2016, 12:58 a.m. UTC | #6
On Fri, Dec 2, 2016 at 1:42 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Thu, 1 Dec 2016 12:31:09 -0800, Pravin Shelar wrote:
>> On Wed, Nov 30, 2016 at 6:30 AM, Jiri Benc <jbenc@redhat.com> wrote:
>> > I'm not opposed to changing this but I'm afraid it needs much deeper
>> > review. Because with this in place, no core kernel functions that
>> > depend on skb->protocol may be called from within openvswitch.
>> >
>> Can you give specific example where it does not work?
>
> I can't, I haven't reviewed the usage. I'm just saying that the stack
> does not expect skb->protocol being ETH_P_8021Q for e.g. IPv4 packets.
> It may not be relevant for the calls used by openvswitch but we should
> be sure about that. Especially defragmentation and conntrack is worth
> looking at.
>
> Again, I'm not saying this is wrong nor that there is an actual
> problem. I'm just pointing out that openvswitch has different
> expectations about skb wrt. vlans than the rest of the kernel and we
> should be reasonably sure the behavior is correct when passing between
> the two.
>
I agree that conntrack does not expect skb-protocol to be vlan
protocol. We could accelerate vlan if there is vlan header in packet
itself. That would make the packet consistent across upcalls.

>> skb-protocol value is set by the caller, so it should not be
>> arbitrary. is it missing in any case?
>
> It's not set exactly by the caller, because that's what this patch is
> removing. It is set by whoever handed over the packet to openvswitch.
> The point is we don't know *what* it is set to. It may as well be
> ETH_P_8021Q, breaking the conditions here. It should not happen in
> practice but still, it seems weird to depend on the fact that the
> packet coming to ovs has never skb->protocol equal to ETH_P_8021Q nor
> ETH_P_8021AD.
>

We are kind of dependent on this atleast for L3 packets injected back
by vswitchd. For rest of entry points I think we have to trust the
networking stack would set skb-protocol to correct value. If that is
not true in some case, it is bug and we will need to fix it.
diff mbox

Patch

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 2d4c4d3..9c62b63 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -606,7 +606,6 @@  static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	rcu_assign_pointer(flow->sf_acts, acts);
 	packet->priority = flow->key.phy.priority;
 	packet->mark = flow->key.phy.skb_mark;
-	packet->protocol = flow->key.eth.type;
 
 	rcu_read_lock();
 	dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 08aa926..e2fe2e5 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -354,6 +354,7 @@  static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 		res = parse_vlan_tag(skb, &key->eth.vlan);
 		if (res <= 0)
 			return res;
+		skb->protocol = key->eth.vlan.tpid;
 	}
 
 	/* Parse inner vlan tag. */
@@ -361,6 +362,11 @@  static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 	if (res <= 0)
 		return res;
 
+	/* If the outer vlan tag was accelerated, skb->protocol should
+	 * refelect the inner vlan type. */
+	if (!eth_type_vlan(skb->protocol))
+		skb->protocol = key->eth.cvlan.tpid;
+
 	return 0;
 }
 
@@ -477,12 +483,18 @@  static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
 }
 
 /**
- * key_extract - extracts a flow key from an Ethernet frame.
+ * key_extract - extracts a flow key from a packet with or without an
+ * Ethernet header.
  * @skb: sk_buff that contains the frame, with skb->data pointing to the
- * Ethernet header
+ * beginning of the packet.
  * @key: output flow key
  *
- * The caller must ensure that skb->len >= ETH_HLEN.
+ * 'key->mac_proto' must be initialized to indicate the frame type.  For an L3
+ * frame 'key->mac_proto' must equal 'MAC_PROTO_NONE', and the caller must
+ * ensure that 'skb->protocol' is set to the ethertype of the L3 header.
+ * Otherwise the presence of an Ethernet header is assumed and the caller must
+ * ensure that skb->len >= ETH_HLEN and that 'skb->protocol' is initialized to
+ * zero.
  *
  * Returns 0 if successful, otherwise a negative errno value.
  *
@@ -498,8 +510,9 @@  static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
  *      of a correct length, otherwise the same as skb->network_header.
  *      For other key->eth.type values it is left untouched.
  *
- *    - skb->protocol: the type of the data starting at skb->network_header.
- *      Equals to key->eth.type.
+ *    - skb->protocol: For non-accelerated VLAN, one of the VLAN ether types,
+ *      otherwise the same as key->eth.type, the ether type of the payload
+ *      starting at skb->network_header.
  */
 static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 {
@@ -531,15 +544,16 @@  static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		if (unlikely(parse_vlan(skb, key)))
 			return -ENOMEM;
 
-		skb->protocol = parse_ethertype(skb);
-		if (unlikely(skb->protocol == htons(0)))
+		key->eth.type = parse_ethertype(skb);
+		if (unlikely(key->eth.type == htons(0)))
 			return -ENOMEM;
 
 		skb_reset_network_header(skb);
 		__skb_push(skb, skb->data - skb_mac_header(skb));
 	}
 	skb_reset_mac_len(skb);
-	key->eth.type = skb->protocol;
+	if (!eth_type_vlan(skb->protocol))
+		skb->protocol = key->eth.type;
 
 	/* Network layer. */
 	if (key->eth.type == htons(ETH_P_IP)) {
@@ -800,29 +814,15 @@  int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 	if (err)
 		return err;
 
-	if (ovs_key_mac_proto(key) == MAC_PROTO_NONE) {
-		/* key_extract assumes that skb->protocol is set-up for
-		 * layer 3 packets which is the case for other callers,
-		 * in particular packets recieved from the network stack.
-		 * Here the correct value can be set from the metadata
-		 * extracted above.
-		 */
-		skb->protocol = key->eth.type;
-	} else {
-		struct ethhdr *eth;
-
-		skb_reset_mac_header(skb);
-		eth = eth_hdr(skb);
-
-		/* Normally, setting the skb 'protocol' field would be
-		 * handled by a call to eth_type_trans(), but it assumes
-		 * there's a sending device, which we may not have.
-		 */
-		if (eth_proto_is_802_3(eth->h_proto))
-			skb->protocol = eth->h_proto;
-		else
-			skb->protocol = htons(ETH_P_802_2);
-	}
+	/* key_extract assumes that skb->protocol is set-up for
+	 * layer 3 packets which is the case for other callers,
+	 * in particular packets recieved from the network stack.
+	 * Here the correct value can be set from the metadata
+	 * extracted above.  For layer 2 packets we initialize
+         * skb->protocol to zero and set it in key_extract() while
+         * parsing the L2 headers.
+	 */
+	skb->protocol = key->eth.type;
 
 	return key_extract(skb, key);
 }