diff mbox

[net-next,1/7] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

Message ID 1486084206-109903-1-git-send-email-jarno@ovn.org
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jarno Rajahalme Feb. 3, 2017, 1:10 a.m. UTC
When looking for an existing conntrack entry, the packet 5-tuple
must be inverted if NAT has already been applied, as the current
packet headers do not match any conntrack tuple.  For
example, if a packet from private address X to a public address B is
source-NATted to A, the conntrack entry will have the following tuples
(ignoring the protocol and port numbers) after the conntrack entry is
committed:

Original direction tuple: (X,B)
Reply direction tuple: (B,A)

Now, if a reply packet is already transformed back to the private
address space (e.g., with a CT(nat) action), the tuple corresponding
to the current packet headers is:

Current packet tuple: (B,X)

This does not match either of the conntrack tuples above.  Normally
this does not matter, as the conntrack lookup was already done using
the tuple (B,A), but if the current packet does not match any flow in
the OVS datapath, the packet is sent to userspace via an upcall,
during which the packet's skb is freed, and the conntrack entry
pointer in the skb is lost.  When the packet is reintroduced to the
datapath, any further conntrack action will need to perform a new
conntrack lookup to find the entry again.  Prior to this patch this
second lookup failed for NATted packets.  The datapath flow setup
corresponding to the upcall can succeed, however, allowing all further
packets in the reply direction to re-use the conntrack entry pointer
in the skb, so typically the lookup failure only causes a packet drop.

The solution is to invert the tuple derived from the current packet
headers in case the conntrack state stored in the packet metadata
indicates that the packet has been transformed by NAT:

Inverted tuple: (X,B)

With this the conntrack entry can be found, matching the original
direction tuple.

This same logic also works for the original direction packets:

Current packet tuple (after NAT): (A,B)
Inverted tuple: (B,A)

While the current packet tuple (A,B) does not match either of the
conntrack tuples, the inverted one (B,A) does match the reply
direction tuple.

Since the inverted tuple matches the reverse direction tuple the
direction of the packet must be reversed as well.

Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
---
 net/openvswitch/conntrack.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

Comments

David Miller Feb. 5, 2017, 10:28 p.m. UTC | #1
From: Jarno Rajahalme <jarno@ovn.org>
Date: Thu,  2 Feb 2017 17:10:00 -0800

> This does not match either of the conntrack tuples above.  Normally
> this does not matter, as the conntrack lookup was already done using
> the tuple (B,A), but if the current packet does not match any flow in
> the OVS datapath, the packet is sent to userspace via an upcall,
> during which the packet's skb is freed, and the conntrack entry
> pointer in the skb is lost.

This is the real bug.

If the metadata for a packet is important, as it obviously is here,
then upcalls should preserve that metadata across the upcall.  This
is exactly how NF_QUEUE handles this problem and so should OVS.
Pravin Shelar Feb. 6, 2017, 5:06 p.m. UTC | #2
On Sun, Feb 5, 2017 at 2:28 PM, David Miller <davem@davemloft.net> wrote:
> From: Jarno Rajahalme <jarno@ovn.org>
> Date: Thu,  2 Feb 2017 17:10:00 -0800
>
>> This does not match either of the conntrack tuples above.  Normally
>> this does not matter, as the conntrack lookup was already done using
>> the tuple (B,A), but if the current packet does not match any flow in
>> the OVS datapath, the packet is sent to userspace via an upcall,
>> during which the packet's skb is freed, and the conntrack entry
>> pointer in the skb is lost.
>
> This is the real bug.
>
> If the metadata for a packet is important, as it obviously is here,
> then upcalls should preserve that metadata across the upcall.  This
> is exactly how NF_QUEUE handles this problem and so should OVS.

OVS kernel datapath serializes skb context and sends it along with skb
during upcall via genetlink parameters. userspace echoes same
information back to kernel modules. This way OVS does maintains
metadata across upcall.
Pravin Shelar Feb. 6, 2017, 5:07 p.m. UTC | #3
On Thu, Feb 2, 2017 at 5:10 PM, Jarno Rajahalme <jarno@ovn.org> wrote:
> When looking for an existing conntrack entry, the packet 5-tuple
> must be inverted if NAT has already been applied, as the current
> packet headers do not match any conntrack tuple.  For
> example, if a packet from private address X to a public address B is
> source-NATted to A, the conntrack entry will have the following tuples
> (ignoring the protocol and port numbers) after the conntrack entry is
> committed:
>
> Original direction tuple: (X,B)
> Reply direction tuple: (B,A)
>
> Now, if a reply packet is already transformed back to the private
> address space (e.g., with a CT(nat) action), the tuple corresponding
> to the current packet headers is:
>
> Current packet tuple: (B,X)
>
> This does not match either of the conntrack tuples above.  Normally
> this does not matter, as the conntrack lookup was already done using
> the tuple (B,A), but if the current packet does not match any flow in
> the OVS datapath, the packet is sent to userspace via an upcall,
> during which the packet's skb is freed, and the conntrack entry
> pointer in the skb is lost.  When the packet is reintroduced to the
> datapath, any further conntrack action will need to perform a new
> conntrack lookup to find the entry again.  Prior to this patch this
> second lookup failed for NATted packets.  The datapath flow setup
> corresponding to the upcall can succeed, however, allowing all further
> packets in the reply direction to re-use the conntrack entry pointer
> in the skb, so typically the lookup failure only causes a packet drop.
>
> The solution is to invert the tuple derived from the current packet
> headers in case the conntrack state stored in the packet metadata
> indicates that the packet has been transformed by NAT:
>
> Inverted tuple: (X,B)
>
> With this the conntrack entry can be found, matching the original
> direction tuple.
>
> This same logic also works for the original direction packets:
>
> Current packet tuple (after NAT): (A,B)
> Inverted tuple: (B,A)
>
> While the current packet tuple (A,B) does not match either of the
> conntrack tuples, the inverted one (B,A) does match the reply
> direction tuple.
>
> Since the inverted tuple matches the reverse direction tuple the
> direction of the packet must be reversed as well.
>
> Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>

I could not apply this patch series to net-next branch. But it does
applies to net, which branch are you targeting it for?
David Miller Feb. 6, 2017, 5:15 p.m. UTC | #4
From: Pravin Shelar <pshelar@ovn.org>
Date: Mon, 6 Feb 2017 09:06:29 -0800

> On Sun, Feb 5, 2017 at 2:28 PM, David Miller <davem@davemloft.net> wrote:
>> From: Jarno Rajahalme <jarno@ovn.org>
>> Date: Thu,  2 Feb 2017 17:10:00 -0800
>>
>>> This does not match either of the conntrack tuples above.  Normally
>>> this does not matter, as the conntrack lookup was already done using
>>> the tuple (B,A), but if the current packet does not match any flow in
>>> the OVS datapath, the packet is sent to userspace via an upcall,
>>> during which the packet's skb is freed, and the conntrack entry
>>> pointer in the skb is lost.
>>
>> This is the real bug.
>>
>> If the metadata for a packet is important, as it obviously is here,
>> then upcalls should preserve that metadata across the upcall.  This
>> is exactly how NF_QUEUE handles this problem and so should OVS.
> 
> OVS kernel datapath serializes skb context and sends it along with skb
> during upcall via genetlink parameters. userspace echoes same
> information back to kernel modules. This way OVS does maintains
> metadata across upcall.

Then the conntrack state looked up before the NAT operation done in
the upcall should be maintained and therefore this bug should not
exist.
Joe Stringer Feb. 7, 2017, 12:45 a.m. UTC | #5
On 5 February 2017 at 14:28, David Miller <davem@davemloft.net> wrote:
> From: Jarno Rajahalme <jarno@ovn.org>
> Date: Thu,  2 Feb 2017 17:10:00 -0800
>
>> This does not match either of the conntrack tuples above.  Normally
>> this does not matter, as the conntrack lookup was already done using
>> the tuple (B,A), but if the current packet does not match any flow in
>> the OVS datapath, the packet is sent to userspace via an upcall,
>> during which the packet's skb is freed, and the conntrack entry
>> pointer in the skb is lost.
>
> This is the real bug.
>
> If the metadata for a packet is important, as it obviously is here,
> then upcalls should preserve that metadata across the upcall.  This
> is exactly how NF_QUEUE handles this problem and so should OVS.

Looks like the patch #5 provides this preservation across upcall, so
this patch can be converted to use the key->ct.orig_* from that patch
instead of doing the invert.
Pravin Shelar Feb. 7, 2017, 5:14 p.m. UTC | #6
On Mon, Feb 6, 2017 at 9:15 AM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@ovn.org>
> Date: Mon, 6 Feb 2017 09:06:29 -0800
>
>> On Sun, Feb 5, 2017 at 2:28 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Jarno Rajahalme <jarno@ovn.org>
>>> Date: Thu,  2 Feb 2017 17:10:00 -0800
>>>
>>>> This does not match either of the conntrack tuples above.  Normally
>>>> this does not matter, as the conntrack lookup was already done using
>>>> the tuple (B,A), but if the current packet does not match any flow in
>>>> the OVS datapath, the packet is sent to userspace via an upcall,
>>>> during which the packet's skb is freed, and the conntrack entry
>>>> pointer in the skb is lost.
>>>
>>> This is the real bug.
>>>
>>> If the metadata for a packet is important, as it obviously is here,
>>> then upcalls should preserve that metadata across the upcall.  This
>>> is exactly how NF_QUEUE handles this problem and so should OVS.
>>
>> OVS kernel datapath serializes skb context and sends it along with skb
>> during upcall via genetlink parameters. userspace echoes same
>> information back to kernel modules. This way OVS does maintains
>> metadata across upcall.
>
> Then the conntrack state looked up before the NAT operation done in
> the upcall should be maintained and therefore this bug should not
> exist.

I think it fails because after upcall OVS is performing lookup for
nonexistent conntrack entry. This is due to the fact that the packet
is already Nat-ed. So one could argue that there is already enough
context available in OVS to lookup original conntract entry after the
upcall as shown in this patch.
But I am also fine with using original context to lookup the conntract
entry as Joe has suggested.
Jarno Rajahalme Feb. 7, 2017, 9:29 p.m. UTC | #7
> On Feb 7, 2017, at 9:14 AM, Pravin Shelar <pshelar@ovn.org> wrote:
> 
> On Mon, Feb 6, 2017 at 9:15 AM, David Miller <davem@davemloft.net> wrote:
>> From: Pravin Shelar <pshelar@ovn.org>
>> Date: Mon, 6 Feb 2017 09:06:29 -0800
>> 
>>> On Sun, Feb 5, 2017 at 2:28 PM, David Miller <davem@davemloft.net> wrote:
>>>> From: Jarno Rajahalme <jarno@ovn.org>
>>>> Date: Thu,  2 Feb 2017 17:10:00 -0800
>>>> 
>>>>> This does not match either of the conntrack tuples above.  Normally
>>>>> this does not matter, as the conntrack lookup was already done using
>>>>> the tuple (B,A), but if the current packet does not match any flow in
>>>>> the OVS datapath, the packet is sent to userspace via an upcall,
>>>>> during which the packet's skb is freed, and the conntrack entry
>>>>> pointer in the skb is lost.
>>>> 
>>>> This is the real bug.
>>>> 
>>>> If the metadata for a packet is important, as it obviously is here,
>>>> then upcalls should preserve that metadata across the upcall.  This
>>>> is exactly how NF_QUEUE handles this problem and so should OVS.
>>> 
>>> OVS kernel datapath serializes skb context and sends it along with skb
>>> during upcall via genetlink parameters. userspace echoes same
>>> information back to kernel modules. This way OVS does maintains
>>> metadata across upcall.
>> 
>> Then the conntrack state looked up before the NAT operation done in
>> the upcall should be maintained and therefore this bug should not
>> exist.
> 

We already maintain enough metadata across the userspace upcall, but so far we have failed to use it correctly in the case of NATted packets. The NAT flags are there, based on which we (now) know that we have to do inverted lookup to locate the conntrack entry. Conntrack NAT by design only stores tuples for the incoming packets, and there are various instances of tuple inversions in netfilter code for this reason (see, for example, net/netfilter/nf_nat_core.c:nf_nat_used_tuple()). I’ll make the commit message clearer about this for the v2 of the series.

> I think it fails because after upcall OVS is performing lookup for
> nonexistent conntrack entry. This is due to the fact that the packet
> is already Nat-ed. So one could argue that there is already enough
> context available in OVS to lookup original conntract entry after the
> upcall as shown in this patch.
> But I am also fine with using original context to lookup the conntract
> entry as Joe has suggested.

For expected (i.e., related) connection, using the master’s original direction 5-tuple added to the flow meta-data in patch 5 would find the master connection, not the related connection, so it would not work.

  Jarno
Jarno Rajahalme Feb. 8, 2017, 5:30 a.m. UTC | #8
> On Feb 6, 2017, at 9:07 AM, Pravin Shelar <pshelar@ovn.org> wrote:
> 
> On Thu, Feb 2, 2017 at 5:10 PM, Jarno Rajahalme <jarno@ovn.org> wrote:
>> When looking for an existing conntrack entry, the packet 5-tuple
>> must be inverted if NAT has already been applied, as the current
>> packet headers do not match any conntrack tuple.  For
>> example, if a packet from private address X to a public address B is
>> source-NATted to A, the conntrack entry will have the following tuples
>> (ignoring the protocol and port numbers) after the conntrack entry is
>> committed:
>> 
>> Original direction tuple: (X,B)
>> Reply direction tuple: (B,A)
>> 
>> Now, if a reply packet is already transformed back to the private
>> address space (e.g., with a CT(nat) action), the tuple corresponding
>> to the current packet headers is:
>> 
>> Current packet tuple: (B,X)
>> 
>> This does not match either of the conntrack tuples above.  Normally
>> this does not matter, as the conntrack lookup was already done using
>> the tuple (B,A), but if the current packet does not match any flow in
>> the OVS datapath, the packet is sent to userspace via an upcall,
>> during which the packet's skb is freed, and the conntrack entry
>> pointer in the skb is lost.  When the packet is reintroduced to the
>> datapath, any further conntrack action will need to perform a new
>> conntrack lookup to find the entry again.  Prior to this patch this
>> second lookup failed for NATted packets.  The datapath flow setup
>> corresponding to the upcall can succeed, however, allowing all further
>> packets in the reply direction to re-use the conntrack entry pointer
>> in the skb, so typically the lookup failure only causes a packet drop.
>> 
>> The solution is to invert the tuple derived from the current packet
>> headers in case the conntrack state stored in the packet metadata
>> indicates that the packet has been transformed by NAT:
>> 
>> Inverted tuple: (X,B)
>> 
>> With this the conntrack entry can be found, matching the original
>> direction tuple.
>> 
>> This same logic also works for the original direction packets:
>> 
>> Current packet tuple (after NAT): (A,B)
>> Inverted tuple: (B,A)
>> 
>> While the current packet tuple (A,B) does not match either of the
>> conntrack tuples, the inverted one (B,A) does match the reply
>> direction tuple.
>> 
>> Since the inverted tuple matches the reverse direction tuple the
>> direction of the packet must be reversed as well.
>> 
>> Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
>> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
> 
> I could not apply this patch series to net-next branch. But it does
> applies to net, which branch are you targeting it for?

The patches were against net-next, but there likely was a merge from netfilter around the time of me sending the email out causing the difficulty. Will address all comments, rebase and post a v2 later today.

 Jarno
diff mbox

Patch

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 54253ea..b91baa2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -430,7 +430,7 @@  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-		     u8 l3num, struct sk_buff *skb)
+		     u8 l3num, struct sk_buff *skb, bool natted)
 {
 	struct nf_conntrack_l3proto *l3proto;
 	struct nf_conntrack_l4proto *l4proto;
@@ -453,6 +453,17 @@  ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
 		return NULL;
 	}
 
+	/* Must invert the tuple if skb has been transformed by NAT. */
+	if (natted) {
+		struct nf_conntrack_tuple inverse;
+
+		if (!nf_ct_invert_tuple(&inverse, &tuple, l3proto, l4proto)) {
+			pr_debug("ovs_ct_find_existing: Inversion failed!\n");
+			return NULL;
+		}
+		tuple = inverse;
+	}
+
 	/* look for tuple match */
 	h = nf_conntrack_find_get(net, zone, &tuple);
 	if (!h)
@@ -460,6 +471,13 @@  ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
 
 	ct = nf_ct_tuplehash_to_ctrack(h);
 
+	/* Inverted packet tuple matches the reverse direction conntrack tuple,
+	 * select the other tuplehash to get the right 'ctinfo' bits for this
+	 * packet.
+	 */
+	if (natted)
+		h = &ct->tuplehash[!h->tuple.dst.dir];
+
 	skb->nfct = &ct->ct_general;
 	skb->nfctinfo = ovs_ct_get_info(h);
 	return ct;
@@ -483,7 +501,9 @@  static bool skb_nfct_cached(struct net *net,
 	if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
 	    !(key->ct.state & OVS_CS_F_INVALID) &&
 	    key->ct.zone == info->zone.id)
-		ct = ovs_ct_find_existing(net, &info->zone, info->family, skb);
+		ct = ovs_ct_find_existing(net, &info->zone, info->family, skb,
+					  !!(key->ct.state
+					     & OVS_CS_F_NAT_MASK));
 	if (!ct)
 		return false;
 	if (!net_eq(net, read_pnet(&ct->ct_net)))