mbox series

[ovs-dev,net-next,0/3] net: handle the exp removal problem with ovs upcall properly

Message ID cover.1689541664.git.lucien.xin@gmail.com
Headers show
Series net: handle the exp removal problem with ovs upcall properly | expand

Message

Xin Long July 16, 2023, 9:09 p.m. UTC
With the OVS upcall, the original ct in the skb will be dropped, and when
the skb comes back from userspace it has to create a new ct again through
nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().

However, the new ct will not be able to have the exp as the original ct
has taken it away from the hash table in nf_ct_find_expectation(). This
will cause some flow never to be matched, like:

  'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'

if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
matched.

OVS conntrack works around this by adding its own exp lookup function to
not remove the exp from the hash table and saving the exp and its master
info to the flow keys instead of create a real ct. But this way doesn't
work for TC act_ct.

The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
allows both OVS conntrack and TC act_ct to have a simple and clear fix
for this problem in the patch 2/3 and 3/3.

Xin Long (3):
  netfilter: allow exp not to be removed in nf_ct_find_expectation
  net: sched: set IPS_CONFIRMED in tmpl status only when commit is set
    in act_ct
  openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set
    in conntrack

 include/net/netfilter/nf_conntrack_expect.h |  2 +-
 net/netfilter/nf_conntrack_core.c           |  2 +-
 net/netfilter/nf_conntrack_expect.c         |  4 +-
 net/netfilter/nft_ct.c                      |  2 +
 net/openvswitch/conntrack.c                 | 78 +++------------------
 net/sched/act_ct.c                          |  3 +-
 6 files changed, 18 insertions(+), 73 deletions(-)

Comments

Jakub Kicinski July 19, 2023, 2:58 a.m. UTC | #1
On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote:
> With the OVS upcall, the original ct in the skb will be dropped, and when
> the skb comes back from userspace it has to create a new ct again through
> nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
> 
> However, the new ct will not be able to have the exp as the original ct
> has taken it away from the hash table in nf_ct_find_expectation(). This
> will cause some flow never to be matched, like:
> 
>   'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
>   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
>   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
> 
> if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
> matched.
> 
> OVS conntrack works around this by adding its own exp lookup function to
> not remove the exp from the hash table and saving the exp and its master
> info to the flow keys instead of create a real ct. But this way doesn't
> work for TC act_ct.
> 
> The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
> the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
> allows both OVS conntrack and TC act_ct to have a simple and clear fix
> for this problem in the patch 2/3 and 3/3.

Florian, Pablo, any opinion on these? 
Would you prefer to take them via netfilter?
Florian Westphal July 19, 2023, 3:01 a.m. UTC | #2
Jakub Kicinski <kuba@kernel.org> wrote:
> On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote:
> > With the OVS upcall, the original ct in the skb will be dropped, and when
> > the skb comes back from userspace it has to create a new ct again through
> > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
> > 
> > However, the new ct will not be able to have the exp as the original ct
> > has taken it away from the hash table in nf_ct_find_expectation(). This
> > will cause some flow never to be matched, like:
> > 
> >   'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
> >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
> >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
> > 
> > if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
> > matched.
> > 
> > OVS conntrack works around this by adding its own exp lookup function to
> > not remove the exp from the hash table and saving the exp and its master
> > info to the flow keys instead of create a real ct. But this way doesn't
> > work for TC act_ct.
> > 
> > The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
> > the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
> > allows both OVS conntrack and TC act_ct to have a simple and clear fix
> > for this problem in the patch 2/3 and 3/3.
> 
> Florian, Pablo, any opinion on these?

Sorry for the silence.  I dislike moving tc/ovs artifacts into
the conntrack core.

But so far I could not come up with a counter-proposal,
and it doesn't look too outrageous.

Let me look at it again later today (its early morning here)
and I will get back to this in my late evening.
Aaron Conole July 19, 2023, 1:31 p.m. UTC | #3
Xin Long <lucien.xin@gmail.com> writes:

> With the OVS upcall, the original ct in the skb will be dropped, and when
> the skb comes back from userspace it has to create a new ct again through
> nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
>
> However, the new ct will not be able to have the exp as the original ct
> has taken it away from the hash table in nf_ct_find_expectation(). This
> will cause some flow never to be matched, like:
>
>   'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
>   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
>   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
>
> if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
> matched.
>
> OVS conntrack works around this by adding its own exp lookup function to
> not remove the exp from the hash table and saving the exp and its master
> info to the flow keys instead of create a real ct. But this way doesn't
> work for TC act_ct.
>
> The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
> the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
> allows both OVS conntrack and TC act_ct to have a simple and clear fix
> for this problem in the patch 2/3 and 3/3.
>
> Xin Long (3):
>   netfilter: allow exp not to be removed in nf_ct_find_expectation
>   net: sched: set IPS_CONFIRMED in tmpl status only when commit is set
>     in act_ct
>   openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set
>     in conntrack
>
>  include/net/netfilter/nf_conntrack_expect.h |  2 +-
>  net/netfilter/nf_conntrack_core.c           |  2 +-
>  net/netfilter/nf_conntrack_expect.c         |  4 +-
>  net/netfilter/nft_ct.c                      |  2 +
>  net/openvswitch/conntrack.c                 | 78 +++------------------
>  net/sched/act_ct.c                          |  3 +-
>  6 files changed, 18 insertions(+), 73 deletions(-)

Hi Xin,

The changes look okay to me, and I don't see anything that immediately
jumps out.  I would like to test it today / tomorrow with the ovs
check-kernel testsuite if you haven't done so already.

-Aaron
Florian Westphal July 19, 2023, 4:12 p.m. UTC | #4
Florian Westphal <fw@strlen.de> wrote:
> Jakub Kicinski <kuba@kernel.org> wrote:
> > On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote:
> > > With the OVS upcall, the original ct in the skb will be dropped, and when
> > > the skb comes back from userspace it has to create a new ct again through
> > > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
> > > 
> > > However, the new ct will not be able to have the exp as the original ct
> > > has taken it away from the hash table in nf_ct_find_expectation(). This
> > > will cause some flow never to be matched, like:
> > > 
> > >   'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
> > >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
> > >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
> > > 
> > > if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
> > > matched.
> > > 
> > > OVS conntrack works around this by adding its own exp lookup function to
> > > not remove the exp from the hash table and saving the exp and its master
> > > info to the flow keys instead of create a real ct. But this way doesn't
> > > work for TC act_ct.
> > > 
> > > The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
> > > the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
> > > allows both OVS conntrack and TC act_ct to have a simple and clear fix
> > > for this problem in the patch 2/3 and 3/3.
> > 
> > Florian, Pablo, any opinion on these?
> 
> Sorry for the silence.  I dislike moving tc/ovs artifacts into
> the conntrack core.

Can't find a better solution, feel free to take this though the net-next tree.

Acked-by: Florian Westphal <fw@strlen.de>
patchwork-bot+netdevbpf@kernel.org July 20, 2023, 8:20 a.m. UTC | #5
Hello:

This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Sun, 16 Jul 2023 17:09:16 -0400 you wrote:
> With the OVS upcall, the original ct in the skb will be dropped, and when
> the skb comes back from userspace it has to create a new ct again through
> nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
> 
> However, the new ct will not be able to have the exp as the original ct
> has taken it away from the hash table in nf_ct_find_expectation(). This
> will cause some flow never to be matched, like:
> 
> [...]

Here is the summary with links:
  - [net-next,1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation
    https://git.kernel.org/netdev/net-next/c/4914109a8e1e
  - [net-next,2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct
    https://git.kernel.org/netdev/net-next/c/76622ced50a1
  - [net-next,3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack
    https://git.kernel.org/netdev/net-next/c/8c8b73320805

You are awesome, thank you!