diff mbox series

[ovs-dev,ovn] Fix conntrack entry leaks because of TCP RST packets not sent to conntrack.

Message ID 20200424075507.1811244-1-numans@ovn.org
State Superseded
Headers show
Series [ovs-dev,ovn] Fix conntrack entry leaks because of TCP RST packets not sent to conntrack. | expand

Commit Message

Numan Siddique April 24, 2020, 7:55 a.m. UTC
From: Numan Siddique <numans@ovn.org>

The commit [1] - 28097d5adb95("Fix tcp_reset action handling") fixed an issue
with tcp_reset OVN action. In order to fix that issue, this commit added
logical flows to skip all the TCP RST packets from conntrack.
Ideally it should have skipped only the TCP RST packets generated by
ovn-controller from conntrack. Since all the TCP RST packets are
skipped from conntrack, the connections in conntrack remain in
ESTABLISHED state even if the client/server sends TCP RST to close the
connection.  And these entries live for a long time and this is
causing performance issues as reported in the BZ.

This patch reverts the logical flows added in [1] and modifies the inner
actions of tcp_reset in the ingress logical switch pipeline
from - "tcp_reset { outport <-> inport; output; }"
to "tcp_reset { output <-> inport; next(pipeline=egress,table=5); }".
This causes the packet to resubmit to the egress table ls_out_qos_mark
skipping the egress ACL stage. Prior to this packet, next action was
not allowing a resubmit from ingress to egress pipeline. This patch
relaxes this limitation.

For the tcp_reset action in the egress logical switch pipeline, this patch
modifies the inner action
from - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=0); }"
to - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=19); }".
This causes the packet to enter the ingress table ls_in_l2_lkup.

We don't see similar conntrack leaks with UDP. Although there is an issue
with the acl reject action for UDP packets. When ovn-controller generates icmp
destination unreachable packet, it doesn't get delivered. And the IP checksum is
incorrect in this packet. A follow up patch will fix these issues.

[1] - 28097d5adb95("Fix tcp_reset action handling")

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1819785
Co-Authored-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
---
 lib/actions.c           |   6 +-
 northd/ovn-northd.8.xml |   8 ++
 northd/ovn-northd.c     |  14 ++--
 ovn-sb.xml              |  10 ++-
 tests/automake.mk       |   3 +-
 tests/ovn.at            |   6 +-
 tests/system-ovn.at     | 170 +++++++++++++++++++++++++++++++++++-----
 tests/test-tcp-rst.py   |  37 +++++++++
 8 files changed, 216 insertions(+), 38 deletions(-)
 create mode 100644 tests/test-tcp-rst.py

Comments

Dumitru Ceara April 27, 2020, 10:23 a.m. UTC | #1
On 4/24/20 9:55 AM, numans@ovn.org wrote:
> From: Numan Siddique <numans@ovn.org>
> 
> The commit [1] - 28097d5adb95("Fix tcp_reset action handling") fixed an issue
> with tcp_reset OVN action. In order to fix that issue, this commit added
> logical flows to skip all the TCP RST packets from conntrack.
> Ideally it should have skipped only the TCP RST packets generated by
> ovn-controller from conntrack. Since all the TCP RST packets are
> skipped from conntrack, the connections in conntrack remain in
> ESTABLISHED state even if the client/server sends TCP RST to close the
> connection.  And these entries live for a long time and this is
> causing performance issues as reported in the BZ.
> 
> This patch reverts the logical flows added in [1] and modifies the inner
> actions of tcp_reset in the ingress logical switch pipeline
> from - "tcp_reset { outport <-> inport; output; }"
> to "tcp_reset { output <-> inport; next(pipeline=egress,table=5); }".
> This causes the packet to resubmit to the egress table ls_out_qos_mark
> skipping the egress ACL stage. Prior to this packet, next action was
> not allowing a resubmit from ingress to egress pipeline. This patch
> relaxes this limitation.
> 
> For the tcp_reset action in the egress logical switch pipeline, this patch
> modifies the inner action
> from - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=0); }"
> to - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=19); }".
> This causes the packet to enter the ingress table ls_in_l2_lkup.
> 
> We don't see similar conntrack leaks with UDP. Although there is an issue
> with the acl reject action for UDP packets. When ovn-controller generates icmp
> destination unreachable packet, it doesn't get delivered. And the IP checksum is
> incorrect in this packet. A follow up patch will fix these issues.
> 
> [1] - 28097d5adb95("Fix tcp_reset action handling")
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1819785
> Co-Authored-by: Tim Rozet <trozet@redhat.com>
> Signed-off-by: Tim Rozet <trozet@redhat.com>
> Signed-off-by: Numan Siddique <numans@ovn.org>

Hi Numan,

Looks good to me. Only a question inline regarding options for avoiding
hardcoding table IDs. Except for that:

Acked-by: Dumitru Ceara <dceara@redhat.com>

Thanks,
Dumitru

> ---
>  lib/actions.c           |   6 +-
>  northd/ovn-northd.8.xml |   8 ++
>  northd/ovn-northd.c     |  14 ++--
>  ovn-sb.xml              |  10 ++-
>  tests/automake.mk       |   3 +-
>  tests/ovn.at            |   6 +-
>  tests/system-ovn.at     | 170 +++++++++++++++++++++++++++++++++++-----
>  tests/test-tcp-rst.py   |  37 +++++++++
>  8 files changed, 216 insertions(+), 38 deletions(-)
>  create mode 100644 tests/test-tcp-rst.py
> 
> diff --git a/lib/actions.c b/lib/actions.c
> index c19813de0..91619c889 100644
> --- a/lib/actions.c
> +++ b/lib/actions.c
> @@ -319,11 +319,7 @@ parse_NEXT(struct action_context *ctx)
>          }
>      }
>  
> -    if (pipeline == OVNACT_P_EGRESS && ctx->pp->pipeline == OVNACT_P_INGRESS) {
> -        lexer_error(ctx->lexer,
> -                    "\"next\" action cannot advance from ingress to egress "
> -                    "pipeline (use \"output\" action instead)");
> -    } else if (table >= ctx->pp->n_tables) {
> +    if (table >= ctx->pp->n_tables) {
>          lexer_error(ctx->lexer,
>                      "\"next\" action cannot advance beyond table %d.",
>                      ctx->pp->n_tables - 1);
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index efcc4b7fc..d39e259f6 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -373,6 +373,14 @@
>          for new connections and <code>reg0[1] = 1; next;</code> for existing
>          connections.
>        </li>
> +      <li>
> +        <code>reject</code> ACLs translate into logical
> +        flows with the
> +        <code>tcp_reset { output &lt;-&gt; inport;
> +        next(pipeline=egress,table=5);}</code>
> +        action for TCP connections and <code>icmp4/icmp6</code> action
> +        for UDP connections.
> +      </li>
>        <li>
>          Other ACLs translate to <code>drop;</code> for new or untracked
>          connections and <code>ct_commit(ct_label=1/1);</code> for known
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index c4675bd68..f8ae155a2 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -4721,11 +4721,11 @@ build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
>           * unreachable packets. */
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110,
>                        "nd || nd_rs || nd_ra || icmp4.type == 3 || "
> -                      "icmp6.type == 1 || (tcp && tcp.flags == 20) || "
> +                      "icmp6.type == 1 || "
>                        "(udp && udp.src == 546 && udp.dst == 547)", "next;");
>          ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110,
>                        "nd || nd_rs || nd_ra || icmp4.type == 3 || "
> -                      "icmp6.type == 1 || (tcp && tcp.flags == 20) ||"
> +                      "icmp6.type == 1 || "
>                        "(udp && udp.src == 546 && udp.dst == 547)", "next;");
>  
>          /* Ingress and Egress Pre-ACL Table (Priority 100).
> @@ -4838,11 +4838,11 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
>      /* Do not send ND packets to conntrack */
>      ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 110,
>                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> +                  "icmp6.type == 1",
>                    "next;");
>      ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110,
>                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> +                  "icmp6.type == 1",
>                    "next;");
>  
>      /* Do not send service monitor packets to conntrack. */
> @@ -4988,7 +4988,8 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
>      ds_put_format(&actions, "reg0 = 0; "
>                    "eth.dst <-> eth.src; ip4.dst <-> ip4.src; "
>                    "tcp_reset { outport <-> inport; %s };",
> -                  ingress ? "output;" : "next(pipeline=ingress,table=0);");
> +                  ingress ? "next(pipeline=egress,table=5);"
> +                          : "next(pipeline=ingress,table=19);");

Do you think there's a way we could avoid hardcoding the table numbers?
I'm thinking of the case when a stage is added/deleted and the
ls_out_qos_mark/"ls_in_l2_lkup" end up having different table ids. I
don't think a macro definition is enough as that still needs updating
every time a table is added/deleted.

>      ovn_lflow_add_with_hint(lflows, od, stage,
>                              acl->priority + OVN_ACL_PRI_OFFSET + 10,
>                              ds_cstr(&match), ds_cstr(&actions), stage_hint);
> @@ -5002,7 +5003,8 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
>      ds_put_format(&actions, "reg0 = 0; "
>                    "eth.dst <-> eth.src; ip6.dst <-> ip6.src; "
>                    "tcp_reset { outport <-> inport; %s };",
> -                  ingress ? "output;" : "next(pipeline=ingress,table=0);");
> +                  ingress ? "next(pipeline=egress,table=5);"
> +                          : "next(pipeline=ingress,table=19);");

Same question as above.

>      ovn_lflow_add_with_hint(lflows, od, stage,
>                              acl->priority + OVN_ACL_PRI_OFFSET + 10,
>                              ds_cstr(&match), ds_cstr(&actions), stage_hint);
> diff --git a/ovn-sb.xml b/ovn-sb.xml
> index 5f8da534c..3aa7cd4da 100644
> --- a/ovn-sb.xml
> +++ b/ovn-sb.xml
> @@ -1112,10 +1112,12 @@
>            <var>pipeline</var> as a subroutine.  The default <var>table</var> is
>            just after the current one.  If <var>pipeline</var> is specified, it
>            may be <code>ingress</code> or <code>egress</code>; the default
> -          <var>pipeline</var> is the one currently executing.  Actions in the
> -          ingress pipeline may not use <code>next</code> to jump into the
> -          egress pipeline (use the <code>output</code> instead), but
> -          transitions in the opposite direction are allowed.
> +          <var>pipeline</var> is the one currently executing. Actions in the
> +          both ingress and egress pipeline can use <code>next</code> to jump
> +          across the other pipeline.  Actions in the ingress pipeline should
> +          use <code>next</code> to jump into the specific table of egress
> +          pipeline only if it is certain that the packets are local and not
> +          tunnelled and wants to skip certain stages in the packet processing.
>          </dd>
>  
>          <dt><code><var>field</var> = <var>constant</var>;</code></dt>
> diff --git a/tests/automake.mk b/tests/automake.mk
> index 215fb432b..ed530dd77 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -205,7 +205,8 @@ tests_ovstest_LDADD = $(OVS_LIBDIR)/libopenvswitch.la lib/libovn.la
>  # Python tests.
>  CHECK_PYFILES = \
>  	tests/test-l7.py \
> -	tests/uuidfilt.py
> +	tests/uuidfilt.py \
> +	tests/test-tcp-rst.py
>  
>  EXTRA_DIST += $(CHECK_PYFILES)
>  PYCOV_CLEAN_FILES += $(CHECK_PYFILES:.py=.py,cover) .coverage
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 2a7ee7528..b00670816 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -852,7 +852,11 @@ next(pipeline=ingress, table=11);
>      encodes as resubmit(,19)
>  
>  next(pipeline=egress);
> -    "next" action cannot advance from ingress to egress pipeline (use "output" action instead)
> +    formats as next(pipeline=egress, table=11);
> +    encodes as resubmit(,51)
> +
> +next(pipeline=egress, table=5);
> +    encodes as resubmit(,45)
>  
>  next(table=10);
>      formats as next(10);
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index bdb9768d2..3b11cf92b 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -3719,60 +3719,86 @@ start_daemon ovn-controller
>  
>  ovn-nbctl ls-add sw0
>  
> -ovn-nbctl lsp-add sw0 sw0-p1
> -ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
> -ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
> +ovn-nbctl lsp-add sw0 sw0-p1-rej
> +ovn-nbctl lsp-set-addresses sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"
> +ovn-nbctl lsp-set-port-security sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"
>  
> -ovn-nbctl lsp-add sw0 sw0-p2
> -ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
> -ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
> +ovn-nbctl lsp-add sw0 sw0-p2-rej
> +ovn-nbctl lsp-set-addresses sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"
> +ovn-nbctl lsp-set-port-security sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"
> +
> +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
> +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject
> +
> +# Create port group and ACLs for sw0 ports.
> +ovn-nbctl pg-add pg0_drop sw0-p1-rej sw0-p2-rej
> +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
> +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop
> +
> +ovn-nbctl pg-add pg0 sw0-p1-rej sw0-p2-rej
> +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
> +ovn-nbctl --log acl-add pg0 from-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 80" reject
>  
> -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
> -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject
> +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
> +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 82" allow-related
> +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 82" allow-related
> +ovn-nbctl --log acl-add pg0 to-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 84" reject
>  
>  OVN_POPULATE_ARP
>  ovn-nbctl --wait=hv sync
>  
> -ADD_NAMESPACES(sw0-p1)
> -ADD_VETH(sw0-p1, sw0-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
> +ADD_NAMESPACES(sw0-p1-rej)
> +ADD_VETH(sw0-p1-rej, sw0-p1-rej, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
>           "10.0.0.1")
>  
> -ADD_NAMESPACES(sw0-p2)
> -ADD_VETH(sw0-p2, sw0-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
> +ADD_NAMESPACES(sw0-p2-rej)
> +ADD_VETH(sw0-p2-rej, sw0-p2-rej, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
>           "10.0.0.1")
>  
> -NS_CHECK_EXEC([sw0-p1], [ip a a aef0::3/64 dev sw0-p1], [0])
> -NS_CHECK_EXEC([sw0-p2], [ip a a aef0::4/64 dev sw0-p2], [0])
> +NS_CHECK_EXEC([sw0-p1-rej], [ip a a aef0::3/64 dev sw0-p1-rej], [0])
> +NS_CHECK_EXEC([sw0-p2-rej], [ip a a aef0::4/64 dev sw0-p2-rej], [0])
>  
> -# Capture packets in sw0-p1.
> -NS_CHECK_EXEC([sw0-p1], [tcpdump -n -c 2 -i sw0-p1 tcp port 80 > sw0-p1-ip4.pcap &], [0])
> +# Capture packets in sw0-p1-rej.
> +NS_CHECK_EXEC([sw0-p1-rej], [tcpdump -n -c 2 -i sw0-p1-rej tcp port 80 > sw0-p1-rej-ip4.pcap &], [0])
>  sleep 1
>  
> -NS_CHECK_EXEC([sw0-p1], [nc 10.0.0.4 80], [1], [],
> +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 80], [1], [],
>  [dnl
>  Ncat: Connection refused.
>  ])
>  
>  OVS_WAIT_UNTIL([
> -    total=`cat sw0-p1-ip4.pcap |  wc -l`
> +    total=`cat sw0-p1-rej-ip4.pcap |  wc -l`
>      echo "total = $total"
>      test "${total}" = "2"
>  ])
>  
> +# Now send traffic to port 84
> +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 84], [1], [],
> +[dnl
> +Ncat: Connection refused.
> +])
> +
> +AT_CHECK([
> +    n_pkt=$(ovs-ofctl dump-flows br-int table=44 | grep -v n_packets=0 | \
> +grep controller | grep tp_dst=84 -c)
> +    test $n_pkt -eq 1
> +])
> +
>  # Without this sleep, test case fails intermittently.
>  sleep 3
>  
> -NS_CHECK_EXEC([sw0-p2], [tcpdump -n -c 2 -i sw0-p2 tcp port 80 > sw0-p2-ip6.pcap &], [0])
> +NS_CHECK_EXEC([sw0-p2-rej], [tcpdump -n -c 2 -i sw0-p2-rej tcp port 80 > sw0-p2-rej-ip6.pcap &], [0])
>  
>  sleep 1
>  
> -NS_CHECK_EXEC([sw0-p2], [nc -6 aef0::3 80], [1], [],
> +NS_CHECK_EXEC([sw0-p2-rej], [nc -6 aef0::3 80], [1], [],
>  [dnl
>  Ncat: Connection refused.
>  ])
>  
>  OVS_WAIT_UNTIL([
> -    total=`cat sw0-p2-ip6.pcap |  wc -l`
> +    total=`cat sw0-p2-rej-ip6.pcap |  wc -l`
>      echo "total = $total"
>      test "${total}" = "2"
>  ])
> @@ -3936,3 +3962,105 @@ as
>  OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
>  /.*terminating with signal 15.*/d"])
>  AT_CLEANUP
> +
> +# Tests that when an established connection sends TCP reset,
> +# the conntrack entry is not in established state.
> +AT_SETUP([ovn -- conntrack TCP reset])
> +AT_KEYWORDS([conntrack])
> +ovn_start
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-int])
> +
> +# Set external-ids in br-int needed for ovn-controller
> +ovs-vsctl \
> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> +        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> +        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
> +
> +# Start ovn-controller
> +start_daemon ovn-controller
> +
> +ovn-nbctl ls-add sw0
> +
> +ovn-nbctl lsp-add sw0 rst-p1
> +ovn-nbctl lsp-set-addresses rst-p1 "50:54:00:00:00:03"
> +ovn-nbctl lsp-set-port-security rst-p1 "50:54:00:00:00:03"
> +
> +ovn-nbctl lsp-add sw0 rst-p2
> +ovn-nbctl lsp-set-addresses rst-p2 "50:54:00:00:00:04 10.0.0.4"
> +ovn-nbctl lsp-set-port-security rst-p2 "50:54:00:00:00:04 10.0.0.4"
> +
> +# Create port group and ACLs for sw0 ports.
> +ovn-nbctl pg-add pg0_drop rst-p1 rst-p2
> +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
> +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop
> +
> +ovn-nbctl pg-add pg0 rst-p1 rst-p2
> +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
> +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
> +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 80" allow-related
> +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 80" allow-related
> +
> +# Create a logical router and attach to logical switch.
> +ovn-nbctl lr-add lr0
> +ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> +ovn-nbctl lsp-add sw0 sw0-lr0
> +ovn-nbctl lsp-set-type sw0-lr0 router
> +ovn-nbctl lsp-set-addresses sw0-lr0 router
> +ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> +
> +ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80
> +ovn-nbctl --wait=sb ls-lb-add sw0 lb1
> +ovn-nbctl --wait=sb lr-lb-add lr0 lb1
> +
> +OVN_POPULATE_ARP
> +ovn-nbctl --wait=hv sync
> +
> +ADD_NAMESPACES(rst-p1)
> +ADD_VETH(rst-p1, rst-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
> +         "10.0.0.1")
> +
> +ADD_NAMESPACES(rst-p2)
> +ADD_VETH(rst-p2, rst-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
> +         "10.0.0.1")
> +
> +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p1) = xup])
> +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p2) = xup])
> +
> +# Start webservers in 'rst-p1'.
> +OVS_START_L7([rst-p1], [http])
> +
> +NS_CHECK_EXEC([rst-p2], [$PYTHON $srcdir/test-tcp-rst.py --dst-port 80 --dst-ip 10.0.0.10])
> +
> +# When tcp reset is sent, conntrack entry should be in the state - CLOSED or CLOSING.
> +# But there is a bug where tcp reset packet was not sent to the conntrack.
> +# This test case checks that the tcp reset packet is sent to conntrack
> +# and the state is not in established state.
> +AT_CHECK([
> +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=ESTABLISHED -c)
> +    test $ct_est_count -eq 0
> +
> +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=CLOS -c)
> +    test $ct_est_count -eq 1
> +])
> +
> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> +
> +as ovn-sb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as ovn-nb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as northd
> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> +
> +as
> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> +/connection dropped.*/d
> +/Service monitor not found.*/d"])
> +
> +AT_CLEANUP
> diff --git a/tests/test-tcp-rst.py b/tests/test-tcp-rst.py
> new file mode 100644
> index 000000000..6f96c5706
> --- /dev/null
> +++ b/tests/test-tcp-rst.py
> @@ -0,0 +1,37 @@
> +#!/usr/bin/env python3
> +# Copyright (c) 2020 Red Hat, Inc.
> +#
> +# Licensed under the Apache License, Version 2.0 (the "License");
> +# you may not use this file except in compliance with the License.
> +# You may obtain a copy of the License at:
> +#
> +#     http://www.apache.org/licenses/LICENSE-2.0
> +#
> +# Unless required by applicable law or agreed to in writing, software
> +# distributed under the License is distributed on an "AS IS" BASIS,
> +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> +# See the License for the specific language governing permissions and
> +# limitations under the License.
> +
> +# Simple python script which connects to tcp server and then
> +# resets the connection.
> +import argparse
> +import socket
> +import sys
> +import struct
> +import time
> +
> +parser = argparse.ArgumentParser(description='')
> +parser.add_argument("--src-port", type=int, default=11337, help="source port to use")
> +parser.add_argument("--dst-port", type=int, help="dst port to use")
> +parser.add_argument("--dst-ip", help="server ip to use")
> +args = parser.parse_args()
> +sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> +server_address = (args.dst_ip, args.dst_port)
> +sock.bind(('0.0.0.0', args.src_port))
> +sock.connect(server_address)
> +l_onoff = 1
> +l_linger = 0
> +time.sleep(1)
> +sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))
> +sock.close()
>
Lorenzo Bianconi April 27, 2020, 3:09 p.m. UTC | #2
>
> On 4/24/20 9:55 AM, numans@ovn.org wrote:
> > From: Numan Siddique <numans@ovn.org>
> >
> > The commit [1] - 28097d5adb95("Fix tcp_reset action handling") fixed an issue
> > with tcp_reset OVN action. In order to fix that issue, this commit added
> > logical flows to skip all the TCP RST packets from conntrack.
> > Ideally it should have skipped only the TCP RST packets generated by
> > ovn-controller from conntrack. Since all the TCP RST packets are
> > skipped from conntrack, the connections in conntrack remain in
> > ESTABLISHED state even if the client/server sends TCP RST to close the
> > connection.  And these entries live for a long time and this is
> > causing performance issues as reported in the BZ.
> >
> > This patch reverts the logical flows added in [1] and modifies the inner
> > actions of tcp_reset in the ingress logical switch pipeline
> > from - "tcp_reset { outport <-> inport; output; }"
> > to "tcp_reset { output <-> inport; next(pipeline=egress,table=5); }".
> > This causes the packet to resubmit to the egress table ls_out_qos_mark
> > skipping the egress ACL stage. Prior to this packet, next action was
> > not allowing a resubmit from ingress to egress pipeline. This patch
> > relaxes this limitation.
> >
> > For the tcp_reset action in the egress logical switch pipeline, this patch
> > modifies the inner action
> > from - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=0); }"
> > to - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=19); }".
> > This causes the packet to enter the ingress table ls_in_l2_lkup.
> >
> > We don't see similar conntrack leaks with UDP. Although there is an issue
> > with the acl reject action for UDP packets. When ovn-controller generates icmp
> > destination unreachable packet, it doesn't get delivered. And the IP checksum is
> > incorrect in this packet. A follow up patch will fix these issues.
> >
> > [1] - 28097d5adb95("Fix tcp_reset action handling")
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1819785
> > Co-Authored-by: Tim Rozet <trozet@redhat.com>
> > Signed-off-by: Tim Rozet <trozet@redhat.com>
> > Signed-off-by: Numan Siddique <numans@ovn.org>

Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>


>
> Hi Numan,
>
> Looks good to me. Only a question inline regarding options for avoiding
> hardcoding table IDs. Except for that:
>
> Acked-by: Dumitru Ceara <dceara@redhat.com>
>
> Thanks,
> Dumitru
>
> > ---
> >  lib/actions.c           |   6 +-
> >  northd/ovn-northd.8.xml |   8 ++
> >  northd/ovn-northd.c     |  14 ++--
> >  ovn-sb.xml              |  10 ++-
> >  tests/automake.mk       |   3 +-
> >  tests/ovn.at            |   6 +-
> >  tests/system-ovn.at     | 170 +++++++++++++++++++++++++++++++++++-----
> >  tests/test-tcp-rst.py   |  37 +++++++++
> >  8 files changed, 216 insertions(+), 38 deletions(-)
> >  create mode 100644 tests/test-tcp-rst.py
> >
> > diff --git a/lib/actions.c b/lib/actions.c
> > index c19813de0..91619c889 100644
> > --- a/lib/actions.c
> > +++ b/lib/actions.c
> > @@ -319,11 +319,7 @@ parse_NEXT(struct action_context *ctx)
> >          }
> >      }
> >
> > -    if (pipeline == OVNACT_P_EGRESS && ctx->pp->pipeline == OVNACT_P_INGRESS) {
> > -        lexer_error(ctx->lexer,
> > -                    "\"next\" action cannot advance from ingress to egress "
> > -                    "pipeline (use \"output\" action instead)");
> > -    } else if (table >= ctx->pp->n_tables) {
> > +    if (table >= ctx->pp->n_tables) {
> >          lexer_error(ctx->lexer,
> >                      "\"next\" action cannot advance beyond table %d.",
> >                      ctx->pp->n_tables - 1);
> > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > index efcc4b7fc..d39e259f6 100644
> > --- a/northd/ovn-northd.8.xml
> > +++ b/northd/ovn-northd.8.xml
> > @@ -373,6 +373,14 @@
> >          for new connections and <code>reg0[1] = 1; next;</code> for existing
> >          connections.
> >        </li>
> > +      <li>
> > +        <code>reject</code> ACLs translate into logical
> > +        flows with the
> > +        <code>tcp_reset { output &lt;-&gt; inport;
> > +        next(pipeline=egress,table=5);}</code>
> > +        action for TCP connections and <code>icmp4/icmp6</code> action
> > +        for UDP connections.
> > +      </li>
> >        <li>
> >          Other ACLs translate to <code>drop;</code> for new or untracked
> >          connections and <code>ct_commit(ct_label=1/1);</code> for known
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index c4675bd68..f8ae155a2 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -4721,11 +4721,11 @@ build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
> >           * unreachable packets. */
> >          ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110,
> >                        "nd || nd_rs || nd_ra || icmp4.type == 3 || "
> > -                      "icmp6.type == 1 || (tcp && tcp.flags == 20) || "
> > +                      "icmp6.type == 1 || "
> >                        "(udp && udp.src == 546 && udp.dst == 547)", "next;");
> >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110,
> >                        "nd || nd_rs || nd_ra || icmp4.type == 3 || "
> > -                      "icmp6.type == 1 || (tcp && tcp.flags == 20) ||"
> > +                      "icmp6.type == 1 || "
> >                        "(udp && udp.src == 546 && udp.dst == 547)", "next;");
> >
> >          /* Ingress and Egress Pre-ACL Table (Priority 100).
> > @@ -4838,11 +4838,11 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
> >      /* Do not send ND packets to conntrack */
> >      ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 110,
> >                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> > -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> > +                  "icmp6.type == 1",
> >                    "next;");
> >      ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110,
> >                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> > -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> > +                  "icmp6.type == 1",
> >                    "next;");
> >
> >      /* Do not send service monitor packets to conntrack. */
> > @@ -4988,7 +4988,8 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
> >      ds_put_format(&actions, "reg0 = 0; "
> >                    "eth.dst <-> eth.src; ip4.dst <-> ip4.src; "
> >                    "tcp_reset { outport <-> inport; %s };",
> > -                  ingress ? "output;" : "next(pipeline=ingress,table=0);");
> > +                  ingress ? "next(pipeline=egress,table=5);"
> > +                          : "next(pipeline=ingress,table=19);");
>
> Do you think there's a way we could avoid hardcoding the table numbers?
> I'm thinking of the case when a stage is added/deleted and the
> ls_out_qos_mark/"ls_in_l2_lkup" end up having different table ids. I
> don't think a macro definition is enough as that still needs updating
> every time a table is added/deleted.
>
> >      ovn_lflow_add_with_hint(lflows, od, stage,
> >                              acl->priority + OVN_ACL_PRI_OFFSET + 10,
> >                              ds_cstr(&match), ds_cstr(&actions), stage_hint);
> > @@ -5002,7 +5003,8 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
> >      ds_put_format(&actions, "reg0 = 0; "
> >                    "eth.dst <-> eth.src; ip6.dst <-> ip6.src; "
> >                    "tcp_reset { outport <-> inport; %s };",
> > -                  ingress ? "output;" : "next(pipeline=ingress,table=0);");
> > +                  ingress ? "next(pipeline=egress,table=5);"
> > +                          : "next(pipeline=ingress,table=19);");
>
> Same question as above.
>
> >      ovn_lflow_add_with_hint(lflows, od, stage,
> >                              acl->priority + OVN_ACL_PRI_OFFSET + 10,
> >                              ds_cstr(&match), ds_cstr(&actions), stage_hint);
> > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > index 5f8da534c..3aa7cd4da 100644
> > --- a/ovn-sb.xml
> > +++ b/ovn-sb.xml
> > @@ -1112,10 +1112,12 @@
> >            <var>pipeline</var> as a subroutine.  The default <var>table</var> is
> >            just after the current one.  If <var>pipeline</var> is specified, it
> >            may be <code>ingress</code> or <code>egress</code>; the default
> > -          <var>pipeline</var> is the one currently executing.  Actions in the
> > -          ingress pipeline may not use <code>next</code> to jump into the
> > -          egress pipeline (use the <code>output</code> instead), but
> > -          transitions in the opposite direction are allowed.
> > +          <var>pipeline</var> is the one currently executing. Actions in the
> > +          both ingress and egress pipeline can use <code>next</code> to jump
> > +          across the other pipeline.  Actions in the ingress pipeline should
> > +          use <code>next</code> to jump into the specific table of egress
> > +          pipeline only if it is certain that the packets are local and not
> > +          tunnelled and wants to skip certain stages in the packet processing.
> >          </dd>
> >
> >          <dt><code><var>field</var> = <var>constant</var>;</code></dt>
> > diff --git a/tests/automake.mk b/tests/automake.mk
> > index 215fb432b..ed530dd77 100644
> > --- a/tests/automake.mk
> > +++ b/tests/automake.mk
> > @@ -205,7 +205,8 @@ tests_ovstest_LDADD = $(OVS_LIBDIR)/libopenvswitch.la lib/libovn.la
> >  # Python tests.
> >  CHECK_PYFILES = \
> >       tests/test-l7.py \
> > -     tests/uuidfilt.py
> > +     tests/uuidfilt.py \
> > +     tests/test-tcp-rst.py
> >
> >  EXTRA_DIST += $(CHECK_PYFILES)
> >  PYCOV_CLEAN_FILES += $(CHECK_PYFILES:.py=.py,cover) .coverage
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 2a7ee7528..b00670816 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -852,7 +852,11 @@ next(pipeline=ingress, table=11);
> >      encodes as resubmit(,19)
> >
> >  next(pipeline=egress);
> > -    "next" action cannot advance from ingress to egress pipeline (use "output" action instead)
> > +    formats as next(pipeline=egress, table=11);
> > +    encodes as resubmit(,51)
> > +
> > +next(pipeline=egress, table=5);
> > +    encodes as resubmit(,45)
> >
> >  next(table=10);
> >      formats as next(10);
> > diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> > index bdb9768d2..3b11cf92b 100644
> > --- a/tests/system-ovn.at
> > +++ b/tests/system-ovn.at
> > @@ -3719,60 +3719,86 @@ start_daemon ovn-controller
> >
> >  ovn-nbctl ls-add sw0
> >
> > -ovn-nbctl lsp-add sw0 sw0-p1
> > -ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
> > -ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
> > +ovn-nbctl lsp-add sw0 sw0-p1-rej
> > +ovn-nbctl lsp-set-addresses sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"
> > +ovn-nbctl lsp-set-port-security sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"
> >
> > -ovn-nbctl lsp-add sw0 sw0-p2
> > -ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
> > -ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
> > +ovn-nbctl lsp-add sw0 sw0-p2-rej
> > +ovn-nbctl lsp-set-addresses sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"
> > +ovn-nbctl lsp-set-port-security sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"
> > +
> > +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
> > +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject
> > +
> > +# Create port group and ACLs for sw0 ports.
> > +ovn-nbctl pg-add pg0_drop sw0-p1-rej sw0-p2-rej
> > +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
> > +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop
> > +
> > +ovn-nbctl pg-add pg0 sw0-p1-rej sw0-p2-rej
> > +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
> > +ovn-nbctl --log acl-add pg0 from-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 80" reject
> >
> > -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
> > -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject
> > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
> > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 82" allow-related
> > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 82" allow-related
> > +ovn-nbctl --log acl-add pg0 to-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 84" reject
> >
> >  OVN_POPULATE_ARP
> >  ovn-nbctl --wait=hv sync
> >
> > -ADD_NAMESPACES(sw0-p1)
> > -ADD_VETH(sw0-p1, sw0-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
> > +ADD_NAMESPACES(sw0-p1-rej)
> > +ADD_VETH(sw0-p1-rej, sw0-p1-rej, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
> >           "10.0.0.1")
> >
> > -ADD_NAMESPACES(sw0-p2)
> > -ADD_VETH(sw0-p2, sw0-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
> > +ADD_NAMESPACES(sw0-p2-rej)
> > +ADD_VETH(sw0-p2-rej, sw0-p2-rej, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
> >           "10.0.0.1")
> >
> > -NS_CHECK_EXEC([sw0-p1], [ip a a aef0::3/64 dev sw0-p1], [0])
> > -NS_CHECK_EXEC([sw0-p2], [ip a a aef0::4/64 dev sw0-p2], [0])
> > +NS_CHECK_EXEC([sw0-p1-rej], [ip a a aef0::3/64 dev sw0-p1-rej], [0])
> > +NS_CHECK_EXEC([sw0-p2-rej], [ip a a aef0::4/64 dev sw0-p2-rej], [0])
> >
> > -# Capture packets in sw0-p1.
> > -NS_CHECK_EXEC([sw0-p1], [tcpdump -n -c 2 -i sw0-p1 tcp port 80 > sw0-p1-ip4.pcap &], [0])
> > +# Capture packets in sw0-p1-rej.
> > +NS_CHECK_EXEC([sw0-p1-rej], [tcpdump -n -c 2 -i sw0-p1-rej tcp port 80 > sw0-p1-rej-ip4.pcap &], [0])
> >  sleep 1
> >
> > -NS_CHECK_EXEC([sw0-p1], [nc 10.0.0.4 80], [1], [],
> > +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 80], [1], [],
> >  [dnl
> >  Ncat: Connection refused.
> >  ])
> >
> >  OVS_WAIT_UNTIL([
> > -    total=`cat sw0-p1-ip4.pcap |  wc -l`
> > +    total=`cat sw0-p1-rej-ip4.pcap |  wc -l`
> >      echo "total = $total"
> >      test "${total}" = "2"
> >  ])
> >
> > +# Now send traffic to port 84
> > +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 84], [1], [],
> > +[dnl
> > +Ncat: Connection refused.
> > +])
> > +
> > +AT_CHECK([
> > +    n_pkt=$(ovs-ofctl dump-flows br-int table=44 | grep -v n_packets=0 | \
> > +grep controller | grep tp_dst=84 -c)
> > +    test $n_pkt -eq 1
> > +])
> > +
> >  # Without this sleep, test case fails intermittently.
> >  sleep 3
> >
> > -NS_CHECK_EXEC([sw0-p2], [tcpdump -n -c 2 -i sw0-p2 tcp port 80 > sw0-p2-ip6.pcap &], [0])
> > +NS_CHECK_EXEC([sw0-p2-rej], [tcpdump -n -c 2 -i sw0-p2-rej tcp port 80 > sw0-p2-rej-ip6.pcap &], [0])
> >
> >  sleep 1
> >
> > -NS_CHECK_EXEC([sw0-p2], [nc -6 aef0::3 80], [1], [],
> > +NS_CHECK_EXEC([sw0-p2-rej], [nc -6 aef0::3 80], [1], [],
> >  [dnl
> >  Ncat: Connection refused.
> >  ])
> >
> >  OVS_WAIT_UNTIL([
> > -    total=`cat sw0-p2-ip6.pcap |  wc -l`
> > +    total=`cat sw0-p2-rej-ip6.pcap |  wc -l`
> >      echo "total = $total"
> >      test "${total}" = "2"
> >  ])
> > @@ -3936,3 +3962,105 @@ as
> >  OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
> >  /.*terminating with signal 15.*/d"])
> >  AT_CLEANUP
> > +
> > +# Tests that when an established connection sends TCP reset,
> > +# the conntrack entry is not in established state.
> > +AT_SETUP([ovn -- conntrack TCP reset])
> > +AT_KEYWORDS([conntrack])
> > +ovn_start
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-int])
> > +
> > +# Set external-ids in br-int needed for ovn-controller
> > +ovs-vsctl \
> > +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> > +        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> > +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> > +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> > +        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
> > +
> > +# Start ovn-controller
> > +start_daemon ovn-controller
> > +
> > +ovn-nbctl ls-add sw0
> > +
> > +ovn-nbctl lsp-add sw0 rst-p1
> > +ovn-nbctl lsp-set-addresses rst-p1 "50:54:00:00:00:03"
> > +ovn-nbctl lsp-set-port-security rst-p1 "50:54:00:00:00:03"
> > +
> > +ovn-nbctl lsp-add sw0 rst-p2
> > +ovn-nbctl lsp-set-addresses rst-p2 "50:54:00:00:00:04 10.0.0.4"
> > +ovn-nbctl lsp-set-port-security rst-p2 "50:54:00:00:00:04 10.0.0.4"
> > +
> > +# Create port group and ACLs for sw0 ports.
> > +ovn-nbctl pg-add pg0_drop rst-p1 rst-p2
> > +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
> > +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop
> > +
> > +ovn-nbctl pg-add pg0 rst-p1 rst-p2
> > +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
> > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
> > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 80" allow-related
> > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 80" allow-related
> > +
> > +# Create a logical router and attach to logical switch.
> > +ovn-nbctl lr-add lr0
> > +ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> > +ovn-nbctl lsp-add sw0 sw0-lr0
> > +ovn-nbctl lsp-set-type sw0-lr0 router
> > +ovn-nbctl lsp-set-addresses sw0-lr0 router
> > +ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> > +
> > +ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80
> > +ovn-nbctl --wait=sb ls-lb-add sw0 lb1
> > +ovn-nbctl --wait=sb lr-lb-add lr0 lb1
> > +
> > +OVN_POPULATE_ARP
> > +ovn-nbctl --wait=hv sync
> > +
> > +ADD_NAMESPACES(rst-p1)
> > +ADD_VETH(rst-p1, rst-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
> > +         "10.0.0.1")
> > +
> > +ADD_NAMESPACES(rst-p2)
> > +ADD_VETH(rst-p2, rst-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
> > +         "10.0.0.1")
> > +
> > +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p1) = xup])
> > +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p2) = xup])
> > +
> > +# Start webservers in 'rst-p1'.
> > +OVS_START_L7([rst-p1], [http])
> > +
> > +NS_CHECK_EXEC([rst-p2], [$PYTHON $srcdir/test-tcp-rst.py --dst-port 80 --dst-ip 10.0.0.10])
> > +
> > +# When tcp reset is sent, conntrack entry should be in the state - CLOSED or CLOSING.
> > +# But there is a bug where tcp reset packet was not sent to the conntrack.
> > +# This test case checks that the tcp reset packet is sent to conntrack
> > +# and the state is not in established state.
> > +AT_CHECK([
> > +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=ESTABLISHED -c)
> > +    test $ct_est_count -eq 0
> > +
> > +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=CLOS -c)
> > +    test $ct_est_count -eq 1
> > +])
> > +
> > +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> > +
> > +as ovn-sb
> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > +
> > +as ovn-nb
> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > +
> > +as northd
> > +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> > +
> > +as
> > +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> > +/connection dropped.*/d
> > +/Service monitor not found.*/d"])
> > +
> > +AT_CLEANUP
> > diff --git a/tests/test-tcp-rst.py b/tests/test-tcp-rst.py
> > new file mode 100644
> > index 000000000..6f96c5706
> > --- /dev/null
> > +++ b/tests/test-tcp-rst.py
> > @@ -0,0 +1,37 @@
> > +#!/usr/bin/env python3
> > +# Copyright (c) 2020 Red Hat, Inc.
> > +#
> > +# Licensed under the Apache License, Version 2.0 (the "License");
> > +# you may not use this file except in compliance with the License.
> > +# You may obtain a copy of the License at:
> > +#
> > +#     http://www.apache.org/licenses/LICENSE-2.0
> > +#
> > +# Unless required by applicable law or agreed to in writing, software
> > +# distributed under the License is distributed on an "AS IS" BASIS,
> > +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > +# See the License for the specific language governing permissions and
> > +# limitations under the License.
> > +
> > +# Simple python script which connects to tcp server and then
> > +# resets the connection.
> > +import argparse
> > +import socket
> > +import sys
> > +import struct
> > +import time
> > +
> > +parser = argparse.ArgumentParser(description='')
> > +parser.add_argument("--src-port", type=int, default=11337, help="source port to use")
> > +parser.add_argument("--dst-port", type=int, help="dst port to use")
> > +parser.add_argument("--dst-ip", help="server ip to use")
> > +args = parser.parse_args()
> > +sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> > +server_address = (args.dst_ip, args.dst_port)
> > +sock.bind(('0.0.0.0', args.src_port))
> > +sock.connect(server_address)
> > +l_onoff = 1
> > +l_linger = 0
> > +time.sleep(1)
> > +sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))
> > +sock.close()
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Numan Siddique April 27, 2020, 3:40 p.m. UTC | #3
On Mon, Apr 27, 2020 at 8:40 PM Lorenzo Bianconi
<lorenzo.bianconi@redhat.com> wrote:
>
> >
> > On 4/24/20 9:55 AM, numans@ovn.org wrote:
> > > From: Numan Siddique <numans@ovn.org>
> > >
> > > The commit [1] - 28097d5adb95("Fix tcp_reset action handling") fixed an issue
> > > with tcp_reset OVN action. In order to fix that issue, this commit added
> > > logical flows to skip all the TCP RST packets from conntrack.
> > > Ideally it should have skipped only the TCP RST packets generated by
> > > ovn-controller from conntrack. Since all the TCP RST packets are
> > > skipped from conntrack, the connections in conntrack remain in
> > > ESTABLISHED state even if the client/server sends TCP RST to close the
> > > connection.  And these entries live for a long time and this is
> > > causing performance issues as reported in the BZ.
> > >
> > > This patch reverts the logical flows added in [1] and modifies the inner
> > > actions of tcp_reset in the ingress logical switch pipeline
> > > from - "tcp_reset { outport <-> inport; output; }"
> > > to "tcp_reset { output <-> inport; next(pipeline=egress,table=5); }".
> > > This causes the packet to resubmit to the egress table ls_out_qos_mark
> > > skipping the egress ACL stage. Prior to this packet, next action was
> > > not allowing a resubmit from ingress to egress pipeline. This patch
> > > relaxes this limitation.
> > >
> > > For the tcp_reset action in the egress logical switch pipeline, this patch
> > > modifies the inner action
> > > from - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=0); }"
> > > to - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=19); }".
> > > This causes the packet to enter the ingress table ls_in_l2_lkup.
> > >
> > > We don't see similar conntrack leaks with UDP. Although there is an issue
> > > with the acl reject action for UDP packets. When ovn-controller generates icmp
> > > destination unreachable packet, it doesn't get delivered. And the IP checksum is
> > > incorrect in this packet. A follow up patch will fix these issues.
> > >
> > > [1] - 28097d5adb95("Fix tcp_reset action handling")
> > >
> > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1819785
> > > Co-Authored-by: Tim Rozet <trozet@redhat.com>
> > > Signed-off-by: Tim Rozet <trozet@redhat.com>
> > > Signed-off-by: Numan Siddique <numans@ovn.org>
>
> Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
>
>
> >
> > Hi Numan,
> >
> > Looks good to me. Only a question inline regarding options for avoiding
> > hardcoding table IDs. Except for that:
> >
> > Acked-by: Dumitru Ceara <dceara@redhat.com>
> >

Thanks Dumitru and Lorenzo for the reviews. I applied this patch to master.

> > Thanks,
> > Dumitru
> >
> > > ---
> > >  lib/actions.c           |   6 +-
> > >  northd/ovn-northd.8.xml |   8 ++
> > >  northd/ovn-northd.c     |  14 ++--
> > >  ovn-sb.xml              |  10 ++-
> > >  tests/automake.mk       |   3 +-
> > >  tests/ovn.at            |   6 +-
> > >  tests/system-ovn.at     | 170 +++++++++++++++++++++++++++++++++++-----
> > >  tests/test-tcp-rst.py   |  37 +++++++++
> > >  8 files changed, 216 insertions(+), 38 deletions(-)
> > >  create mode 100644 tests/test-tcp-rst.py
> > >
> > > diff --git a/lib/actions.c b/lib/actions.c
> > > index c19813de0..91619c889 100644
> > > --- a/lib/actions.c
> > > +++ b/lib/actions.c
> > > @@ -319,11 +319,7 @@ parse_NEXT(struct action_context *ctx)
> > >          }
> > >      }
> > >
> > > -    if (pipeline == OVNACT_P_EGRESS && ctx->pp->pipeline == OVNACT_P_INGRESS) {
> > > -        lexer_error(ctx->lexer,
> > > -                    "\"next\" action cannot advance from ingress to egress "
> > > -                    "pipeline (use \"output\" action instead)");
> > > -    } else if (table >= ctx->pp->n_tables) {
> > > +    if (table >= ctx->pp->n_tables) {
> > >          lexer_error(ctx->lexer,
> > >                      "\"next\" action cannot advance beyond table %d.",
> > >                      ctx->pp->n_tables - 1);
> > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > > index efcc4b7fc..d39e259f6 100644
> > > --- a/northd/ovn-northd.8.xml
> > > +++ b/northd/ovn-northd.8.xml
> > > @@ -373,6 +373,14 @@
> > >          for new connections and <code>reg0[1] = 1; next;</code> for existing
> > >          connections.
> > >        </li>
> > > +      <li>
> > > +        <code>reject</code> ACLs translate into logical
> > > +        flows with the
> > > +        <code>tcp_reset { output &lt;-&gt; inport;
> > > +        next(pipeline=egress,table=5);}</code>
> > > +        action for TCP connections and <code>icmp4/icmp6</code> action
> > > +        for UDP connections.
> > > +      </li>
> > >        <li>
> > >          Other ACLs translate to <code>drop;</code> for new or untracked
> > >          connections and <code>ct_commit(ct_label=1/1);</code> for known
> > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > > index c4675bd68..f8ae155a2 100644
> > > --- a/northd/ovn-northd.c
> > > +++ b/northd/ovn-northd.c
> > > @@ -4721,11 +4721,11 @@ build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
> > >           * unreachable packets. */
> > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110,
> > >                        "nd || nd_rs || nd_ra || icmp4.type == 3 || "
> > > -                      "icmp6.type == 1 || (tcp && tcp.flags == 20) || "
> > > +                      "icmp6.type == 1 || "
> > >                        "(udp && udp.src == 546 && udp.dst == 547)", "next;");
> > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110,
> > >                        "nd || nd_rs || nd_ra || icmp4.type == 3 || "
> > > -                      "icmp6.type == 1 || (tcp && tcp.flags == 20) ||"
> > > +                      "icmp6.type == 1 || "
> > >                        "(udp && udp.src == 546 && udp.dst == 547)", "next;");
> > >
> > >          /* Ingress and Egress Pre-ACL Table (Priority 100).
> > > @@ -4838,11 +4838,11 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
> > >      /* Do not send ND packets to conntrack */
> > >      ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 110,
> > >                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> > > -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> > > +                  "icmp6.type == 1",
> > >                    "next;");
> > >      ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110,
> > >                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> > > -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> > > +                  "icmp6.type == 1",
> > >                    "next;");
> > >
> > >      /* Do not send service monitor packets to conntrack. */
> > > @@ -4988,7 +4988,8 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
> > >      ds_put_format(&actions, "reg0 = 0; "
> > >                    "eth.dst <-> eth.src; ip4.dst <-> ip4.src; "
> > >                    "tcp_reset { outport <-> inport; %s };",
> > > -                  ingress ? "output;" : "next(pipeline=ingress,table=0);");
> > > +                  ingress ? "next(pipeline=egress,table=5);"
> > > +                          : "next(pipeline=ingress,table=19);");
> >
> > Do you think there's a way we could avoid hardcoding the table numbers?
> > I'm thinking of the case when a stage is added/deleted and the
> > ls_out_qos_mark/"ls_in_l2_lkup" end up having different table ids. I
> > don't think a macro definition is enough as that still needs updating
> > every time a table is added/deleted.

I agree.  Hardcoding table numbers are prone to regressions later in this
case. As I discussed offline, I think it's better to have a macro defined
for each table whose value gets generated.

I'll work on a follow up patch to have this.

Thanks
Numan

> >
> > >      ovn_lflow_add_with_hint(lflows, od, stage,
> > >                              acl->priority + OVN_ACL_PRI_OFFSET + 10,
> > >                              ds_cstr(&match), ds_cstr(&actions), stage_hint);
> > > @@ -5002,7 +5003,8 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
> > >      ds_put_format(&actions, "reg0 = 0; "
> > >                    "eth.dst <-> eth.src; ip6.dst <-> ip6.src; "
> > >                    "tcp_reset { outport <-> inport; %s };",
> > > -                  ingress ? "output;" : "next(pipeline=ingress,table=0);");
> > > +                  ingress ? "next(pipeline=egress,table=5);"
> > > +                          : "next(pipeline=ingress,table=19);");
> >
> > Same question as above.
> >
> > >      ovn_lflow_add_with_hint(lflows, od, stage,
> > >                              acl->priority + OVN_ACL_PRI_OFFSET + 10,
> > >                              ds_cstr(&match), ds_cstr(&actions), stage_hint);
> > > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > > index 5f8da534c..3aa7cd4da 100644
> > > --- a/ovn-sb.xml
> > > +++ b/ovn-sb.xml
> > > @@ -1112,10 +1112,12 @@
> > >            <var>pipeline</var> as a subroutine.  The default <var>table</var> is
> > >            just after the current one.  If <var>pipeline</var> is specified, it
> > >            may be <code>ingress</code> or <code>egress</code>; the default
> > > -          <var>pipeline</var> is the one currently executing.  Actions in the
> > > -          ingress pipeline may not use <code>next</code> to jump into the
> > > -          egress pipeline (use the <code>output</code> instead), but
> > > -          transitions in the opposite direction are allowed.
> > > +          <var>pipeline</var> is the one currently executing. Actions in the
> > > +          both ingress and egress pipeline can use <code>next</code> to jump
> > > +          across the other pipeline.  Actions in the ingress pipeline should
> > > +          use <code>next</code> to jump into the specific table of egress
> > > +          pipeline only if it is certain that the packets are local and not
> > > +          tunnelled and wants to skip certain stages in the packet processing.
> > >          </dd>
> > >
> > >          <dt><code><var>field</var> = <var>constant</var>;</code></dt>
> > > diff --git a/tests/automake.mk b/tests/automake.mk
> > > index 215fb432b..ed530dd77 100644
> > > --- a/tests/automake.mk
> > > +++ b/tests/automake.mk
> > > @@ -205,7 +205,8 @@ tests_ovstest_LDADD = $(OVS_LIBDIR)/libopenvswitch.la lib/libovn.la
> > >  # Python tests.
> > >  CHECK_PYFILES = \
> > >       tests/test-l7.py \
> > > -     tests/uuidfilt.py
> > > +     tests/uuidfilt.py \
> > > +     tests/test-tcp-rst.py
> > >
> > >  EXTRA_DIST += $(CHECK_PYFILES)
> > >  PYCOV_CLEAN_FILES += $(CHECK_PYFILES:.py=.py,cover) .coverage
> > > diff --git a/tests/ovn.at b/tests/ovn.at
> > > index 2a7ee7528..b00670816 100644
> > > --- a/tests/ovn.at
> > > +++ b/tests/ovn.at
> > > @@ -852,7 +852,11 @@ next(pipeline=ingress, table=11);
> > >      encodes as resubmit(,19)
> > >
> > >  next(pipeline=egress);
> > > -    "next" action cannot advance from ingress to egress pipeline (use "output" action instead)
> > > +    formats as next(pipeline=egress, table=11);
> > > +    encodes as resubmit(,51)
> > > +
> > > +next(pipeline=egress, table=5);
> > > +    encodes as resubmit(,45)
> > >
> > >  next(table=10);
> > >      formats as next(10);
> > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> > > index bdb9768d2..3b11cf92b 100644
> > > --- a/tests/system-ovn.at
> > > +++ b/tests/system-ovn.at
> > > @@ -3719,60 +3719,86 @@ start_daemon ovn-controller
> > >
> > >  ovn-nbctl ls-add sw0
> > >
> > > -ovn-nbctl lsp-add sw0 sw0-p1
> > > -ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
> > > -ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
> > > +ovn-nbctl lsp-add sw0 sw0-p1-rej
> > > +ovn-nbctl lsp-set-addresses sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"
> > > +ovn-nbctl lsp-set-port-security sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"
> > >
> > > -ovn-nbctl lsp-add sw0 sw0-p2
> > > -ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
> > > -ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
> > > +ovn-nbctl lsp-add sw0 sw0-p2-rej
> > > +ovn-nbctl lsp-set-addresses sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"
> > > +ovn-nbctl lsp-set-port-security sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"
> > > +
> > > +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
> > > +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject
> > > +
> > > +# Create port group and ACLs for sw0 ports.
> > > +ovn-nbctl pg-add pg0_drop sw0-p1-rej sw0-p2-rej
> > > +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
> > > +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop
> > > +
> > > +ovn-nbctl pg-add pg0 sw0-p1-rej sw0-p2-rej
> > > +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
> > > +ovn-nbctl --log acl-add pg0 from-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 80" reject
> > >
> > > -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
> > > -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject
> > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
> > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 82" allow-related
> > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 82" allow-related
> > > +ovn-nbctl --log acl-add pg0 to-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 84" reject
> > >
> > >  OVN_POPULATE_ARP
> > >  ovn-nbctl --wait=hv sync
> > >
> > > -ADD_NAMESPACES(sw0-p1)
> > > -ADD_VETH(sw0-p1, sw0-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
> > > +ADD_NAMESPACES(sw0-p1-rej)
> > > +ADD_VETH(sw0-p1-rej, sw0-p1-rej, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
> > >           "10.0.0.1")
> > >
> > > -ADD_NAMESPACES(sw0-p2)
> > > -ADD_VETH(sw0-p2, sw0-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
> > > +ADD_NAMESPACES(sw0-p2-rej)
> > > +ADD_VETH(sw0-p2-rej, sw0-p2-rej, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
> > >           "10.0.0.1")
> > >
> > > -NS_CHECK_EXEC([sw0-p1], [ip a a aef0::3/64 dev sw0-p1], [0])
> > > -NS_CHECK_EXEC([sw0-p2], [ip a a aef0::4/64 dev sw0-p2], [0])
> > > +NS_CHECK_EXEC([sw0-p1-rej], [ip a a aef0::3/64 dev sw0-p1-rej], [0])
> > > +NS_CHECK_EXEC([sw0-p2-rej], [ip a a aef0::4/64 dev sw0-p2-rej], [0])
> > >
> > > -# Capture packets in sw0-p1.
> > > -NS_CHECK_EXEC([sw0-p1], [tcpdump -n -c 2 -i sw0-p1 tcp port 80 > sw0-p1-ip4.pcap &], [0])
> > > +# Capture packets in sw0-p1-rej.
> > > +NS_CHECK_EXEC([sw0-p1-rej], [tcpdump -n -c 2 -i sw0-p1-rej tcp port 80 > sw0-p1-rej-ip4.pcap &], [0])
> > >  sleep 1
> > >
> > > -NS_CHECK_EXEC([sw0-p1], [nc 10.0.0.4 80], [1], [],
> > > +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 80], [1], [],
> > >  [dnl
> > >  Ncat: Connection refused.
> > >  ])
> > >
> > >  OVS_WAIT_UNTIL([
> > > -    total=`cat sw0-p1-ip4.pcap |  wc -l`
> > > +    total=`cat sw0-p1-rej-ip4.pcap |  wc -l`
> > >      echo "total = $total"
> > >      test "${total}" = "2"
> > >  ])
> > >
> > > +# Now send traffic to port 84
> > > +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 84], [1], [],
> > > +[dnl
> > > +Ncat: Connection refused.
> > > +])
> > > +
> > > +AT_CHECK([
> > > +    n_pkt=$(ovs-ofctl dump-flows br-int table=44 | grep -v n_packets=0 | \
> > > +grep controller | grep tp_dst=84 -c)
> > > +    test $n_pkt -eq 1
> > > +])
> > > +
> > >  # Without this sleep, test case fails intermittently.
> > >  sleep 3
> > >
> > > -NS_CHECK_EXEC([sw0-p2], [tcpdump -n -c 2 -i sw0-p2 tcp port 80 > sw0-p2-ip6.pcap &], [0])
> > > +NS_CHECK_EXEC([sw0-p2-rej], [tcpdump -n -c 2 -i sw0-p2-rej tcp port 80 > sw0-p2-rej-ip6.pcap &], [0])
> > >
> > >  sleep 1
> > >
> > > -NS_CHECK_EXEC([sw0-p2], [nc -6 aef0::3 80], [1], [],
> > > +NS_CHECK_EXEC([sw0-p2-rej], [nc -6 aef0::3 80], [1], [],
> > >  [dnl
> > >  Ncat: Connection refused.
> > >  ])
> > >
> > >  OVS_WAIT_UNTIL([
> > > -    total=`cat sw0-p2-ip6.pcap |  wc -l`
> > > +    total=`cat sw0-p2-rej-ip6.pcap |  wc -l`
> > >      echo "total = $total"
> > >      test "${total}" = "2"
> > >  ])
> > > @@ -3936,3 +3962,105 @@ as
> > >  OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
> > >  /.*terminating with signal 15.*/d"])
> > >  AT_CLEANUP
> > > +
> > > +# Tests that when an established connection sends TCP reset,
> > > +# the conntrack entry is not in established state.
> > > +AT_SETUP([ovn -- conntrack TCP reset])
> > > +AT_KEYWORDS([conntrack])
> > > +ovn_start
> > > +
> > > +OVS_TRAFFIC_VSWITCHD_START()
> > > +ADD_BR([br-int])
> > > +
> > > +# Set external-ids in br-int needed for ovn-controller
> > > +ovs-vsctl \
> > > +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> > > +        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> > > +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> > > +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> > > +        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
> > > +
> > > +# Start ovn-controller
> > > +start_daemon ovn-controller
> > > +
> > > +ovn-nbctl ls-add sw0
> > > +
> > > +ovn-nbctl lsp-add sw0 rst-p1
> > > +ovn-nbctl lsp-set-addresses rst-p1 "50:54:00:00:00:03"
> > > +ovn-nbctl lsp-set-port-security rst-p1 "50:54:00:00:00:03"
> > > +
> > > +ovn-nbctl lsp-add sw0 rst-p2
> > > +ovn-nbctl lsp-set-addresses rst-p2 "50:54:00:00:00:04 10.0.0.4"
> > > +ovn-nbctl lsp-set-port-security rst-p2 "50:54:00:00:00:04 10.0.0.4"
> > > +
> > > +# Create port group and ACLs for sw0 ports.
> > > +ovn-nbctl pg-add pg0_drop rst-p1 rst-p2
> > > +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
> > > +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop
> > > +
> > > +ovn-nbctl pg-add pg0 rst-p1 rst-p2
> > > +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
> > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
> > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 80" allow-related
> > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 80" allow-related
> > > +
> > > +# Create a logical router and attach to logical switch.
> > > +ovn-nbctl lr-add lr0
> > > +ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> > > +ovn-nbctl lsp-add sw0 sw0-lr0
> > > +ovn-nbctl lsp-set-type sw0-lr0 router
> > > +ovn-nbctl lsp-set-addresses sw0-lr0 router
> > > +ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> > > +
> > > +ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80
> > > +ovn-nbctl --wait=sb ls-lb-add sw0 lb1
> > > +ovn-nbctl --wait=sb lr-lb-add lr0 lb1
> > > +
> > > +OVN_POPULATE_ARP
> > > +ovn-nbctl --wait=hv sync
> > > +
> > > +ADD_NAMESPACES(rst-p1)
> > > +ADD_VETH(rst-p1, rst-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
> > > +         "10.0.0.1")
> > > +
> > > +ADD_NAMESPACES(rst-p2)
> > > +ADD_VETH(rst-p2, rst-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
> > > +         "10.0.0.1")
> > > +
> > > +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p1) = xup])
> > > +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p2) = xup])
> > > +
> > > +# Start webservers in 'rst-p1'.
> > > +OVS_START_L7([rst-p1], [http])
> > > +
> > > +NS_CHECK_EXEC([rst-p2], [$PYTHON $srcdir/test-tcp-rst.py --dst-port 80 --dst-ip 10.0.0.10])
> > > +
> > > +# When tcp reset is sent, conntrack entry should be in the state - CLOSED or CLOSING.
> > > +# But there is a bug where tcp reset packet was not sent to the conntrack.
> > > +# This test case checks that the tcp reset packet is sent to conntrack
> > > +# and the state is not in established state.
> > > +AT_CHECK([
> > > +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=ESTABLISHED -c)
> > > +    test $ct_est_count -eq 0
> > > +
> > > +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=CLOS -c)
> > > +    test $ct_est_count -eq 1
> > > +])
> > > +
> > > +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> > > +
> > > +as ovn-sb
> > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > > +
> > > +as ovn-nb
> > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > > +
> > > +as northd
> > > +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> > > +
> > > +as
> > > +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> > > +/connection dropped.*/d
> > > +/Service monitor not found.*/d"])
> > > +
> > > +AT_CLEANUP
> > > diff --git a/tests/test-tcp-rst.py b/tests/test-tcp-rst.py
> > > new file mode 100644
> > > index 000000000..6f96c5706
> > > --- /dev/null
> > > +++ b/tests/test-tcp-rst.py
> > > @@ -0,0 +1,37 @@
> > > +#!/usr/bin/env python3
> > > +# Copyright (c) 2020 Red Hat, Inc.
> > > +#
> > > +# Licensed under the Apache License, Version 2.0 (the "License");
> > > +# you may not use this file except in compliance with the License.
> > > +# You may obtain a copy of the License at:
> > > +#
> > > +#     http://www.apache.org/licenses/LICENSE-2.0
> > > +#
> > > +# Unless required by applicable law or agreed to in writing, software
> > > +# distributed under the License is distributed on an "AS IS" BASIS,
> > > +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > > +# See the License for the specific language governing permissions and
> > > +# limitations under the License.
> > > +
> > > +# Simple python script which connects to tcp server and then
> > > +# resets the connection.
> > > +import argparse
> > > +import socket
> > > +import sys
> > > +import struct
> > > +import time
> > > +
> > > +parser = argparse.ArgumentParser(description='')
> > > +parser.add_argument("--src-port", type=int, default=11337, help="source port to use")
> > > +parser.add_argument("--dst-port", type=int, help="dst port to use")
> > > +parser.add_argument("--dst-ip", help="server ip to use")
> > > +args = parser.parse_args()
> > > +sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> > > +server_address = (args.dst_ip, args.dst_port)
> > > +sock.bind(('0.0.0.0', args.src_port))
> > > +sock.connect(server_address)
> > > +l_onoff = 1
> > > +l_linger = 0
> > > +time.sleep(1)
> > > +sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))
> > > +sock.close()
> > >
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou April 29, 2020, 6:50 a.m. UTC | #4
On Mon, Apr 27, 2020 at 8:41 AM Numan Siddique <numans@ovn.org> wrote:
>
> On Mon, Apr 27, 2020 at 8:40 PM Lorenzo Bianconi
> <lorenzo.bianconi@redhat.com> wrote:
> >
> > >
> > > On 4/24/20 9:55 AM, numans@ovn.org wrote:
> > > > From: Numan Siddique <numans@ovn.org>
> > > >
> > > > The commit [1] - 28097d5adb95("Fix tcp_reset action handling")
fixed an issue
> > > > with tcp_reset OVN action. In order to fix that issue, this commit
added
> > > > logical flows to skip all the TCP RST packets from conntrack.
> > > > Ideally it should have skipped only the TCP RST packets generated by
> > > > ovn-controller from conntrack. Since all the TCP RST packets are
> > > > skipped from conntrack, the connections in conntrack remain in
> > > > ESTABLISHED state even if the client/server sends TCP RST to close
the
> > > > connection.  And these entries live for a long time and this is
> > > > causing performance issues as reported in the BZ.
> > > >
> > > > This patch reverts the logical flows added in [1] and modifies the
inner
> > > > actions of tcp_reset in the ingress logical switch pipeline
> > > > from - "tcp_reset { outport <-> inport; output; }"
> > > > to "tcp_reset { output <-> inport; next(pipeline=egress,table=5);
}".
> > > > This causes the packet to resubmit to the egress table
ls_out_qos_mark
> > > > skipping the egress ACL stage. Prior to this packet, next action was
> > > > not allowing a resubmit from ingress to egress pipeline. This patch
> > > > relaxes this limitation.
> > > >
> > > > For the tcp_reset action in the egress logical switch pipeline,
this patch
> > > > modifies the inner action
> > > > from - "tcp_reset { outport <-> inport;
next(pipeline=ingress,table=0); }"
> > > > to - "tcp_reset { outport <-> inport;
next(pipeline=ingress,table=19); }".
> > > > This causes the packet to enter the ingress table ls_in_l2_lkup.
> > > >
> > > > We don't see similar conntrack leaks with UDP. Although there is an
issue
> > > > with the acl reject action for UDP packets. When ovn-controller
generates icmp
> > > > destination unreachable packet, it doesn't get delivered. And the
IP checksum is
> > > > incorrect in this packet. A follow up patch will fix these issues.
> > > >
> > > > [1] - 28097d5adb95("Fix tcp_reset action handling")
> > > >
> > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1819785
> > > > Co-Authored-by: Tim Rozet <trozet@redhat.com>
> > > > Signed-off-by: Tim Rozet <trozet@redhat.com>
> > > > Signed-off-by: Numan Siddique <numans@ovn.org>
> >
> > Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> >
> >
> > >
> > > Hi Numan,
> > >
> > > Looks good to me. Only a question inline regarding options for
avoiding
> > > hardcoding table IDs. Except for that:
> > >
> > > Acked-by: Dumitru Ceara <dceara@redhat.com>
> > >
>
> Thanks Dumitru and Lorenzo for the reviews. I applied this patch to
master.

Thanks Numan. Shall this be backported to 20.03?

>
> > > Thanks,
> > > Dumitru
> > >
> > > > ---
> > > >  lib/actions.c           |   6 +-
> > > >  northd/ovn-northd.8.xml |   8 ++
> > > >  northd/ovn-northd.c     |  14 ++--
> > > >  ovn-sb.xml              |  10 ++-
> > > >  tests/automake.mk       |   3 +-
> > > >  tests/ovn.at            |   6 +-
> > > >  tests/system-ovn.at     | 170
+++++++++++++++++++++++++++++++++++-----
> > > >  tests/test-tcp-rst.py   |  37 +++++++++
> > > >  8 files changed, 216 insertions(+), 38 deletions(-)
> > > >  create mode 100644 tests/test-tcp-rst.py
> > > >
> > > > diff --git a/lib/actions.c b/lib/actions.c
> > > > index c19813de0..91619c889 100644
> > > > --- a/lib/actions.c
> > > > +++ b/lib/actions.c
> > > > @@ -319,11 +319,7 @@ parse_NEXT(struct action_context *ctx)
> > > >          }
> > > >      }
> > > >
> > > > -    if (pipeline == OVNACT_P_EGRESS && ctx->pp->pipeline ==
OVNACT_P_INGRESS) {
> > > > -        lexer_error(ctx->lexer,
> > > > -                    "\"next\" action cannot advance from ingress
to egress "
> > > > -                    "pipeline (use \"output\" action instead)");
> > > > -    } else if (table >= ctx->pp->n_tables) {
> > > > +    if (table >= ctx->pp->n_tables) {
> > > >          lexer_error(ctx->lexer,
> > > >                      "\"next\" action cannot advance beyond table
%d.",
> > > >                      ctx->pp->n_tables - 1);
> > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > > > index efcc4b7fc..d39e259f6 100644
> > > > --- a/northd/ovn-northd.8.xml
> > > > +++ b/northd/ovn-northd.8.xml
> > > > @@ -373,6 +373,14 @@
> > > >          for new connections and <code>reg0[1] = 1; next;</code>
for existing
> > > >          connections.
> > > >        </li>
> > > > +      <li>
> > > > +        <code>reject</code> ACLs translate into logical
> > > > +        flows with the
> > > > +        <code>tcp_reset { output &lt;-&gt; inport;
> > > > +        next(pipeline=egress,table=5);}</code>
> > > > +        action for TCP connections and <code>icmp4/icmp6</code>
action
> > > > +        for UDP connections.
> > > > +      </li>
> > > >        <li>
> > > >          Other ACLs translate to <code>drop;</code> for new or
untracked
> > > >          connections and <code>ct_commit(ct_label=1/1);</code> for
known
> > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > > > index c4675bd68..f8ae155a2 100644
> > > > --- a/northd/ovn-northd.c
> > > > +++ b/northd/ovn-northd.c
> > > > @@ -4721,11 +4721,11 @@ build_pre_acls(struct ovn_datapath *od,
struct hmap *lflows)
> > > >           * unreachable packets. */
> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110,
> > > >                        "nd || nd_rs || nd_ra || icmp4.type == 3 || "
> > > > -                      "icmp6.type == 1 || (tcp && tcp.flags == 20)
|| "
> > > > +                      "icmp6.type == 1 || "
> > > >                        "(udp && udp.src == 546 && udp.dst == 547)",
"next;");
> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110,
> > > >                        "nd || nd_rs || nd_ra || icmp4.type == 3 || "
> > > > -                      "icmp6.type == 1 || (tcp && tcp.flags == 20)
||"
> > > > +                      "icmp6.type == 1 || "
> > > >                        "(udp && udp.src == 546 && udp.dst == 547)",
"next;");
> > > >
> > > >          /* Ingress and Egress Pre-ACL Table (Priority 100).
> > > > @@ -4838,11 +4838,11 @@ build_pre_lb(struct ovn_datapath *od,
struct hmap *lflows,
> > > >      /* Do not send ND packets to conntrack */
> > > >      ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 110,
> > > >                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> > > > -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> > > > +                  "icmp6.type == 1",
> > > >                    "next;");
> > > >      ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110,
> > > >                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> > > > -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> > > > +                  "icmp6.type == 1",
> > > >                    "next;");
> > > >
> > > >      /* Do not send service monitor packets to conntrack. */
> > > > @@ -4988,7 +4988,8 @@ build_reject_acl_rules(struct ovn_datapath
*od, struct hmap *lflows,
> > > >      ds_put_format(&actions, "reg0 = 0; "
> > > >                    "eth.dst <-> eth.src; ip4.dst <-> ip4.src; "
> > > >                    "tcp_reset { outport <-> inport; %s };",
> > > > -                  ingress ? "output;" :
"next(pipeline=ingress,table=0);");
> > > > +                  ingress ? "next(pipeline=egress,table=5);"
> > > > +                          : "next(pipeline=ingress,table=19);");
> > >
> > > Do you think there's a way we could avoid hardcoding the table
numbers?
> > > I'm thinking of the case when a stage is added/deleted and the
> > > ls_out_qos_mark/"ls_in_l2_lkup" end up having different table ids. I
> > > don't think a macro definition is enough as that still needs updating
> > > every time a table is added/deleted.
>
> I agree.  Hardcoding table numbers are prone to regressions later in this
> case. As I discussed offline, I think it's better to have a macro defined
> for each table whose value gets generated.
>
> I'll work on a follow up patch to have this.
>
> Thanks
> Numan
>
> > >
> > > >      ovn_lflow_add_with_hint(lflows, od, stage,
> > > >                              acl->priority + OVN_ACL_PRI_OFFSET +
10,
> > > >                              ds_cstr(&match), ds_cstr(&actions),
stage_hint);
> > > > @@ -5002,7 +5003,8 @@ build_reject_acl_rules(struct ovn_datapath
*od, struct hmap *lflows,
> > > >      ds_put_format(&actions, "reg0 = 0; "
> > > >                    "eth.dst <-> eth.src; ip6.dst <-> ip6.src; "
> > > >                    "tcp_reset { outport <-> inport; %s };",
> > > > -                  ingress ? "output;" :
"next(pipeline=ingress,table=0);");
> > > > +                  ingress ? "next(pipeline=egress,table=5);"
> > > > +                          : "next(pipeline=ingress,table=19);");
> > >
> > > Same question as above.
> > >
> > > >      ovn_lflow_add_with_hint(lflows, od, stage,
> > > >                              acl->priority + OVN_ACL_PRI_OFFSET +
10,
> > > >                              ds_cstr(&match), ds_cstr(&actions),
stage_hint);
> > > > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > > > index 5f8da534c..3aa7cd4da 100644
> > > > --- a/ovn-sb.xml
> > > > +++ b/ovn-sb.xml
> > > > @@ -1112,10 +1112,12 @@
> > > >            <var>pipeline</var> as a subroutine.  The default
<var>table</var> is
> > > >            just after the current one.  If <var>pipeline</var> is
specified, it
> > > >            may be <code>ingress</code> or <code>egress</code>; the
default
> > > > -          <var>pipeline</var> is the one currently executing.
Actions in the
> > > > -          ingress pipeline may not use <code>next</code> to jump
into the
> > > > -          egress pipeline (use the <code>output</code> instead),
but
> > > > -          transitions in the opposite direction are allowed.
> > > > +          <var>pipeline</var> is the one currently executing.
Actions in the
> > > > +          both ingress and egress pipeline can use
<code>next</code> to jump
> > > > +          across the other pipeline.  Actions in the ingress
pipeline should
> > > > +          use <code>next</code> to jump into the specific table of
egress
> > > > +          pipeline only if it is certain that the packets are
local and not
> > > > +          tunnelled and wants to skip certain stages in the packet
processing.
> > > >          </dd>
> > > >
> > > >          <dt><code><var>field</var> =
<var>constant</var>;</code></dt>
> > > > diff --git a/tests/automake.mk b/tests/automake.mk
> > > > index 215fb432b..ed530dd77 100644
> > > > --- a/tests/automake.mk
> > > > +++ b/tests/automake.mk
> > > > @@ -205,7 +205,8 @@ tests_ovstest_LDADD = $(OVS_LIBDIR)/
libopenvswitch.la lib/libovn.la
> > > >  # Python tests.
> > > >  CHECK_PYFILES = \
> > > >       tests/test-l7.py \
> > > > -     tests/uuidfilt.py
> > > > +     tests/uuidfilt.py \
> > > > +     tests/test-tcp-rst.py
> > > >
> > > >  EXTRA_DIST += $(CHECK_PYFILES)
> > > >  PYCOV_CLEAN_FILES += $(CHECK_PYFILES:.py=.py,cover) .coverage
> > > > diff --git a/tests/ovn.at b/tests/ovn.at
> > > > index 2a7ee7528..b00670816 100644
> > > > --- a/tests/ovn.at
> > > > +++ b/tests/ovn.at
> > > > @@ -852,7 +852,11 @@ next(pipeline=ingress, table=11);
> > > >      encodes as resubmit(,19)
> > > >
> > > >  next(pipeline=egress);
> > > > -    "next" action cannot advance from ingress to egress pipeline
(use "output" action instead)
> > > > +    formats as next(pipeline=egress, table=11);
> > > > +    encodes as resubmit(,51)
> > > > +
> > > > +next(pipeline=egress, table=5);
> > > > +    encodes as resubmit(,45)
> > > >
> > > >  next(table=10);
> > > >      formats as next(10);
> > > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> > > > index bdb9768d2..3b11cf92b 100644
> > > > --- a/tests/system-ovn.at
> > > > +++ b/tests/system-ovn.at
> > > > @@ -3719,60 +3719,86 @@ start_daemon ovn-controller
> > > >
> > > >  ovn-nbctl ls-add sw0
> > > >
> > > > -ovn-nbctl lsp-add sw0 sw0-p1
> > > > -ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3
aef0::3"
> > > > -ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3
aef0::3"
> > > > +ovn-nbctl lsp-add sw0 sw0-p1-rej
> > > > +ovn-nbctl lsp-set-addresses sw0-p1-rej "50:54:00:00:00:03 10.0.0.3
aef0::3"
> > > > +ovn-nbctl lsp-set-port-security sw0-p1-rej "50:54:00:00:00:03
10.0.0.3 aef0::3"
> > > >
> > > > -ovn-nbctl lsp-add sw0 sw0-p2
> > > > -ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4
aef0::4"
> > > > -ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4
aef0::4"
> > > > +ovn-nbctl lsp-add sw0 sw0-p2-rej
> > > > +ovn-nbctl lsp-set-addresses sw0-p2-rej "50:54:00:00:00:04 10.0.0.4
aef0::4"
> > > > +ovn-nbctl lsp-set-port-security sw0-p2-rej "50:54:00:00:00:04
10.0.0.4 aef0::4"
> > > > +
> > > > +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\"
&& tcp && tcp.dst == 80" reject
> > > > +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\"
&& ip6 && tcp && tcp.dst == 80" reject
> > > > +
> > > > +# Create port group and ACLs for sw0 ports.
> > > > +ovn-nbctl pg-add pg0_drop sw0-p1-rej sw0-p2-rej
> > > > +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop &&
ip" drop
> > > > +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop &&
ip" drop
> > > > +
> > > > +ovn-nbctl pg-add pg0 sw0-p1-rej sw0-p2-rej
> > > > +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4"
allow-related
> > > > +ovn-nbctl --log acl-add pg0 from-lport 1004 "inport == @pg0 && ip
&& tcp && tcp.dst == 80" reject
> > > >
> > > > -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\"
&& tcp && tcp.dst == 80" reject
> > > > -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\"
&& ip6 && tcp && tcp.dst == 80" reject
> > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
ip4.src == 0.0.0.0/0 && icmp4" allow-related
> > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 82" allow-related
> > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
ip4.src == 0.0.0.0/0 && udp && udp.dst == 82" allow-related
> > > > +ovn-nbctl --log acl-add pg0 to-lport 1004 "inport == @pg0 && ip &&
tcp && tcp.dst == 84" reject
> > > >
> > > >  OVN_POPULATE_ARP
> > > >  ovn-nbctl --wait=hv sync
> > > >
> > > > -ADD_NAMESPACES(sw0-p1)
> > > > -ADD_VETH(sw0-p1, sw0-p1, br-int, "10.0.0.3/24",
"50:54:00:00:00:03", \
> > > > +ADD_NAMESPACES(sw0-p1-rej)
> > > > +ADD_VETH(sw0-p1-rej, sw0-p1-rej, br-int, "10.0.0.3/24",
"50:54:00:00:00:03", \
> > > >           "10.0.0.1")
> > > >
> > > > -ADD_NAMESPACES(sw0-p2)
> > > > -ADD_VETH(sw0-p2, sw0-p2, br-int, "10.0.0.4/24",
"50:54:00:00:00:04", \
> > > > +ADD_NAMESPACES(sw0-p2-rej)
> > > > +ADD_VETH(sw0-p2-rej, sw0-p2-rej, br-int, "10.0.0.4/24",
"50:54:00:00:00:04", \
> > > >           "10.0.0.1")
> > > >
> > > > -NS_CHECK_EXEC([sw0-p1], [ip a a aef0::3/64 dev sw0-p1], [0])
> > > > -NS_CHECK_EXEC([sw0-p2], [ip a a aef0::4/64 dev sw0-p2], [0])
> > > > +NS_CHECK_EXEC([sw0-p1-rej], [ip a a aef0::3/64 dev sw0-p1-rej],
[0])
> > > > +NS_CHECK_EXEC([sw0-p2-rej], [ip a a aef0::4/64 dev sw0-p2-rej],
[0])
> > > >
> > > > -# Capture packets in sw0-p1.
> > > > -NS_CHECK_EXEC([sw0-p1], [tcpdump -n -c 2 -i sw0-p1 tcp port 80 >
sw0-p1-ip4.pcap &], [0])
> > > > +# Capture packets in sw0-p1-rej.
> > > > +NS_CHECK_EXEC([sw0-p1-rej], [tcpdump -n -c 2 -i sw0-p1-rej tcp
port 80 > sw0-p1-rej-ip4.pcap &], [0])
> > > >  sleep 1
> > > >
> > > > -NS_CHECK_EXEC([sw0-p1], [nc 10.0.0.4 80], [1], [],
> > > > +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 80], [1], [],
> > > >  [dnl
> > > >  Ncat: Connection refused.
> > > >  ])
> > > >
> > > >  OVS_WAIT_UNTIL([
> > > > -    total=`cat sw0-p1-ip4.pcap |  wc -l`
> > > > +    total=`cat sw0-p1-rej-ip4.pcap |  wc -l`
> > > >      echo "total = $total"
> > > >      test "${total}" = "2"
> > > >  ])
> > > >
> > > > +# Now send traffic to port 84
> > > > +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 84], [1], [],
> > > > +[dnl
> > > > +Ncat: Connection refused.
> > > > +])
> > > > +
> > > > +AT_CHECK([
> > > > +    n_pkt=$(ovs-ofctl dump-flows br-int table=44 | grep -v
n_packets=0 | \
> > > > +grep controller | grep tp_dst=84 -c)
> > > > +    test $n_pkt -eq 1
> > > > +])
> > > > +
> > > >  # Without this sleep, test case fails intermittently.
> > > >  sleep 3
> > > >
> > > > -NS_CHECK_EXEC([sw0-p2], [tcpdump -n -c 2 -i sw0-p2 tcp port 80 >
sw0-p2-ip6.pcap &], [0])
> > > > +NS_CHECK_EXEC([sw0-p2-rej], [tcpdump -n -c 2 -i sw0-p2-rej tcp
port 80 > sw0-p2-rej-ip6.pcap &], [0])
> > > >
> > > >  sleep 1
> > > >
> > > > -NS_CHECK_EXEC([sw0-p2], [nc -6 aef0::3 80], [1], [],
> > > > +NS_CHECK_EXEC([sw0-p2-rej], [nc -6 aef0::3 80], [1], [],
> > > >  [dnl
> > > >  Ncat: Connection refused.
> > > >  ])
> > > >
> > > >  OVS_WAIT_UNTIL([
> > > > -    total=`cat sw0-p2-ip6.pcap |  wc -l`
> > > > +    total=`cat sw0-p2-rej-ip6.pcap |  wc -l`
> > > >      echo "total = $total"
> > > >      test "${total}" = "2"
> > > >  ])
> > > > @@ -3936,3 +3962,105 @@ as
> > > >  OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
> > > >  /.*terminating with signal 15.*/d"])
> > > >  AT_CLEANUP
> > > > +
> > > > +# Tests that when an established connection sends TCP reset,
> > > > +# the conntrack entry is not in established state.
> > > > +AT_SETUP([ovn -- conntrack TCP reset])
> > > > +AT_KEYWORDS([conntrack])
> > > > +ovn_start
> > > > +
> > > > +OVS_TRAFFIC_VSWITCHD_START()
> > > > +ADD_BR([br-int])
> > > > +
> > > > +# Set external-ids in br-int needed for ovn-controller
> > > > +ovs-vsctl \
> > > > +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> > > > +        -- set Open_vSwitch .
external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> > > > +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> > > > +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> > > > +        -- set bridge br-int fail-mode=secure
other-config:disable-in-band=true
> > > > +
> > > > +# Start ovn-controller
> > > > +start_daemon ovn-controller
> > > > +
> > > > +ovn-nbctl ls-add sw0
> > > > +
> > > > +ovn-nbctl lsp-add sw0 rst-p1
> > > > +ovn-nbctl lsp-set-addresses rst-p1 "50:54:00:00:00:03"
> > > > +ovn-nbctl lsp-set-port-security rst-p1 "50:54:00:00:00:03"
> > > > +
> > > > +ovn-nbctl lsp-add sw0 rst-p2
> > > > +ovn-nbctl lsp-set-addresses rst-p2 "50:54:00:00:00:04 10.0.0.4"
> > > > +ovn-nbctl lsp-set-port-security rst-p2 "50:54:00:00:00:04 10.0.0.4"
> > > > +
> > > > +# Create port group and ACLs for sw0 ports.
> > > > +ovn-nbctl pg-add pg0_drop rst-p1 rst-p2
> > > > +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop &&
ip" drop
> > > > +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop &&
ip" drop
> > > > +
> > > > +ovn-nbctl pg-add pg0 rst-p1 rst-p2
> > > > +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4"
allow-related
> > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
ip4.src == 0.0.0.0/0 && icmp4" allow-related
> > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 80" allow-related
> > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
ip4.src == 0.0.0.0/0 && udp && udp.dst == 80" allow-related
> > > > +
> > > > +# Create a logical router and attach to logical switch.
> > > > +ovn-nbctl lr-add lr0
> > > > +ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> > > > +ovn-nbctl lsp-add sw0 sw0-lr0
> > > > +ovn-nbctl lsp-set-type sw0-lr0 router
> > > > +ovn-nbctl lsp-set-addresses sw0-lr0 router
> > > > +ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> > > > +
> > > > +ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80
> > > > +ovn-nbctl --wait=sb ls-lb-add sw0 lb1
> > > > +ovn-nbctl --wait=sb lr-lb-add lr0 lb1
> > > > +
> > > > +OVN_POPULATE_ARP
> > > > +ovn-nbctl --wait=hv sync
> > > > +
> > > > +ADD_NAMESPACES(rst-p1)
> > > > +ADD_VETH(rst-p1, rst-p1, br-int, "10.0.0.3/24",
"50:54:00:00:00:03", \
> > > > +         "10.0.0.1")
> > > > +
> > > > +ADD_NAMESPACES(rst-p2)
> > > > +ADD_VETH(rst-p2, rst-p2, br-int, "10.0.0.4/24",
"50:54:00:00:00:04", \
> > > > +         "10.0.0.1")
> > > > +
> > > > +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p1) = xup])
> > > > +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p2) = xup])
> > > > +
> > > > +# Start webservers in 'rst-p1'.
> > > > +OVS_START_L7([rst-p1], [http])
> > > > +
> > > > +NS_CHECK_EXEC([rst-p2], [$PYTHON $srcdir/test-tcp-rst.py
--dst-port 80 --dst-ip 10.0.0.10])
> > > > +
> > > > +# When tcp reset is sent, conntrack entry should be in the state -
CLOSED or CLOSING.
> > > > +# But there is a bug where tcp reset packet was not sent to the
conntrack.
> > > > +# This test case checks that the tcp reset packet is sent to
conntrack
> > > > +# and the state is not in established state.
> > > > +AT_CHECK([
> > > > +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep
10.0.0.10 | grep state=ESTABLISHED -c)
> > > > +    test $ct_est_count -eq 0
> > > > +
> > > > +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep
10.0.0.10 | grep state=CLOS -c)
> > > > +    test $ct_est_count -eq 1
> > > > +])
> > > > +
> > > > +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> > > > +
> > > > +as ovn-sb
> > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > > > +
> > > > +as ovn-nb
> > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > > > +
> > > > +as northd
> > > > +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> > > > +
> > > > +as
> > > > +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> > > > +/connection dropped.*/d
> > > > +/Service monitor not found.*/d"])
> > > > +
> > > > +AT_CLEANUP
> > > > diff --git a/tests/test-tcp-rst.py b/tests/test-tcp-rst.py
> > > > new file mode 100644
> > > > index 000000000..6f96c5706
> > > > --- /dev/null
> > > > +++ b/tests/test-tcp-rst.py
> > > > @@ -0,0 +1,37 @@
> > > > +#!/usr/bin/env python3
> > > > +# Copyright (c) 2020 Red Hat, Inc.
> > > > +#
> > > > +# Licensed under the Apache License, Version 2.0 (the "License");
> > > > +# you may not use this file except in compliance with the License.
> > > > +# You may obtain a copy of the License at:
> > > > +#
> > > > +#     http://www.apache.org/licenses/LICENSE-2.0
> > > > +#
> > > > +# Unless required by applicable law or agreed to in writing,
software
> > > > +# distributed under the License is distributed on an "AS IS" BASIS,
> > > > +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
> > > > +# See the License for the specific language governing permissions
and
> > > > +# limitations under the License.
> > > > +
> > > > +# Simple python script which connects to tcp server and then
> > > > +# resets the connection.
> > > > +import argparse
> > > > +import socket
> > > > +import sys
> > > > +import struct
> > > > +import time
> > > > +
> > > > +parser = argparse.ArgumentParser(description='')
> > > > +parser.add_argument("--src-port", type=int, default=11337,
help="source port to use")
> > > > +parser.add_argument("--dst-port", type=int, help="dst port to use")
> > > > +parser.add_argument("--dst-ip", help="server ip to use")
> > > > +args = parser.parse_args()
> > > > +sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> > > > +server_address = (args.dst_ip, args.dst_port)
> > > > +sock.bind(('0.0.0.0', args.src_port))
> > > > +sock.connect(server_address)
> > > > +l_onoff = 1
> > > > +l_linger = 0
> > > > +time.sleep(1)
> > > > +sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER,
struct.pack('ii', l_onoff, l_linger))
> > > > +sock.close()
> > > >
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Numan Siddique April 29, 2020, 7:54 a.m. UTC | #5
On Wed, Apr 29, 2020 at 12:21 PM Han Zhou <hzhou@ovn.org> wrote:

> On Mon, Apr 27, 2020 at 8:41 AM Numan Siddique <numans@ovn.org> wrote:
> >
> > On Mon, Apr 27, 2020 at 8:40 PM Lorenzo Bianconi
> > <lorenzo.bianconi@redhat.com> wrote:
> > >
> > > >
> > > > On 4/24/20 9:55 AM, numans@ovn.org wrote:
> > > > > From: Numan Siddique <numans@ovn.org>
> > > > >
> > > > > The commit [1] - 28097d5adb95("Fix tcp_reset action handling")
> fixed an issue
> > > > > with tcp_reset OVN action. In order to fix that issue, this commit
> added
> > > > > logical flows to skip all the TCP RST packets from conntrack.
> > > > > Ideally it should have skipped only the TCP RST packets generated
> by
> > > > > ovn-controller from conntrack. Since all the TCP RST packets are
> > > > > skipped from conntrack, the connections in conntrack remain in
> > > > > ESTABLISHED state even if the client/server sends TCP RST to close
> the
> > > > > connection.  And these entries live for a long time and this is
> > > > > causing performance issues as reported in the BZ.
> > > > >
> > > > > This patch reverts the logical flows added in [1] and modifies the
> inner
> > > > > actions of tcp_reset in the ingress logical switch pipeline
> > > > > from - "tcp_reset { outport <-> inport; output; }"
> > > > > to "tcp_reset { output <-> inport; next(pipeline=egress,table=5);
> }".
> > > > > This causes the packet to resubmit to the egress table
> ls_out_qos_mark
> > > > > skipping the egress ACL stage. Prior to this packet, next action
> was
> > > > > not allowing a resubmit from ingress to egress pipeline. This patch
> > > > > relaxes this limitation.
> > > > >
> > > > > For the tcp_reset action in the egress logical switch pipeline,
> this patch
> > > > > modifies the inner action
> > > > > from - "tcp_reset { outport <-> inport;
> next(pipeline=ingress,table=0); }"
> > > > > to - "tcp_reset { outport <-> inport;
> next(pipeline=ingress,table=19); }".
> > > > > This causes the packet to enter the ingress table ls_in_l2_lkup.
> > > > >
> > > > > We don't see similar conntrack leaks with UDP. Although there is an
> issue
> > > > > with the acl reject action for UDP packets. When ovn-controller
> generates icmp
> > > > > destination unreachable packet, it doesn't get delivered. And the
> IP checksum is
> > > > > incorrect in this packet. A follow up patch will fix these issues.
> > > > >
> > > > > [1] - 28097d5adb95("Fix tcp_reset action handling")
> > > > >
> > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1819785
> > > > > Co-Authored-by: Tim Rozet <trozet@redhat.com>
> > > > > Signed-off-by: Tim Rozet <trozet@redhat.com>
> > > > > Signed-off-by: Numan Siddique <numans@ovn.org>
> > >
> > > Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> > >
> > >
> > > >
> > > > Hi Numan,
> > > >
> > > > Looks good to me. Only a question inline regarding options for
> avoiding
> > > > hardcoding table IDs. Except for that:
> > > >
> > > > Acked-by: Dumitru Ceara <dceara@redhat.com>
> > > >
> >
> > Thanks Dumitru and Lorenzo for the reviews. I applied this patch to
> master.
>
> Thanks Numan. Shall this be backported to 20.03?
>

I forgot to backport to 20.03 earlier when I applied to master. I
backported it yesterday. Forgot to  update here.

Thanks
Numan.


> >
> > > > Thanks,
> > > > Dumitru
> > > >
> > > > > ---
> > > > >  lib/actions.c           |   6 +-
> > > > >  northd/ovn-northd.8.xml |   8 ++
> > > > >  northd/ovn-northd.c     |  14 ++--
> > > > >  ovn-sb.xml              |  10 ++-
> > > > >  tests/automake.mk       |   3 +-
> > > > >  tests/ovn.at            |   6 +-
> > > > >  tests/system-ovn.at     | 170
> +++++++++++++++++++++++++++++++++++-----
> > > > >  tests/test-tcp-rst.py   |  37 +++++++++
> > > > >  8 files changed, 216 insertions(+), 38 deletions(-)
> > > > >  create mode 100644 tests/test-tcp-rst.py
> > > > >
> > > > > diff --git a/lib/actions.c b/lib/actions.c
> > > > > index c19813de0..91619c889 100644
> > > > > --- a/lib/actions.c
> > > > > +++ b/lib/actions.c
> > > > > @@ -319,11 +319,7 @@ parse_NEXT(struct action_context *ctx)
> > > > >          }
> > > > >      }
> > > > >
> > > > > -    if (pipeline == OVNACT_P_EGRESS && ctx->pp->pipeline ==
> OVNACT_P_INGRESS) {
> > > > > -        lexer_error(ctx->lexer,
> > > > > -                    "\"next\" action cannot advance from ingress
> to egress "
> > > > > -                    "pipeline (use \"output\" action instead)");
> > > > > -    } else if (table >= ctx->pp->n_tables) {
> > > > > +    if (table >= ctx->pp->n_tables) {
> > > > >          lexer_error(ctx->lexer,
> > > > >                      "\"next\" action cannot advance beyond table
> %d.",
> > > > >                      ctx->pp->n_tables - 1);
> > > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > > > > index efcc4b7fc..d39e259f6 100644
> > > > > --- a/northd/ovn-northd.8.xml
> > > > > +++ b/northd/ovn-northd.8.xml
> > > > > @@ -373,6 +373,14 @@
> > > > >          for new connections and <code>reg0[1] = 1; next;</code>
> for existing
> > > > >          connections.
> > > > >        </li>
> > > > > +      <li>
> > > > > +        <code>reject</code> ACLs translate into logical
> > > > > +        flows with the
> > > > > +        <code>tcp_reset { output &lt;-&gt; inport;
> > > > > +        next(pipeline=egress,table=5);}</code>
> > > > > +        action for TCP connections and <code>icmp4/icmp6</code>
> action
> > > > > +        for UDP connections.
> > > > > +      </li>
> > > > >        <li>
> > > > >          Other ACLs translate to <code>drop;</code> for new or
> untracked
> > > > >          connections and <code>ct_commit(ct_label=1/1);</code> for
> known
> > > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > > > > index c4675bd68..f8ae155a2 100644
> > > > > --- a/northd/ovn-northd.c
> > > > > +++ b/northd/ovn-northd.c
> > > > > @@ -4721,11 +4721,11 @@ build_pre_acls(struct ovn_datapath *od,
> struct hmap *lflows)
> > > > >           * unreachable packets. */
> > > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110,
> > > > >                        "nd || nd_rs || nd_ra || icmp4.type == 3 ||
> "
> > > > > -                      "icmp6.type == 1 || (tcp && tcp.flags == 20)
> || "
> > > > > +                      "icmp6.type == 1 || "
> > > > >                        "(udp && udp.src == 546 && udp.dst == 547)",
> "next;");
> > > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110,
> > > > >                        "nd || nd_rs || nd_ra || icmp4.type == 3 ||
> "
> > > > > -                      "icmp6.type == 1 || (tcp && tcp.flags == 20)
> ||"
> > > > > +                      "icmp6.type == 1 || "
> > > > >                        "(udp && udp.src == 546 && udp.dst == 547)",
> "next;");
> > > > >
> > > > >          /* Ingress and Egress Pre-ACL Table (Priority 100).
> > > > > @@ -4838,11 +4838,11 @@ build_pre_lb(struct ovn_datapath *od,
> struct hmap *lflows,
> > > > >      /* Do not send ND packets to conntrack */
> > > > >      ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 110,
> > > > >                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> > > > > -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> > > > > +                  "icmp6.type == 1",
> > > > >                    "next;");
> > > > >      ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110,
> > > > >                    "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
> > > > > -                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
> > > > > +                  "icmp6.type == 1",
> > > > >                    "next;");
> > > > >
> > > > >      /* Do not send service monitor packets to conntrack. */
> > > > > @@ -4988,7 +4988,8 @@ build_reject_acl_rules(struct ovn_datapath
> *od, struct hmap *lflows,
> > > > >      ds_put_format(&actions, "reg0 = 0; "
> > > > >                    "eth.dst <-> eth.src; ip4.dst <-> ip4.src; "
> > > > >                    "tcp_reset { outport <-> inport; %s };",
> > > > > -                  ingress ? "output;" :
> "next(pipeline=ingress,table=0);");
> > > > > +                  ingress ? "next(pipeline=egress,table=5);"
> > > > > +                          : "next(pipeline=ingress,table=19);");
> > > >
> > > > Do you think there's a way we could avoid hardcoding the table
> numbers?
> > > > I'm thinking of the case when a stage is added/deleted and the
> > > > ls_out_qos_mark/"ls_in_l2_lkup" end up having different table ids. I
> > > > don't think a macro definition is enough as that still needs updating
> > > > every time a table is added/deleted.
> >
> > I agree.  Hardcoding table numbers are prone to regressions later in this
> > case. As I discussed offline, I think it's better to have a macro defined
> > for each table whose value gets generated.
> >
> > I'll work on a follow up patch to have this.
> >
> > Thanks
> > Numan
> >
> > > >
> > > > >      ovn_lflow_add_with_hint(lflows, od, stage,
> > > > >                              acl->priority + OVN_ACL_PRI_OFFSET +
> 10,
> > > > >                              ds_cstr(&match), ds_cstr(&actions),
> stage_hint);
> > > > > @@ -5002,7 +5003,8 @@ build_reject_acl_rules(struct ovn_datapath
> *od, struct hmap *lflows,
> > > > >      ds_put_format(&actions, "reg0 = 0; "
> > > > >                    "eth.dst <-> eth.src; ip6.dst <-> ip6.src; "
> > > > >                    "tcp_reset { outport <-> inport; %s };",
> > > > > -                  ingress ? "output;" :
> "next(pipeline=ingress,table=0);");
> > > > > +                  ingress ? "next(pipeline=egress,table=5);"
> > > > > +                          : "next(pipeline=ingress,table=19);");
> > > >
> > > > Same question as above.
> > > >
> > > > >      ovn_lflow_add_with_hint(lflows, od, stage,
> > > > >                              acl->priority + OVN_ACL_PRI_OFFSET +
> 10,
> > > > >                              ds_cstr(&match), ds_cstr(&actions),
> stage_hint);
> > > > > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > > > > index 5f8da534c..3aa7cd4da 100644
> > > > > --- a/ovn-sb.xml
> > > > > +++ b/ovn-sb.xml
> > > > > @@ -1112,10 +1112,12 @@
> > > > >            <var>pipeline</var> as a subroutine.  The default
> <var>table</var> is
> > > > >            just after the current one.  If <var>pipeline</var> is
> specified, it
> > > > >            may be <code>ingress</code> or <code>egress</code>; the
> default
> > > > > -          <var>pipeline</var> is the one currently executing.
> Actions in the
> > > > > -          ingress pipeline may not use <code>next</code> to jump
> into the
> > > > > -          egress pipeline (use the <code>output</code> instead),
> but
> > > > > -          transitions in the opposite direction are allowed.
> > > > > +          <var>pipeline</var> is the one currently executing.
> Actions in the
> > > > > +          both ingress and egress pipeline can use
> <code>next</code> to jump
> > > > > +          across the other pipeline.  Actions in the ingress
> pipeline should
> > > > > +          use <code>next</code> to jump into the specific table of
> egress
> > > > > +          pipeline only if it is certain that the packets are
> local and not
> > > > > +          tunnelled and wants to skip certain stages in the packet
> processing.
> > > > >          </dd>
> > > > >
> > > > >          <dt><code><var>field</var> =
> <var>constant</var>;</code></dt>
> > > > > diff --git a/tests/automake.mk b/tests/automake.mk
> > > > > index 215fb432b..ed530dd77 100644
> > > > > --- a/tests/automake.mk
> > > > > +++ b/tests/automake.mk
> > > > > @@ -205,7 +205,8 @@ tests_ovstest_LDADD = $(OVS_LIBDIR)/
> libopenvswitch.la lib/libovn.la
> > > > >  # Python tests.
> > > > >  CHECK_PYFILES = \
> > > > >       tests/test-l7.py \
> > > > > -     tests/uuidfilt.py
> > > > > +     tests/uuidfilt.py \
> > > > > +     tests/test-tcp-rst.py
> > > > >
> > > > >  EXTRA_DIST += $(CHECK_PYFILES)
> > > > >  PYCOV_CLEAN_FILES += $(CHECK_PYFILES:.py=.py,cover) .coverage
> > > > > diff --git a/tests/ovn.at b/tests/ovn.at
> > > > > index 2a7ee7528..b00670816 100644
> > > > > --- a/tests/ovn.at
> > > > > +++ b/tests/ovn.at
> > > > > @@ -852,7 +852,11 @@ next(pipeline=ingress, table=11);
> > > > >      encodes as resubmit(,19)
> > > > >
> > > > >  next(pipeline=egress);
> > > > > -    "next" action cannot advance from ingress to egress pipeline
> (use "output" action instead)
> > > > > +    formats as next(pipeline=egress, table=11);
> > > > > +    encodes as resubmit(,51)
> > > > > +
> > > > > +next(pipeline=egress, table=5);
> > > > > +    encodes as resubmit(,45)
> > > > >
> > > > >  next(table=10);
> > > > >      formats as next(10);
> > > > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> > > > > index bdb9768d2..3b11cf92b 100644
> > > > > --- a/tests/system-ovn.at
> > > > > +++ b/tests/system-ovn.at
> > > > > @@ -3719,60 +3719,86 @@ start_daemon ovn-controller
> > > > >
> > > > >  ovn-nbctl ls-add sw0
> > > > >
> > > > > -ovn-nbctl lsp-add sw0 sw0-p1
> > > > > -ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3
> aef0::3"
> > > > > -ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3
> aef0::3"
> > > > > +ovn-nbctl lsp-add sw0 sw0-p1-rej
> > > > > +ovn-nbctl lsp-set-addresses sw0-p1-rej "50:54:00:00:00:03 10.0.0.3
> aef0::3"
> > > > > +ovn-nbctl lsp-set-port-security sw0-p1-rej "50:54:00:00:00:03
> 10.0.0.3 aef0::3"
> > > > >
> > > > > -ovn-nbctl lsp-add sw0 sw0-p2
> > > > > -ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4
> aef0::4"
> > > > > -ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4
> aef0::4"
> > > > > +ovn-nbctl lsp-add sw0 sw0-p2-rej
> > > > > +ovn-nbctl lsp-set-addresses sw0-p2-rej "50:54:00:00:00:04 10.0.0.4
> aef0::4"
> > > > > +ovn-nbctl lsp-set-port-security sw0-p2-rej "50:54:00:00:00:04
> 10.0.0.4 aef0::4"
> > > > > +
> > > > > +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\"
> && tcp && tcp.dst == 80" reject
> > > > > +#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\"
> && ip6 && tcp && tcp.dst == 80" reject
> > > > > +
> > > > > +# Create port group and ACLs for sw0 ports.
> > > > > +ovn-nbctl pg-add pg0_drop sw0-p1-rej sw0-p2-rej
> > > > > +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop &&
> ip" drop
> > > > > +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop &&
> ip" drop
> > > > > +
> > > > > +ovn-nbctl pg-add pg0 sw0-p1-rej sw0-p2-rej
> > > > > +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4"
> allow-related
> > > > > +ovn-nbctl --log acl-add pg0 from-lport 1004 "inport == @pg0 && ip
> && tcp && tcp.dst == 80" reject
> > > > >
> > > > > -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\"
> && tcp && tcp.dst == 80" reject
> > > > > -ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\"
> && ip6 && tcp && tcp.dst == 80" reject
> > > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
> ip4.src == 0.0.0.0/0 && icmp4" allow-related
> > > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
> ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 82" allow-related
> > > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
> ip4.src == 0.0.0.0/0 && udp && udp.dst == 82" allow-related
> > > > > +ovn-nbctl --log acl-add pg0 to-lport 1004 "inport == @pg0 && ip &&
> tcp && tcp.dst == 84" reject
> > > > >
> > > > >  OVN_POPULATE_ARP
> > > > >  ovn-nbctl --wait=hv sync
> > > > >
> > > > > -ADD_NAMESPACES(sw0-p1)
> > > > > -ADD_VETH(sw0-p1, sw0-p1, br-int, "10.0.0.3/24",
> "50:54:00:00:00:03", \
> > > > > +ADD_NAMESPACES(sw0-p1-rej)
> > > > > +ADD_VETH(sw0-p1-rej, sw0-p1-rej, br-int, "10.0.0.3/24",
> "50:54:00:00:00:03", \
> > > > >           "10.0.0.1")
> > > > >
> > > > > -ADD_NAMESPACES(sw0-p2)
> > > > > -ADD_VETH(sw0-p2, sw0-p2, br-int, "10.0.0.4/24",
> "50:54:00:00:00:04", \
> > > > > +ADD_NAMESPACES(sw0-p2-rej)
> > > > > +ADD_VETH(sw0-p2-rej, sw0-p2-rej, br-int, "10.0.0.4/24",
> "50:54:00:00:00:04", \
> > > > >           "10.0.0.1")
> > > > >
> > > > > -NS_CHECK_EXEC([sw0-p1], [ip a a aef0::3/64 dev sw0-p1], [0])
> > > > > -NS_CHECK_EXEC([sw0-p2], [ip a a aef0::4/64 dev sw0-p2], [0])
> > > > > +NS_CHECK_EXEC([sw0-p1-rej], [ip a a aef0::3/64 dev sw0-p1-rej],
> [0])
> > > > > +NS_CHECK_EXEC([sw0-p2-rej], [ip a a aef0::4/64 dev sw0-p2-rej],
> [0])
> > > > >
> > > > > -# Capture packets in sw0-p1.
> > > > > -NS_CHECK_EXEC([sw0-p1], [tcpdump -n -c 2 -i sw0-p1 tcp port 80 >
> sw0-p1-ip4.pcap &], [0])
> > > > > +# Capture packets in sw0-p1-rej.
> > > > > +NS_CHECK_EXEC([sw0-p1-rej], [tcpdump -n -c 2 -i sw0-p1-rej tcp
> port 80 > sw0-p1-rej-ip4.pcap &], [0])
> > > > >  sleep 1
> > > > >
> > > > > -NS_CHECK_EXEC([sw0-p1], [nc 10.0.0.4 80], [1], [],
> > > > > +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 80], [1], [],
> > > > >  [dnl
> > > > >  Ncat: Connection refused.
> > > > >  ])
> > > > >
> > > > >  OVS_WAIT_UNTIL([
> > > > > -    total=`cat sw0-p1-ip4.pcap |  wc -l`
> > > > > +    total=`cat sw0-p1-rej-ip4.pcap |  wc -l`
> > > > >      echo "total = $total"
> > > > >      test "${total}" = "2"
> > > > >  ])
> > > > >
> > > > > +# Now send traffic to port 84
> > > > > +NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 84], [1], [],
> > > > > +[dnl
> > > > > +Ncat: Connection refused.
> > > > > +])
> > > > > +
> > > > > +AT_CHECK([
> > > > > +    n_pkt=$(ovs-ofctl dump-flows br-int table=44 | grep -v
> n_packets=0 | \
> > > > > +grep controller | grep tp_dst=84 -c)
> > > > > +    test $n_pkt -eq 1
> > > > > +])
> > > > > +
> > > > >  # Without this sleep, test case fails intermittently.
> > > > >  sleep 3
> > > > >
> > > > > -NS_CHECK_EXEC([sw0-p2], [tcpdump -n -c 2 -i sw0-p2 tcp port 80 >
> sw0-p2-ip6.pcap &], [0])
> > > > > +NS_CHECK_EXEC([sw0-p2-rej], [tcpdump -n -c 2 -i sw0-p2-rej tcp
> port 80 > sw0-p2-rej-ip6.pcap &], [0])
> > > > >
> > > > >  sleep 1
> > > > >
> > > > > -NS_CHECK_EXEC([sw0-p2], [nc -6 aef0::3 80], [1], [],
> > > > > +NS_CHECK_EXEC([sw0-p2-rej], [nc -6 aef0::3 80], [1], [],
> > > > >  [dnl
> > > > >  Ncat: Connection refused.
> > > > >  ])
> > > > >
> > > > >  OVS_WAIT_UNTIL([
> > > > > -    total=`cat sw0-p2-ip6.pcap |  wc -l`
> > > > > +    total=`cat sw0-p2-rej-ip6.pcap |  wc -l`
> > > > >      echo "total = $total"
> > > > >      test "${total}" = "2"
> > > > >  ])
> > > > > @@ -3936,3 +3962,105 @@ as
> > > > >  OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
> > > > >  /.*terminating with signal 15.*/d"])
> > > > >  AT_CLEANUP
> > > > > +
> > > > > +# Tests that when an established connection sends TCP reset,
> > > > > +# the conntrack entry is not in established state.
> > > > > +AT_SETUP([ovn -- conntrack TCP reset])
> > > > > +AT_KEYWORDS([conntrack])
> > > > > +ovn_start
> > > > > +
> > > > > +OVS_TRAFFIC_VSWITCHD_START()
> > > > > +ADD_BR([br-int])
> > > > > +
> > > > > +# Set external-ids in br-int needed for ovn-controller
> > > > > +ovs-vsctl \
> > > > > +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> > > > > +        -- set Open_vSwitch .
> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> > > > > +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> > > > > +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1
> \
> > > > > +        -- set bridge br-int fail-mode=secure
> other-config:disable-in-band=true
> > > > > +
> > > > > +# Start ovn-controller
> > > > > +start_daemon ovn-controller
> > > > > +
> > > > > +ovn-nbctl ls-add sw0
> > > > > +
> > > > > +ovn-nbctl lsp-add sw0 rst-p1
> > > > > +ovn-nbctl lsp-set-addresses rst-p1 "50:54:00:00:00:03"
> > > > > +ovn-nbctl lsp-set-port-security rst-p1 "50:54:00:00:00:03"
> > > > > +
> > > > > +ovn-nbctl lsp-add sw0 rst-p2
> > > > > +ovn-nbctl lsp-set-addresses rst-p2 "50:54:00:00:00:04 10.0.0.4"
> > > > > +ovn-nbctl lsp-set-port-security rst-p2 "50:54:00:00:00:04
> 10.0.0.4"
> > > > > +
> > > > > +# Create port group and ACLs for sw0 ports.
> > > > > +ovn-nbctl pg-add pg0_drop rst-p1 rst-p2
> > > > > +ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop &&
> ip" drop
> > > > > +ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop &&
> ip" drop
> > > > > +
> > > > > +ovn-nbctl pg-add pg0 rst-p1 rst-p2
> > > > > +ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4"
> allow-related
> > > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
> ip4.src == 0.0.0.0/0 && icmp4" allow-related
> > > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
> ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 80" allow-related
> > > > > +ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 &&
> ip4.src == 0.0.0.0/0 && udp && udp.dst == 80" allow-related
> > > > > +
> > > > > +# Create a logical router and attach to logical switch.
> > > > > +ovn-nbctl lr-add lr0
> > > > > +ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> > > > > +ovn-nbctl lsp-add sw0 sw0-lr0
> > > > > +ovn-nbctl lsp-set-type sw0-lr0 router
> > > > > +ovn-nbctl lsp-set-addresses sw0-lr0 router
> > > > > +ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> > > > > +
> > > > > +ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80
> > > > > +ovn-nbctl --wait=sb ls-lb-add sw0 lb1
> > > > > +ovn-nbctl --wait=sb lr-lb-add lr0 lb1
> > > > > +
> > > > > +OVN_POPULATE_ARP
> > > > > +ovn-nbctl --wait=hv sync
> > > > > +
> > > > > +ADD_NAMESPACES(rst-p1)
> > > > > +ADD_VETH(rst-p1, rst-p1, br-int, "10.0.0.3/24",
> "50:54:00:00:00:03", \
> > > > > +         "10.0.0.1")
> > > > > +
> > > > > +ADD_NAMESPACES(rst-p2)
> > > > > +ADD_VETH(rst-p2, rst-p2, br-int, "10.0.0.4/24",
> "50:54:00:00:00:04", \
> > > > > +         "10.0.0.1")
> > > > > +
> > > > > +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p1) = xup])
> > > > > +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p2) = xup])
> > > > > +
> > > > > +# Start webservers in 'rst-p1'.
> > > > > +OVS_START_L7([rst-p1], [http])
> > > > > +
> > > > > +NS_CHECK_EXEC([rst-p2], [$PYTHON $srcdir/test-tcp-rst.py
> --dst-port 80 --dst-ip 10.0.0.10])
> > > > > +
> > > > > +# When tcp reset is sent, conntrack entry should be in the state -
> CLOSED or CLOSING.
> > > > > +# But there is a bug where tcp reset packet was not sent to the
> conntrack.
> > > > > +# This test case checks that the tcp reset packet is sent to
> conntrack
> > > > > +# and the state is not in established state.
> > > > > +AT_CHECK([
> > > > > +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep
> 10.0.0.10 | grep state=ESTABLISHED -c)
> > > > > +    test $ct_est_count -eq 0
> > > > > +
> > > > > +    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep
> 10.0.0.10 | grep state=CLOS -c)
> > > > > +    test $ct_est_count -eq 1
> > > > > +])
> > > > > +
> > > > > +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> > > > > +
> > > > > +as ovn-sb
> > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > > > > +
> > > > > +as ovn-nb
> > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > > > > +
> > > > > +as northd
> > > > > +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> > > > > +
> > > > > +as
> > > > > +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> > > > > +/connection dropped.*/d
> > > > > +/Service monitor not found.*/d"])
> > > > > +
> > > > > +AT_CLEANUP
> > > > > diff --git a/tests/test-tcp-rst.py b/tests/test-tcp-rst.py
> > > > > new file mode 100644
> > > > > index 000000000..6f96c5706
> > > > > --- /dev/null
> > > > > +++ b/tests/test-tcp-rst.py
> > > > > @@ -0,0 +1,37 @@
> > > > > +#!/usr/bin/env python3
> > > > > +# Copyright (c) 2020 Red Hat, Inc.
> > > > > +#
> > > > > +# Licensed under the Apache License, Version 2.0 (the "License");
> > > > > +# you may not use this file except in compliance with the License.
> > > > > +# You may obtain a copy of the License at:
> > > > > +#
> > > > > +#     http://www.apache.org/licenses/LICENSE-2.0
> > > > > +#
> > > > > +# Unless required by applicable law or agreed to in writing,
> software
> > > > > +# distributed under the License is distributed on an "AS IS"
> BASIS,
> > > > > +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > > > > +# See the License for the specific language governing permissions
> and
> > > > > +# limitations under the License.
> > > > > +
> > > > > +# Simple python script which connects to tcp server and then
> > > > > +# resets the connection.
> > > > > +import argparse
> > > > > +import socket
> > > > > +import sys
> > > > > +import struct
> > > > > +import time
> > > > > +
> > > > > +parser = argparse.ArgumentParser(description='')
> > > > > +parser.add_argument("--src-port", type=int, default=11337,
> help="source port to use")
> > > > > +parser.add_argument("--dst-port", type=int, help="dst port to
> use")
> > > > > +parser.add_argument("--dst-ip", help="server ip to use")
> > > > > +args = parser.parse_args()
> > > > > +sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> > > > > +server_address = (args.dst_ip, args.dst_port)
> > > > > +sock.bind(('0.0.0.0', args.src_port))
> > > > > +sock.connect(server_address)
> > > > > +l_onoff = 1
> > > > > +l_linger = 0
> > > > > +time.sleep(1)
> > > > > +sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER,
> struct.pack('ii', l_onoff, l_linger))
> > > > > +sock.close()
> > > > >
> > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > dev@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
diff mbox series

Patch

diff --git a/lib/actions.c b/lib/actions.c
index c19813de0..91619c889 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -319,11 +319,7 @@  parse_NEXT(struct action_context *ctx)
         }
     }
 
-    if (pipeline == OVNACT_P_EGRESS && ctx->pp->pipeline == OVNACT_P_INGRESS) {
-        lexer_error(ctx->lexer,
-                    "\"next\" action cannot advance from ingress to egress "
-                    "pipeline (use \"output\" action instead)");
-    } else if (table >= ctx->pp->n_tables) {
+    if (table >= ctx->pp->n_tables) {
         lexer_error(ctx->lexer,
                     "\"next\" action cannot advance beyond table %d.",
                     ctx->pp->n_tables - 1);
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index efcc4b7fc..d39e259f6 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -373,6 +373,14 @@ 
         for new connections and <code>reg0[1] = 1; next;</code> for existing
         connections.
       </li>
+      <li>
+        <code>reject</code> ACLs translate into logical
+        flows with the
+        <code>tcp_reset { output &lt;-&gt; inport;
+        next(pipeline=egress,table=5);}</code>
+        action for TCP connections and <code>icmp4/icmp6</code> action
+        for UDP connections.
+      </li>
       <li>
         Other ACLs translate to <code>drop;</code> for new or untracked
         connections and <code>ct_commit(ct_label=1/1);</code> for known
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index c4675bd68..f8ae155a2 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -4721,11 +4721,11 @@  build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
          * unreachable packets. */
         ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110,
                       "nd || nd_rs || nd_ra || icmp4.type == 3 || "
-                      "icmp6.type == 1 || (tcp && tcp.flags == 20) || "
+                      "icmp6.type == 1 || "
                       "(udp && udp.src == 546 && udp.dst == 547)", "next;");
         ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110,
                       "nd || nd_rs || nd_ra || icmp4.type == 3 || "
-                      "icmp6.type == 1 || (tcp && tcp.flags == 20) ||"
+                      "icmp6.type == 1 || "
                       "(udp && udp.src == 546 && udp.dst == 547)", "next;");
 
         /* Ingress and Egress Pre-ACL Table (Priority 100).
@@ -4838,11 +4838,11 @@  build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
     /* Do not send ND packets to conntrack */
     ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 110,
                   "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
-                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
+                  "icmp6.type == 1",
                   "next;");
     ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110,
                   "nd || nd_rs || nd_ra || icmp4.type == 3 ||"
-                  "icmp6.type == 1 || (tcp && tcp.flags == 20)",
+                  "icmp6.type == 1",
                   "next;");
 
     /* Do not send service monitor packets to conntrack. */
@@ -4988,7 +4988,8 @@  build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
     ds_put_format(&actions, "reg0 = 0; "
                   "eth.dst <-> eth.src; ip4.dst <-> ip4.src; "
                   "tcp_reset { outport <-> inport; %s };",
-                  ingress ? "output;" : "next(pipeline=ingress,table=0);");
+                  ingress ? "next(pipeline=egress,table=5);"
+                          : "next(pipeline=ingress,table=19);");
     ovn_lflow_add_with_hint(lflows, od, stage,
                             acl->priority + OVN_ACL_PRI_OFFSET + 10,
                             ds_cstr(&match), ds_cstr(&actions), stage_hint);
@@ -5002,7 +5003,8 @@  build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
     ds_put_format(&actions, "reg0 = 0; "
                   "eth.dst <-> eth.src; ip6.dst <-> ip6.src; "
                   "tcp_reset { outport <-> inport; %s };",
-                  ingress ? "output;" : "next(pipeline=ingress,table=0);");
+                  ingress ? "next(pipeline=egress,table=5);"
+                          : "next(pipeline=ingress,table=19);");
     ovn_lflow_add_with_hint(lflows, od, stage,
                             acl->priority + OVN_ACL_PRI_OFFSET + 10,
                             ds_cstr(&match), ds_cstr(&actions), stage_hint);
diff --git a/ovn-sb.xml b/ovn-sb.xml
index 5f8da534c..3aa7cd4da 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -1112,10 +1112,12 @@ 
           <var>pipeline</var> as a subroutine.  The default <var>table</var> is
           just after the current one.  If <var>pipeline</var> is specified, it
           may be <code>ingress</code> or <code>egress</code>; the default
-          <var>pipeline</var> is the one currently executing.  Actions in the
-          ingress pipeline may not use <code>next</code> to jump into the
-          egress pipeline (use the <code>output</code> instead), but
-          transitions in the opposite direction are allowed.
+          <var>pipeline</var> is the one currently executing. Actions in the
+          both ingress and egress pipeline can use <code>next</code> to jump
+          across the other pipeline.  Actions in the ingress pipeline should
+          use <code>next</code> to jump into the specific table of egress
+          pipeline only if it is certain that the packets are local and not
+          tunnelled and wants to skip certain stages in the packet processing.
         </dd>
 
         <dt><code><var>field</var> = <var>constant</var>;</code></dt>
diff --git a/tests/automake.mk b/tests/automake.mk
index 215fb432b..ed530dd77 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -205,7 +205,8 @@  tests_ovstest_LDADD = $(OVS_LIBDIR)/libopenvswitch.la lib/libovn.la
 # Python tests.
 CHECK_PYFILES = \
 	tests/test-l7.py \
-	tests/uuidfilt.py
+	tests/uuidfilt.py \
+	tests/test-tcp-rst.py
 
 EXTRA_DIST += $(CHECK_PYFILES)
 PYCOV_CLEAN_FILES += $(CHECK_PYFILES:.py=.py,cover) .coverage
diff --git a/tests/ovn.at b/tests/ovn.at
index 2a7ee7528..b00670816 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -852,7 +852,11 @@  next(pipeline=ingress, table=11);
     encodes as resubmit(,19)
 
 next(pipeline=egress);
-    "next" action cannot advance from ingress to egress pipeline (use "output" action instead)
+    formats as next(pipeline=egress, table=11);
+    encodes as resubmit(,51)
+
+next(pipeline=egress, table=5);
+    encodes as resubmit(,45)
 
 next(table=10);
     formats as next(10);
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index bdb9768d2..3b11cf92b 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -3719,60 +3719,86 @@  start_daemon ovn-controller
 
 ovn-nbctl ls-add sw0
 
-ovn-nbctl lsp-add sw0 sw0-p1
-ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
-ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
+ovn-nbctl lsp-add sw0 sw0-p1-rej
+ovn-nbctl lsp-set-addresses sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"
+ovn-nbctl lsp-set-port-security sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"
 
-ovn-nbctl lsp-add sw0 sw0-p2
-ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
-ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
+ovn-nbctl lsp-add sw0 sw0-p2-rej
+ovn-nbctl lsp-set-addresses sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"
+ovn-nbctl lsp-set-port-security sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"
+
+#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
+#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject
+
+# Create port group and ACLs for sw0 ports.
+ovn-nbctl pg-add pg0_drop sw0-p1-rej sw0-p2-rej
+ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
+ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop
+
+ovn-nbctl pg-add pg0 sw0-p1-rej sw0-p2-rej
+ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
+ovn-nbctl --log acl-add pg0 from-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 80" reject
 
-ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
-ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject
+ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
+ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 82" allow-related
+ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 82" allow-related
+ovn-nbctl --log acl-add pg0 to-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 84" reject
 
 OVN_POPULATE_ARP
 ovn-nbctl --wait=hv sync
 
-ADD_NAMESPACES(sw0-p1)
-ADD_VETH(sw0-p1, sw0-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
+ADD_NAMESPACES(sw0-p1-rej)
+ADD_VETH(sw0-p1-rej, sw0-p1-rej, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
          "10.0.0.1")
 
-ADD_NAMESPACES(sw0-p2)
-ADD_VETH(sw0-p2, sw0-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
+ADD_NAMESPACES(sw0-p2-rej)
+ADD_VETH(sw0-p2-rej, sw0-p2-rej, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
          "10.0.0.1")
 
-NS_CHECK_EXEC([sw0-p1], [ip a a aef0::3/64 dev sw0-p1], [0])
-NS_CHECK_EXEC([sw0-p2], [ip a a aef0::4/64 dev sw0-p2], [0])
+NS_CHECK_EXEC([sw0-p1-rej], [ip a a aef0::3/64 dev sw0-p1-rej], [0])
+NS_CHECK_EXEC([sw0-p2-rej], [ip a a aef0::4/64 dev sw0-p2-rej], [0])
 
-# Capture packets in sw0-p1.
-NS_CHECK_EXEC([sw0-p1], [tcpdump -n -c 2 -i sw0-p1 tcp port 80 > sw0-p1-ip4.pcap &], [0])
+# Capture packets in sw0-p1-rej.
+NS_CHECK_EXEC([sw0-p1-rej], [tcpdump -n -c 2 -i sw0-p1-rej tcp port 80 > sw0-p1-rej-ip4.pcap &], [0])
 sleep 1
 
-NS_CHECK_EXEC([sw0-p1], [nc 10.0.0.4 80], [1], [],
+NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 80], [1], [],
 [dnl
 Ncat: Connection refused.
 ])
 
 OVS_WAIT_UNTIL([
-    total=`cat sw0-p1-ip4.pcap |  wc -l`
+    total=`cat sw0-p1-rej-ip4.pcap |  wc -l`
     echo "total = $total"
     test "${total}" = "2"
 ])
 
+# Now send traffic to port 84
+NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 84], [1], [],
+[dnl
+Ncat: Connection refused.
+])
+
+AT_CHECK([
+    n_pkt=$(ovs-ofctl dump-flows br-int table=44 | grep -v n_packets=0 | \
+grep controller | grep tp_dst=84 -c)
+    test $n_pkt -eq 1
+])
+
 # Without this sleep, test case fails intermittently.
 sleep 3
 
-NS_CHECK_EXEC([sw0-p2], [tcpdump -n -c 2 -i sw0-p2 tcp port 80 > sw0-p2-ip6.pcap &], [0])
+NS_CHECK_EXEC([sw0-p2-rej], [tcpdump -n -c 2 -i sw0-p2-rej tcp port 80 > sw0-p2-rej-ip6.pcap &], [0])
 
 sleep 1
 
-NS_CHECK_EXEC([sw0-p2], [nc -6 aef0::3 80], [1], [],
+NS_CHECK_EXEC([sw0-p2-rej], [nc -6 aef0::3 80], [1], [],
 [dnl
 Ncat: Connection refused.
 ])
 
 OVS_WAIT_UNTIL([
-    total=`cat sw0-p2-ip6.pcap |  wc -l`
+    total=`cat sw0-p2-rej-ip6.pcap |  wc -l`
     echo "total = $total"
     test "${total}" = "2"
 ])
@@ -3936,3 +3962,105 @@  as
 OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
 /.*terminating with signal 15.*/d"])
 AT_CLEANUP
+
+# Tests that when an established connection sends TCP reset,
+# the conntrack entry is not in established state.
+AT_SETUP([ovn -- conntrack TCP reset])
+AT_KEYWORDS([conntrack])
+ovn_start
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-int])
+
+# Set external-ids in br-int needed for ovn-controller
+ovs-vsctl \
+        -- set Open_vSwitch . external-ids:system-id=hv1 \
+        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
+        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
+        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
+        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
+
+# Start ovn-controller
+start_daemon ovn-controller
+
+ovn-nbctl ls-add sw0
+
+ovn-nbctl lsp-add sw0 rst-p1
+ovn-nbctl lsp-set-addresses rst-p1 "50:54:00:00:00:03"
+ovn-nbctl lsp-set-port-security rst-p1 "50:54:00:00:00:03"
+
+ovn-nbctl lsp-add sw0 rst-p2
+ovn-nbctl lsp-set-addresses rst-p2 "50:54:00:00:00:04 10.0.0.4"
+ovn-nbctl lsp-set-port-security rst-p2 "50:54:00:00:00:04 10.0.0.4"
+
+# Create port group and ACLs for sw0 ports.
+ovn-nbctl pg-add pg0_drop rst-p1 rst-p2
+ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
+ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop
+
+ovn-nbctl pg-add pg0 rst-p1 rst-p2
+ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
+ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
+ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 80" allow-related
+ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 80" allow-related
+
+# Create a logical router and attach to logical switch.
+ovn-nbctl lr-add lr0
+ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
+ovn-nbctl lsp-add sw0 sw0-lr0
+ovn-nbctl lsp-set-type sw0-lr0 router
+ovn-nbctl lsp-set-addresses sw0-lr0 router
+ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
+
+ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80
+ovn-nbctl --wait=sb ls-lb-add sw0 lb1
+ovn-nbctl --wait=sb lr-lb-add lr0 lb1
+
+OVN_POPULATE_ARP
+ovn-nbctl --wait=hv sync
+
+ADD_NAMESPACES(rst-p1)
+ADD_VETH(rst-p1, rst-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
+         "10.0.0.1")
+
+ADD_NAMESPACES(rst-p2)
+ADD_VETH(rst-p2, rst-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
+         "10.0.0.1")
+
+OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p1) = xup])
+OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p2) = xup])
+
+# Start webservers in 'rst-p1'.
+OVS_START_L7([rst-p1], [http])
+
+NS_CHECK_EXEC([rst-p2], [$PYTHON $srcdir/test-tcp-rst.py --dst-port 80 --dst-ip 10.0.0.10])
+
+# When tcp reset is sent, conntrack entry should be in the state - CLOSED or CLOSING.
+# But there is a bug where tcp reset packet was not sent to the conntrack.
+# This test case checks that the tcp reset packet is sent to conntrack
+# and the state is not in established state.
+AT_CHECK([
+    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=ESTABLISHED -c)
+    test $ct_est_count -eq 0
+
+    ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=CLOS -c)
+    test $ct_est_count -eq 1
+])
+
+OVS_APP_EXIT_AND_WAIT([ovn-controller])
+
+as ovn-sb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as ovn-nb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as northd
+OVS_APP_EXIT_AND_WAIT([ovn-northd])
+
+as
+OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
+/connection dropped.*/d
+/Service monitor not found.*/d"])
+
+AT_CLEANUP
diff --git a/tests/test-tcp-rst.py b/tests/test-tcp-rst.py
new file mode 100644
index 000000000..6f96c5706
--- /dev/null
+++ b/tests/test-tcp-rst.py
@@ -0,0 +1,37 @@ 
+#!/usr/bin/env python3
+# Copyright (c) 2020 Red Hat, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Simple python script which connects to tcp server and then
+# resets the connection.
+import argparse
+import socket
+import sys
+import struct
+import time
+
+parser = argparse.ArgumentParser(description='')
+parser.add_argument("--src-port", type=int, default=11337, help="source port to use")
+parser.add_argument("--dst-port", type=int, help="dst port to use")
+parser.add_argument("--dst-ip", help="server ip to use")
+args = parser.parse_args()
+sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+server_address = (args.dst_ip, args.dst_port)
+sock.bind(('0.0.0.0', args.src_port))
+sock.connect(server_address)
+l_onoff = 1
+l_linger = 0
+time.sleep(1)
+sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))
+sock.close()