diff mbox series

[ovs-dev,v2] northd: Add ECMP symmetric replies for egress.

Message ID 20240718212926.3637396-1-numans@ovn.org
State Accepted
Headers show
Series [ovs-dev,v2] northd: Add ECMP symmetric replies for egress. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Numan Siddique July 18, 2024, 9:29 p.m. UTC
From: Numan Siddique <numans@ovn.org>

When there are ECMP symmetric static routes configured, OVN selects
one of the next hop for the traffic originated from within the
cluster.  For the subsequent packets to the same destination,
OVN may select a different next hop (which is fine).  But there can
be certain usecases, where the next hop entity can be stateful and
selecting the same next hop is desirable.

This patch address this usecase in the following way

   1.  For the first packet originating from the OVN logical port
       VIF, OVN selects a next hop 'A' and forwards the traffic to
       it.

   2.  When the reply traffic is received (either from next hop 'A'
       or any other next hop), it commits the connection in the
       DNAT zone of the logical router and saves the state in
       ct_label.ecmp_reply_eth and ct_label.ecmp_reply_port.
       Note that we already support this for the traffic
       originating from an ECMP route [1].  We are now extending
       the same for the traffic originating from the cluster towards
       the ECMP route.

    3. For the subsequent packets from the cluster, we select the
       next hop eth address and the port from the saved conntrack
       state.  This is straightforward as we anyway send the packet
       to the DNAT zone of the logical router.

Example: If a logical router lr0 is configured with the below
EMCP static routes

ovn-nbctl lr-route-list lr0
IPv4 Routes
Route Table <main>:
                0.0.0.0/0                172.20.0.1 dst-ip ecmp
                0.0.0.0/0                172.20.0.2 dst-ip ecmp ecmp-symmetric-reply

Before this patch, we were adding the below logical flows in the router
pipeline for the ECMP route handling:

-------------
table=10(lr_in_ecmp_stateful), priority=100  , match=(inport == "lr0-public" && ip4.src == 0.0.0.0/0 && !ct.rpl && (ct.new || ct.est)), action=(ct_commit { ct_label.ecmp_reply_eth = eth.src;  ct_mark.ecmp_reply_port = 3;}; next;)
table=14(lr_in_ip_routing   ), priority=10300, match=(ct.rpl && ct_mark.ecmp_reply_port == 3 && reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:11:22:00:ff:01; reg1 = 172.20.0.100; outport = "lr0-public"; next;)
table=16(lr_in_policy       ), priority=65535, match=(ct.rpl && ct_mark.ecmp_reply_port == 3), action=(next;)
table=20(lr_in_arp_resolve  ), priority=200  , match=(ct.rpl && ct_mark.ecmp_reply_port == 3), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[32..79]; pop(xxreg1); next;)
-------------

After this patch, we add the below logical flows:

--------------
table=10(lr_in_ecmp_stateful), priority=100  , match=(inport == "lr0-public" && ip4.src == 0.0.0.0/0 && (ct.new || ct.est)), action=(ct_commit { ct_label.ecmp_reply_eth = eth.src;  ct_mark.ecmp_reply_port = 3;}; next;)
table=14(lr_in_ip_routing   ), priority=10300, match=(ct_mark.ecmp_reply_port == 3 && reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:11:22:00:ff:01; reg1 = 172.20.0.100; outport = "lr0-public"; next;)
table=16(lr_in_policy       ), priority=65535, match=(ct_mark.ecmp_reply_port == 3), action=(next;)
table=20(lr_in_arp_resolve  ), priority=200  , match=(ct_mark.ecmp_reply_port == 3), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[32..79]; pop(xxreg1); next;)
--------------

[1] - 4fdca656857d ("Add ECMP symmetric replies.")
Reported-at: https://issues.redhat.com/browse/FDP-628
Signed-off-by: Numan Siddique <numans@ovn.org>
---

v1 -> v2
------
  - Addressed Ilya's comments - Fixed the failing system tests related
    to ECMP symmetric replies
  - And enhanced those tests to cover the scenario of packet originating
    from the VIF.

 northd/northd.c     |  4 ++--
 tests/ovn-northd.at | 26 ++++++++++++++++++-----
 tests/system-ovn.at | 50 +++++++++++++++++++++++++++++++++++----------
 3 files changed, 62 insertions(+), 18 deletions(-)

Comments

Mark Michelson July 26, 2024, 3:16 p.m. UTC | #1
Hi Numan,

The only issue I have with this patch is that I think it should have a 
change in ovn-nb.xml to mention that the ecmp_symmetric_reply option 
also now applies for egress-originated traffic.

Other than that,

Acked-by: Mark Michelson <mmichels@redhat.com>

On 7/18/24 17:29, numans@ovn.org wrote:
> From: Numan Siddique <numans@ovn.org>
> 
> When there are ECMP symmetric static routes configured, OVN selects
> one of the next hop for the traffic originated from within the
> cluster.  For the subsequent packets to the same destination,
> OVN may select a different next hop (which is fine).  But there can
> be certain usecases, where the next hop entity can be stateful and
> selecting the same next hop is desirable.
> 
> This patch address this usecase in the following way
> 
>     1.  For the first packet originating from the OVN logical port
>         VIF, OVN selects a next hop 'A' and forwards the traffic to
>         it.
> 
>     2.  When the reply traffic is received (either from next hop 'A'
>         or any other next hop), it commits the connection in the
>         DNAT zone of the logical router and saves the state in
>         ct_label.ecmp_reply_eth and ct_label.ecmp_reply_port.
>         Note that we already support this for the traffic
>         originating from an ECMP route [1].  We are now extending
>         the same for the traffic originating from the cluster towards
>         the ECMP route.
> 
>      3. For the subsequent packets from the cluster, we select the
>         next hop eth address and the port from the saved conntrack
>         state.  This is straightforward as we anyway send the packet
>         to the DNAT zone of the logical router.
> 
> Example: If a logical router lr0 is configured with the below
> EMCP static routes
> 
> ovn-nbctl lr-route-list lr0
> IPv4 Routes
> Route Table <main>:
>                  0.0.0.0/0                172.20.0.1 dst-ip ecmp
>                  0.0.0.0/0                172.20.0.2 dst-ip ecmp ecmp-symmetric-reply
> 
> Before this patch, we were adding the below logical flows in the router
> pipeline for the ECMP route handling:
> 
> -------------
> table=10(lr_in_ecmp_stateful), priority=100  , match=(inport == "lr0-public" && ip4.src == 0.0.0.0/0 && !ct.rpl && (ct.new || ct.est)), action=(ct_commit { ct_label.ecmp_reply_eth = eth.src;  ct_mark.ecmp_reply_port = 3;}; next;)
> table=14(lr_in_ip_routing   ), priority=10300, match=(ct.rpl && ct_mark.ecmp_reply_port == 3 && reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:11:22:00:ff:01; reg1 = 172.20.0.100; outport = "lr0-public"; next;)
> table=16(lr_in_policy       ), priority=65535, match=(ct.rpl && ct_mark.ecmp_reply_port == 3), action=(next;)
> table=20(lr_in_arp_resolve  ), priority=200  , match=(ct.rpl && ct_mark.ecmp_reply_port == 3), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[32..79]; pop(xxreg1); next;)
> -------------
> 
> After this patch, we add the below logical flows:
> 
> --------------
> table=10(lr_in_ecmp_stateful), priority=100  , match=(inport == "lr0-public" && ip4.src == 0.0.0.0/0 && (ct.new || ct.est)), action=(ct_commit { ct_label.ecmp_reply_eth = eth.src;  ct_mark.ecmp_reply_port = 3;}; next;)
> table=14(lr_in_ip_routing   ), priority=10300, match=(ct_mark.ecmp_reply_port == 3 && reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:11:22:00:ff:01; reg1 = 172.20.0.100; outport = "lr0-public"; next;)
> table=16(lr_in_policy       ), priority=65535, match=(ct_mark.ecmp_reply_port == 3), action=(next;)
> table=20(lr_in_arp_resolve  ), priority=200  , match=(ct_mark.ecmp_reply_port == 3), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[32..79]; pop(xxreg1); next;)
> --------------
> 
> [1] - 4fdca656857d ("Add ECMP symmetric replies.")
> Reported-at: https://issues.redhat.com/browse/FDP-628
> Signed-off-by: Numan Siddique <numans@ovn.org>
> ---
> 
> v1 -> v2
> ------
>    - Addressed Ilya's comments - Fixed the failing system tests related
>      to ECMP symmetric replies
>    - And enhanced those tests to cover the scenario of packet originating
>      from the VIF.
> 
>   northd/northd.c     |  4 ++--
>   tests/ovn-northd.at | 26 ++++++++++++++++++-----
>   tests/system-ovn.at | 50 +++++++++++++++++++++++++++++++++++----------
>   3 files changed, 62 insertions(+), 18 deletions(-)
> 
> diff --git a/northd/northd.c b/northd/northd.c
> index 6898daa00d..3227c093cb 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -10706,7 +10706,7 @@ add_ecmp_symmetric_reply_flows(struct lflow_table *lflows,
>        * NOTE: we purposely are not clearing match before this
>        * ds_put_cstr() call. The previous contents are needed.
>        */
> -    ds_put_cstr(&match, " && !ct.rpl && (ct.new || ct.est)");
> +    ds_put_cstr(&match, " && (ct.new || ct.est)");
>       ds_put_format(&actions,
>               "ct_commit { ct_label.ecmp_reply_eth = eth.src; "
>               " %s = %" PRId64 ";}; "
> @@ -10721,7 +10721,7 @@ add_ecmp_symmetric_reply_flows(struct lflow_table *lflows,
>        * for where to route the packet.
>        */
>       ds_put_format(&ecmp_reply,
> -                  "ct.rpl && %s == %"PRId64,
> +                  "%s == %"PRId64,
>                     ct_ecmp_reply_port_match, out_port->sb->tunnel_key);
>       ds_clear(&match);
>       ds_put_format(&match, "%s && %s", ds_cstr(&ecmp_reply),
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index a389d19886..98a5006cb5 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -6722,8 +6722,14 @@ check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.16
>   
>   ovn-sbctl dump-flows lr0 > lr0flows
>   
> -AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows |sort], [0], [dnl
> +AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
> +  table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> +  table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
> +  table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> +  table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> +  table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
>   ])
> +
>   AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
>     table=??(lr_in_ip_routing_ecmp), priority=0    , match=(1), action=(drop;)
>     table=??(lr_in_ip_routing_ecmp), priority=150  , match=(reg8[[0..15]] == 0), action=(next;)
> @@ -6732,7 +6738,12 @@ AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dn
>   check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.20
>   
>   ovn-sbctl dump-flows lr0 > lr0flows
> -AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows | ovn_strip_lflows], [0], [dnl
> +AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
> +  table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> +  table=??(lr_in_ip_routing   ), priority=10300, match=(ct_label.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;)
> +  table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
> +  table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> +  table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
>     table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);)
>   ])
>   AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl
> @@ -6751,7 +6762,7 @@ AT_CHECK([grep -e "lr_in_defrag" lr0flows | ovn_strip_lflows], [0], [dnl
>   dnl The chassis was created with other_config:ct-no-masked-label=false, the flows
>   dnl should be using ct_label.ecmp_reply_port.
>   AT_CHECK([grep -e "lr_in_arp_resolve.*ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
> -  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct.rpl && ct_label.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
> +  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct_label.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
>   ])
>   
>   dnl Simulate an ovn-controller upgrade to a version that supports
> @@ -6761,14 +6772,19 @@ check ovn-sbctl set chassis ch1 other_config:ct-no-masked-label=true
>   check ovn-nbctl --wait=sb sync
>   ovn-sbctl dump-flows lr0 > lr0flows
>   AT_CHECK([grep -e "lr_in_arp_resolve.*ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
> -  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct.rpl && ct_mark.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
> +  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct_mark.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
>   ])
>   
>   # add ecmp route with wrong nexthop
>   check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.1.20
>   
>   ovn-sbctl dump-flows lr0 > lr0flows
> -AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows | ovn_strip_lflows], [0], [dnl
> +AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
> +  table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> +  table=??(lr_in_ip_routing   ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;)
> +  table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
> +  table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> +  table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
>     table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);)
>   ])
>   AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index c24ede7c50..7fadd6a19b 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -6180,11 +6180,10 @@ tcp,orig=(src=172.16.0.1,dst=10.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
>   # Ensure datapaths show conntrack states as expected
>   # Like with conntrack entries, we shouldn't try to predict
>   # port binding tunnel keys. So omit them from expected labels.
> -ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x401020400000000/.*)'
> -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
>   2
>   ])
> -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
>   2
>   ])
>   
> @@ -6203,10 +6202,10 @@ NS_CHECK_EXEC([bob1], [ping -q -c 3 -i 0.3 -w 2 10.0.0.2 | FORMAT_PING], \
>   [0], [dnl
>   3 packets transmitted, 3 received, 0% packet loss, time 0ms
>   ])
> -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
>   2
>   ])
> -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
>   2
>   ])
>   
> @@ -6224,7 +6223,23 @@ OVS_WAIT_UNTIL([
>   test $(ovs-ofctl dump-flows br-int | grep -c 'table=OFTABLE_ECMP_NH, n_packets') -eq 0
>   ])
>   
> -ovs-ofctl dump-flows br-int
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +NS_CHECK_EXEC([bob1], [ip r d default via 172.16.0.2 dev bob1], [0])
> +NS_CHECK_EXEC([bob1], [ip r a default via 172.16.0.3 dev bob1], [0])
> +
> +NETNS_DAEMONIZE([bob1], [nc -l -k 8080], [bob1.pid])
> +NS_CHECK_EXEC([alice1], [nc -z 172.16.0.1 8080], [0])
> +NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 172.16.0.1 | FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep 0x401020500000000 | FORMAT_CT(172.16.0.1) | \
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
> +sed -e 's/mark=[[0-9]]*/mark=<cleared>/' | sort], [0], [dnl
> +tcp,orig=(src=172.16.0.1,dst=10.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=10.0.0.2,dst=172.16.0.1,sport=<cleared>,dport=<cleared>),zone=<cleared>,mark=<cleared>,labels=0x401020500000000,protoinfo=(state=<cleared>)
> +])
>   
>   OVS_APP_EXIT_AND_WAIT([ovn-controller])
>   
> @@ -6373,11 +6388,11 @@ NS_CHECK_EXEC([bob1], [ping -q -c 3 -i 0.3 -w 2 fd01::2 | FORMAT_PING], \
>   # Ensure datapaths show conntrack states as expected
>   # Like with conntrack entries, we shouldn't try to predict
>   # port binding tunnel keys. So omit them from expected labels.
> -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
>   2
>   ])
>   
> -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
>   2
>   ])
>   
> @@ -6405,10 +6420,10 @@ NS_CHECK_EXEC([bob1], [ping -q -c 3 -i 0.3 -w 2 fd01::2 | FORMAT_PING], \
>   3 packets transmitted, 3 received, 0% packet loss, time 0ms
>   ])
>   
> -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
>   2
>   ])
> -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
>   2
>   ])
>   
> @@ -6427,7 +6442,20 @@ OVS_WAIT_UNTIL([
>   test $(ovs-ofctl dump-flows br-int | grep -c 'table=OFTABLE_ECMP_NH, n_packets') -eq 0
>   ])
>   
> -ovs-ofctl dump-flows br-int
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +NETNS_DAEMONIZE([bob1], [nc -6 -l -k 8080], [bob1.pid])
> +NS_CHECK_EXEC([alice1], [nc -6 -z fd07::1 8080], [0])
> +NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 fd07::1 | FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep 0x1001020400000000 | FORMAT_CT(fd07::1) | \
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
> +sed -e 's/mark=[[0-9]]*/mark=<cleared>/' | sort], [0], [dnl
> +tcp,orig=(src=fd07::1,dst=fd01::2,sport=<cleared>,dport=<cleared>),reply=(src=fd01::2,dst=fd07::1,sport=<cleared>,dport=<cleared>),zone=<cleared>,mark=<cleared>,labels=0x1001020400000000,protoinfo=(state=<cleared>)
> +])
>   
>   OVS_APP_EXIT_AND_WAIT([ovn-controller])
>
Numan Siddique July 30, 2024, 2:19 a.m. UTC | #2
On Fri, Jul 26, 2024 at 11:16 AM Mark Michelson <mmichels@redhat.com> wrote:
>
> Hi Numan,
>
> The only issue I have with this patch is that I think it should have a
> change in ovn-nb.xml to mention that the ecmp_symmetric_reply option
> also now applies for egress-originated traffic.
>
> Other than that,
>
> Acked-by: Mark Michelson <mmichels@redhat.com>

Thanks for the reviews.

I applied this patch to the main with the below changes in ovn-nb.xml.

--------------------
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 6a2b964ace..a4362a4ef1 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -3686,12 +3686,29 @@ or
       </column>

       <column name="options" key="ecmp_symmetric_reply">
-        If true, then new traffic that arrives over this route will have
-        its reply traffic bypass ECMP route selection and will be sent out
-        this route instead. Note that this option overrides any rules set
-        in the <ref table="Logical_Router_policy" /> table. This option
-        only works on gateway routers (routers that have
-        <ref column="options" key="chassis" table="Logical_Router" /> set).
+        <p>
+          If true, then
+        </p>
+        <ul>
+          <li>
+            New ingress-originated traffic that arrives over this route will
+            have its reply traffic bypass ECMP route selection and will be
+            sent out this route instead.
+          </li>
+
+          <li>
+            For the egress-originated traffic, the ingress reply traffic route
+            gets saved.  And the subsequent traffic will bypass ECMP route
+            selection and instead be sent out the same route.
+          </li>
+        </ul>
+
+        <p>
+          Note that this option overrides any rules set in the
+          <ref table="Logical_Router_policy" /> table.  This option only works
+          on gateway routers (routers that have
+          <ref column="options" key="chassis" table="Logical_Router" /> set).
+        </p>
       </column>

       <column name="options" key="origin">

----------------------------------------------------

Numan

>
> On 7/18/24 17:29, numans@ovn.org wrote:
> > From: Numan Siddique <numans@ovn.org>
> >
> > When there are ECMP symmetric static routes configured, OVN selects
> > one of the next hop for the traffic originated from within the
> > cluster.  For the subsequent packets to the same destination,
> > OVN may select a different next hop (which is fine).  But there can
> > be certain usecases, where the next hop entity can be stateful and
> > selecting the same next hop is desirable.
> >
> > This patch address this usecase in the following way
> >
> >     1.  For the first packet originating from the OVN logical port
> >         VIF, OVN selects a next hop 'A' and forwards the traffic to
> >         it.
> >
> >     2.  When the reply traffic is received (either from next hop 'A'
> >         or any other next hop), it commits the connection in the
> >         DNAT zone of the logical router and saves the state in
> >         ct_label.ecmp_reply_eth and ct_label.ecmp_reply_port.
> >         Note that we already support this for the traffic
> >         originating from an ECMP route [1].  We are now extending
> >         the same for the traffic originating from the cluster towards
> >         the ECMP route.
> >
> >      3. For the subsequent packets from the cluster, we select the
> >         next hop eth address and the port from the saved conntrack
> >         state.  This is straightforward as we anyway send the packet
> >         to the DNAT zone of the logical router.
> >
> > Example: If a logical router lr0 is configured with the below
> > EMCP static routes
> >
> > ovn-nbctl lr-route-list lr0
> > IPv4 Routes
> > Route Table <main>:
> >                  0.0.0.0/0                172.20.0.1 dst-ip ecmp
> >                  0.0.0.0/0                172.20.0.2 dst-ip ecmp ecmp-symmetric-reply
> >
> > Before this patch, we were adding the below logical flows in the router
> > pipeline for the ECMP route handling:
> >
> > -------------
> > table=10(lr_in_ecmp_stateful), priority=100  , match=(inport == "lr0-public" && ip4.src == 0.0.0.0/0 && !ct.rpl && (ct.new || ct.est)), action=(ct_commit { ct_label.ecmp_reply_eth = eth.src;  ct_mark.ecmp_reply_port = 3;}; next;)
> > table=14(lr_in_ip_routing   ), priority=10300, match=(ct.rpl && ct_mark.ecmp_reply_port == 3 && reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:11:22:00:ff:01; reg1 = 172.20.0.100; outport = "lr0-public"; next;)
> > table=16(lr_in_policy       ), priority=65535, match=(ct.rpl && ct_mark.ecmp_reply_port == 3), action=(next;)
> > table=20(lr_in_arp_resolve  ), priority=200  , match=(ct.rpl && ct_mark.ecmp_reply_port == 3), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[32..79]; pop(xxreg1); next;)
> > -------------
> >
> > After this patch, we add the below logical flows:
> >
> > --------------
> > table=10(lr_in_ecmp_stateful), priority=100  , match=(inport == "lr0-public" && ip4.src == 0.0.0.0/0 && (ct.new || ct.est)), action=(ct_commit { ct_label.ecmp_reply_eth = eth.src;  ct_mark.ecmp_reply_port = 3;}; next;)
> > table=14(lr_in_ip_routing   ), priority=10300, match=(ct_mark.ecmp_reply_port == 3 && reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:11:22:00:ff:01; reg1 = 172.20.0.100; outport = "lr0-public"; next;)
> > table=16(lr_in_policy       ), priority=65535, match=(ct_mark.ecmp_reply_port == 3), action=(next;)
> > table=20(lr_in_arp_resolve  ), priority=200  , match=(ct_mark.ecmp_reply_port == 3), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[32..79]; pop(xxreg1); next;)
> > --------------
> >
> > [1] - 4fdca656857d ("Add ECMP symmetric replies.")
> > Reported-at: https://issues.redhat.com/browse/FDP-628
> > Signed-off-by: Numan Siddique <numans@ovn.org>
> > ---
> >
> > v1 -> v2
> > ------
> >    - Addressed Ilya's comments - Fixed the failing system tests related
> >      to ECMP symmetric replies
> >    - And enhanced those tests to cover the scenario of packet originating
> >      from the VIF.
> >
> >   northd/northd.c     |  4 ++--
> >   tests/ovn-northd.at | 26 ++++++++++++++++++-----
> >   tests/system-ovn.at | 50 +++++++++++++++++++++++++++++++++++----------
> >   3 files changed, 62 insertions(+), 18 deletions(-)
> >
> > diff --git a/northd/northd.c b/northd/northd.c
> > index 6898daa00d..3227c093cb 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -10706,7 +10706,7 @@ add_ecmp_symmetric_reply_flows(struct lflow_table *lflows,
> >        * NOTE: we purposely are not clearing match before this
> >        * ds_put_cstr() call. The previous contents are needed.
> >        */
> > -    ds_put_cstr(&match, " && !ct.rpl && (ct.new || ct.est)");
> > +    ds_put_cstr(&match, " && (ct.new || ct.est)");
> >       ds_put_format(&actions,
> >               "ct_commit { ct_label.ecmp_reply_eth = eth.src; "
> >               " %s = %" PRId64 ";}; "
> > @@ -10721,7 +10721,7 @@ add_ecmp_symmetric_reply_flows(struct lflow_table *lflows,
> >        * for where to route the packet.
> >        */
> >       ds_put_format(&ecmp_reply,
> > -                  "ct.rpl && %s == %"PRId64,
> > +                  "%s == %"PRId64,
> >                     ct_ecmp_reply_port_match, out_port->sb->tunnel_key);
> >       ds_clear(&match);
> >       ds_put_format(&match, "%s && %s", ds_cstr(&ecmp_reply),
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index a389d19886..98a5006cb5 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -6722,8 +6722,14 @@ check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.16
> >
> >   ovn-sbctl dump-flows lr0 > lr0flows
> >
> > -AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows |sort], [0], [dnl
> > +AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
> > +  table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> > +  table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
> > +  table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> > +  table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> > +  table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> >   ])
> > +
> >   AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
> >     table=??(lr_in_ip_routing_ecmp), priority=0    , match=(1), action=(drop;)
> >     table=??(lr_in_ip_routing_ecmp), priority=150  , match=(reg8[[0..15]] == 0), action=(next;)
> > @@ -6732,7 +6738,12 @@ AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dn
> >   check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.20
> >
> >   ovn-sbctl dump-flows lr0 > lr0flows
> > -AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows | ovn_strip_lflows], [0], [dnl
> > +AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
> > +  table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> > +  table=??(lr_in_ip_routing   ), priority=10300, match=(ct_label.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;)
> > +  table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
> > +  table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> > +  table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> >     table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);)
> >   ])
> >   AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl
> > @@ -6751,7 +6762,7 @@ AT_CHECK([grep -e "lr_in_defrag" lr0flows | ovn_strip_lflows], [0], [dnl
> >   dnl The chassis was created with other_config:ct-no-masked-label=false, the flows
> >   dnl should be using ct_label.ecmp_reply_port.
> >   AT_CHECK([grep -e "lr_in_arp_resolve.*ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
> > -  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct.rpl && ct_label.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
> > +  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct_label.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
> >   ])
> >
> >   dnl Simulate an ovn-controller upgrade to a version that supports
> > @@ -6761,14 +6772,19 @@ check ovn-sbctl set chassis ch1 other_config:ct-no-masked-label=true
> >   check ovn-nbctl --wait=sb sync
> >   ovn-sbctl dump-flows lr0 > lr0flows
> >   AT_CHECK([grep -e "lr_in_arp_resolve.*ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
> > -  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct.rpl && ct_mark.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
> > +  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct_mark.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
> >   ])
> >
> >   # add ecmp route with wrong nexthop
> >   check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.1.20
> >
> >   ovn-sbctl dump-flows lr0 > lr0flows
> > -AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows | ovn_strip_lflows], [0], [dnl
> > +AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
> > +  table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> > +  table=??(lr_in_ip_routing   ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;)
> > +  table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
> > +  table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> > +  table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> >     table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);)
> >   ])
> >   AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl
> > diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> > index c24ede7c50..7fadd6a19b 100644
> > --- a/tests/system-ovn.at
> > +++ b/tests/system-ovn.at
> > @@ -6180,11 +6180,10 @@ tcp,orig=(src=172.16.0.1,dst=10.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
> >   # Ensure datapaths show conntrack states as expected
> >   # Like with conntrack entries, we shouldn't try to predict
> >   # port binding tunnel keys. So omit them from expected labels.
> > -ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x401020400000000/.*)'
> > -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
> >   2
> >   ])
> > -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
> >   2
> >   ])
> >
> > @@ -6203,10 +6202,10 @@ NS_CHECK_EXEC([bob1], [ping -q -c 3 -i 0.3 -w 2 10.0.0.2 | FORMAT_PING], \
> >   [0], [dnl
> >   3 packets transmitted, 3 received, 0% packet loss, time 0ms
> >   ])
> > -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
> >   2
> >   ])
> > -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
> >   2
> >   ])
> >
> > @@ -6224,7 +6223,23 @@ OVS_WAIT_UNTIL([
> >   test $(ovs-ofctl dump-flows br-int | grep -c 'table=OFTABLE_ECMP_NH, n_packets') -eq 0
> >   ])
> >
> > -ovs-ofctl dump-flows br-int
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +NS_CHECK_EXEC([bob1], [ip r d default via 172.16.0.2 dev bob1], [0])
> > +NS_CHECK_EXEC([bob1], [ip r a default via 172.16.0.3 dev bob1], [0])
> > +
> > +NETNS_DAEMONIZE([bob1], [nc -l -k 8080], [bob1.pid])
> > +NS_CHECK_EXEC([alice1], [nc -z 172.16.0.1 8080], [0])
> > +NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 172.16.0.1 | FORMAT_PING], \
> > +[0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep 0x401020500000000 | FORMAT_CT(172.16.0.1) | \
> > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
> > +sed -e 's/mark=[[0-9]]*/mark=<cleared>/' | sort], [0], [dnl
> > +tcp,orig=(src=172.16.0.1,dst=10.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=10.0.0.2,dst=172.16.0.1,sport=<cleared>,dport=<cleared>),zone=<cleared>,mark=<cleared>,labels=0x401020500000000,protoinfo=(state=<cleared>)
> > +])
> >
> >   OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >
> > @@ -6373,11 +6388,11 @@ NS_CHECK_EXEC([bob1], [ping -q -c 3 -i 0.3 -w 2 fd01::2 | FORMAT_PING], \
> >   # Ensure datapaths show conntrack states as expected
> >   # Like with conntrack entries, we shouldn't try to predict
> >   # port binding tunnel keys. So omit them from expected labels.
> > -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
> >   2
> >   ])
> >
> > -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
> >   2
> >   ])
> >
> > @@ -6405,10 +6420,10 @@ NS_CHECK_EXEC([bob1], [ping -q -c 3 -i 0.3 -w 2 fd01::2 | FORMAT_PING], \
> >   3 packets transmitted, 3 received, 0% packet loss, time 0ms
> >   ])
> >
> > -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
> >   2
> >   ])
> > -AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
> >   2
> >   ])
> >
> > @@ -6427,7 +6442,20 @@ OVS_WAIT_UNTIL([
> >   test $(ovs-ofctl dump-flows br-int | grep -c 'table=OFTABLE_ECMP_NH, n_packets') -eq 0
> >   ])
> >
> > -ovs-ofctl dump-flows br-int
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +NETNS_DAEMONIZE([bob1], [nc -6 -l -k 8080], [bob1.pid])
> > +NS_CHECK_EXEC([alice1], [nc -6 -z fd07::1 8080], [0])
> > +NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 fd07::1 | FORMAT_PING], \
> > +[0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep 0x1001020400000000 | FORMAT_CT(fd07::1) | \
> > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
> > +sed -e 's/mark=[[0-9]]*/mark=<cleared>/' | sort], [0], [dnl
> > +tcp,orig=(src=fd07::1,dst=fd01::2,sport=<cleared>,dport=<cleared>),reply=(src=fd01::2,dst=fd07::1,sport=<cleared>,dport=<cleared>),zone=<cleared>,mark=<cleared>,labels=0x1001020400000000,protoinfo=(state=<cleared>)
> > +])
> >
> >   OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index 6898daa00d..3227c093cb 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -10706,7 +10706,7 @@  add_ecmp_symmetric_reply_flows(struct lflow_table *lflows,
      * NOTE: we purposely are not clearing match before this
      * ds_put_cstr() call. The previous contents are needed.
      */
-    ds_put_cstr(&match, " && !ct.rpl && (ct.new || ct.est)");
+    ds_put_cstr(&match, " && (ct.new || ct.est)");
     ds_put_format(&actions,
             "ct_commit { ct_label.ecmp_reply_eth = eth.src; "
             " %s = %" PRId64 ";}; "
@@ -10721,7 +10721,7 @@  add_ecmp_symmetric_reply_flows(struct lflow_table *lflows,
      * for where to route the packet.
      */
     ds_put_format(&ecmp_reply,
-                  "ct.rpl && %s == %"PRId64,
+                  "%s == %"PRId64,
                   ct_ecmp_reply_port_match, out_port->sb->tunnel_key);
     ds_clear(&match);
     ds_put_format(&match, "%s && %s", ds_cstr(&ecmp_reply),
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index a389d19886..98a5006cb5 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -6722,8 +6722,14 @@  check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.16
 
 ovn-sbctl dump-flows lr0 > lr0flows
 
-AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows |sort], [0], [dnl
+AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
+  table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
+  table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
+  table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
+  table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
+  table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
 ])
+
 AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
   table=??(lr_in_ip_routing_ecmp), priority=0    , match=(1), action=(drop;)
   table=??(lr_in_ip_routing_ecmp), priority=150  , match=(reg8[[0..15]] == 0), action=(next;)
@@ -6732,7 +6738,12 @@  AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dn
 check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.20
 
 ovn-sbctl dump-flows lr0 > lr0flows
-AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows | ovn_strip_lflows], [0], [dnl
+AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
+  table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
+  table=??(lr_in_ip_routing   ), priority=10300, match=(ct_label.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;)
+  table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
+  table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
+  table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
   table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);)
 ])
 AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl
@@ -6751,7 +6762,7 @@  AT_CHECK([grep -e "lr_in_defrag" lr0flows | ovn_strip_lflows], [0], [dnl
 dnl The chassis was created with other_config:ct-no-masked-label=false, the flows
 dnl should be using ct_label.ecmp_reply_port.
 AT_CHECK([grep -e "lr_in_arp_resolve.*ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
-  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct.rpl && ct_label.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
+  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct_label.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
 ])
 
 dnl Simulate an ovn-controller upgrade to a version that supports
@@ -6761,14 +6772,19 @@  check ovn-sbctl set chassis ch1 other_config:ct-no-masked-label=true
 check ovn-nbctl --wait=sb sync
 ovn-sbctl dump-flows lr0 > lr0flows
 AT_CHECK([grep -e "lr_in_arp_resolve.*ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
-  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct.rpl && ct_mark.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
+  table=??(lr_in_arp_resolve  ), priority=200  , match=(ct_mark.ecmp_reply_port == 1), action=(push(xxreg1); xxreg1 = ct_label; eth.dst = xxreg1[[32..79]]; pop(xxreg1); next;)
 ])
 
 # add ecmp route with wrong nexthop
 check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.1.20
 
 ovn-sbctl dump-flows lr0 > lr0flows
-AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows | ovn_strip_lflows], [0], [dnl
+AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
+  table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
+  table=??(lr_in_ip_routing   ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;)
+  table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
+  table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
+  table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
   table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);)
 ])
 AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index c24ede7c50..7fadd6a19b 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -6180,11 +6180,10 @@  tcp,orig=(src=172.16.0.1,dst=10.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
 # Ensure datapaths show conntrack states as expected
 # Like with conntrack entries, we shouldn't try to predict
 # port binding tunnel keys. So omit them from expected labels.
-ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x401020400000000/.*)'
-AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
 2
 ])
-AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
 2
 ])
 
@@ -6203,10 +6202,10 @@  NS_CHECK_EXEC([bob1], [ping -q -c 3 -i 0.3 -w 2 10.0.0.2 | FORMAT_PING], \
 [0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
-AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
 2
 ])
-AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
 2
 ])
 
@@ -6224,7 +6223,23 @@  OVS_WAIT_UNTIL([
 test $(ovs-ofctl dump-flows br-int | grep -c 'table=OFTABLE_ECMP_NH, n_packets') -eq 0
 ])
 
-ovs-ofctl dump-flows br-int
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+NS_CHECK_EXEC([bob1], [ip r d default via 172.16.0.2 dev bob1], [0])
+NS_CHECK_EXEC([bob1], [ip r a default via 172.16.0.3 dev bob1], [0])
+
+NETNS_DAEMONIZE([bob1], [nc -l -k 8080], [bob1.pid])
+NS_CHECK_EXEC([alice1], [nc -z 172.16.0.1 8080], [0])
+NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 172.16.0.1 | FORMAT_PING], \
+[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep 0x401020500000000 | FORMAT_CT(172.16.0.1) | \
+sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
+sed -e 's/mark=[[0-9]]*/mark=<cleared>/' | sort], [0], [dnl
+tcp,orig=(src=172.16.0.1,dst=10.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=10.0.0.2,dst=172.16.0.1,sport=<cleared>,dport=<cleared>),zone=<cleared>,mark=<cleared>,labels=0x401020500000000,protoinfo=(state=<cleared>)
+])
 
 OVS_APP_EXIT_AND_WAIT([ovn-controller])
 
@@ -6373,11 +6388,11 @@  NS_CHECK_EXEC([bob1], [ping -q -c 3 -i 0.3 -w 2 fd01::2 | FORMAT_PING], \
 # Ensure datapaths show conntrack states as expected
 # Like with conntrack entries, we shouldn't try to predict
 # port binding tunnel keys. So omit them from expected labels.
-AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x401020400000000/.*)' -c], [0], [dnl
 2
 ])
 
-AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x401020400000000)' -c], [0], [dnl
 2
 ])
 
@@ -6405,10 +6420,10 @@  NS_CHECK_EXEC([bob1], [ping -q -c 3 -i 0.3 -w 2 fd01::2 | FORMAT_PING], \
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
-AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est+trk).*ct(.*label=0x1001020400000000/.*)' -c], [0], [dnl
 2
 ])
-AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+trk).*ct_label(0x1001020400000000)' -c], [0], [dnl
 2
 ])
 
@@ -6427,7 +6442,20 @@  OVS_WAIT_UNTIL([
 test $(ovs-ofctl dump-flows br-int | grep -c 'table=OFTABLE_ECMP_NH, n_packets') -eq 0
 ])
 
-ovs-ofctl dump-flows br-int
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+NETNS_DAEMONIZE([bob1], [nc -6 -l -k 8080], [bob1.pid])
+NS_CHECK_EXEC([alice1], [nc -6 -z fd07::1 8080], [0])
+NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 fd07::1 | FORMAT_PING], \
+[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep 0x1001020400000000 | FORMAT_CT(fd07::1) | \
+sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
+sed -e 's/mark=[[0-9]]*/mark=<cleared>/' | sort], [0], [dnl
+tcp,orig=(src=fd07::1,dst=fd01::2,sport=<cleared>,dport=<cleared>),reply=(src=fd01::2,dst=fd07::1,sport=<cleared>,dport=<cleared>),zone=<cleared>,mark=<cleared>,labels=0x1001020400000000,protoinfo=(state=<cleared>)
+])
 
 OVS_APP_EXIT_AND_WAIT([ovn-controller])