mbox series

[ovs-dev,0/7] OVN IC bugfixes & proposals/questions

Message ID 20221202173147.3032702-1-odivlad@gmail.com
Headers show
Series OVN IC bugfixes & proposals/questions | expand

Message

Vladislav Odintsov Dec. 2, 2022, 5:31 p.m. UTC
Hi,

we’ve met with an issue, where it was possible to create multiple similar
routes within LR (same ip_prefix, nexthop, and route table).

Initially the problem stared after OVN upgrade. We use python ovsdbapp library,
and we found a problem in python-ovs, which is described here
https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my
colleague Anton.  @Terry Wilson, please take a look on this.

The problem itself touches OVN and OVS.  Sorry for the long read, but it seems
that there are a couple of bugs in different places, part of which this RFC
used to cover.

How the issue was initially reproduced:

1. assume we have (at least) 2-Availability Zone OVN deployment
   (utilising ovn-ic infrastructure).
2. create transit switch in IC NB
3. create LR in each AZ, connect them to transit switch
4. create one logical switch with a VIF port attached to local OVS &
   connect this logical switch to LR (e.g. 192.168.0.1/24)
5. install in one AZ in LR 2 static routes with a create command (invoke
   next command twice):

   ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id

From this time there is a couple of strange behaviour/bugs appear:

1. [possible problem] There is a duplicated route in the NB within a
   single LR.  lflow is computed to have ECMP group with two similar
   routes:

   table=11(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; reg8[16..31] = select(1, 2);
   table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 1 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
   table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 2 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)

   Maybe, it’s better to have some kind of handling such routes?
   ovsdb index or some logic in ovn-northd?

2. [bug] There is a duplicated route advertisement in
   OVN_IC_Southbound:Route table.  IMO, this should be fixed by adding a
   new index to this table for availability_zone, transit_switch,
   ip_prefix, nexthop and route_table; adding a logic to check if the
   route was already advertised (covered in Patch #7).

3. [bug] There is a constant same route learning.  Each ovn-ic iteration
   on the opposite availability zone adds one new same route.  It creates
   thousands of same routes each second. This bug is covered by Patch #7.

4. [possible problem] After multiple routes are learned to NB on the
   opposite availability zone, ovn-northd generates ecmp lflows.  Same as
   in #1: one in lr_in_ip_routing with select(<thousands of elements>)
   and thousands of same records in lr_in_ip_routing_ecmp.  OVN allows
   installing UINT_MAX routes within ECMP group.

5. [OVS bug?] I'd like someone from OVS team to see on this.
   ovn-controller installed long-long openflow group rule
   (group #3):

   # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
   797824

   When I try to dump groups with ovs-ofctl dump-groups br-int, I get
   next error in console:

   # ovs-ofctl dump-groups br-int
   ovs-ofctl: OpenFlow packet receive failed (End of file)

   In ovs-vswitchd I see next error in logs and after this line ovs is
   restarted:

   2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion start_ofs <= UINT16_MAX failed in ofpmp_postappend()

   If I issue command again, sometimes it prints same error, but
   sometimes this one (I had on the dev machine another OVN LB, so there
   are excess groups):

   # ovs-ofctl dump-groups br-int
   NXST_GROUP_DESC reply (xid=0x2): flags=[more]
   group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
   group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
   2022-11-17T17:53:41Z|00001|ofp_group|WARN|OpenFlow message bucket length 56 exceeds remaining buckets data size 40
   NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET***
   00000000  01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..........# |
   00000010  00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02 |.........@......|
   00000020  a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00 |.........8.(....|
   00000030  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000040  00 00 00 00 00 00 00 01-ff ff 00 10 00 00 23 20 |..............# |
   00000050  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   00000060  00 38 00 28 00 00 00 01-ff ff 00 18 00 00 23 20 |.8.(..........# |
   00000070  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 02 |................|
   00000080  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   00000090  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 02 |.....d...8.(....|
   000000a0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   000000b0  00 00 00 00 00 00 00 03-ff ff 00 10 00 00 23 20 |..............# |
   000000c0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   000000d0  00 38 00 28 00 00 00 03-ff ff 00 18 00 00 23 20 |.8.(..........# |
   000000e0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 04 |................|
   000000f0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   00000100  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 04 |.....d...8.(....|
   00000110  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000120  00 00 00 00 00 00 00 05-ff ff 00 10 00 00 23 20 |..............# |
   00000130  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   00000140  00 38 00 28 00 00 00 05-ff ff 00 18 00 00 23 20 |.8.(..........# |
   00000150  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 06 |................|
   00000160  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   00000170  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 06 |.....d...8.(....|
   00000180  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000190  00 00 00 00 00 00 00 07-ff ff 00 10 00 00 23 20 |..............# |
   000001a0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   000001b0  00 38 00 28 00 00 00 07-ff ff 00 18 00 00 23 20 |.8.(..........# |
   000001c0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 08 |................|
   000001d0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   000001e0  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 08 |.....d...8.(....|
   000001f0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000200  00 00 00 00 00 00 00 09-ff ff 00 10 00 00 23 20 |..............# |
   00000210  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|

7. From this problem with groups-dump I have some questions:
   1. Is there a limit for a buckets count in group? Or a limit for the
      group string length?
   2. If yes, should OVN limit on its side the count of buckets in a
      group? (Patches #4 && #6).

8. Also I’ve tried to see from which values do these problem with
   dump-groups begin. I created in a for-loop in OVN multiple ECMP routes
   and see that starting from 1200 items in a group the error from last
   example appear. I tried to create 10k buckets and while it was
   configuring on my machine there were also next lines in logfile:

   2022-11-17T18:23:30.992Z|00554|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
   2022-11-17T18:23:31.992Z|00555|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
   2022-11-17T18:23:33.993Z|00556|ovs_rcu(urcu6)|WARN|blocked 4001 ms waiting for main to quiesce

   When the routes finished creating, I've issued ovs-ofctl dump-groups br-int
   and there was just an error:

   # ovs-ofctl dump-groups br-int
   ovs-ofctl: OpenFlow packet receive failed (End of file)

   And OVS crashed. OVS 2.17.3 is used.

   My script:

# cat ./repro.sh
#!/bin/bash

count=$1

echo "Creating ${count} same routes..."

ovn-nbctl lr-route-del lr1 1.2.3.4/32

for i in $(seq 1 ${count}); do
    echo $i
    ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=172.31.32.4 policy=dst-ip -- add logical-router vpc-FC7D6A54 static_routes @id
done

Thanks for reading this, I'm ready to provide any additional information to help investigate this.

Vladislav Odintsov (7):
  ic: move routes_ad hmap insert to separate function
  ic: remove orphan ovn interconnection routes
  ic: lookup southbound port_binding only if needed
  actions: limit possible OF group bucket count
  ic: minor code improvements
  northd: limit ECMP group by 1024 members
  ic: prevent advertising/learning multiple same routes

 ic/ovn-ic.c         | 123 ++++++++++++++++++++++++++++------------
 lib/actions.c       |  40 ++++++++++++-
 northd/northd.c     |   2 +-
 ovn-ic-sb.ovsschema |   6 +-
 tests/ovn-ic.at     | 133 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 263 insertions(+), 41 deletions(-)

Comments

Dumitru Ceara Dec. 5, 2022, 4:40 p.m. UTC | #1
On 12/2/22 18:31, Vladislav Odintsov wrote:
> Hi,
> 
> we’ve met with an issue, where it was possible to create multiple similar
> routes within LR (same ip_prefix, nexthop, and route table).
> 
> Initially the problem stared after OVN upgrade. We use python ovsdbapp library,
> and we found a problem in python-ovs, which is described here
> https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my
> colleague Anton.  @Terry Wilson, please take a look on this.
> 
> The problem itself touches OVN and OVS.  Sorry for the long read, but it seems
> that there are a couple of bugs in different places, part of which this RFC
> used to cover.
> 
> How the issue was initially reproduced:
> 
> 1. assume we have (at least) 2-Availability Zone OVN deployment
>    (utilising ovn-ic infrastructure).
> 2. create transit switch in IC NB
> 3. create LR in each AZ, connect them to transit switch
> 4. create one logical switch with a VIF port attached to local OVS &
>    connect this logical switch to LR (e.g. 192.168.0.1/24)
> 5. install in one AZ in LR 2 static routes with a create command (invoke
>    next command twice):
> 
>    ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id
> 
> From this time there is a couple of strange behaviour/bugs appear:
> 
> 1. [possible problem] There is a duplicated route in the NB within a
>    single LR.  lflow is computed to have ECMP group with two similar
>    routes:
> 
>    table=11(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; reg8[16..31] = select(1, 2);
>    table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 1 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>    table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 2 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
> 
>    Maybe, it’s better to have some kind of handling such routes?
>    ovsdb index or some logic in ovn-northd?
> 
> 2. [bug] There is a duplicated route advertisement in
>    OVN_IC_Southbound:Route table.  IMO, this should be fixed by adding a
>    new index to this table for availability_zone, transit_switch,
>    ip_prefix, nexthop and route_table; adding a logic to check if the
>    route was already advertised (covered in Patch #7).
> 
> 3. [bug] There is a constant same route learning.  Each ovn-ic iteration
>    on the opposite availability zone adds one new same route.  It creates
>    thousands of same routes each second. This bug is covered by Patch #7.
> 
> 4. [possible problem] After multiple routes are learned to NB on the
>    opposite availability zone, ovn-northd generates ecmp lflows.  Same as
>    in #1: one in lr_in_ip_routing with select(<thousands of elements>)
>    and thousands of same records in lr_in_ip_routing_ecmp.  OVN allows
>    installing UINT_MAX routes within ECMP group.
> 
> 5. [OVS bug?] I'd like someone from OVS team to see on this.
>    ovn-controller installed long-long openflow group rule
>    (group #3):
> 
>    # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
>    797824
> 
>    When I try to dump groups with ovs-ofctl dump-groups br-int, I get
>    next error in console:
> 
>    # ovs-ofctl dump-groups br-int
>    ovs-ofctl: OpenFlow packet receive failed (End of file)
> 
>    In ovs-vswitchd I see next error in logs and after this line ovs is
>    restarted:
> 
>    2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion start_ofs <= UINT16_MAX failed in ofpmp_postappend()

This looks like an OVS bug to me.  Ilya, what do you think the best way
to fix this is?

> 
>    If I issue command again, sometimes it prints same error, but
>    sometimes this one (I had on the dev machine another OVN LB, so there
>    are excess groups):
> 
>    # ovs-ofctl dump-groups br-int
>    NXST_GROUP_DESC reply (xid=0x2): flags=[more]
>    group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
>    group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
>    2022-11-17T17:53:41Z|00001|ofp_group|WARN|OpenFlow message bucket length 56 exceeds remaining buckets data size 40
>    NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET***
>    00000000  01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..........# |
>    00000010  00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02 |.........@......|
>    00000020  a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00 |.........8.(....|
>    00000030  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
>    00000040  00 00 00 00 00 00 00 01-ff ff 00 10 00 00 23 20 |..............# |
>    00000050  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
>    00000060  00 38 00 28 00 00 00 01-ff ff 00 18 00 00 23 20 |.8.(..........# |
>    00000070  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 02 |................|
>    00000080  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
>    00000090  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 02 |.....d...8.(....|
>    000000a0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
>    000000b0  00 00 00 00 00 00 00 03-ff ff 00 10 00 00 23 20 |..............# |
>    000000c0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
>    000000d0  00 38 00 28 00 00 00 03-ff ff 00 18 00 00 23 20 |.8.(..........# |
>    000000e0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 04 |................|
>    000000f0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
>    00000100  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 04 |.....d...8.(....|
>    00000110  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
>    00000120  00 00 00 00 00 00 00 05-ff ff 00 10 00 00 23 20 |..............# |
>    00000130  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
>    00000140  00 38 00 28 00 00 00 05-ff ff 00 18 00 00 23 20 |.8.(..........# |
>    00000150  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 06 |................|
>    00000160  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
>    00000170  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 06 |.....d...8.(....|
>    00000180  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
>    00000190  00 00 00 00 00 00 00 07-ff ff 00 10 00 00 23 20 |..............# |
>    000001a0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
>    000001b0  00 38 00 28 00 00 00 07-ff ff 00 18 00 00 23 20 |.8.(..........# |
>    000001c0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 08 |................|
>    000001d0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
>    000001e0  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 08 |.....d...8.(....|
>    000001f0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
>    00000200  00 00 00 00 00 00 00 09-ff ff 00 10 00 00 23 20 |..............# |
>    00000210  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
> 
> 7. From this problem with groups-dump I have some questions:
>    1. Is there a limit for a buckets count in group? Or a limit for the
>       group string length?
>    2. If yes, should OVN limit on its side the count of buckets in a
>       group? (Patches #4 && #6).
> 
> 8. Also I’ve tried to see from which values do these problem with
>    dump-groups begin. I created in a for-loop in OVN multiple ECMP routes
>    and see that starting from 1200 items in a group the error from last
>    example appear. I tried to create 10k buckets and while it was
>    configuring on my machine there were also next lines in logfile:
> 
>    2022-11-17T18:23:30.992Z|00554|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
>    2022-11-17T18:23:31.992Z|00555|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
>    2022-11-17T18:23:33.993Z|00556|ovs_rcu(urcu6)|WARN|blocked 4001 ms waiting for main to quiesce
> 
>    When the routes finished creating, I've issued ovs-ofctl dump-groups br-int
>    and there was just an error:
> 
>    # ovs-ofctl dump-groups br-int
>    ovs-ofctl: OpenFlow packet receive failed (End of file)
> 
>    And OVS crashed. OVS 2.17.3 is used.
> 
>    My script:
> 
> # cat ./repro.sh
> #!/bin/bash
> 
> count=$1
> 
> echo "Creating ${count} same routes..."
> 
> ovn-nbctl lr-route-del lr1 1.2.3.4/32
> 
> for i in $(seq 1 ${count}); do
>     echo $i
>     ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=172.31.32.4 policy=dst-ip -- add logical-router vpc-FC7D6A54 static_routes @id
> done
> 
> Thanks for reading this, I'm ready to provide any additional information to help investigate this.
> 
> Vladislav Odintsov (7):
>   ic: move routes_ad hmap insert to separate function
>   ic: remove orphan ovn interconnection routes
>   ic: lookup southbound port_binding only if needed
>   actions: limit possible OF group bucket count
>   ic: minor code improvements
>   northd: limit ECMP group by 1024 members
>   ic: prevent advertising/learning multiple same routes
> 
>  ic/ovn-ic.c         | 123 ++++++++++++++++++++++++++++------------
>  lib/actions.c       |  40 ++++++++++++-
>  northd/northd.c     |   2 +-
>  ovn-ic-sb.ovsschema |   6 +-
>  tests/ovn-ic.at     | 133 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 263 insertions(+), 41 deletions(-)
>
Ilya Maximets Dec. 5, 2022, 6:59 p.m. UTC | #2
On 12/5/22 17:40, Dumitru Ceara wrote:
> On 12/2/22 18:31, Vladislav Odintsov wrote:
>> Hi,
>>
>> we’ve met with an issue, where it was possible to create multiple similar
>> routes within LR (same ip_prefix, nexthop, and route table).
>>
>> Initially the problem stared after OVN upgrade. We use python ovsdbapp library,
>> and we found a problem in python-ovs, which is described here
>> https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my
>> colleague Anton.  @Terry Wilson, please take a look on this.
>>
>> The problem itself touches OVN and OVS.  Sorry for the long read, but it seems
>> that there are a couple of bugs in different places, part of which this RFC
>> used to cover.
>>
>> How the issue was initially reproduced:
>>
>> 1. assume we have (at least) 2-Availability Zone OVN deployment
>>    (utilising ovn-ic infrastructure).
>> 2. create transit switch in IC NB
>> 3. create LR in each AZ, connect them to transit switch
>> 4. create one logical switch with a VIF port attached to local OVS &
>>    connect this logical switch to LR (e.g. 192.168.0.1/24)
>> 5. install in one AZ in LR 2 static routes with a create command (invoke
>>    next command twice):
>>
>>    ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id
>>
>> From this time there is a couple of strange behaviour/bugs appear:
>>
>> 1. [possible problem] There is a duplicated route in the NB within a
>>    single LR.  lflow is computed to have ECMP group with two similar
>>    routes:
>>
>>    table=11(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; reg8[16..31] = select(1, 2);
>>    table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 1 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>>    table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 2 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>>
>>    Maybe, it’s better to have some kind of handling such routes?
>>    ovsdb index or some logic in ovn-northd?
>>
>> 2. [bug] There is a duplicated route advertisement in
>>    OVN_IC_Southbound:Route table.  IMO, this should be fixed by adding a
>>    new index to this table for availability_zone, transit_switch,
>>    ip_prefix, nexthop and route_table; adding a logic to check if the
>>    route was already advertised (covered in Patch #7).
>>
>> 3. [bug] There is a constant same route learning.  Each ovn-ic iteration
>>    on the opposite availability zone adds one new same route.  It creates
>>    thousands of same routes each second. This bug is covered by Patch #7.
>>
>> 4. [possible problem] After multiple routes are learned to NB on the
>>    opposite availability zone, ovn-northd generates ecmp lflows.  Same as
>>    in #1: one in lr_in_ip_routing with select(<thousands of elements>)
>>    and thousands of same records in lr_in_ip_routing_ecmp.  OVN allows
>>    installing UINT_MAX routes within ECMP group.
>>
>> 5. [OVS bug?] I'd like someone from OVS team to see on this.
>>    ovn-controller installed long-long openflow group rule
>>    (group #3):
>>
>>    # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
>>    797824
>>
>>    When I try to dump groups with ovs-ofctl dump-groups br-int, I get
>>    next error in console:
>>
>>    # ovs-ofctl dump-groups br-int
>>    ovs-ofctl: OpenFlow packet receive failed (End of file)
>>
>>    In ovs-vswitchd I see next error in logs and after this line ovs is
>>    restarted:
>>
>>    2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion start_ofs <= UINT16_MAX failed in ofpmp_postappend()
> 
> This looks like an OVS bug to me.  Ilya, what do you think the best way
> to fix this is?

This might be considered as a bug in OVS.  In any case, OVS
should not crash, but print an error and continue.

I'm not sure what is the best way to fix that, need to look
closer at the code.

However...

>> 7. From this problem with groups-dump I have some questions:
>>    1. Is there a limit for a buckets count in group? Or a limit for the
>>       group string length?
>>    2. If yes, should OVN limit on its side the count of buckets in a
>>       group? (Patches #4 && #6).

Reading the OpenFlow 1.5 spec, there is a limit on the number
of buckets, but it is derived from the maximum bucket id, which
is close to a 32bit unsigned value.  So, there is no meaningful
limit until you reach 32bit limit, which is unlikely.

But, there are other indirect limits:

1. For the group modification message, the bucket should fit
   into a single OFPT_GROUP_MOD message (struct ofp_group_mod).
   Meaning that each bucket (including actions) cannot take
   more than roughly (didn't account for headers) 64K - 24 bytes.

2. Actions within a bucket cannot exceed 64K bytes (but they
   are more limited by the total bucket size above).

3. In order to be dumpable with OFPMP_GROUP_DESC, each group
   (with all the buckets with their actions) must fit into a
   single OF message, i.e. 64K.  This is caused by the fact
   that multipart messages must contain an integral number of
   objects and objects can not be split between messages.
   The 'object' for OFPMP_GROUP_DESC is a group, so we can't
   split it on a bucket level.

So, technically, a group with a very large number of buckets
can be created using OFPT_GROUP_MOD with OFPGC_INSERT_BUCKET,
but it will not be possible to dump that group with OFPMP_GROUP_DESC.
Depending on the size, it might still be possible to get group
stats with OFPMP_GROUP_STATS, since that reply will not contain
actual buckets, but only stats per bucket, that might be smaller
in total size.

OVN should definitely check and not create buckets with actions
longer than 64K minus some overhead.  OVN has the same issue for
OpenFlow rules as well that is currently is not handled in any
way.

I'm not sure if limiting the total number of buckets makes sense,
unless we're talking about the 32bit range.

Processing of very large groups may be a performance concern
for OVS as you saw in the logs, but that's a different story
and can, potentially, be optimized if necessary.

Best regards, Ilya Maximets.