Message ID | 20221202173147.3032702-1-odivlad@gmail.com |
---|---|
Headers | show |
Series | OVN IC bugfixes & proposals/questions | expand |
On 12/2/22 18:31, Vladislav Odintsov wrote: > Hi, > > we’ve met with an issue, where it was possible to create multiple similar > routes within LR (same ip_prefix, nexthop, and route table). > > Initially the problem stared after OVN upgrade. We use python ovsdbapp library, > and we found a problem in python-ovs, which is described here > https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my > colleague Anton. @Terry Wilson, please take a look on this. > > The problem itself touches OVN and OVS. Sorry for the long read, but it seems > that there are a couple of bugs in different places, part of which this RFC > used to cover. > > How the issue was initially reproduced: > > 1. assume we have (at least) 2-Availability Zone OVN deployment > (utilising ovn-ic infrastructure). > 2. create transit switch in IC NB > 3. create LR in each AZ, connect them to transit switch > 4. create one logical switch with a VIF port attached to local OVS & > connect this logical switch to LR (e.g. 192.168.0.1/24) > 5. install in one AZ in LR 2 static routes with a create command (invoke > next command twice): > > ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id > > From this time there is a couple of strange behaviour/bugs appear: > > 1. [possible problem] There is a duplicated route in the NB within a > single LR. lflow is computed to have ECMP group with two similar > routes: > > table=11(lr_in_ip_routing ), priority=97 , match=(reg7 == 0 && ip4.dst == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; reg8[16..31] = select(1, 2); > table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 1 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;) > table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 2 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;) > > Maybe, it’s better to have some kind of handling such routes? > ovsdb index or some logic in ovn-northd? > > 2. [bug] There is a duplicated route advertisement in > OVN_IC_Southbound:Route table. IMO, this should be fixed by adding a > new index to this table for availability_zone, transit_switch, > ip_prefix, nexthop and route_table; adding a logic to check if the > route was already advertised (covered in Patch #7). > > 3. [bug] There is a constant same route learning. Each ovn-ic iteration > on the opposite availability zone adds one new same route. It creates > thousands of same routes each second. This bug is covered by Patch #7. > > 4. [possible problem] After multiple routes are learned to NB on the > opposite availability zone, ovn-northd generates ecmp lflows. Same as > in #1: one in lr_in_ip_routing with select(<thousands of elements>) > and thousands of same records in lr_in_ip_routing_ecmp. OVN allows > installing UINT_MAX routes within ECMP group. > > 5. [OVS bug?] I'd like someone from OVS team to see on this. > ovn-controller installed long-long openflow group rule > (group #3): > > # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c > 797824 > > When I try to dump groups with ovs-ofctl dump-groups br-int, I get > next error in console: > > # ovs-ofctl dump-groups br-int > ovs-ofctl: OpenFlow packet receive failed (End of file) > > In ovs-vswitchd I see next error in logs and after this line ovs is > restarted: > > 2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion start_ofs <= UINT16_MAX failed in ofpmp_postappend() This looks like an OVS bug to me. Ilya, what do you think the best way to fix this is? > > If I issue command again, sometimes it prints same error, but > sometimes this one (I had on the dev machine another OVN LB, so there > are excess groups): > > # ovs-ofctl dump-groups br-int > NXST_GROUP_DESC reply (xid=0x2): flags=[more] > group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1])) > group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1])) > 2022-11-17T17:53:41Z|00001|ofp_group|WARN|OpenFlow message bucket length 56 exceeds remaining buckets data size 40 > NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET*** > 00000000 01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..........# | > 00000010 00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02 |.........@......| > 00000020 a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00 |.........8.(....| > 00000030 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| > 00000040 00 00 00 00 00 00 00 01-ff ff 00 10 00 00 23 20 |..............# | > 00000050 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| > 00000060 00 38 00 28 00 00 00 01-ff ff 00 18 00 00 23 20 |.8.(..........# | > 00000070 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 02 |................| > 00000080 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........| > 00000090 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 02 |.....d...8.(....| > 000000a0 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| > 000000b0 00 00 00 00 00 00 00 03-ff ff 00 10 00 00 23 20 |..............# | > 000000c0 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| > 000000d0 00 38 00 28 00 00 00 03-ff ff 00 18 00 00 23 20 |.8.(..........# | > 000000e0 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 04 |................| > 000000f0 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........| > 00000100 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 04 |.....d...8.(....| > 00000110 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| > 00000120 00 00 00 00 00 00 00 05-ff ff 00 10 00 00 23 20 |..............# | > 00000130 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| > 00000140 00 38 00 28 00 00 00 05-ff ff 00 18 00 00 23 20 |.8.(..........# | > 00000150 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 06 |................| > 00000160 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........| > 00000170 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 06 |.....d...8.(....| > 00000180 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| > 00000190 00 00 00 00 00 00 00 07-ff ff 00 10 00 00 23 20 |..............# | > 000001a0 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| > 000001b0 00 38 00 28 00 00 00 07-ff ff 00 18 00 00 23 20 |.8.(..........# | > 000001c0 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 08 |................| > 000001d0 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........| > 000001e0 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 08 |.....d...8.(....| > 000001f0 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| > 00000200 00 00 00 00 00 00 00 09-ff ff 00 10 00 00 23 20 |..............# | > 00000210 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| > > 7. From this problem with groups-dump I have some questions: > 1. Is there a limit for a buckets count in group? Or a limit for the > group string length? > 2. If yes, should OVN limit on its side the count of buckets in a > group? (Patches #4 && #6). > > 8. Also I’ve tried to see from which values do these problem with > dump-groups begin. I created in a for-loop in OVN multiple ECMP routes > and see that starting from 1200 items in a group the error from last > example appear. I tried to create 10k buckets and while it was > configuring on my machine there were also next lines in logfile: > > 2022-11-17T18:23:30.992Z|00554|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce > 2022-11-17T18:23:31.992Z|00555|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce > 2022-11-17T18:23:33.993Z|00556|ovs_rcu(urcu6)|WARN|blocked 4001 ms waiting for main to quiesce > > When the routes finished creating, I've issued ovs-ofctl dump-groups br-int > and there was just an error: > > # ovs-ofctl dump-groups br-int > ovs-ofctl: OpenFlow packet receive failed (End of file) > > And OVS crashed. OVS 2.17.3 is used. > > My script: > > # cat ./repro.sh > #!/bin/bash > > count=$1 > > echo "Creating ${count} same routes..." > > ovn-nbctl lr-route-del lr1 1.2.3.4/32 > > for i in $(seq 1 ${count}); do > echo $i > ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=172.31.32.4 policy=dst-ip -- add logical-router vpc-FC7D6A54 static_routes @id > done > > Thanks for reading this, I'm ready to provide any additional information to help investigate this. > > Vladislav Odintsov (7): > ic: move routes_ad hmap insert to separate function > ic: remove orphan ovn interconnection routes > ic: lookup southbound port_binding only if needed > actions: limit possible OF group bucket count > ic: minor code improvements > northd: limit ECMP group by 1024 members > ic: prevent advertising/learning multiple same routes > > ic/ovn-ic.c | 123 ++++++++++++++++++++++++++++------------ > lib/actions.c | 40 ++++++++++++- > northd/northd.c | 2 +- > ovn-ic-sb.ovsschema | 6 +- > tests/ovn-ic.at | 133 ++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 263 insertions(+), 41 deletions(-) >
On 12/5/22 17:40, Dumitru Ceara wrote: > On 12/2/22 18:31, Vladislav Odintsov wrote: >> Hi, >> >> we’ve met with an issue, where it was possible to create multiple similar >> routes within LR (same ip_prefix, nexthop, and route table). >> >> Initially the problem stared after OVN upgrade. We use python ovsdbapp library, >> and we found a problem in python-ovs, which is described here >> https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my >> colleague Anton. @Terry Wilson, please take a look on this. >> >> The problem itself touches OVN and OVS. Sorry for the long read, but it seems >> that there are a couple of bugs in different places, part of which this RFC >> used to cover. >> >> How the issue was initially reproduced: >> >> 1. assume we have (at least) 2-Availability Zone OVN deployment >> (utilising ovn-ic infrastructure). >> 2. create transit switch in IC NB >> 3. create LR in each AZ, connect them to transit switch >> 4. create one logical switch with a VIF port attached to local OVS & >> connect this logical switch to LR (e.g. 192.168.0.1/24) >> 5. install in one AZ in LR 2 static routes with a create command (invoke >> next command twice): >> >> ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id >> >> From this time there is a couple of strange behaviour/bugs appear: >> >> 1. [possible problem] There is a duplicated route in the NB within a >> single LR. lflow is computed to have ECMP group with two similar >> routes: >> >> table=11(lr_in_ip_routing ), priority=97 , match=(reg7 == 0 && ip4.dst == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; reg8[16..31] = select(1, 2); >> table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 1 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;) >> table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 2 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;) >> >> Maybe, it’s better to have some kind of handling such routes? >> ovsdb index or some logic in ovn-northd? >> >> 2. [bug] There is a duplicated route advertisement in >> OVN_IC_Southbound:Route table. IMO, this should be fixed by adding a >> new index to this table for availability_zone, transit_switch, >> ip_prefix, nexthop and route_table; adding a logic to check if the >> route was already advertised (covered in Patch #7). >> >> 3. [bug] There is a constant same route learning. Each ovn-ic iteration >> on the opposite availability zone adds one new same route. It creates >> thousands of same routes each second. This bug is covered by Patch #7. >> >> 4. [possible problem] After multiple routes are learned to NB on the >> opposite availability zone, ovn-northd generates ecmp lflows. Same as >> in #1: one in lr_in_ip_routing with select(<thousands of elements>) >> and thousands of same records in lr_in_ip_routing_ecmp. OVN allows >> installing UINT_MAX routes within ECMP group. >> >> 5. [OVS bug?] I'd like someone from OVS team to see on this. >> ovn-controller installed long-long openflow group rule >> (group #3): >> >> # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c >> 797824 >> >> When I try to dump groups with ovs-ofctl dump-groups br-int, I get >> next error in console: >> >> # ovs-ofctl dump-groups br-int >> ovs-ofctl: OpenFlow packet receive failed (End of file) >> >> In ovs-vswitchd I see next error in logs and after this line ovs is >> restarted: >> >> 2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion start_ofs <= UINT16_MAX failed in ofpmp_postappend() > > This looks like an OVS bug to me. Ilya, what do you think the best way > to fix this is? This might be considered as a bug in OVS. In any case, OVS should not crash, but print an error and continue. I'm not sure what is the best way to fix that, need to look closer at the code. However... >> 7. From this problem with groups-dump I have some questions: >> 1. Is there a limit for a buckets count in group? Or a limit for the >> group string length? >> 2. If yes, should OVN limit on its side the count of buckets in a >> group? (Patches #4 && #6). Reading the OpenFlow 1.5 spec, there is a limit on the number of buckets, but it is derived from the maximum bucket id, which is close to a 32bit unsigned value. So, there is no meaningful limit until you reach 32bit limit, which is unlikely. But, there are other indirect limits: 1. For the group modification message, the bucket should fit into a single OFPT_GROUP_MOD message (struct ofp_group_mod). Meaning that each bucket (including actions) cannot take more than roughly (didn't account for headers) 64K - 24 bytes. 2. Actions within a bucket cannot exceed 64K bytes (but they are more limited by the total bucket size above). 3. In order to be dumpable with OFPMP_GROUP_DESC, each group (with all the buckets with their actions) must fit into a single OF message, i.e. 64K. This is caused by the fact that multipart messages must contain an integral number of objects and objects can not be split between messages. The 'object' for OFPMP_GROUP_DESC is a group, so we can't split it on a bucket level. So, technically, a group with a very large number of buckets can be created using OFPT_GROUP_MOD with OFPGC_INSERT_BUCKET, but it will not be possible to dump that group with OFPMP_GROUP_DESC. Depending on the size, it might still be possible to get group stats with OFPMP_GROUP_STATS, since that reply will not contain actual buckets, but only stats per bucket, that might be smaller in total size. OVN should definitely check and not create buckets with actions longer than 64K minus some overhead. OVN has the same issue for OpenFlow rules as well that is currently is not handled in any way. I'm not sure if limiting the total number of buckets makes sense, unless we're talking about the 32bit range. Processing of very large groups may be a performance concern for OVS as you saw in the logs, but that's a different story and can, potentially, be optimized if necessary. Best regards, Ilya Maximets.