Message ID | 20231208124410.3013939-1-xsimonar@redhat.com |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev,v3] controller: fixed potential segfault when changing tunnel_key and deleting ls | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On 12/8/23 13:44, Xavier Simonart wrote: > When a tunnel_key for a datapath was changed, the local_datapaths hmap was not properly updated > as the old/initial entry was not removed. > - If the datapath was not deleted at the same time, a new entry (for the same dp) was created > in the local_datapaths as the previous entry was not found (wrong hash). > - If the datapath was deleted at the same time, the former entry also remained (as, again, the hash > was wrong). So, we kept a deleted (dp) entry in the hmap, and might crash when we used it later. > > When tunnel_key is updated for an existing datapath, we now clean the local_datapaths, > removing bad entries (i.e entries for which the hash is not the tunnel_key). > > This issue was identified by flaky failures of test "ovn-ic -- route deletion upon TS deletion". > > Backtrace: > 0 0x0000000000504a9a in hmap_first_with_hash (hmap=0x20f4d90, hmap@entry=0x5d5634, hmap=0x20f4d90, hmap@entry=0x5d5634, hash=1956414673) at ./include/openvswitch/hmap.h:360 > 1 smap_find__ (smap=smap@entry=0x20f4d90, key=key@entry=0x5d5634 "snat-ct-zone", key_len=key_len@entry=12, hash=1956414673) at lib/smap.c:421 > 2 0x0000000000505073 in smap_get_node (key=0x5d5634 "snat-ct-zone", smap=0x20f4d90) at lib/smap.c:217 > 3 smap_get_def (def=0x0, key=0x5d5634 "snat-ct-zone", smap=0x20f4d90) at lib/smap.c:208 > 4 smap_get (smap=0x20f4d90, key=key@entry=0x5d5634 "snat-ct-zone") at lib/smap.c:200 > 5 0x0000000000419898 in get_common_nat_zone (ldp=0x20a8c40, ldp=0x20a8c40) at controller/lflow.c:831 > 6 add_matches_to_flow_table (lflow=lflow@entry=0x21ddf90, ldp=ldp@entry=0x20a8c40, matches=matches@entry=0x211bb50, ptable=ptable@entry=10 '\n', output_ptable=output_ptable@entry=37 '%', ovnacts=ovnacts@entry=0x7ffd19b99de0, > ingress=true, l_ctx_in=0x7ffd19b9a300, l_ctx_out=0x7ffd19b9a2c0) at controller/lflow.c:892 > 7 0x000000000041a29b in consider_logical_flow__ (lflow=lflow@entry=0x21ddf90, dp=0x20eb720, l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:1207 > 8 0x000000000041a587 in consider_logical_flow (lflow=lflow@entry=0x21ddf90, is_recompute=is_recompute@entry=false, l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:1276 > 9 0x000000000041e30f in lflow_handle_changed_flows (l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:271 > 10 0x0000000000439fc7 in lflow_output_sb_logical_flow_handler (node=0x7ffd19ba5b70, data=<optimized out>) at controller/ovn-controller.c:4045 > 11 0x00000000004682fe in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at lib/inc-proc-eng.c:441 > 12 engine_run_node (recompute_allowed=true, node=0x7ffd19ba5b70) at lib/inc-proc-eng.c:503 > 13 engine_run (recompute_allowed=recompute_allowed@entry=true) at lib/inc-proc-eng.c:528 > 14 0x000000000040ade2 in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:5709 > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > --- > v2: Fixed potential memory leak. > v3: - Updated based on Dumitru's feedback: style & fixed another potential memory leak. > - Modify test case to detect potential memory leak. > - Rebased on latest main. > --- > controller/local_data.c | 14 ++++++++++++ > controller/local_data.h | 4 ++++ > controller/ovn-controller.c | 23 ++++++++++++++++++++ > tests/ovn.at | 43 +++++++++++++++++++++++++++++++++++++ > 4 files changed, 84 insertions(+) > > diff --git a/controller/local_data.c b/controller/local_data.c > index 3a7d3afeb..8f0890a14 100644 > --- a/controller/local_data.c > +++ b/controller/local_data.c > @@ -56,6 +56,20 @@ static bool datapath_is_transit_switch(const struct sbrec_datapath_binding *); > > static uint64_t local_datapath_usage; > > +/* To be used when hmap_node.hash might be wrong e.g. tunnel_key got updated */ > +struct local_datapath * > +get_local_datapath_no_hash(const struct hmap *local_datapaths, > + uint32_t tunnel_key) Nit: indentation. > +{ > + struct local_datapath *ld; > + HMAP_FOR_EACH (ld, hmap_node, local_datapaths) { > + if (ld->datapath->tunnel_key == tunnel_key) { > + return ld; > + } > + } > + return NULL; > +} > + > struct local_datapath * > get_local_datapath(const struct hmap *local_datapaths, uint32_t tunnel_key) > { > diff --git a/controller/local_data.h b/controller/local_data.h > index f6d8f725f..1586a76da 100644 > --- a/controller/local_data.h > +++ b/controller/local_data.h > @@ -69,6 +69,10 @@ struct local_datapath *local_datapath_alloc( > const struct sbrec_datapath_binding *); > struct local_datapath *get_local_datapath(const struct hmap *, > uint32_t tunnel_key); > +struct local_datapath > +*get_local_datapath_no_hash(const struct hmap *local_datapaths, > + uint32_t tunnel_key); > + Nit: for prototypes we usually keep the return type and function name/declaration on the same line. Also, IMO there's no need for an empty line here. > bool > need_add_peer_to_local( > struct ovsdb_idl_index *sbrec_port_binding_by_name, > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 760b7788b..e4fb3a5e4 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -1895,6 +1895,7 @@ runtime_data_sb_datapath_binding_handler(struct engine_node *node OVS_UNUSED, > engine_get_input("SB_datapath_binding", node)); > const struct sbrec_datapath_binding *dp; > struct ed_type_runtime_data *rt_data = data; > + struct local_datapath *ld; > > SBREC_DATAPATH_BINDING_TABLE_FOR_EACH_TRACKED (dp, dp_table) { > if (sbrec_datapath_binding_is_deleted(dp)) { > @@ -1902,6 +1903,28 @@ runtime_data_sb_datapath_binding_handler(struct engine_node *node OVS_UNUSED, > dp->tunnel_key)) { > return false; > } > + /* If the tunnel key got updated, get_local_datapath will not find > + * the ld. Use get_local_datapath_no_hash which does not > + * rely on the hash. > + */ > + if (sbrec_datapath_binding_is_updated( > + dp, SBREC_DATAPATH_BINDING_COL_TUNNEL_KEY)) { > + if (get_local_datapath_no_hash(&rt_data->local_datapaths, > + dp->tunnel_key)) { > + return false; > + } > + } > + } else if (sbrec_datapath_binding_is_updated( > + dp, SBREC_DATAPATH_BINDING_COL_TUNNEL_KEY) > + && !sbrec_datapath_binding_is_new(dp)) { > + /* If the tunnel key is updated, remove the entry (with a wrong > + * hash) from the map. It will be (properly) added back later. > + */ > + if ((ld = get_local_datapath_no_hash(&rt_data->local_datapaths, > + dp->tunnel_key))) { > + hmap_remove(&rt_data->local_datapaths, &ld->hmap_node); > + local_datapath_destroy(ld); > + } > } > } > > diff --git a/tests/ovn.at b/tests/ovn.at > index e8c79512b..969a8fb9c 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -37419,3 +37419,46 @@ check_flow_count hv2 12 > OVN_CLEANUP([hv1]) > AT_CLEANUP > ]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([Changing tunnel_key]) > +ovn_start > + > +net_add n1 > + > +sim_add hv1 > +as hv1 > +check ovs-vsctl add-br br-phys > +ovn_attach n1 br-phys 192.168.0.11 > + > +check ovn-nbctl --wait=hv ls-add ls \ > + -- lsp-add ls lsp1 \ > + -- lsp-add ls ls-lr \ > + -- lr-add lr \ > + -- lrp-add lr lr-ls f0:00:00:00:00:f1 192.168.1.1/24 \ > + -- set Logical_Switch_Port ls-lr \ > + type=router \ > + options:router-port=lr-ls \ > + addresses=router \ > + -- lrp-set-gateway-chassis lr-ls hv1 > + > +sleep_controller hv1 > + > +check ovn-nbctl --wait=sb set Logical_Switch ls other_config:requested-tnl-key=1000 > +check ovn-nbctl --wait=sb ls-del ls > +wake_up_controller hv1 > + > +check ovn-nbctl --wait=hv sync > + > +check ovn-nbctl --wait=hv ls-add ls1 \ > + -- lsp-add ls1 ls1-lr \ > + -- lrp-add lr lr-ls1 f0:00:00:00:00:f2 192.168.2.1/24 \ > + -- set Logical_Switch_Port ls1-lr type=router options:router-port=lr-ls1 addresses=router \ > + -- lrp-set-gateway-chassis lr-ls1 hv1 > + > +check ovn-nbctl --wait=hv set Logical_Switch ls1 other_config:requested-tnl-key=1001 > + > +OVN_CLEANUP([hv1]) > + > +AT_CLEANUP > +]) Except for the minor nits above, the patch looks good to me: Acked-by: Dumitru Ceara <dceara@redhat.com> Thanks, Dumitru
On 1/4/24 11:43, Dumitru Ceara wrote: > On 12/8/23 13:44, Xavier Simonart wrote: >> When a tunnel_key for a datapath was changed, the local_datapaths hmap was not properly updated >> as the old/initial entry was not removed. >> - If the datapath was not deleted at the same time, a new entry (for the same dp) was created >> in the local_datapaths as the previous entry was not found (wrong hash). >> - If the datapath was deleted at the same time, the former entry also remained (as, again, the hash >> was wrong). So, we kept a deleted (dp) entry in the hmap, and might crash when we used it later. >> >> When tunnel_key is updated for an existing datapath, we now clean the local_datapaths, >> removing bad entries (i.e entries for which the hash is not the tunnel_key). >> >> This issue was identified by flaky failures of test "ovn-ic -- route deletion upon TS deletion". >> >> Backtrace: >> 0 0x0000000000504a9a in hmap_first_with_hash (hmap=0x20f4d90, hmap@entry=0x5d5634, hmap=0x20f4d90, hmap@entry=0x5d5634, hash=1956414673) at ./include/openvswitch/hmap.h:360 >> 1 smap_find__ (smap=smap@entry=0x20f4d90, key=key@entry=0x5d5634 "snat-ct-zone", key_len=key_len@entry=12, hash=1956414673) at lib/smap.c:421 >> 2 0x0000000000505073 in smap_get_node (key=0x5d5634 "snat-ct-zone", smap=0x20f4d90) at lib/smap.c:217 >> 3 smap_get_def (def=0x0, key=0x5d5634 "snat-ct-zone", smap=0x20f4d90) at lib/smap.c:208 >> 4 smap_get (smap=0x20f4d90, key=key@entry=0x5d5634 "snat-ct-zone") at lib/smap.c:200 >> 5 0x0000000000419898 in get_common_nat_zone (ldp=0x20a8c40, ldp=0x20a8c40) at controller/lflow.c:831 >> 6 add_matches_to_flow_table (lflow=lflow@entry=0x21ddf90, ldp=ldp@entry=0x20a8c40, matches=matches@entry=0x211bb50, ptable=ptable@entry=10 '\n', output_ptable=output_ptable@entry=37 '%', ovnacts=ovnacts@entry=0x7ffd19b99de0, >> ingress=true, l_ctx_in=0x7ffd19b9a300, l_ctx_out=0x7ffd19b9a2c0) at controller/lflow.c:892 >> 7 0x000000000041a29b in consider_logical_flow__ (lflow=lflow@entry=0x21ddf90, dp=0x20eb720, l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:1207 >> 8 0x000000000041a587 in consider_logical_flow (lflow=lflow@entry=0x21ddf90, is_recompute=is_recompute@entry=false, l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:1276 >> 9 0x000000000041e30f in lflow_handle_changed_flows (l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:271 >> 10 0x0000000000439fc7 in lflow_output_sb_logical_flow_handler (node=0x7ffd19ba5b70, data=<optimized out>) at controller/ovn-controller.c:4045 >> 11 0x00000000004682fe in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at lib/inc-proc-eng.c:441 >> 12 engine_run_node (recompute_allowed=true, node=0x7ffd19ba5b70) at lib/inc-proc-eng.c:503 >> 13 engine_run (recompute_allowed=recompute_allowed@entry=true) at lib/inc-proc-eng.c:528 >> 14 0x000000000040ade2 in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:5709 >> >> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >> >> --- >> v2: Fixed potential memory leak. >> v3: - Updated based on Dumitru's feedback: style & fixed another potential memory leak. >> - Modify test case to detect potential memory leak. >> - Rebased on latest main. >> --- >> controller/local_data.c | 14 ++++++++++++ >> controller/local_data.h | 4 ++++ >> controller/ovn-controller.c | 23 ++++++++++++++++++++ >> tests/ovn.at | 43 +++++++++++++++++++++++++++++++++++++ >> 4 files changed, 84 insertions(+) >> >> diff --git a/controller/local_data.c b/controller/local_data.c >> index 3a7d3afeb..8f0890a14 100644 >> --- a/controller/local_data.c >> +++ b/controller/local_data.c >> @@ -56,6 +56,20 @@ static bool datapath_is_transit_switch(const struct sbrec_datapath_binding *); >> >> static uint64_t local_datapath_usage; >> >> +/* To be used when hmap_node.hash might be wrong e.g. tunnel_key got updated */ >> +struct local_datapath * >> +get_local_datapath_no_hash(const struct hmap *local_datapaths, >> + uint32_t tunnel_key) > > Nit: indentation. > >> +{ >> + struct local_datapath *ld; >> + HMAP_FOR_EACH (ld, hmap_node, local_datapaths) { >> + if (ld->datapath->tunnel_key == tunnel_key) { >> + return ld; >> + } >> + } >> + return NULL; >> +} >> + >> struct local_datapath * >> get_local_datapath(const struct hmap *local_datapaths, uint32_t tunnel_key) >> { >> diff --git a/controller/local_data.h b/controller/local_data.h >> index f6d8f725f..1586a76da 100644 >> --- a/controller/local_data.h >> +++ b/controller/local_data.h >> @@ -69,6 +69,10 @@ struct local_datapath *local_datapath_alloc( >> const struct sbrec_datapath_binding *); >> struct local_datapath *get_local_datapath(const struct hmap *, >> uint32_t tunnel_key); >> +struct local_datapath >> +*get_local_datapath_no_hash(const struct hmap *local_datapaths, >> + uint32_t tunnel_key); >> + > > Nit: for prototypes we usually keep the return type and function > name/declaration on the same line. > > Also, IMO there's no need for an empty line here. > >> bool >> need_add_peer_to_local( >> struct ovsdb_idl_index *sbrec_port_binding_by_name, >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> index 760b7788b..e4fb3a5e4 100644 >> --- a/controller/ovn-controller.c >> +++ b/controller/ovn-controller.c >> @@ -1895,6 +1895,7 @@ runtime_data_sb_datapath_binding_handler(struct engine_node *node OVS_UNUSED, >> engine_get_input("SB_datapath_binding", node)); >> const struct sbrec_datapath_binding *dp; >> struct ed_type_runtime_data *rt_data = data; >> + struct local_datapath *ld; >> >> SBREC_DATAPATH_BINDING_TABLE_FOR_EACH_TRACKED (dp, dp_table) { >> if (sbrec_datapath_binding_is_deleted(dp)) { >> @@ -1902,6 +1903,28 @@ runtime_data_sb_datapath_binding_handler(struct engine_node *node OVS_UNUSED, >> dp->tunnel_key)) { >> return false; >> } >> + /* If the tunnel key got updated, get_local_datapath will not find >> + * the ld. Use get_local_datapath_no_hash which does not >> + * rely on the hash. >> + */ >> + if (sbrec_datapath_binding_is_updated( >> + dp, SBREC_DATAPATH_BINDING_COL_TUNNEL_KEY)) { >> + if (get_local_datapath_no_hash(&rt_data->local_datapaths, >> + dp->tunnel_key)) { >> + return false; >> + } >> + } >> + } else if (sbrec_datapath_binding_is_updated( >> + dp, SBREC_DATAPATH_BINDING_COL_TUNNEL_KEY) >> + && !sbrec_datapath_binding_is_new(dp)) { >> + /* If the tunnel key is updated, remove the entry (with a wrong >> + * hash) from the map. It will be (properly) added back later. >> + */ >> + if ((ld = get_local_datapath_no_hash(&rt_data->local_datapaths, >> + dp->tunnel_key))) { >> + hmap_remove(&rt_data->local_datapaths, &ld->hmap_node); >> + local_datapath_destroy(ld); >> + } >> } >> } >> >> diff --git a/tests/ovn.at b/tests/ovn.at >> index e8c79512b..969a8fb9c 100644 >> --- a/tests/ovn.at >> +++ b/tests/ovn.at >> @@ -37419,3 +37419,46 @@ check_flow_count hv2 12 >> OVN_CLEANUP([hv1]) >> AT_CLEANUP >> ]) >> + >> +OVN_FOR_EACH_NORTHD([ >> +AT_SETUP([Changing tunnel_key]) >> +ovn_start >> + >> +net_add n1 >> + >> +sim_add hv1 >> +as hv1 >> +check ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.11 >> + >> +check ovn-nbctl --wait=hv ls-add ls \ >> + -- lsp-add ls lsp1 \ >> + -- lsp-add ls ls-lr \ >> + -- lr-add lr \ >> + -- lrp-add lr lr-ls f0:00:00:00:00:f1 192.168.1.1/24 \ >> + -- set Logical_Switch_Port ls-lr \ >> + type=router \ >> + options:router-port=lr-ls \ >> + addresses=router \ >> + -- lrp-set-gateway-chassis lr-ls hv1 >> + >> +sleep_controller hv1 >> + >> +check ovn-nbctl --wait=sb set Logical_Switch ls other_config:requested-tnl-key=1000 >> +check ovn-nbctl --wait=sb ls-del ls >> +wake_up_controller hv1 >> + >> +check ovn-nbctl --wait=hv sync >> + >> +check ovn-nbctl --wait=hv ls-add ls1 \ >> + -- lsp-add ls1 ls1-lr \ >> + -- lrp-add lr lr-ls1 f0:00:00:00:00:f2 192.168.2.1/24 \ >> + -- set Logical_Switch_Port ls1-lr type=router options:router-port=lr-ls1 addresses=router \ >> + -- lrp-set-gateway-chassis lr-ls1 hv1 >> + >> +check ovn-nbctl --wait=hv set Logical_Switch ls1 other_config:requested-tnl-key=1001 >> + >> +OVN_CLEANUP([hv1]) >> + >> +AT_CLEANUP >> +]) > > Except for the minor nits above, the patch looks good to me: > Acked-by: Dumitru Ceara <dceara@redhat.com> > I changed my ack into a signoff and applied the patch to main and all stable branches down to 22.03. Thanks, Xavier, for the fix! Regards, Dumitru
diff --git a/controller/local_data.c b/controller/local_data.c index 3a7d3afeb..8f0890a14 100644 --- a/controller/local_data.c +++ b/controller/local_data.c @@ -56,6 +56,20 @@ static bool datapath_is_transit_switch(const struct sbrec_datapath_binding *); static uint64_t local_datapath_usage; +/* To be used when hmap_node.hash might be wrong e.g. tunnel_key got updated */ +struct local_datapath * +get_local_datapath_no_hash(const struct hmap *local_datapaths, + uint32_t tunnel_key) +{ + struct local_datapath *ld; + HMAP_FOR_EACH (ld, hmap_node, local_datapaths) { + if (ld->datapath->tunnel_key == tunnel_key) { + return ld; + } + } + return NULL; +} + struct local_datapath * get_local_datapath(const struct hmap *local_datapaths, uint32_t tunnel_key) { diff --git a/controller/local_data.h b/controller/local_data.h index f6d8f725f..1586a76da 100644 --- a/controller/local_data.h +++ b/controller/local_data.h @@ -69,6 +69,10 @@ struct local_datapath *local_datapath_alloc( const struct sbrec_datapath_binding *); struct local_datapath *get_local_datapath(const struct hmap *, uint32_t tunnel_key); +struct local_datapath +*get_local_datapath_no_hash(const struct hmap *local_datapaths, + uint32_t tunnel_key); + bool need_add_peer_to_local( struct ovsdb_idl_index *sbrec_port_binding_by_name, diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 760b7788b..e4fb3a5e4 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1895,6 +1895,7 @@ runtime_data_sb_datapath_binding_handler(struct engine_node *node OVS_UNUSED, engine_get_input("SB_datapath_binding", node)); const struct sbrec_datapath_binding *dp; struct ed_type_runtime_data *rt_data = data; + struct local_datapath *ld; SBREC_DATAPATH_BINDING_TABLE_FOR_EACH_TRACKED (dp, dp_table) { if (sbrec_datapath_binding_is_deleted(dp)) { @@ -1902,6 +1903,28 @@ runtime_data_sb_datapath_binding_handler(struct engine_node *node OVS_UNUSED, dp->tunnel_key)) { return false; } + /* If the tunnel key got updated, get_local_datapath will not find + * the ld. Use get_local_datapath_no_hash which does not + * rely on the hash. + */ + if (sbrec_datapath_binding_is_updated( + dp, SBREC_DATAPATH_BINDING_COL_TUNNEL_KEY)) { + if (get_local_datapath_no_hash(&rt_data->local_datapaths, + dp->tunnel_key)) { + return false; + } + } + } else if (sbrec_datapath_binding_is_updated( + dp, SBREC_DATAPATH_BINDING_COL_TUNNEL_KEY) + && !sbrec_datapath_binding_is_new(dp)) { + /* If the tunnel key is updated, remove the entry (with a wrong + * hash) from the map. It will be (properly) added back later. + */ + if ((ld = get_local_datapath_no_hash(&rt_data->local_datapaths, + dp->tunnel_key))) { + hmap_remove(&rt_data->local_datapaths, &ld->hmap_node); + local_datapath_destroy(ld); + } } } diff --git a/tests/ovn.at b/tests/ovn.at index e8c79512b..969a8fb9c 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -37419,3 +37419,46 @@ check_flow_count hv2 12 OVN_CLEANUP([hv1]) AT_CLEANUP ]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([Changing tunnel_key]) +ovn_start + +net_add n1 + +sim_add hv1 +as hv1 +check ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.11 + +check ovn-nbctl --wait=hv ls-add ls \ + -- lsp-add ls lsp1 \ + -- lsp-add ls ls-lr \ + -- lr-add lr \ + -- lrp-add lr lr-ls f0:00:00:00:00:f1 192.168.1.1/24 \ + -- set Logical_Switch_Port ls-lr \ + type=router \ + options:router-port=lr-ls \ + addresses=router \ + -- lrp-set-gateway-chassis lr-ls hv1 + +sleep_controller hv1 + +check ovn-nbctl --wait=sb set Logical_Switch ls other_config:requested-tnl-key=1000 +check ovn-nbctl --wait=sb ls-del ls +wake_up_controller hv1 + +check ovn-nbctl --wait=hv sync + +check ovn-nbctl --wait=hv ls-add ls1 \ + -- lsp-add ls1 ls1-lr \ + -- lrp-add lr lr-ls1 f0:00:00:00:00:f2 192.168.2.1/24 \ + -- set Logical_Switch_Port ls1-lr type=router options:router-port=lr-ls1 addresses=router \ + -- lrp-set-gateway-chassis lr-ls1 hv1 + +check ovn-nbctl --wait=hv set Logical_Switch ls1 other_config:requested-tnl-key=1001 + +OVN_CLEANUP([hv1]) + +AT_CLEANUP +])
When a tunnel_key for a datapath was changed, the local_datapaths hmap was not properly updated as the old/initial entry was not removed. - If the datapath was not deleted at the same time, a new entry (for the same dp) was created in the local_datapaths as the previous entry was not found (wrong hash). - If the datapath was deleted at the same time, the former entry also remained (as, again, the hash was wrong). So, we kept a deleted (dp) entry in the hmap, and might crash when we used it later. When tunnel_key is updated for an existing datapath, we now clean the local_datapaths, removing bad entries (i.e entries for which the hash is not the tunnel_key). This issue was identified by flaky failures of test "ovn-ic -- route deletion upon TS deletion". Backtrace: 0 0x0000000000504a9a in hmap_first_with_hash (hmap=0x20f4d90, hmap@entry=0x5d5634, hmap=0x20f4d90, hmap@entry=0x5d5634, hash=1956414673) at ./include/openvswitch/hmap.h:360 1 smap_find__ (smap=smap@entry=0x20f4d90, key=key@entry=0x5d5634 "snat-ct-zone", key_len=key_len@entry=12, hash=1956414673) at lib/smap.c:421 2 0x0000000000505073 in smap_get_node (key=0x5d5634 "snat-ct-zone", smap=0x20f4d90) at lib/smap.c:217 3 smap_get_def (def=0x0, key=0x5d5634 "snat-ct-zone", smap=0x20f4d90) at lib/smap.c:208 4 smap_get (smap=0x20f4d90, key=key@entry=0x5d5634 "snat-ct-zone") at lib/smap.c:200 5 0x0000000000419898 in get_common_nat_zone (ldp=0x20a8c40, ldp=0x20a8c40) at controller/lflow.c:831 6 add_matches_to_flow_table (lflow=lflow@entry=0x21ddf90, ldp=ldp@entry=0x20a8c40, matches=matches@entry=0x211bb50, ptable=ptable@entry=10 '\n', output_ptable=output_ptable@entry=37 '%', ovnacts=ovnacts@entry=0x7ffd19b99de0, ingress=true, l_ctx_in=0x7ffd19b9a300, l_ctx_out=0x7ffd19b9a2c0) at controller/lflow.c:892 7 0x000000000041a29b in consider_logical_flow__ (lflow=lflow@entry=0x21ddf90, dp=0x20eb720, l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:1207 8 0x000000000041a587 in consider_logical_flow (lflow=lflow@entry=0x21ddf90, is_recompute=is_recompute@entry=false, l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:1276 9 0x000000000041e30f in lflow_handle_changed_flows (l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:271 10 0x0000000000439fc7 in lflow_output_sb_logical_flow_handler (node=0x7ffd19ba5b70, data=<optimized out>) at controller/ovn-controller.c:4045 11 0x00000000004682fe in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at lib/inc-proc-eng.c:441 12 engine_run_node (recompute_allowed=true, node=0x7ffd19ba5b70) at lib/inc-proc-eng.c:503 13 engine_run (recompute_allowed=recompute_allowed@entry=true) at lib/inc-proc-eng.c:528 14 0x000000000040ade2 in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:5709 Signed-off-by: Xavier Simonart <xsimonar@redhat.com> --- v2: Fixed potential memory leak. v3: - Updated based on Dumitru's feedback: style & fixed another potential memory leak. - Modify test case to detect potential memory leak. - Rebased on latest main. --- controller/local_data.c | 14 ++++++++++++ controller/local_data.h | 4 ++++ controller/ovn-controller.c | 23 ++++++++++++++++++++ tests/ovn.at | 43 +++++++++++++++++++++++++++++++++++++ 4 files changed, 84 insertions(+)