Message ID | 20220615094610.2343717-1-xsimonar@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev,v2] controller: avoid recomputes triggered by SBDB Port_Binding updates. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On 6/15/22 11:46, Xavier Simonart wrote: > When VIF ports are claimed on a chassis, SBDB Port_Binding table is updated. > If the SBDB IDL is still is read-only ("in transaction") when such a update > is required, the update is not possible and recompute is triggered through > I+P failure. > > This situation can happen: > - after updating Port_Binding->chassis to SBDB for one port, in a following > iteration, ovn-controller handles Interface:external_ids:ovn-installed > (for the same port) while SBDB is still read-only. > - after updating Port_Binding->chassis to SBDB for one port, in a following > iteration, ovn-controller updates Port_Binding->chassis for another port, > while SBDB is still read-only. > > This patch prevent the recompute, by having the if-status module > updating the Port_Binding chassis (if needed) when possible. > This does not delay Port_Binding chassis update compared to before this patch. > - With the patch, Port_Binding chassis will be updated as soon as SBDB is > again writable, without recompute. > - Without the patch, Port_Binding chassis was updated as soon as SBDB was > again writable, through a recompute. > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > --- Hi Xavier, Overall I think the approach is good. I do have a question related to I-P of tracked datapaths/ports below and some more rather minor comments. > v2: - handled Dumitru's comments. > - handled Han's comments, mainly ensure we moved out of CLAIMED state > only after updating pb->chassis to guarentee physical flows are installed > when ovn-installed is updated in OVS. > - slighly reorganize the code to isolate 'notify_up = false' cases in > claim_port (i.e. ports such as virtual ports), in the idea of making > future patch preventing recomputes when virtual ports are claimed. > - updated test case to cause more race conditions. > - rebased on origin/main > - note that "additional chassis" as now supported by > "Support LSP:options:requested-chassis as a list" might still cause > recomputes. > - fixed missing flows when Port_Binding chassis was updated by mgr_update > w/o any lflow recalculation. > --- > controller/binding.c | 154 ++++++++++++++++++++++--------- > controller/binding.h | 15 ++- > controller/if-status.c | 179 ++++++++++++++++++++++++++++++++---- > controller/if-status.h | 17 +++- > controller/ovn-controller.c | 121 +++++++++++++++++++++++- > tests/ovn.at | 113 ++++++++++++++++++++++- > 6 files changed, 528 insertions(+), 71 deletions(-) > > diff --git a/controller/binding.c b/controller/binding.c > index 2279570f9..b21577f71 100644 > --- a/controller/binding.c > +++ b/controller/binding.c > @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash *local_bindings, > } > > bool > -local_binding_is_up(struct shash *local_bindings, const char *pb_name) > +local_binding_is_up(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec) > { > struct local_binding *lbinding = > local_binding_find(local_bindings, pb_name); > struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); > + > + if (b_lport && b_lport->pb->chassis != chassis_rec) { > + return false; > + } > + > if (lbinding && b_lport && lbinding->iface) { > if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > return false; > @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, const char *pb_name) > } > > bool > -local_binding_is_down(struct shash *local_bindings, const char *pb_name) > +local_binding_is_down(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec) > { > struct local_binding *lbinding = > local_binding_find(local_bindings, pb_name); > > struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); > > + if (b_lport) { > + if (b_lport->pb->chassis == chassis_rec) { > + return false; > + } else if (b_lport->pb->chassis) { > + VLOG_DBG("lport %s already claimed by other chassis", > + b_lport->pb->logical_port); > + } > + } > + > if (!lbinding) { > return true; > } > @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type) > OVS_NOT_REACHED(); > } > > -/* For newly claimed ports, if 'notify_up' is 'false': > +void > +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool is_set) > +{ > + if (pb->chassis != chassis_rec) { > + if (is_set) { > + if (pb->chassis) { > + VLOG_INFO("Changing chassis for lport %s from %s to %s.", > + pb->logical_port, pb->chassis->name, > + chassis_rec->name); > + } else { > + VLOG_INFO("Claiming lport %s for this chassis.", > + pb->logical_port); > + } > + for (int i = 0; i < pb->n_mac; i++) { > + VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); > + } > + sbrec_port_binding_set_chassis(pb, chassis_rec); > + } > + } else if (!is_set) { > + sbrec_port_binding_set_chassis(pb, NULL); > + } > +} > + > +void > +local_binding_set_pb(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapaths, bool is_set) > +{ > + struct local_binding *lbinding = > + local_binding_find(local_bindings, pb_name); > + struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); > + > + if (b_lport) { > + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > + if (tracked_datapaths) { > + update_lport_tracking(b_lport->pb, tracked_datapaths, true); > + } > + } > +} > + > +/* For newly claimed ports: > * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., for > * container and virtual ports). > - * Otherwise request a notification to be sent when the OVS flows > - * corresponding to 'pb' have been installed. > + * > + * Returns false if lport is not claimed due to 'sb_readonly'. > + * Returns true otherwise. > * > * Note: > - * Updates (directly or through a notification) the 'pb->up' field only if > - * it's explicitly set to 'false'. > + * Updates the 'pb->up' field only if it's explicitly set to 'false'. > * This is to ensure compatibility with older versions of ovn-northd. > */ > -static void > +static bool > claimed_lport_set_up(const struct sbrec_port_binding *pb, > const struct sbrec_port_binding *parent_pb, > - const struct sbrec_chassis *chassis_rec, > - bool notify_up, struct if_status_mgr *if_mgr) > + bool sb_readonly) > { > - if (!notify_up) { > - bool up = true; > - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > + /* When notify_up is false in claim_port(), no state is created > + * by if_status_mgr. In such cases, return false (i.e. trigger recompute) > + * if we can't update sb (because it is readonly). > + */ > + bool up = true; > + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > + if (!sb_readonly) { > if (pb->n_up) { > sbrec_port_binding_set_up(pb, &up, 1); > } > + } else if (pb->n_up && !pb->up[0]) { > + return false; > } > - return; > - } > - > - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > } > + return true; > } > > typedef void (*set_func)(const struct sbrec_port_binding *pb, > @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb, > struct hmap *tracked_datapaths, > struct if_status_mgr *if_mgr) > { > - if (!sb_readonly) { > - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, if_mgr); > - } > - > enum can_bind can_bind = lport_can_bind_on_this_chassis(chassis_rec, pb); > bool update_tracked = false; > > if (can_bind == CAN_BIND_AS_MAIN) { > if (pb->chassis != chassis_rec) { > - if (sb_readonly) { > - return false; > - } > - > - if (pb->chassis) { > - VLOG_INFO("Changing chassis for lport %s from %s to %s.", > - pb->logical_port, pb->chassis->name, > - chassis_rec->name); > - } else { > - VLOG_INFO("Claiming lport %s for this chassis.", > - pb->logical_port); > - } > - for (size_t i = 0; i < pb->n_mac; i++) { > - VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); > - } > - > - sbrec_port_binding_set_chassis(pb, chassis_rec); > if (is_additional_chassis(pb, chassis_rec)) { > + if (sb_readonly) { > + return false; > + } > remove_additional_chassis(pb, chassis_rec); > } > update_tracked = true; > } > + if (!notify_up) { > + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { > + return false; > + } > + if (pb->chassis != chassis_rec) { > + if (sb_readonly) { > + return false; > + } > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > + } > + } else { > + if ((pb->chassis != chassis_rec) || (pb->n_up && !pb->up[0])) { > + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > + sb_readonly); > + } > + } > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > if (!is_additional_chassis(pb, chassis_rec)) { > if (sb_readonly) { > @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, > */ > static bool > release_lport_main_chassis(const struct sbrec_port_binding *pb, > - bool sb_readonly) > + bool sb_readonly, > + struct if_status_mgr *if_mgr) > { > if (pb->encap) { > if (sb_readonly) { > @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, > sbrec_port_binding_set_encap(pb, NULL); > } > > + /* If sb readonly, pb->chassis unset through if-status if present. */ > if (pb->chassis) { > - if (sb_readonly) { > + if (!sb_readonly) { > + sbrec_port_binding_set_chassis(pb, NULL); > + } else if (!if_status_mgr_iface_is_present(if_mgr, pb->logical_port)) { I would really love it if we could get rid of the !notify_up special case for ports with parents in the future. That will also allow us to get rid of this check too, right? It can be a future patch though. > return false; > } > - sbrec_port_binding_set_chassis(pb, NULL); > } > > if (pb->virtual_parent) { > @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, > sbrec_port_binding_set_virtual_parent(pb, NULL); > } > > - VLOG_INFO("Releasing lport %s from this chassis.", pb->logical_port); > + VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)", > + pb->logical_port, sb_readonly); > return true; > } > > @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb, > struct hmap *tracked_datapaths, struct if_status_mgr *if_mgr) > { > if (pb->chassis == chassis_rec) { > - if (!release_lport_main_chassis(pb, sb_readonly)) { > + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { > return false; > } > } else if (is_additional_chassis(pb, chassis_rec)) { > @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb, > b_lport->lbinding->iface, > !b_ctx_in->ovnsb_idl_txn, > !parent_pb, b_ctx_out->tracked_dp_bindings, > - b_ctx_out->if_mgr)){ > + b_ctx_out->if_mgr)) { > return false; > } > > @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding *pb, > enum can_bind can_bind = lport_can_bind_on_this_chassis( > b_ctx_in->chassis_rec, pb); > if (can_bind == CAN_BIND_AS_MAIN) { > - if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) { > + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, > + b_ctx_out->if_mgr)) { > return false; > } > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > diff --git a/controller/binding.h b/controller/binding.h > index 1fed06674..d20659b0b 100644 > --- a/controller/binding.h > +++ b/controller/binding.h > @@ -151,8 +151,10 @@ const struct sbrec_port_binding *local_binding_get_primary_pb( > ofp_port_t local_binding_get_lport_ofport(const struct shash *local_bindings, > const char *pb_name); > > -bool local_binding_is_up(struct shash *local_bindings, const char *pb_name); > -bool local_binding_is_down(struct shash *local_bindings, const char *pb_name); > +bool local_binding_is_up(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *); > +bool local_binding_is_down(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *); > void local_binding_set_up(struct shash *local_bindings, const char *pb_name, > const struct sbrec_chassis *chassis_rec, > const char *ts_now_str, bool sb_readonly, > @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash *local_bindings, const char *pb_name, > void local_binding_set_down(struct shash *local_bindings, const char *pb_name, > const struct sbrec_chassis *chassis_rec, > bool sb_readonly, bool ovs_readonly); > - > +void local_binding_set_pb(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapaths, > + bool is_set); > void binding_register_ovs_idl(struct ovsdb_idl *); > void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); > bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct local_binding_data *, struct ds *); > bool is_additional_chassis(const struct sbrec_port_binding *pb, > const struct sbrec_chassis *chassis_rec); > > +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool is_set); > + > /* Corresponds to each Port_Binding.type. */ > enum en_lport_type { > LP_UNKNOWN, > diff --git a/controller/if-status.c b/controller/if-status.c > index ad61844d8..5b0eef859 100644 > --- a/controller/if-status.c > +++ b/controller/if-status.c > @@ -24,6 +24,7 @@ > #include "lib/util.h" > #include "timeval.h" > #include "openvswitch/vlog.h" > +#include "lib/ovn-sb-idl.h" > > VLOG_DEFINE_THIS_MODULE(if_status); > > @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > */ > > enum if_state { > - OIF_CLAIMED, /* Newly claimed interface. */ > - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are still > - * being installed. > + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not yet updated. > + */ > + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis successfully > + * updated in SB and for which flows are still being > + * installed. > */ > OIF_MARK_UP, /* Interface with flows successfully installed in OVS > * but not yet marked "up" in the binding module (in > @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > [OIF_INSTALLED] = "INSTALLED", > }; > > +/* > + * +----------------------+ > + * +---> | | > + * | +-> | NULL | <--------------------------------------+++-+ > + * | | +----------------------+ | > + * | | ^ release_iface | claim_iface | > + * | | | V - sbrec_update_chassis(if sb is rw) | > + * | | +----------------------+ | > + * | | | | <----------------------------------------+ | > + * | | | CLAIMED | <--------------------------------------+ | | > + * | | +----------------------+ | | | > + * | | | mgr_update(when sb is rw) | | | > + * | | release_iface | - sbrec_update_chassis | | | > + * | | | - request seqno | | | > + * | | V | | | > + * | | +----------------------+ | | | > + * | +-- | | mgr_run(seqno not rcvd) | | | > + * | | INSTALL_FLOWS | - set port down in sb | | | > + * | | | mgr_update() | | | > + * | +----------------------+ - sbrec_update_chassis if needed | | | > + * | | | | | > + * | | mgr_run(seqno rcvd) | | | > + * | | - set port up in sb | | | > + * | release_iface | - set ovn-installed in ovs | | | > + * | V | | | > + * | +----------------------+ | | | > + * | | | mgr_run() | | | > + * +-- | MARK_UP | - set port up in sb | | | > + * | | - set ovn-installed in ovs | | | > + * | | mgr_update() | | | > + * +----------------------+ - sbrec_update_chassis if needed | | | > + * | | | | > + * | mgr_update(rcvd port up / ovn_installed & chassis set) | | | > + * V | | | > + * +----------------------+ | | | > + * | INSTALLED | ------------> claim_iface ---------------+ | | > + * +----------------------+ | | > + * | | | > + * | release_iface | | > + * V | | > + * +----------------------+ | | > + * | | ------------> claim_iface -----------------+ | > + * | MARK_DOWN | ------> mgr_update(rcvd port down) ----------+ > + * | | mgr_run() > + * | | - set port down in sb > + * | | mgr_update() > + * +----------------------+ - sbrec_update_chassis(NULL) > + */ > + Thanks for adding this, it's really useful! > struct ovs_iface { > char *id; /* Extracted from OVS external_ids.iface_id. */ > enum if_state state; /* State of the interface in the state machine. */ > @@ -85,6 +137,7 @@ struct ovs_iface { > * be fully programmed in OVS. Only used in state > * OIF_INSTALL_FLOWS. > */ > + bool chassis_update_required; /* If true, pb->chassis must be updated. */ > }; > > static uint64_t ifaces_usage; > @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) > } > > void > -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) > +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > + const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool sb_readonly) > { > + const char *iface_id = pb->logical_port; > struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); > > if (!iface) { > iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > } > - > + if (!sb_readonly) { > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > + iface->chassis_update_required = false; > + } else { > + iface->chassis_update_required = true; > + } > switch (iface->state) { > case OIF_CLAIMED: > case OIF_INSTALL_FLOWS: > @@ -182,6 +244,13 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) > } > } > > +bool > +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char *iface_id) > +{ > + struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); > + return (!!iface); Nit: I'd rewrite this as: return !!shash_find_data(&mgr->ifaces, iface_id); > +} > + > void > if_status_mgr_release_iface(struct if_status_mgr *mgr, const char *iface_id) > { > @@ -246,9 +315,43 @@ if_status_mgr_delete_iface(struct if_status_mgr *mgr, const char *iface_id) > } > } > > +bool > +if_status_handle_claims(struct if_status_mgr *mgr, > + struct local_binding_data *binding_data, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapath, > + bool sb_readonly) > +{ > + if (!binding_data) { > + return false; > + } > + > + struct shash *bindings = &binding_data->bindings; > + struct hmapx_node *node; > + > + bool rc = false; > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { We don't need the _SAFE here as far as I can tell, right? > + struct ovs_iface *iface = node->data; > + if (sb_readonly) { > + return false; > + } > + if (iface->chassis_update_required) { > + VLOG_INFO("if_status_handle_claims for %s", iface->id); > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + tracked_datapath, true); > + rc = true; > + } > + iface->chassis_update_required = false; > + } > + return rc; > +} > + > void > if_status_mgr_update(struct if_status_mgr *mgr, > - struct local_binding_data *binding_data) > + struct local_binding_data *binding_data, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapath, > + bool sb_readonly) > { > if (!binding_data) { > return; > @@ -257,13 +360,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, > struct shash *bindings = &binding_data->bindings; > struct hmapx_node *node; > > + /* Interfaces in OIF_MARK_UP state have already set their pb->chassis. > + * However, it might have been reset by another hv. > + */ > /* Move all interfaces that have been confirmed "up" by the binding module, > * from OIF_MARK_UP to OIF_INSTALLED. > */ > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { > struct ovs_iface *iface = node->data; > > - if (local_binding_is_up(bindings, iface->id)) { > + if (iface->chassis_update_required) { > + if (!sb_readonly) { > + iface->chassis_update_required = false; > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + tracked_datapath, true); > + } else { > + continue; > + } > + } > + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { > ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > } > } > @@ -274,23 +389,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > struct ovs_iface *iface = node->data; > > - if (local_binding_is_down(bindings, iface->id)) { > + if (!sb_readonly) { > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + tracked_datapath, false); > + } > + if (local_binding_is_down(bindings, iface->id, chassis_rec)) { > ovs_iface_destroy(mgr, iface); > } > } > > - /* Register for a notification about flows being installed in OVS for all > - * newly claimed interfaces. > + if (!sb_readonly) { > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > + struct ovs_iface *iface = node->data; > + > + if (iface->chassis_update_required) { > + iface->chassis_update_required = false; > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + tracked_datapath, true); > + } > + } > + } > + > + /* Update Port_Binding->chassis for newly claimed interfaces > + * Register for a notification about flows being installed in OVS for all > + * newly claimed interfaces for which we could update pb->chassis. > * > * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > */ > - bool new_ifaces = false; > - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > - struct ovs_iface *iface = node->data; > > - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > - iface->install_seqno = mgr->iface_seqno + 1; > - new_ifaces = true; > + bool new_ifaces = false; > + if (!sb_readonly) { > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > + struct ovs_iface *iface = node->data; > + if (iface->chassis_update_required) { > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + tracked_datapath, true); > + } > + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > + iface->install_seqno = mgr->iface_seqno + 1; > + iface->chassis_update_required = false; > + new_ifaces = true; > + } > + } else { > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > + struct ovs_iface *iface = node->data; > + VLOG_INFO("Not updating pb chassis for %s now as sb is readonly", > + iface->id); Should we rate limit this? Nit: indentation. > + } > } > > /* Request a seqno update when the flows for new interfaces have been > @@ -403,7 +548,7 @@ if_status_mgr_update_bindings(struct if_status_mgr *mgr, > struct hmapx_node *node; > > /* Notify the binding module to set "down" all bindings that are still > - * in the process of being installed in OVS, i.e., are not yet instsalled. > + * in the process of being installed in OVS, i.e., are not yet installed. > */ > HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > struct ovs_iface *iface = node->data; > diff --git a/controller/if-status.h b/controller/if-status.h > index bb8a3950d..9b1200ff0 100644 > --- a/controller/if-status.h > +++ b/controller/if-status.h > @@ -27,15 +27,28 @@ struct if_status_mgr *if_status_mgr_create(void); > void if_status_mgr_clear(struct if_status_mgr *); > void if_status_mgr_destroy(struct if_status_mgr *); > > -void if_status_mgr_claim_iface(struct if_status_mgr *, const char *iface_id); > +void if_status_mgr_claim_iface(struct if_status_mgr *, > + const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool sb_readonly); > void if_status_mgr_release_iface(struct if_status_mgr *, const char *iface_id); > void if_status_mgr_delete_iface(struct if_status_mgr *, const char *iface_id); > > -void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *); > +void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *, > + const struct sbrec_chassis *chassis, > + struct hmap *tracked_dp_bindings, > + bool sb_readonly); > void if_status_mgr_run(struct if_status_mgr *mgr, struct local_binding_data *, > const struct sbrec_chassis *, > bool sb_readonly, bool ovs_readonly); > void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > struct simap *usage); > +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > + const char *iface_id); > +bool if_status_handle_claims(struct if_status_mgr *mgr, > + struct local_binding_data *binding_data, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapath, > + bool sb_readonly); > > # endif /* controller/if-status.h */ > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 2793c8687..1c3cac2c1 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -1417,6 +1417,111 @@ en_runtime_data_run(struct engine_node *node, void *data) > engine_set_node_state(node, EN_UPDATED); > } > > +struct ed_type_sb_ro { > + bool sb_readonly; > +}; > + > +static void * > +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > + struct engine_arg *arg OVS_UNUSED) > +{ > + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > + return data; > +} > + > +static void > +en_sb_ro_run(struct engine_node *node, void *data) > +{ > + struct ed_type_sb_ro *sb_ro_data = data; > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > + if (sb_ro_data->sb_readonly != sb_readonly) { > + sb_ro_data->sb_readonly = sb_readonly; > + if (!sb_ro_data->sb_readonly) { > + engine_set_node_state(node, EN_UPDATED); > + } > + } > +} > + > +static void > +en_sb_ro_cleanup(void *data OVS_UNUSED) > +{ > +} > + > +static bool > +pb_claims_sb_ro_handler(struct engine_node *node, void *data OVS_UNUSED) > +{ > + engine_set_node_state(node, EN_UPDATED); > + return true; > +} > + > +static void * > +en_pb_claims_init(struct engine_node *node OVS_UNUSED, > + struct engine_arg *arg OVS_UNUSED) > +{ > + return NULL; > +} > + > +static void > +en_pb_claims_run(struct engine_node *node OVS_UNUSED, void *data_ OVS_UNUSED) > +{ > +} > + > +static void > +en_pb_claims_cleanup(void *data OVS_UNUSED) > +{ > +} Not really a problem of this patch but maybe we can find a way to avoid having to explcitly define cleanup/run callbacks that don't do much. Maybe some sensible defaults for engine node callbacks would make sense. Han, what do you think? > + > +static bool > +pb_claims_runtime_data_handler(struct engine_node *node OVS_UNUSED, > + void *data OVS_UNUSED) > +{ > + return true; > +} > + > +static bool > +lflow_output_runtime_data_handler(struct engine_node *node, > + void *data); > +static bool > +lflow_output_pb_claims_handler(struct engine_node *node, void *data OVS_UNUSED) > +{ > + const struct sbrec_chassis *chassis = NULL; > + > + struct ovsrec_open_vswitch_table *ovs_table = > + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > + engine_get_input("OVS_open_vswitch", node)); > + > + const char *chassis_id = get_ovs_chassis_id(ovs_table); > + > + struct ovsdb_idl_index *sbrec_chassis_by_name = > + engine_ovsdb_node_get_index( > + engine_get_input("SB_chassis", node), > + "name"); > + > + if (chassis_id) { > + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); > + } > + if (chassis) { > + struct ed_type_runtime_data *runtime_data = > + engine_get_input_data("runtime_data", node); > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > + struct controller_engine_ctx *ctrl_ctx = > + engine_get_context()->client_ctx; > + > + if (if_status_handle_claims(ctrl_ctx->if_mgr, > + &runtime_data->lbinding_data, > + chassis, > + &runtime_data->tracked_dp_bindings, > + sb_readonly)) { > + struct engine_node *rt_node = > + engine_get_input("runtime_data", node); > + if (!engine_node_changed(rt_node)) { > + lflow_output_runtime_data_handler(node, data); Instead of explicitly calling another input's handler here I think I would factor out the logic of the handler into a separate helper function which we could call from lflow_output_runtime_data_handler() and here. Based on its return value we could set the node state, if needed. > + } > + } > + } > + return true; > +} > + > static bool > runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data) > { > @@ -3438,6 +3543,8 @@ main(int argc, char *argv[]) > stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > > /* Define inc-proc-engine nodes. */ > + ENGINE_NODE(sb_ro, "sb_ro"); > + ENGINE_NODE(pb_claims, "pb_claims"); > ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); > ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > "ovs_interface_shadow"); > @@ -3477,6 +3584,13 @@ main(int argc, char *argv[]) > engine_add_input(&en_non_vif_data, &en_ovs_interface, > non_vif_data_ovs_iface_handler); > > + /* This ensures we run after runtime data. */ > + engine_add_input(&en_pb_claims, &en_runtime_data, > + pb_claims_runtime_data_handler); This is neat but I think we need to at least add a comment to lflow_output_pb_claims_handler() to explain why it's ok to access the runtime_data input there. Also, we can use engine_noop_handler() instead of defining an explicit one. > + > + /* This ensures we always run when sb becomes writable. */ > + engine_add_input(&en_pb_claims, &en_sb_ro, pb_claims_sb_ro_handler); > + > /* Note: The order of inputs is important, all OVS interface changes must > * be handled before any ct_zone changes. > */ > @@ -3518,6 +3632,8 @@ main(int argc, char *argv[]) > lflow_output_port_groups_handler); > engine_add_input(&en_lflow_output, &en_runtime_data, > lflow_output_runtime_data_handler); > + engine_add_input(&en_lflow_output, &en_pb_claims, > + lflow_output_pb_claims_handler); > engine_add_input(&en_lflow_output, &en_non_vif_data, > NULL); > > @@ -3999,7 +4115,10 @@ main(int argc, char *argv[]) > runtime_data ? &runtime_data->lbinding_data : NULL; > stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > time_msec()); > - if_status_mgr_update(if_mgr, binding_data); > + if_status_mgr_update(if_mgr, binding_data, chassis, > + (runtime_data > + ? &runtime_data->tracked_dp_bindings > + : NULL), !ovnsb_idl_txn); I'm not completely sure that this is correct. The engine has already run so any nodes that depend on runtime_data have already been processed. But now we might be adding more "tracked changes" to runtime_data, e.g.: if_status_mgr_update() -> local_binding_set_pb() -> update_lport_tracking() -> tracked_datapath_lport_add(pb, TRACKED_RESOURCE_NEW, ..) After this point these tracked changes will not be processed anymore and will be cleared before the next run starts in the next iteration in en_runtime_data_clear_tracked_data(). The question is: are we ever adding more "tracked changes" to runtime_data->tracked_dp_bindings here? If the answer is "yes" then this approach might be problematic. If the answer is "no" then we might as well avoid passing the runtime_data->tracked_dp_bindings all the way down and just call local_binding_set_pb() with tracked_datapaths set to NULL where needed. > stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > time_msec()); > > diff --git a/tests/ovn.at b/tests/ovn.at > index 59d51f3e0..e9c061939 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -14990,7 +14990,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], > echo "verifying that lsp0 binding moves when requested-chassis is changed" > > ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > + > +# We might see multiple "Releasing lport ...", when sb is read only > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > + > wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > > # (6) Chassis hv2 should add flows and hv1 should not. > @@ -15036,7 +15039,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ig > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore]) > > check ovn-nbctl --wait=hv lsp-set-options lsp0 requested-chassis=non-existant-chassis > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > check ovn-nbctl --wait=hv sync > wait_column '' Port_Binding chasssi logical_port=lsp0 > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [1], []) > @@ -32041,3 +32044,109 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c "00:00:00:00:10:30") = 0]) > OVN_CLEANUP([hv1]) > AT_CLEANUP > ]) > + > +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() macro. > +m4_define([OVN_NBCTL], [ > + command="${command} -- $1" > +]) > + > +m4_define([RUN_OVN_NBCTL], [ > + check ovn-nbctl ${command} > + unset command > +]) Why can't we factor this out to ovn-macros.at (and remove the version from perf-northd.at too)? > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([recomputes]) As this is a test that is also performance related I think we could add AT_KEYWORDS([performance]). We could also add that to the tests in perf-northd.at. But we can do all this as a follow up, I guess, to allow us to run all performance related tests separately if needed. > +ovn_start > + > +n_hv=4 > + > +# Add chassis > +net_add n1 > +for i in $(seq 1 $n_hv); do > + sim_add hv$i > + as hv$i > + check ovs-vsctl add-br br-phys > + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys > + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > +done > + > +add_switch_ports() { > + start_port=$1 > + end_port=$2 > + nb_hv=$3 > + bulk_size=$4 > + for ((i=start_port; i<end_port; )) do > + start_bulk=$i > + for hv in $(seq 1 $nb_hv); do > + end_bulk=$((start_bulk+bulk_size-1)) > + for port in $(seq $start_bulk $end_bulk); do > + logical_switch_port=lsp${port} > + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > + OVN_NBCTL(lsp-set-addresses $logical_switch_port dynamic) > + done > + start_bulk=$((end_bulk+1)) > + done > + RUN_OVN_NBCTL() > + > + start_bulk=$i > + for hv in $(seq 1 $nb_hv); do > + end_bulk=$((start_bulk+bulk_size-1)) > + for port in $(seq $start_bulk $end_bulk); do > + logical_switch_port=lsp${port} > + as hv$hv ovs-vsctl \ > + --no-wait -- add-port br-int vif${port} \ > + -- set Interface vif${port} external_ids:iface-id=$logical_switch_port > + done > + start_bulk=$((end_bulk+1)) > + done > + i=$((end_bulk+1)) > + done > + > + # Claiming still WIP here but check for recomputes so test can fail faster > + AT_CHECK([test $(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) == $lflow_run]) > +} > +check ovn-nbctl ls-add ls1 > +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16 > +check ovn-nbctl set Logical_Switch ls1 other_config:exclude_ips=10.1.255.254 > + > +check ovn-nbctl lr-add lr1 > +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 type=router options:router-port=lrp0 addresses=dynamic > +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 > +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > + > +check ovn-nbctl --wait=hv sync > +lflow_run=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > + > +add_switch_ports 1 1000 $n_hv 5 > + > +wait_for_ports_up > +check ovn-nbctl --wait=hv sync > + > +for i in $(seq 1 $n_hv); do > + pid=$(cat hv${i}/ovn-controller.pid) > + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > +done > + > +n_pid=$(cat northd/ovn-northd.pid) > +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > + > +echo "Total Northd User Time: $n_u" > +echo "Total Northd System Time: $n_s" > +echo "Total Controller User Time: $u" > +echo "Total Controller System Time: $s" > + > +lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > +n_recomputes=`expr $lflow_run1 - $lflow_run` > +echo "$n_recomputes recomputes" > + > +AT_CHECK([test $lflow_run1 == $lflow_run]) > + > +for i in $(seq 2 $n_hv); do > + OVN_CLEANUP_SBOX([hv$i]) > +done > +OVN_CLEANUP([hv1]) > +AT_CLEANUP > +]) Thanks, Dumitru
Hi Dumitru Thanks for the review and the comments. I'll send an update (v3) taking into account your comments. See comments/feedback embedded below Thanks Xavier On Thu, Jun 16, 2022 at 12:55 PM Dumitru Ceara <dceara@redhat.com> wrote: > On 6/15/22 11:46, Xavier Simonart wrote: > > When VIF ports are claimed on a chassis, SBDB Port_Binding table is > updated. > > If the SBDB IDL is still is read-only ("in transaction") when such a > update > > is required, the update is not possible and recompute is triggered > through > > I+P failure. > > > > This situation can happen: > > - after updating Port_Binding->chassis to SBDB for one port, in a > following > > iteration, ovn-controller handles Interface:external_ids:ovn-installed > > (for the same port) while SBDB is still read-only. > > - after updating Port_Binding->chassis to SBDB for one port, in a > following > > iteration, ovn-controller updates Port_Binding->chassis for another > port, > > while SBDB is still read-only. > > > > This patch prevent the recompute, by having the if-status module > > updating the Port_Binding chassis (if needed) when possible. > > This does not delay Port_Binding chassis update compared to before this > patch. > > - With the patch, Port_Binding chassis will be updated as soon as SBDB is > > again writable, without recompute. > > - Without the patch, Port_Binding chassis was updated as soon as SBDB was > > again writable, through a recompute. > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > > > --- > > Hi Xavier, > > Overall I think the approach is good. I do have a question related to > I-P of tracked datapaths/ports below and some more rather minor comments. > > > v2: - handled Dumitru's comments. > > - handled Han's comments, mainly ensure we moved out of CLAIMED > state > > only after updating pb->chassis to guarentee physical flows are > installed > > when ovn-installed is updated in OVS. > > - slighly reorganize the code to isolate 'notify_up = false' cases > in > > claim_port (i.e. ports such as virtual ports), in the idea of > making > > future patch preventing recomputes when virtual ports are claimed. > > - updated test case to cause more race conditions. > > - rebased on origin/main > > - note that "additional chassis" as now supported by > > "Support LSP:options:requested-chassis as a list" might still > cause > > recomputes. > > - fixed missing flows when Port_Binding chassis was updated by > mgr_update > > w/o any lflow recalculation. > > --- > > controller/binding.c | 154 ++++++++++++++++++++++--------- > > controller/binding.h | 15 ++- > > controller/if-status.c | 179 ++++++++++++++++++++++++++++++++---- > > controller/if-status.h | 17 +++- > > controller/ovn-controller.c | 121 +++++++++++++++++++++++- > > tests/ovn.at | 113 ++++++++++++++++++++++- > > 6 files changed, 528 insertions(+), 71 deletions(-) > > > > diff --git a/controller/binding.c b/controller/binding.c > > index 2279570f9..b21577f71 100644 > > --- a/controller/binding.c > > +++ b/controller/binding.c > > @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash > *local_bindings, > > } > > > > bool > > -local_binding_is_up(struct shash *local_bindings, const char *pb_name) > > +local_binding_is_up(struct shash *local_bindings, const char *pb_name, > > + const struct sbrec_chassis *chassis_rec) > > { > > struct local_binding *lbinding = > > local_binding_find(local_bindings, pb_name); > > struct binding_lport *b_lport = > local_binding_get_primary_lport(lbinding); > > + > > + if (b_lport && b_lport->pb->chassis != chassis_rec) { > > + return false; > > + } > > + > > if (lbinding && b_lport && lbinding->iface) { > > if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > > return false; > > @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, > const char *pb_name) > > } > > > > bool > > -local_binding_is_down(struct shash *local_bindings, const char *pb_name) > > +local_binding_is_down(struct shash *local_bindings, const char *pb_name, > > + const struct sbrec_chassis *chassis_rec) > > { > > struct local_binding *lbinding = > > local_binding_find(local_bindings, pb_name); > > > > struct binding_lport *b_lport = > local_binding_get_primary_lport(lbinding); > > > > + if (b_lport) { > > + if (b_lport->pb->chassis == chassis_rec) { > > + return false; > > + } else if (b_lport->pb->chassis) { > > + VLOG_DBG("lport %s already claimed by other chassis", > > + b_lport->pb->logical_port); > > + } > > + } > > + > > if (!lbinding) { > > return true; > > } > > @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type) > > OVS_NOT_REACHED(); > > } > > > > -/* For newly claimed ports, if 'notify_up' is 'false': > > +void > > +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > > + const struct sbrec_chassis *chassis_rec, > > + bool is_set) > > +{ > > + if (pb->chassis != chassis_rec) { > > + if (is_set) { > > + if (pb->chassis) { > > + VLOG_INFO("Changing chassis for lport %s from %s to > %s.", > > + pb->logical_port, pb->chassis->name, > > + chassis_rec->name); > > + } else { > > + VLOG_INFO("Claiming lport %s for this chassis.", > > + pb->logical_port); > > + } > > + for (int i = 0; i < pb->n_mac; i++) { > > + VLOG_INFO("%s: Claiming %s", pb->logical_port, > pb->mac[i]); > > + } > > + sbrec_port_binding_set_chassis(pb, chassis_rec); > > + } > > + } else if (!is_set) { > > + sbrec_port_binding_set_chassis(pb, NULL); > > + } > > +} > > + > > +void > > +local_binding_set_pb(struct shash *local_bindings, const char *pb_name, > > + const struct sbrec_chassis *chassis_rec, > > + struct hmap *tracked_datapaths, bool is_set) > > +{ > > + struct local_binding *lbinding = > > + local_binding_find(local_bindings, pb_name); > > + struct binding_lport *b_lport = > local_binding_get_primary_lport(lbinding); > > + > > + if (b_lport) { > > + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > > + if (tracked_datapaths) { > > + update_lport_tracking(b_lport->pb, tracked_datapaths, true); > > + } > > + } > > +} > > + > > +/* For newly claimed ports: > > * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > > * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., > for > > * container and virtual ports). > > - * Otherwise request a notification to be sent when the OVS flows > > - * corresponding to 'pb' have been installed. > > + * > > + * Returns false if lport is not claimed due to 'sb_readonly'. > > + * Returns true otherwise. > > * > > * Note: > > - * Updates (directly or through a notification) the 'pb->up' field > only if > > - * it's explicitly set to 'false'. > > + * Updates the 'pb->up' field only if it's explicitly set to 'false'. > > * This is to ensure compatibility with older versions of ovn-northd. > > */ > > -static void > > +static bool > > claimed_lport_set_up(const struct sbrec_port_binding *pb, > > const struct sbrec_port_binding *parent_pb, > > - const struct sbrec_chassis *chassis_rec, > > - bool notify_up, struct if_status_mgr *if_mgr) > > + bool sb_readonly) > > { > > - if (!notify_up) { > > - bool up = true; > > - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > > + /* When notify_up is false in claim_port(), no state is created > > + * by if_status_mgr. In such cases, return false (i.e. trigger > recompute) > > + * if we can't update sb (because it is readonly). > > + */ > > + bool up = true; > > + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > > + if (!sb_readonly) { > > if (pb->n_up) { > > sbrec_port_binding_set_up(pb, &up, 1); > > } > > + } else if (pb->n_up && !pb->up[0]) { > > + return false; > > } > > - return; > > - } > > - > > - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > > - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > > } > > + return true; > > } > > > > typedef void (*set_func)(const struct sbrec_port_binding *pb, > > @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb, > > struct hmap *tracked_datapaths, > > struct if_status_mgr *if_mgr) > > { > > - if (!sb_readonly) { > > - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, > if_mgr); > > - } > > - > > enum can_bind can_bind = > lport_can_bind_on_this_chassis(chassis_rec, pb); > > bool update_tracked = false; > > > > if (can_bind == CAN_BIND_AS_MAIN) { > > if (pb->chassis != chassis_rec) { > > - if (sb_readonly) { > > - return false; > > - } > > - > > - if (pb->chassis) { > > - VLOG_INFO("Changing chassis for lport %s from %s to > %s.", > > - pb->logical_port, pb->chassis->name, > > - chassis_rec->name); > > - } else { > > - VLOG_INFO("Claiming lport %s for this chassis.", > > - pb->logical_port); > > - } > > - for (size_t i = 0; i < pb->n_mac; i++) { > > - VLOG_INFO("%s: Claiming %s", pb->logical_port, > pb->mac[i]); > > - } > > - > > - sbrec_port_binding_set_chassis(pb, chassis_rec); > > if (is_additional_chassis(pb, chassis_rec)) { > > + if (sb_readonly) { > > + return false; > > + } > > remove_additional_chassis(pb, chassis_rec); > > } > > update_tracked = true; > > } > > + if (!notify_up) { > > + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { > > + return false; > > + } > > + if (pb->chassis != chassis_rec) { > > + if (sb_readonly) { > > + return false; > > + } > > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > > + } > > + } else { > > + if ((pb->chassis != chassis_rec) || (pb->n_up && > !pb->up[0])) { > > + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > > + sb_readonly); > > + } > > + } > > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > > if (!is_additional_chassis(pb, chassis_rec)) { > > if (sb_readonly) { > > @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, > > */ > > static bool > > release_lport_main_chassis(const struct sbrec_port_binding *pb, > > - bool sb_readonly) > > + bool sb_readonly, > > + struct if_status_mgr *if_mgr) > > { > > if (pb->encap) { > > if (sb_readonly) { > > @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct > sbrec_port_binding *pb, > > sbrec_port_binding_set_encap(pb, NULL); > > } > > > > + /* If sb readonly, pb->chassis unset through if-status if present. > */ > > if (pb->chassis) { > > - if (sb_readonly) { > > + if (!sb_readonly) { > > + sbrec_port_binding_set_chassis(pb, NULL); > > + } else if (!if_status_mgr_iface_is_present(if_mgr, > pb->logical_port)) { > > I would really love it if we could get rid of the !notify_up special > case for ports with parents in the future. That will also allow us to > get rid of this check too, right? It can be a future patch though. > > ack. Will do in further patch. > > return false; > > } > > - sbrec_port_binding_set_chassis(pb, NULL); > > } > > > > if (pb->virtual_parent) { > > @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct > sbrec_port_binding *pb, > > sbrec_port_binding_set_virtual_parent(pb, NULL); > > } > > > > - VLOG_INFO("Releasing lport %s from this chassis.", > pb->logical_port); > > + VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)", > > + pb->logical_port, sb_readonly); > > return true; > > } > > > > @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb, > > struct hmap *tracked_datapaths, struct if_status_mgr > *if_mgr) > > { > > if (pb->chassis == chassis_rec) { > > - if (!release_lport_main_chassis(pb, sb_readonly)) { > > + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { > > return false; > > } > > } else if (is_additional_chassis(pb, chassis_rec)) { > > @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct > sbrec_port_binding *pb, > > b_lport->lbinding->iface, > > !b_ctx_in->ovnsb_idl_txn, > > !parent_pb, b_ctx_out->tracked_dp_bindings, > > - b_ctx_out->if_mgr)){ > > + b_ctx_out->if_mgr)) { > > return false; > > } > > > > @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding > *pb, > > enum can_bind can_bind = lport_can_bind_on_this_chassis( > > b_ctx_in->chassis_rec, pb); > > if (can_bind == CAN_BIND_AS_MAIN) { > > - if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) { > > + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, > > + b_ctx_out->if_mgr)) { > > return false; > > } > > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > > diff --git a/controller/binding.h b/controller/binding.h > > index 1fed06674..d20659b0b 100644 > > --- a/controller/binding.h > > +++ b/controller/binding.h > > @@ -151,8 +151,10 @@ const struct sbrec_port_binding > *local_binding_get_primary_pb( > > ofp_port_t local_binding_get_lport_ofport(const struct shash > *local_bindings, > > const char *pb_name); > > > > -bool local_binding_is_up(struct shash *local_bindings, const char > *pb_name); > > -bool local_binding_is_down(struct shash *local_bindings, const char > *pb_name); > > +bool local_binding_is_up(struct shash *local_bindings, const char > *pb_name, > > + const struct sbrec_chassis *); > > +bool local_binding_is_down(struct shash *local_bindings, const char > *pb_name, > > + const struct sbrec_chassis *); > > void local_binding_set_up(struct shash *local_bindings, const char > *pb_name, > > const struct sbrec_chassis *chassis_rec, > > const char *ts_now_str, bool sb_readonly, > > @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash > *local_bindings, const char *pb_name, > > void local_binding_set_down(struct shash *local_bindings, const char > *pb_name, > > const struct sbrec_chassis *chassis_rec, > > bool sb_readonly, bool ovs_readonly); > > - > > +void local_binding_set_pb(struct shash *local_bindings, const char > *pb_name, > > + const struct sbrec_chassis *chassis_rec, > > + struct hmap *tracked_datapaths, > > + bool is_set); > > void binding_register_ovs_idl(struct ovsdb_idl *); > > void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); > > bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > > @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct > local_binding_data *, struct ds *); > > bool is_additional_chassis(const struct sbrec_port_binding *pb, > > const struct sbrec_chassis *chassis_rec); > > > > +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > > + const struct sbrec_chassis *chassis_rec, > > + bool is_set); > > + > > /* Corresponds to each Port_Binding.type. */ > > enum en_lport_type { > > LP_UNKNOWN, > > diff --git a/controller/if-status.c b/controller/if-status.c > > index ad61844d8..5b0eef859 100644 > > --- a/controller/if-status.c > > +++ b/controller/if-status.c > > @@ -24,6 +24,7 @@ > > #include "lib/util.h" > > #include "timeval.h" > > #include "openvswitch/vlog.h" > > +#include "lib/ovn-sb-idl.h" > > > > VLOG_DEFINE_THIS_MODULE(if_status); > > > > @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > > */ > > > > enum if_state { > > - OIF_CLAIMED, /* Newly claimed interface. */ > > - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are > still > > - * being installed. > > + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not yet > updated. > > + */ > > + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis > successfully > > + * updated in SB and for which flows are still > being > > + * installed. > > */ > > OIF_MARK_UP, /* Interface with flows successfully installed > in OVS > > * but not yet marked "up" in the binding module > (in > > @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > > [OIF_INSTALLED] = "INSTALLED", > > }; > > > > +/* > > + * +----------------------+ > > + * +---> | | > > + * | +-> | NULL | > <--------------------------------------+++-+ > > + * | | +----------------------+ > | > > + * | | ^ release_iface | claim_iface > | > > + * | | | V - sbrec_update_chassis(if sb is rw) > | > > + * | | +----------------------+ > | > > + * | | | | > <----------------------------------------+ | > > + * | | | CLAIMED | > <--------------------------------------+ | | > > + * | | +----------------------+ > | | | > > + * | | | mgr_update(when sb is rw) > | | | > > + * | | release_iface | - sbrec_update_chassis > | | | > > + * | | | - request seqno > | | | > > + * | | V > | | | > > + * | | +----------------------+ > | | | > > + * | +-- | | mgr_run(seqno not rcvd) > | | | > > + * | | INSTALL_FLOWS | - set port down in sb > | | | > > + * | | | mgr_update() > | | | > > + * | +----------------------+ - sbrec_update_chassis if needed > | | | > > + * | | > | | | > > + * | | mgr_run(seqno rcvd) > | | | > > + * | | - set port up in sb > | | | > > + * | release_iface | - set ovn-installed in ovs > | | | > > + * | V > | | | > > + * | +----------------------+ > | | | > > + * | | | mgr_run() > | | | > > + * +-- | MARK_UP | - set port up in sb > | | | > > + * | | - set ovn-installed in ovs > | | | > > + * | | mgr_update() > | | | > > + * +----------------------+ - sbrec_update_chassis if needed > | | | > > + * | > | | | > > + * | mgr_update(rcvd port up / ovn_installed & chassis > set) | | | > > + * V > | | | > > + * +----------------------+ > | | | > > + * | INSTALLED | ------------> claim_iface > ---------------+ | | > > + * +----------------------+ > | | > > + * | > | | > > + * | release_iface > | | > > + * V > | | > > + * +----------------------+ > | | > > + * | | ------------> claim_iface > -----------------+ | > > + * | MARK_DOWN | ------> mgr_update(rcvd port down) > ----------+ > > + * | | mgr_run() > > + * | | - set port down in sb > > + * | | mgr_update() > > + * +----------------------+ - sbrec_update_chassis(NULL) > > + */ > > + > > Thanks for adding this, it's really useful! > > > struct ovs_iface { > > char *id; /* Extracted from OVS > external_ids.iface_id. */ > > enum if_state state; /* State of the interface in the state > machine. */ > > @@ -85,6 +137,7 @@ struct ovs_iface { > > * be fully programmed in OVS. Only used > in state > > * OIF_INSTALL_FLOWS. > > */ > > + bool chassis_update_required; /* If true, pb->chassis must be > updated. */ > > }; > > > > static uint64_t ifaces_usage; > > @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) > > } > > > > void > > -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char > *iface_id) > > +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > > + const struct sbrec_port_binding *pb, > > + const struct sbrec_chassis *chassis_rec, > > + bool sb_readonly) > > { > > + const char *iface_id = pb->logical_port; > > struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); > > > > if (!iface) { > > iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > > } > > - > > + if (!sb_readonly) { > > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > > + iface->chassis_update_required = false; > > + } else { > > + iface->chassis_update_required = true; > > + } > > switch (iface->state) { > > case OIF_CLAIMED: > > case OIF_INSTALL_FLOWS: > > @@ -182,6 +244,13 @@ if_status_mgr_claim_iface(struct if_status_mgr > *mgr, const char *iface_id) > > } > > } > > > > +bool > > +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char > *iface_id) > > +{ > > + struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); > > + return (!!iface); > > Nit: I'd rewrite this as: > > return !!shash_find_data(&mgr->ifaces, iface_id); > > > > +} > > + > > void > > if_status_mgr_release_iface(struct if_status_mgr *mgr, const char > *iface_id) > > { > > @@ -246,9 +315,43 @@ if_status_mgr_delete_iface(struct if_status_mgr > *mgr, const char *iface_id) > > } > > } > > > > +bool > > +if_status_handle_claims(struct if_status_mgr *mgr, > > + struct local_binding_data *binding_data, > > + const struct sbrec_chassis *chassis_rec, > > + struct hmap *tracked_datapath, > > + bool sb_readonly) > > +{ > > + if (!binding_data) { > > + return false; > > + } > > + > > + struct shash *bindings = &binding_data->bindings; > > + struct hmapx_node *node; > > + > > + bool rc = false; > > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > > We don't need the _SAFE here as far as I can tell, right? > Correct. > > > + struct ovs_iface *iface = node->data; > > + if (sb_readonly) { > > + return false; > > + } > > + if (iface->chassis_update_required) { > > + VLOG_INFO("if_status_handle_claims for %s", iface->id); > > + local_binding_set_pb(bindings, iface->id, chassis_rec, > > + tracked_datapath, true); > > + rc = true; > > + } > > + iface->chassis_update_required = false; > > + } > > + return rc; > > +} > > + > > void > > if_status_mgr_update(struct if_status_mgr *mgr, > > - struct local_binding_data *binding_data) > > + struct local_binding_data *binding_data, > > + const struct sbrec_chassis *chassis_rec, > > + struct hmap *tracked_datapath, > > + bool sb_readonly) > > { > > if (!binding_data) { > > return; > > @@ -257,13 +360,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, > > struct shash *bindings = &binding_data->bindings; > > struct hmapx_node *node; > > > > + /* Interfaces in OIF_MARK_UP state have already set their > pb->chassis. > > + * However, it might have been reset by another hv. > > + */ > > /* Move all interfaces that have been confirmed "up" by the binding > module, > > * from OIF_MARK_UP to OIF_INSTALLED. > > */ > > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { > > struct ovs_iface *iface = node->data; > > > > - if (local_binding_is_up(bindings, iface->id)) { > > + if (iface->chassis_update_required) { > > + if (!sb_readonly) { > > + iface->chassis_update_required = false; > > + local_binding_set_pb(bindings, iface->id, chassis_rec, > > + tracked_datapath, true); > > + } else { > > + continue; > > + } > > + } > > + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { > > ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > > } > > } > > @@ -274,23 +389,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, > > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > > struct ovs_iface *iface = node->data; > > > > - if (local_binding_is_down(bindings, iface->id)) { > > + if (!sb_readonly) { > > + local_binding_set_pb(bindings, iface->id, chassis_rec, > > + tracked_datapath, false); > > + } > > + if (local_binding_is_down(bindings, iface->id, chassis_rec)) { > > ovs_iface_destroy(mgr, iface); > > } > > } > > > > - /* Register for a notification about flows being installed in OVS > for all > > - * newly claimed interfaces. > > + if (!sb_readonly) { > > + HMAPX_FOR_EACH_SAFE (node, > &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > > + struct ovs_iface *iface = node->data; > > + > > + if (iface->chassis_update_required) { > > + iface->chassis_update_required = false; > > + local_binding_set_pb(bindings, iface->id, chassis_rec, > > + tracked_datapath, true); > > + } > > + } > > + } > > + > > + /* Update Port_Binding->chassis for newly claimed interfaces > > + * Register for a notification about flows being installed in OVS > for all > > + * newly claimed interfaces for which we could update pb->chassis. > > * > > * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > > */ > > - bool new_ifaces = false; > > - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > > - struct ovs_iface *iface = node->data; > > > > - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > > - iface->install_seqno = mgr->iface_seqno + 1; > > - new_ifaces = true; > > + bool new_ifaces = false; > > + if (!sb_readonly) { > > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) > { > > + struct ovs_iface *iface = node->data; > > + if (iface->chassis_update_required) { > > + local_binding_set_pb(bindings, iface->id, chassis_rec, > > + tracked_datapath, true); > > + } > > + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > > + iface->install_seqno = mgr->iface_seqno + 1; > > + iface->chassis_update_required = false; > > + new_ifaces = true; > > + } > > + } else { > > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) > { > > + struct ovs_iface *iface = node->data; > > + VLOG_INFO("Not updating pb chassis for %s now as sb is > readonly", > > + iface->id); > > Should we rate limit this? > Nit: indentation. > Good idea in case sb is really slow. > > > + } > > } > > > > /* Request a seqno update when the flows for new interfaces have > been > > @@ -403,7 +548,7 @@ if_status_mgr_update_bindings(struct if_status_mgr > *mgr, > > struct hmapx_node *node; > > > > /* Notify the binding module to set "down" all bindings that are > still > > - * in the process of being installed in OVS, i.e., are not yet > instsalled. > > + * in the process of being installed in OVS, i.e., are not yet > installed. > > */ > > HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > > struct ovs_iface *iface = node->data; > > diff --git a/controller/if-status.h b/controller/if-status.h > > index bb8a3950d..9b1200ff0 100644 > > --- a/controller/if-status.h > > +++ b/controller/if-status.h > > @@ -27,15 +27,28 @@ struct if_status_mgr *if_status_mgr_create(void); > > void if_status_mgr_clear(struct if_status_mgr *); > > void if_status_mgr_destroy(struct if_status_mgr *); > > > > -void if_status_mgr_claim_iface(struct if_status_mgr *, const char > *iface_id); > > +void if_status_mgr_claim_iface(struct if_status_mgr *, > > + const struct sbrec_port_binding *pb, > > + const struct sbrec_chassis *chassis_rec, > > + bool sb_readonly); > > void if_status_mgr_release_iface(struct if_status_mgr *, const char > *iface_id); > > void if_status_mgr_delete_iface(struct if_status_mgr *, const char > *iface_id); > > > > -void if_status_mgr_update(struct if_status_mgr *, struct > local_binding_data *); > > +void if_status_mgr_update(struct if_status_mgr *, struct > local_binding_data *, > > + const struct sbrec_chassis *chassis, > > + struct hmap *tracked_dp_bindings, > > + bool sb_readonly); > > void if_status_mgr_run(struct if_status_mgr *mgr, struct > local_binding_data *, > > const struct sbrec_chassis *, > > bool sb_readonly, bool ovs_readonly); > > void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > > struct simap *usage); > > +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > > + const char *iface_id); > > +bool if_status_handle_claims(struct if_status_mgr *mgr, > > + struct local_binding_data *binding_data, > > + const struct sbrec_chassis *chassis_rec, > > + struct hmap *tracked_datapath, > > + bool sb_readonly); > > > > # endif /* controller/if-status.h */ > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > > index 2793c8687..1c3cac2c1 100644 > > --- a/controller/ovn-controller.c > > +++ b/controller/ovn-controller.c > > @@ -1417,6 +1417,111 @@ en_runtime_data_run(struct engine_node *node, > void *data) > > engine_set_node_state(node, EN_UPDATED); > > } > > > > +struct ed_type_sb_ro { > > + bool sb_readonly; > > +}; > > + > > +static void * > > +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > > + struct engine_arg *arg OVS_UNUSED) > > +{ > > + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > > + return data; > > +} > > + > > +static void > > +en_sb_ro_run(struct engine_node *node, void *data) > > +{ > > + struct ed_type_sb_ro *sb_ro_data = data; > > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > > + if (sb_ro_data->sb_readonly != sb_readonly) { > > + sb_ro_data->sb_readonly = sb_readonly; > > + if (!sb_ro_data->sb_readonly) { > > + engine_set_node_state(node, EN_UPDATED); > > + } > > + } > > +} > > + > > +static void > > +en_sb_ro_cleanup(void *data OVS_UNUSED) > > +{ > > +} > > + > > +static bool > > +pb_claims_sb_ro_handler(struct engine_node *node, void *data OVS_UNUSED) > > +{ > > + engine_set_node_state(node, EN_UPDATED); > > + return true; > > +} > > + > > +static void * > > +en_pb_claims_init(struct engine_node *node OVS_UNUSED, > > + struct engine_arg *arg OVS_UNUSED) > > +{ > > + return NULL; > > +} > > + > > +static void > > +en_pb_claims_run(struct engine_node *node OVS_UNUSED, void *data_ > OVS_UNUSED) > > +{ > > +} > > + > > +static void > > +en_pb_claims_cleanup(void *data OVS_UNUSED) > > +{ > > +} > > Not really a problem of this patch but maybe we can find a way to avoid > having to explcitly define cleanup/run callbacks that don't do much. > Maybe some sensible defaults for engine node callbacks would make sense. > Han, what do you think? > > > + > > +static bool > > +pb_claims_runtime_data_handler(struct engine_node *node OVS_UNUSED, > > + void *data OVS_UNUSED) > > +{ > > + return true; > > +} > > + > > +static bool > > +lflow_output_runtime_data_handler(struct engine_node *node, > > + void *data); > > +static bool > > +lflow_output_pb_claims_handler(struct engine_node *node, void *data > OVS_UNUSED) > > +{ > > + const struct sbrec_chassis *chassis = NULL; > > + > > + struct ovsrec_open_vswitch_table *ovs_table = > > + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > > + engine_get_input("OVS_open_vswitch", node)); > > + > > + const char *chassis_id = get_ovs_chassis_id(ovs_table); > > + > > + struct ovsdb_idl_index *sbrec_chassis_by_name = > > + engine_ovsdb_node_get_index( > > + engine_get_input("SB_chassis", node), > > + "name"); > > + > > + if (chassis_id) { > > + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, > chassis_id); > > + } > > + if (chassis) { > > + struct ed_type_runtime_data *runtime_data = > > + engine_get_input_data("runtime_data", node); > > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > > + struct controller_engine_ctx *ctrl_ctx = > > + engine_get_context()->client_ctx; > > + > > + if (if_status_handle_claims(ctrl_ctx->if_mgr, > > + &runtime_data->lbinding_data, > > + chassis, > > + &runtime_data->tracked_dp_bindings, > > + sb_readonly)) { > > + struct engine_node *rt_node = > > + engine_get_input("runtime_data", node); > > + if (!engine_node_changed(rt_node)) { > > + lflow_output_runtime_data_handler(node, data); > > Instead of explicitly calling another input's handler here I think I > would factor out the logic of the handler into a separate helper > function which we could call from lflow_output_runtime_data_handler() > and here. Based on its return value we could set the node state, if > needed. > > > + } > > + } > > + } > > + return true; > > +} > > + > > static bool > > runtime_data_ovs_interface_shadow_handler(struct engine_node *node, > void *data) > > { > > @@ -3438,6 +3543,8 @@ main(int argc, char *argv[]) > > stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > > > > /* Define inc-proc-engine nodes. */ > > + ENGINE_NODE(sb_ro, "sb_ro"); > > + ENGINE_NODE(pb_claims, "pb_claims"); > > ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); > > ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > > "ovs_interface_shadow"); > > @@ -3477,6 +3584,13 @@ main(int argc, char *argv[]) > > engine_add_input(&en_non_vif_data, &en_ovs_interface, > > non_vif_data_ovs_iface_handler); > > > > + /* This ensures we run after runtime data. */ > > + engine_add_input(&en_pb_claims, &en_runtime_data, > > + pb_claims_runtime_data_handler); > > This is neat but I think we need to at least add a comment to > lflow_output_pb_claims_handler() to explain why it's ok to access the > runtime_data input there. > > Also, we can use engine_noop_handler() instead of defining an explicit one. > > > + > > + /* This ensures we always run when sb becomes writable. */ > > + engine_add_input(&en_pb_claims, &en_sb_ro, pb_claims_sb_ro_handler); > > + > > /* Note: The order of inputs is important, all OVS interface > changes must > > * be handled before any ct_zone changes. > > */ > > @@ -3518,6 +3632,8 @@ main(int argc, char *argv[]) > > lflow_output_port_groups_handler); > > engine_add_input(&en_lflow_output, &en_runtime_data, > > lflow_output_runtime_data_handler); > > + engine_add_input(&en_lflow_output, &en_pb_claims, > > + lflow_output_pb_claims_handler); > > engine_add_input(&en_lflow_output, &en_non_vif_data, > > NULL); > > > > @@ -3999,7 +4115,10 @@ main(int argc, char *argv[]) > > runtime_data ? &runtime_data->lbinding_data : > NULL; > > stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > > time_msec()); > > - if_status_mgr_update(if_mgr, binding_data); > > + if_status_mgr_update(if_mgr, binding_data, chassis, > > + (runtime_data > > + ? > &runtime_data->tracked_dp_bindings > > + : NULL), !ovnsb_idl_txn); > > I'm not completely sure that this is correct. The engine has already > run so any nodes that depend on runtime_data have already been processed. > > But now we might be adding more "tracked changes" to runtime_data, e.g.: > > if_status_mgr_update() > -> local_binding_set_pb() > -> update_lport_tracking() > -> tracked_datapath_lport_add(pb, TRACKED_RESOURCE_NEW, ..) > > After this point these tracked changes will not be processed anymore and > will be cleared before the next run starts in the next iteration in > en_runtime_data_clear_tracked_data(). > > The question is: are we ever adding more "tracked changes" to > runtime_data->tracked_dp_bindings here? If the answer is "yes" then > this approach might be problematic. If the answer is "no" then we might > as well avoid passing the runtime_data->tracked_dp_bindings all the way > down and just call local_binding_set_pb() with tracked_datapaths set to > NULL where needed. > > I think we were adding tracked changes, and so the approach was problematic. I reorg the nodes, and now have only one additional node as input of runtime data. Only if_status_handle_claims or if_status_mgr_claim_iface update tracked data ( as part of I+P). > > stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > > time_msec()); > > > > diff --git a/tests/ovn.at b/tests/ovn.at > > index 59d51f3e0..e9c061939 100644 > > --- a/tests/ovn.at > > +++ b/tests/ovn.at > > @@ -14990,7 +14990,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int > table=65 | grep actions=output:1], > > echo "verifying that lsp0 binding moves when requested-chassis is > changed" > > > > ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > chassis" hv1/ovn-controller.log)]) > > + > > +# We might see multiple "Releasing lport ...", when sb is read only > > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this > chassis" hv1/ovn-controller.log)]) > > + > > wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > > > > # (6) Chassis hv2 should add flows and hv1 should not. > > @@ -15036,7 +15039,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int > table=0 | grep in_port=1], [0], [ig > > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep > actions=output:1], [0], [ignore]) > > > > check ovn-nbctl --wait=hv lsp-set-options lsp0 > requested-chassis=non-existant-chassis > > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > chassis" hv1/ovn-controller.log)]) > > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this > chassis" hv1/ovn-controller.log)]) > > check ovn-nbctl --wait=hv sync > > wait_column '' Port_Binding chasssi logical_port=lsp0 > > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], > [1], []) > > @@ -32041,3 +32044,109 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c > "00:00:00:00:10:30") = 0]) > > OVN_CLEANUP([hv1]) > > AT_CLEANUP > > ]) > > + > > +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() > macro. > > +m4_define([OVN_NBCTL], [ > > + command="${command} -- $1" > > +]) > > + > > +m4_define([RUN_OVN_NBCTL], [ > > + check ovn-nbctl ${command} > > + unset command > > +]) > > Why can't we factor this out to ovn-macros.at (and remove the version > from perf-northd.at too)? > > Done > > + > > +OVN_FOR_EACH_NORTHD([ > > +AT_SETUP([recomputes]) > > As this is a test that is also performance related I think we could add > AT_KEYWORDS([performance]). We could also add that to the tests in > perf-northd.at. But we can do all this as a follow up, I guess, to > allow us to run all performance related tests separately if needed. > > Will do in further patch. > > +ovn_start > > + > > +n_hv=4 > > + > > +# Add chassis > > +net_add n1 > > +for i in $(seq 1 $n_hv); do > > + sim_add hv$i > > + as hv$i > > + check ovs-vsctl add-br br-phys > > + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys > > + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > > +done > > + > > +add_switch_ports() { > > + start_port=$1 > > + end_port=$2 > > + nb_hv=$3 > > + bulk_size=$4 > > + for ((i=start_port; i<end_port; )) do > > + start_bulk=$i > > + for hv in $(seq 1 $nb_hv); do > > + end_bulk=$((start_bulk+bulk_size-1)) > > + for port in $(seq $start_bulk $end_bulk); do > > + logical_switch_port=lsp${port} > > + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > > + OVN_NBCTL(lsp-set-addresses $logical_switch_port > dynamic) > > + done > > + start_bulk=$((end_bulk+1)) > > + done > > + RUN_OVN_NBCTL() > > + > > + start_bulk=$i > > + for hv in $(seq 1 $nb_hv); do > > + end_bulk=$((start_bulk+bulk_size-1)) > > + for port in $(seq $start_bulk $end_bulk); do > > + logical_switch_port=lsp${port} > > + as hv$hv ovs-vsctl \ > > + --no-wait -- add-port br-int vif${port} \ > > + -- set Interface vif${port} > external_ids:iface-id=$logical_switch_port > > + done > > + start_bulk=$((end_bulk+1)) > > + done > > + i=$((end_bulk+1)) > > + done > > + > > + # Claiming still WIP here but check for recomputes so test can fail > faster > > + AT_CHECK([test $(ovn-appctl -t ovn-controller coverage/read-counter > lflow_run) == $lflow_run]) > > +} > > +check ovn-nbctl ls-add ls1 > > +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16 > > +check ovn-nbctl set Logical_Switch ls1 > other_config:exclude_ips=10.1.255.254 > > + > > +check ovn-nbctl lr-add lr1 > > +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 > type=router options:router-port=lrp0 addresses=dynamic > > +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 > > +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > > + > > +check ovn-nbctl --wait=hv sync > > +lflow_run=$(ovn-appctl -t ovn-controller coverage/read-counter > lflow_run) > > + > > +add_switch_ports 1 1000 $n_hv 5 > > + > > +wait_for_ports_up > > +check ovn-nbctl --wait=hv sync > > + > > +for i in $(seq 1 $n_hv); do > > + pid=$(cat hv${i}/ovn-controller.pid) > > + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > > + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > > +done > > + > > +n_pid=$(cat northd/ovn-northd.pid) > > +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > > +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > > + > > +echo "Total Northd User Time: $n_u" > > +echo "Total Northd System Time: $n_s" > > +echo "Total Controller User Time: $u" > > +echo "Total Controller System Time: $s" > > + > > +lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter > lflow_run) > > +n_recomputes=`expr $lflow_run1 - $lflow_run` > > +echo "$n_recomputes recomputes" > > + > > +AT_CHECK([test $lflow_run1 == $lflow_run]) > > + > > +for i in $(seq 2 $n_hv); do > > + OVN_CLEANUP_SBOX([hv$i]) > > +done > > +OVN_CLEANUP([hv1]) > > +AT_CLEANUP > > +]) > > Thanks, > Dumitru > > Thanks Xavier
diff --git a/controller/binding.c b/controller/binding.c index 2279570f9..b21577f71 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash *local_bindings, } bool -local_binding_is_up(struct shash *local_bindings, const char *pb_name) +local_binding_is_up(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *chassis_rec) { struct local_binding *lbinding = local_binding_find(local_bindings, pb_name); struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); + + if (b_lport && b_lport->pb->chassis != chassis_rec) { + return false; + } + if (lbinding && b_lport && lbinding->iface) { if (b_lport->pb->n_up && !b_lport->pb->up[0]) { return false; @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, const char *pb_name) } bool -local_binding_is_down(struct shash *local_bindings, const char *pb_name) +local_binding_is_down(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *chassis_rec) { struct local_binding *lbinding = local_binding_find(local_bindings, pb_name); struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); + if (b_lport) { + if (b_lport->pb->chassis == chassis_rec) { + return false; + } else if (b_lport->pb->chassis) { + VLOG_DBG("lport %s already claimed by other chassis", + b_lport->pb->logical_port); + } + } + if (!lbinding) { return true; } @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type) OVS_NOT_REACHED(); } -/* For newly claimed ports, if 'notify_up' is 'false': +void +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis_rec, + bool is_set) +{ + if (pb->chassis != chassis_rec) { + if (is_set) { + if (pb->chassis) { + VLOG_INFO("Changing chassis for lport %s from %s to %s.", + pb->logical_port, pb->chassis->name, + chassis_rec->name); + } else { + VLOG_INFO("Claiming lport %s for this chassis.", + pb->logical_port); + } + for (int i = 0; i < pb->n_mac; i++) { + VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); + } + sbrec_port_binding_set_chassis(pb, chassis_rec); + } + } else if (!is_set) { + sbrec_port_binding_set_chassis(pb, NULL); + } +} + +void +local_binding_set_pb(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *chassis_rec, + struct hmap *tracked_datapaths, bool is_set) +{ + struct local_binding *lbinding = + local_binding_find(local_bindings, pb_name); + struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); + + if (b_lport) { + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); + if (tracked_datapaths) { + update_lport_tracking(b_lport->pb, tracked_datapaths, true); + } + } +} + +/* For newly claimed ports: * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., for * container and virtual ports). - * Otherwise request a notification to be sent when the OVS flows - * corresponding to 'pb' have been installed. + * + * Returns false if lport is not claimed due to 'sb_readonly'. + * Returns true otherwise. * * Note: - * Updates (directly or through a notification) the 'pb->up' field only if - * it's explicitly set to 'false'. + * Updates the 'pb->up' field only if it's explicitly set to 'false'. * This is to ensure compatibility with older versions of ovn-northd. */ -static void +static bool claimed_lport_set_up(const struct sbrec_port_binding *pb, const struct sbrec_port_binding *parent_pb, - const struct sbrec_chassis *chassis_rec, - bool notify_up, struct if_status_mgr *if_mgr) + bool sb_readonly) { - if (!notify_up) { - bool up = true; - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { + /* When notify_up is false in claim_port(), no state is created + * by if_status_mgr. In such cases, return false (i.e. trigger recompute) + * if we can't update sb (because it is readonly). + */ + bool up = true; + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { + if (!sb_readonly) { if (pb->n_up) { sbrec_port_binding_set_up(pb, &up, 1); } + } else if (pb->n_up && !pb->up[0]) { + return false; } - return; - } - - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { - if_status_mgr_claim_iface(if_mgr, pb->logical_port); } + return true; } typedef void (*set_func)(const struct sbrec_port_binding *pb, @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb, struct hmap *tracked_datapaths, struct if_status_mgr *if_mgr) { - if (!sb_readonly) { - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, if_mgr); - } - enum can_bind can_bind = lport_can_bind_on_this_chassis(chassis_rec, pb); bool update_tracked = false; if (can_bind == CAN_BIND_AS_MAIN) { if (pb->chassis != chassis_rec) { - if (sb_readonly) { - return false; - } - - if (pb->chassis) { - VLOG_INFO("Changing chassis for lport %s from %s to %s.", - pb->logical_port, pb->chassis->name, - chassis_rec->name); - } else { - VLOG_INFO("Claiming lport %s for this chassis.", - pb->logical_port); - } - for (size_t i = 0; i < pb->n_mac; i++) { - VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); - } - - sbrec_port_binding_set_chassis(pb, chassis_rec); if (is_additional_chassis(pb, chassis_rec)) { + if (sb_readonly) { + return false; + } remove_additional_chassis(pb, chassis_rec); } update_tracked = true; } + if (!notify_up) { + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { + return false; + } + if (pb->chassis != chassis_rec) { + if (sb_readonly) { + return false; + } + set_pb_chassis_in_sbrec(pb, chassis_rec, true); + } + } else { + if ((pb->chassis != chassis_rec) || (pb->n_up && !pb->up[0])) { + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, + sb_readonly); + } + } } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { if (!is_additional_chassis(pb, chassis_rec)) { if (sb_readonly) { @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, */ static bool release_lport_main_chassis(const struct sbrec_port_binding *pb, - bool sb_readonly) + bool sb_readonly, + struct if_status_mgr *if_mgr) { if (pb->encap) { if (sb_readonly) { @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, sbrec_port_binding_set_encap(pb, NULL); } + /* If sb readonly, pb->chassis unset through if-status if present. */ if (pb->chassis) { - if (sb_readonly) { + if (!sb_readonly) { + sbrec_port_binding_set_chassis(pb, NULL); + } else if (!if_status_mgr_iface_is_present(if_mgr, pb->logical_port)) { return false; } - sbrec_port_binding_set_chassis(pb, NULL); } if (pb->virtual_parent) { @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, sbrec_port_binding_set_virtual_parent(pb, NULL); } - VLOG_INFO("Releasing lport %s from this chassis.", pb->logical_port); + VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)", + pb->logical_port, sb_readonly); return true; } @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb, struct hmap *tracked_datapaths, struct if_status_mgr *if_mgr) { if (pb->chassis == chassis_rec) { - if (!release_lport_main_chassis(pb, sb_readonly)) { + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { return false; } } else if (is_additional_chassis(pb, chassis_rec)) { @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb, b_lport->lbinding->iface, !b_ctx_in->ovnsb_idl_txn, !parent_pb, b_ctx_out->tracked_dp_bindings, - b_ctx_out->if_mgr)){ + b_ctx_out->if_mgr)) { return false; } @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding *pb, enum can_bind can_bind = lport_can_bind_on_this_chassis( b_ctx_in->chassis_rec, pb); if (can_bind == CAN_BIND_AS_MAIN) { - if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) { + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, + b_ctx_out->if_mgr)) { return false; } } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { diff --git a/controller/binding.h b/controller/binding.h index 1fed06674..d20659b0b 100644 --- a/controller/binding.h +++ b/controller/binding.h @@ -151,8 +151,10 @@ const struct sbrec_port_binding *local_binding_get_primary_pb( ofp_port_t local_binding_get_lport_ofport(const struct shash *local_bindings, const char *pb_name); -bool local_binding_is_up(struct shash *local_bindings, const char *pb_name); -bool local_binding_is_down(struct shash *local_bindings, const char *pb_name); +bool local_binding_is_up(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *); +bool local_binding_is_down(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *); void local_binding_set_up(struct shash *local_bindings, const char *pb_name, const struct sbrec_chassis *chassis_rec, const char *ts_now_str, bool sb_readonly, @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash *local_bindings, const char *pb_name, void local_binding_set_down(struct shash *local_bindings, const char *pb_name, const struct sbrec_chassis *chassis_rec, bool sb_readonly, bool ovs_readonly); - +void local_binding_set_pb(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *chassis_rec, + struct hmap *tracked_datapaths, + bool is_set); void binding_register_ovs_idl(struct ovsdb_idl *); void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct local_binding_data *, struct ds *); bool is_additional_chassis(const struct sbrec_port_binding *pb, const struct sbrec_chassis *chassis_rec); +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis_rec, + bool is_set); + /* Corresponds to each Port_Binding.type. */ enum en_lport_type { LP_UNKNOWN, diff --git a/controller/if-status.c b/controller/if-status.c index ad61844d8..5b0eef859 100644 --- a/controller/if-status.c +++ b/controller/if-status.c @@ -24,6 +24,7 @@ #include "lib/util.h" #include "timeval.h" #include "openvswitch/vlog.h" +#include "lib/ovn-sb-idl.h" VLOG_DEFINE_THIS_MODULE(if_status); @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); */ enum if_state { - OIF_CLAIMED, /* Newly claimed interface. */ - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are still - * being installed. + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not yet updated. + */ + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis successfully + * updated in SB and for which flows are still being + * installed. */ OIF_MARK_UP, /* Interface with flows successfully installed in OVS * but not yet marked "up" in the binding module (in @@ -78,6 +81,55 @@ static const char *if_state_names[] = { [OIF_INSTALLED] = "INSTALLED", }; +/* + * +----------------------+ + * +---> | | + * | +-> | NULL | <--------------------------------------+++-+ + * | | +----------------------+ | + * | | ^ release_iface | claim_iface | + * | | | V - sbrec_update_chassis(if sb is rw) | + * | | +----------------------+ | + * | | | | <----------------------------------------+ | + * | | | CLAIMED | <--------------------------------------+ | | + * | | +----------------------+ | | | + * | | | mgr_update(when sb is rw) | | | + * | | release_iface | - sbrec_update_chassis | | | + * | | | - request seqno | | | + * | | V | | | + * | | +----------------------+ | | | + * | +-- | | mgr_run(seqno not rcvd) | | | + * | | INSTALL_FLOWS | - set port down in sb | | | + * | | | mgr_update() | | | + * | +----------------------+ - sbrec_update_chassis if needed | | | + * | | | | | + * | | mgr_run(seqno rcvd) | | | + * | | - set port up in sb | | | + * | release_iface | - set ovn-installed in ovs | | | + * | V | | | + * | +----------------------+ | | | + * | | | mgr_run() | | | + * +-- | MARK_UP | - set port up in sb | | | + * | | - set ovn-installed in ovs | | | + * | | mgr_update() | | | + * +----------------------+ - sbrec_update_chassis if needed | | | + * | | | | + * | mgr_update(rcvd port up / ovn_installed & chassis set) | | | + * V | | | + * +----------------------+ | | | + * | INSTALLED | ------------> claim_iface ---------------+ | | + * +----------------------+ | | + * | | | + * | release_iface | | + * V | | + * +----------------------+ | | + * | | ------------> claim_iface -----------------+ | + * | MARK_DOWN | ------> mgr_update(rcvd port down) ----------+ + * | | mgr_run() + * | | - set port down in sb + * | | mgr_update() + * +----------------------+ - sbrec_update_chassis(NULL) + */ + struct ovs_iface { char *id; /* Extracted from OVS external_ids.iface_id. */ enum if_state state; /* State of the interface in the state machine. */ @@ -85,6 +137,7 @@ struct ovs_iface { * be fully programmed in OVS. Only used in state * OIF_INSTALL_FLOWS. */ + bool chassis_update_required; /* If true, pb->chassis must be updated. */ }; static uint64_t ifaces_usage; @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) } void -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) +if_status_mgr_claim_iface(struct if_status_mgr *mgr, + const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis_rec, + bool sb_readonly) { + const char *iface_id = pb->logical_port; struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); if (!iface) { iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); } - + if (!sb_readonly) { + set_pb_chassis_in_sbrec(pb, chassis_rec, true); + iface->chassis_update_required = false; + } else { + iface->chassis_update_required = true; + } switch (iface->state) { case OIF_CLAIMED: case OIF_INSTALL_FLOWS: @@ -182,6 +244,13 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) } } +bool +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char *iface_id) +{ + struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); + return (!!iface); +} + void if_status_mgr_release_iface(struct if_status_mgr *mgr, const char *iface_id) { @@ -246,9 +315,43 @@ if_status_mgr_delete_iface(struct if_status_mgr *mgr, const char *iface_id) } } +bool +if_status_handle_claims(struct if_status_mgr *mgr, + struct local_binding_data *binding_data, + const struct sbrec_chassis *chassis_rec, + struct hmap *tracked_datapath, + bool sb_readonly) +{ + if (!binding_data) { + return false; + } + + struct shash *bindings = &binding_data->bindings; + struct hmapx_node *node; + + bool rc = false; + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { + struct ovs_iface *iface = node->data; + if (sb_readonly) { + return false; + } + if (iface->chassis_update_required) { + VLOG_INFO("if_status_handle_claims for %s", iface->id); + local_binding_set_pb(bindings, iface->id, chassis_rec, + tracked_datapath, true); + rc = true; + } + iface->chassis_update_required = false; + } + return rc; +} + void if_status_mgr_update(struct if_status_mgr *mgr, - struct local_binding_data *binding_data) + struct local_binding_data *binding_data, + const struct sbrec_chassis *chassis_rec, + struct hmap *tracked_datapath, + bool sb_readonly) { if (!binding_data) { return; @@ -257,13 +360,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, struct shash *bindings = &binding_data->bindings; struct hmapx_node *node; + /* Interfaces in OIF_MARK_UP state have already set their pb->chassis. + * However, it might have been reset by another hv. + */ /* Move all interfaces that have been confirmed "up" by the binding module, * from OIF_MARK_UP to OIF_INSTALLED. */ HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { struct ovs_iface *iface = node->data; - if (local_binding_is_up(bindings, iface->id)) { + if (iface->chassis_update_required) { + if (!sb_readonly) { + iface->chassis_update_required = false; + local_binding_set_pb(bindings, iface->id, chassis_rec, + tracked_datapath, true); + } else { + continue; + } + } + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { ovs_iface_set_state(mgr, iface, OIF_INSTALLED); } } @@ -274,23 +389,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) { struct ovs_iface *iface = node->data; - if (local_binding_is_down(bindings, iface->id)) { + if (!sb_readonly) { + local_binding_set_pb(bindings, iface->id, chassis_rec, + tracked_datapath, false); + } + if (local_binding_is_down(bindings, iface->id, chassis_rec)) { ovs_iface_destroy(mgr, iface); } } - /* Register for a notification about flows being installed in OVS for all - * newly claimed interfaces. + if (!sb_readonly) { + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { + struct ovs_iface *iface = node->data; + + if (iface->chassis_update_required) { + iface->chassis_update_required = false; + local_binding_set_pb(bindings, iface->id, chassis_rec, + tracked_datapath, true); + } + } + } + + /* Update Port_Binding->chassis for newly claimed interfaces + * Register for a notification about flows being installed in OVS for all + * newly claimed interfaces for which we could update pb->chassis. * * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. */ - bool new_ifaces = false; - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { - struct ovs_iface *iface = node->data; - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); - iface->install_seqno = mgr->iface_seqno + 1; - new_ifaces = true; + bool new_ifaces = false; + if (!sb_readonly) { + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { + struct ovs_iface *iface = node->data; + if (iface->chassis_update_required) { + local_binding_set_pb(bindings, iface->id, chassis_rec, + tracked_datapath, true); + } + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); + iface->install_seqno = mgr->iface_seqno + 1; + iface->chassis_update_required = false; + new_ifaces = true; + } + } else { + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { + struct ovs_iface *iface = node->data; + VLOG_INFO("Not updating pb chassis for %s now as sb is readonly", + iface->id); + } } /* Request a seqno update when the flows for new interfaces have been @@ -403,7 +548,7 @@ if_status_mgr_update_bindings(struct if_status_mgr *mgr, struct hmapx_node *node; /* Notify the binding module to set "down" all bindings that are still - * in the process of being installed in OVS, i.e., are not yet instsalled. + * in the process of being installed in OVS, i.e., are not yet installed. */ HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { struct ovs_iface *iface = node->data; diff --git a/controller/if-status.h b/controller/if-status.h index bb8a3950d..9b1200ff0 100644 --- a/controller/if-status.h +++ b/controller/if-status.h @@ -27,15 +27,28 @@ struct if_status_mgr *if_status_mgr_create(void); void if_status_mgr_clear(struct if_status_mgr *); void if_status_mgr_destroy(struct if_status_mgr *); -void if_status_mgr_claim_iface(struct if_status_mgr *, const char *iface_id); +void if_status_mgr_claim_iface(struct if_status_mgr *, + const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis_rec, + bool sb_readonly); void if_status_mgr_release_iface(struct if_status_mgr *, const char *iface_id); void if_status_mgr_delete_iface(struct if_status_mgr *, const char *iface_id); -void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *); +void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *, + const struct sbrec_chassis *chassis, + struct hmap *tracked_dp_bindings, + bool sb_readonly); void if_status_mgr_run(struct if_status_mgr *mgr, struct local_binding_data *, const struct sbrec_chassis *, bool sb_readonly, bool ovs_readonly); void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, struct simap *usage); +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, + const char *iface_id); +bool if_status_handle_claims(struct if_status_mgr *mgr, + struct local_binding_data *binding_data, + const struct sbrec_chassis *chassis_rec, + struct hmap *tracked_datapath, + bool sb_readonly); # endif /* controller/if-status.h */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 2793c8687..1c3cac2c1 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1417,6 +1417,111 @@ en_runtime_data_run(struct engine_node *node, void *data) engine_set_node_state(node, EN_UPDATED); } +struct ed_type_sb_ro { + bool sb_readonly; +}; + +static void * +en_sb_ro_init(struct engine_node *node OVS_UNUSED, + struct engine_arg *arg OVS_UNUSED) +{ + struct ed_type_sb_ro *data = xzalloc(sizeof *data); + return data; +} + +static void +en_sb_ro_run(struct engine_node *node, void *data) +{ + struct ed_type_sb_ro *sb_ro_data = data; + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; + if (sb_ro_data->sb_readonly != sb_readonly) { + sb_ro_data->sb_readonly = sb_readonly; + if (!sb_ro_data->sb_readonly) { + engine_set_node_state(node, EN_UPDATED); + } + } +} + +static void +en_sb_ro_cleanup(void *data OVS_UNUSED) +{ +} + +static bool +pb_claims_sb_ro_handler(struct engine_node *node, void *data OVS_UNUSED) +{ + engine_set_node_state(node, EN_UPDATED); + return true; +} + +static void * +en_pb_claims_init(struct engine_node *node OVS_UNUSED, + struct engine_arg *arg OVS_UNUSED) +{ + return NULL; +} + +static void +en_pb_claims_run(struct engine_node *node OVS_UNUSED, void *data_ OVS_UNUSED) +{ +} + +static void +en_pb_claims_cleanup(void *data OVS_UNUSED) +{ +} + +static bool +pb_claims_runtime_data_handler(struct engine_node *node OVS_UNUSED, + void *data OVS_UNUSED) +{ + return true; +} + +static bool +lflow_output_runtime_data_handler(struct engine_node *node, + void *data); +static bool +lflow_output_pb_claims_handler(struct engine_node *node, void *data OVS_UNUSED) +{ + const struct sbrec_chassis *chassis = NULL; + + struct ovsrec_open_vswitch_table *ovs_table = + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( + engine_get_input("OVS_open_vswitch", node)); + + const char *chassis_id = get_ovs_chassis_id(ovs_table); + + struct ovsdb_idl_index *sbrec_chassis_by_name = + engine_ovsdb_node_get_index( + engine_get_input("SB_chassis", node), + "name"); + + if (chassis_id) { + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); + } + if (chassis) { + struct ed_type_runtime_data *runtime_data = + engine_get_input_data("runtime_data", node); + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; + struct controller_engine_ctx *ctrl_ctx = + engine_get_context()->client_ctx; + + if (if_status_handle_claims(ctrl_ctx->if_mgr, + &runtime_data->lbinding_data, + chassis, + &runtime_data->tracked_dp_bindings, + sb_readonly)) { + struct engine_node *rt_node = + engine_get_input("runtime_data", node); + if (!engine_node_changed(rt_node)) { + lflow_output_runtime_data_handler(node, data); + } + } + } + return true; +} + static bool runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data) { @@ -3438,6 +3543,8 @@ main(int argc, char *argv[]) stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); /* Define inc-proc-engine nodes. */ + ENGINE_NODE(sb_ro, "sb_ro"); + ENGINE_NODE(pb_claims, "pb_claims"); ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, "ovs_interface_shadow"); @@ -3477,6 +3584,13 @@ main(int argc, char *argv[]) engine_add_input(&en_non_vif_data, &en_ovs_interface, non_vif_data_ovs_iface_handler); + /* This ensures we run after runtime data. */ + engine_add_input(&en_pb_claims, &en_runtime_data, + pb_claims_runtime_data_handler); + + /* This ensures we always run when sb becomes writable. */ + engine_add_input(&en_pb_claims, &en_sb_ro, pb_claims_sb_ro_handler); + /* Note: The order of inputs is important, all OVS interface changes must * be handled before any ct_zone changes. */ @@ -3518,6 +3632,8 @@ main(int argc, char *argv[]) lflow_output_port_groups_handler); engine_add_input(&en_lflow_output, &en_runtime_data, lflow_output_runtime_data_handler); + engine_add_input(&en_lflow_output, &en_pb_claims, + lflow_output_pb_claims_handler); engine_add_input(&en_lflow_output, &en_non_vif_data, NULL); @@ -3999,7 +4115,10 @@ main(int argc, char *argv[]) runtime_data ? &runtime_data->lbinding_data : NULL; stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, time_msec()); - if_status_mgr_update(if_mgr, binding_data); + if_status_mgr_update(if_mgr, binding_data, chassis, + (runtime_data + ? &runtime_data->tracked_dp_bindings + : NULL), !ovnsb_idl_txn); stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, time_msec()); diff --git a/tests/ovn.at b/tests/ovn.at index 59d51f3e0..e9c061939 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -14990,7 +14990,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], echo "verifying that lsp0 binding moves when requested-chassis is changed" ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) + +# We might see multiple "Releasing lport ...", when sb is read only +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) + wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 # (6) Chassis hv2 should add flows and hv1 should not. @@ -15036,7 +15039,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ig AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore]) check ovn-nbctl --wait=hv lsp-set-options lsp0 requested-chassis=non-existant-chassis -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) check ovn-nbctl --wait=hv sync wait_column '' Port_Binding chasssi logical_port=lsp0 AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [1], []) @@ -32041,3 +32044,109 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c "00:00:00:00:10:30") = 0]) OVN_CLEANUP([hv1]) AT_CLEANUP ]) + +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() macro. +m4_define([OVN_NBCTL], [ + command="${command} -- $1" +]) + +m4_define([RUN_OVN_NBCTL], [ + check ovn-nbctl ${command} + unset command +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([recomputes]) +ovn_start + +n_hv=4 + +# Add chassis +net_add n1 +for i in $(seq 1 $n_hv); do + sim_add hv$i + as hv$i + check ovs-vsctl add-br br-phys + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys + ovn_attach n1 br-phys 192.168.0.$i 24 geneve +done + +add_switch_ports() { + start_port=$1 + end_port=$2 + nb_hv=$3 + bulk_size=$4 + for ((i=start_port; i<end_port; )) do + start_bulk=$i + for hv in $(seq 1 $nb_hv); do + end_bulk=$((start_bulk+bulk_size-1)) + for port in $(seq $start_bulk $end_bulk); do + logical_switch_port=lsp${port} + OVN_NBCTL(lsp-add ls1 $logical_switch_port) + OVN_NBCTL(lsp-set-addresses $logical_switch_port dynamic) + done + start_bulk=$((end_bulk+1)) + done + RUN_OVN_NBCTL() + + start_bulk=$i + for hv in $(seq 1 $nb_hv); do + end_bulk=$((start_bulk+bulk_size-1)) + for port in $(seq $start_bulk $end_bulk); do + logical_switch_port=lsp${port} + as hv$hv ovs-vsctl \ + --no-wait -- add-port br-int vif${port} \ + -- set Interface vif${port} external_ids:iface-id=$logical_switch_port + done + start_bulk=$((end_bulk+1)) + done + i=$((end_bulk+1)) + done + + # Claiming still WIP here but check for recomputes so test can fail faster + AT_CHECK([test $(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) == $lflow_run]) +} +check ovn-nbctl ls-add ls1 +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16 +check ovn-nbctl set Logical_Switch ls1 other_config:exclude_ips=10.1.255.254 + +check ovn-nbctl lr-add lr1 +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 type=router options:router-port=lrp0 addresses=dynamic +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 + +check ovn-nbctl --wait=hv sync +lflow_run=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) + +add_switch_ports 1 1000 $n_hv 5 + +wait_for_ports_up +check ovn-nbctl --wait=hv sync + +for i in $(seq 1 $n_hv); do + pid=$(cat hv${i}/ovn-controller.pid) + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) +done + +n_pid=$(cat northd/ovn-northd.pid) +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') + +echo "Total Northd User Time: $n_u" +echo "Total Northd System Time: $n_s" +echo "Total Controller User Time: $u" +echo "Total Controller System Time: $s" + +lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) +n_recomputes=`expr $lflow_run1 - $lflow_run` +echo "$n_recomputes recomputes" + +AT_CHECK([test $lflow_run1 == $lflow_run]) + +for i in $(seq 2 $n_hv); do + OVN_CLEANUP_SBOX([hv$i]) +done +OVN_CLEANUP([hv1]) +AT_CLEANUP +])
When VIF ports are claimed on a chassis, SBDB Port_Binding table is updated. If the SBDB IDL is still is read-only ("in transaction") when such a update is required, the update is not possible and recompute is triggered through I+P failure. This situation can happen: - after updating Port_Binding->chassis to SBDB for one port, in a following iteration, ovn-controller handles Interface:external_ids:ovn-installed (for the same port) while SBDB is still read-only. - after updating Port_Binding->chassis to SBDB for one port, in a following iteration, ovn-controller updates Port_Binding->chassis for another port, while SBDB is still read-only. This patch prevent the recompute, by having the if-status module updating the Port_Binding chassis (if needed) when possible. This does not delay Port_Binding chassis update compared to before this patch. - With the patch, Port_Binding chassis will be updated as soon as SBDB is again writable, without recompute. - Without the patch, Port_Binding chassis was updated as soon as SBDB was again writable, through a recompute. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 Signed-off-by: Xavier Simonart <xsimonar@redhat.com> --- v2: - handled Dumitru's comments. - handled Han's comments, mainly ensure we moved out of CLAIMED state only after updating pb->chassis to guarentee physical flows are installed when ovn-installed is updated in OVS. - slighly reorganize the code to isolate 'notify_up = false' cases in claim_port (i.e. ports such as virtual ports), in the idea of making future patch preventing recomputes when virtual ports are claimed. - updated test case to cause more race conditions. - rebased on origin/main - note that "additional chassis" as now supported by "Support LSP:options:requested-chassis as a list" might still cause recomputes. - fixed missing flows when Port_Binding chassis was updated by mgr_update w/o any lflow recalculation. --- controller/binding.c | 154 ++++++++++++++++++++++--------- controller/binding.h | 15 ++- controller/if-status.c | 179 ++++++++++++++++++++++++++++++++---- controller/if-status.h | 17 +++- controller/ovn-controller.c | 121 +++++++++++++++++++++++- tests/ovn.at | 113 ++++++++++++++++++++++- 6 files changed, 528 insertions(+), 71 deletions(-)